CHAPTER 10 — PRESENTATION

Compression in Linux

Master tar, gzip, bzip2, xz, and zip. Understand the difference between archiving and compression, when to use each tool, and how to automate backup pipelines from the command line.

Slide 1 — Archiving vs Compression

Two Different Concepts — Often Combined

Archiving bundles multiple files and directories into a single file, preserving structure, permissions, and timestamps. The classic tool is tar (Tape ARchive). By itself, tar does not compress — the result is often larger than the source due to archive headers.

Compression reduces file size by encoding data more efficiently using algorithms (LZ77, Huffman coding, etc.). Linux tools include gzip, bzip2, xz, and zip. These work on single files — they do not bundle directories.

Combining both is the standard practice: use tar to bundle files, then pipe through a compression algorithm. The resulting .tar.gz, .tar.bz2, or .tar.xz is both archived and compressed.

KEY DISTINCTION

tar cf archive.tar folder/ creates an uncompressed archive.
tar czf archive.tar.gz folder/ creates a gzip-compressed archive.
The z flag tells tar to pipe through gzip. Use j for bzip2, J for xz.

Slide 2 — Linux Compression Tools Compared

gzip

Fastest compression. Moderate ratio. Extension: .gz. Ubiquitous — available everywhere. The default compression for tar. Use when speed matters more than size.

🦸

bzip2

Better compression than gzip, slower speed. Extension: .bz2. Uses Burrows-Wheeler algorithm. Good for distributing source code archives where size matters.

📦

xz

Best compression ratio. Slowest speed. Extension: .xz. LZMA2 algorithm. Used for kernel source and major software distributions. RAM-intensive during compression.

📁

zip

Cross-platform (Windows/Linux/Mac). Archives AND compresses in one step. Extension: .zip. Good for sharing with Windows users. Slightly less efficient than tar.gz.

📜

tar (archive only)

No compression. Bundles files and preserves Unix metadata (permissions, ownership, symlinks). Extension: .tar. Foundation of all Linux backup pipelines.

📊

Lossless vs Lossy

All Linux file compression tools are lossless — perfect reconstruction of original data. Lossy compression (JPEG, MP3) is for media where some quality loss is acceptable.

Slide 3 — tar Command Deep Dive

Understanding tar Flags

cCreate a new archive xExtract files from archive tList (table of) contents of archive fSpecify the archive filename (always last flag before filename) vVerbose — print each file as it is processed zCompress/decompress with gzip (.tar.gz / .tgz) jCompress/decompress with bzip2 (.tar.bz2) JCompress/decompress with xz (.tar.xz) -CExtract to a specific directory: tar xf archive.tar.gz -C /target/ --excludeExclude files matching pattern: tar czf backup.tgz --exclude='*.log' /var/www
# CREATE archives tar cf archive.tar folder/ # uncompressed archive tar czf archive.tar.gz folder/ # gzip compressed tar cjf archive.tar.bz2 folder/ # bzip2 compressed tar cJf archive.tar.xz folder/ # xz compressed tar cvzf archive.tar.gz folder/ # verbose gzip # EXTRACT archives tar xf archive.tar.gz # auto-detect compression tar xzf archive.tar.gz # explicitly gzip tar xjf archive.tar.bz2 # explicitly bzip2 tar xvf archive.tar.gz # verbose extract tar xf archive.tar.gz -C /opt/ # extract to /opt/ # LIST contents without extracting tar tf archive.tar.gz # list files in archive tar tvf archive.tar.gz # verbose listing with permissions
Slide 4 — gzip, bzip2, and xz Direct Commands

Working with Individual Files

# GZIP — compress and decompress single files gzip file.txt # creates file.txt.gz, removes original gzip -k file.txt # keep original file gzip -d file.txt.gz # decompress (-d flag) gunzip file.txt.gz # same as gzip -d gzip -l file.txt.gz # list compression info (ratio) gzip -9 file.txt # maximum compression (slower) gzip -1 file.txt # fastest compression (larger output) # BZIP2 bzip2 file.txt # creates file.txt.bz2 bunzip2 file.txt.bz2 # decompress bzip2 -k file.txt # keep original # XZ xz file.txt # creates file.txt.xz xz -d file.txt.xz # decompress unxz file.txt.xz # same as xz -d xz -k file.txt # keep original xz -9 file.txt # maximum compression
Slide 5 — zip and unzip (Cross-Platform)

zip for Windows Compatibility

# ZIP — archive AND compress multiple files/directories zip archive.zip file1 file2 file3 zip -r archive.zip folder/ # recursive (-r for directories) zip -9r archive.zip folder/ # max compression + recursive # UNZIP unzip archive.zip # extract to current directory unzip archive.zip -d /target/dir/ # extract to specific directory unzip -l archive.zip # list contents without extracting unzip -p archive.zip file.txt # pipe single file to stdout # Install zip tools if missing: sudo apt install zip unzip # Debian/Ubuntu sudo dnf install zip unzip # Fedora/RHEL
FormatExtensionArchive?Compress?SpeedUse When
tar.tarYesNoFastArchiving with metadata
tar+gzip.tar.gzYesYesFastGeneral purpose backup
tar+bzip2.tar.bz2YesYesMediumSource code distributions
tar+xz.tar.xzYesYesSlowMaximum compression needed
zip.zipYesYesFastCross-platform sharing
gzip.gzNoYesFastSingle file compression
Slide 6 — Real-World Backup Pipeline

Practical Compression for System Administration

# Backup /var/www to dated archive file tar czf /backup/www-$(date +%Y%m%d).tar.gz /var/www/ # Backup with exclusions (skip logs and cache) tar czf /backup/app-backup.tar.gz \ --exclude='*.log' \ --exclude='*/cache/*' \ /var/www/myapp/ # Check the size reduction achieved ls -lh /backup/www-*.tar.gz gzip -l /backup/www-*.gz # shows compressed vs original size # Verify archive integrity before counting on it tar tzf /backup/www-backup.tar.gz | head -20 # Extract specific files from an archive tar xzf /backup/www-backup.tar.gz var/www/html/index.php # Split a large archive across multiple files (e.g., DVD-size chunks) tar czf - /large/directory/ | split -b 4G - backup.tar.gz.part # Reassemble and extract split archive cat backup.tar.gz.part* | tar xzf -
BACKUP RULE

Always verify your archive immediately after creation with tar tzf archive.tar.gz. A corrupted or incomplete archive discovered weeks later during a crisis is worthless. Test extraction to a temp directory at least monthly.

Chapter 10 Complete

Mark this presentation complete to record your progress and unlock the quiz.

Progress saved. Head to the quiz to test your knowledge.