← Back
Script House — Linux Admin

Chapter 10: Compression & Archiving Lab

Hands-on practice with tar, gzip, bzip2, xz, and zip

Lab Overview

Archiving and compression are essential sysadmin skills for backups, log rotation, and software distribution. In this lab you will create archives in multiple formats, measure compression ratios, extract to specific locations, and build an automated backup pipeline using only command-line tools.

1

Creating Archives with tar

Build .tar, .tar.gz, .tar.bz2, and .tar.xz archives and compare their sizes

BEGINNER
Context tar is a two-step tool: it first archives (bundles files), then optionally compresses (reduces size). The flags z, j, and J invoke gzip, bzip2, and xz respectively. The c flag creates, v gives verbose output, and f specifies the filename.
1

Create a test workspace with some sample files to archive:

mkdir -p ~/lab-compress/source cd ~/lab-compress/source # Generate files of various sizes for i in {1..20}; do dd if=/dev/urandom bs=50K count=1 of="datafile-$i.bin" 2>/dev/null done # Also add some text files (these compress well) for i in {1..10}; do yes "This is test line $i repeated many times for compression" | head -n 500 > "textfile-$i.txt" done ls -lh | head -15
Expected output
total 1.1M -rw-rw-r-- 1 user user 50K Jan 15 09:00 datafile-1.bin -rw-rw-r-- 1 user user 50K Jan 15 09:00 datafile-2.bin ... -rw-rw-r-- 1 user user 26K Jan 15 09:00 textfile-1.txt ...
2

Create a raw tar archive (no compression) to establish a baseline size:

cd ~/lab-compress tar cvf source.tar source/
Expected output (truncated)
source/ source/datafile-1.bin source/datafile-2.bin ... source/textfile-10.txt
ls -lh source.tar
-rw-rw-r-- 1 user user 1.3M Jan 15 09:01 source.tar

Note: the .tar file may be slightly larger than the source because tar adds header metadata per file.

3

Create a gzip-compressed archive (fastest, moderate compression):

tar czf source.tar.gz source/ ls -lh source.tar.gz
-rw-rw-r-- 1 user user 1021K Jan 15 09:01 source.tar.gz
4

Create a bzip2-compressed archive (slower, better compression):

tar cjf source.tar.bz2 source/ ls -lh source.tar.bz2
-rw-rw-r-- 1 user user 998K Jan 15 09:02 source.tar.bz2
5

Create an xz-compressed archive (slowest, best compression):

tar cJf source.tar.xz source/ ls -lh source.tar.xz
-rw-rw-r-- 1 user user 975K Jan 15 09:04 source.tar.xz
6

Compare all archive sizes side by side. The binary files (random data) resist compression — that is expected:

ls -lh source.tar source.tar.gz source.tar.bz2 source.tar.xz
-rw-rw-r-- 1 user user 1.3M Jan 15 09:01 source.tar -rw-rw-r-- 1 user user 1021K Jan 15 09:01 source.tar.gz -rw-rw-r-- 1 user user 998K Jan 15 09:02 source.tar.bz2 -rw-rw-r-- 1 user user 975K Jan 15 09:04 source.tar.xz
Key Flag Reference
cCreate new archive
xExtract archive
tList contents (no extract)
vVerbose (show filenames)
fFilename follows this flag
zUse gzip compression
jUse bzip2 compression
JUse xz compression
CChange to directory first
pPreserve permissions
2

Listing and Extracting Archives

Inspect archive contents and extract to specific directories

BEGINNER
1

List archive contents without extracting — essential before unpacking untrusted archives:

tar tf source.tar.gz | head -10
source/ source/datafile-1.bin source/datafile-2.bin source/datafile-3.bin source/textfile-1.txt source/textfile-2.txt ...

The t flag lists, f specifies the file. Notice paths include the source/ prefix.

2

Count how many files are in the archive:

tar tf source.tar.gz | wc -l
31

31 entries = 1 directory + 20 binary files + 10 text files.

3

Extract the gzip archive into a specific target directory using -C:

mkdir -p ~/lab-compress/extracted-gz tar xzf source.tar.gz -C ~/lab-compress/extracted-gz/ ls ~/lab-compress/extracted-gz/source/ | head -5
datafile-1.bin datafile-2.bin datafile-3.bin datafile-4.bin datafile-5.bin
4

Extract the xz archive with verbose output to see each file as it restores:

mkdir -p ~/lab-compress/extracted-xz tar xJvf source.tar.xz -C ~/lab-compress/extracted-xz/ 2>&1 | tail -5
source/textfile-6.txt source/textfile-7.txt source/textfile-8.txt source/textfile-9.txt source/textfile-10.txt
5

Extract only specific files from an archive (not the whole thing):

# Extract only textfile-1.txt from the bz2 archive mkdir -p ~/lab-compress/selective tar xjf source.tar.bz2 -C ~/lab-compress/selective/ source/textfile-1.txt ls -lh ~/lab-compress/selective/source/
total 28K -rw-rw-r-- 1 user user 26K Jan 15 09:00 textfile-1.txt
Security Tip Always run tar tf archive.tar.gz before extracting untrusted archives. Malicious archives can use path traversal (e.g., ../../etc/passwd) to overwrite system files. The list command lets you review paths before they land on disk.
3

Using gzip, bzip2, and xz Standalone

Compress and decompress individual files without tar

INTERMEDIATE
Standalone vs. Paired with tar gzip, bzip2, and xz compress single files only — they do not bundle multiple files into one archive. That is tar's job. By default these tools replace the original file with the compressed version.
1

Create a test text file and compress it with gzip:

cd ~/lab-compress cp source/textfile-1.txt ./test-gzip.txt ls -lh test-gzip.txt gzip test-gzip.txt ls -lh test-gzip.txt.gz
-rw-rw-r-- 1 user user 26K Jan 15 09:00 test-gzip.txt -rw-rw-r-- 1 user user 462 Jan 15 09:05 test-gzip.txt.gz

The text file compresses dramatically because it contains repeated content. The original test-gzip.txt is gone — replaced by .gz.

2

Decompress with gunzip (or gzip -d):

gunzip test-gzip.txt.gz ls -lh test-gzip.txt
-rw-rw-r-- 1 user user 26K Jan 15 09:00 test-gzip.txt
3

Use -k (keep) to compress without deleting the original:

gzip -k test-gzip.txt ls -lh test-gzip.txt test-gzip.txt.gz
-rw-rw-r-- 1 user user 26K Jan 15 09:00 test-gzip.txt -rw-rw-r-- 1 user user 462 Jan 15 09:05 test-gzip.txt.gz
4

Compress with bzip2 and compare the result:

cp source/textfile-1.txt ./test-bzip2.txt bzip2 test-bzip2.txt ls -lh test-bzip2.txt.bz2
-rw-rw-r-- 1 user user 285 Jan 15 09:05 test-bzip2.txt.bz2

bzip2 achieves better compression on text than gzip. Decompress with bunzip2 or bzip2 -d.

5

Compress with xz at maximum compression level and view stats:

cp source/textfile-1.txt ./test-xz.txt xz -9 -v test-xz.txt
test-xz.txt (1/1) 100 % 248 B / 26.0 KiB = 0.009 %
ls -lh test-xz.txt.xz
-rw-rw-r-- 1 user user 248 Jan 15 09:06 test-xz.txt.xz

The -9 flag uses maximum compression. Decompress with unxz or xz -d.

6

Compare all three tools on the same source file:

ToolCommandExtensionSpeedRatio (text)
gzipgzip / gunzip.gzFastGood
bzip2bzip2 / bunzip2.bz2SlowerBetter
xzxz / unxz.xzSlowestBest
4

Cross-Platform Archives with zip

Create and inspect .zip files compatible with Windows, macOS, and Linux

INTERMEDIATE
Why zip? While tar + gzip is the Linux standard, .zip archives are universally compatible with Windows and macOS without extra tools. Use zip when sharing files across operating systems.
1

Install zip if not present, then create a zip archive:

which zip || sudo apt install -y zip cd ~/lab-compress zip -r source.zip source/
adding: source/ (stored 0%) adding: source/datafile-1.bin (deflated 0%) ... adding: source/textfile-1.txt (deflated 98%) ...

The -r flag is required for directories — without it, zip silently skips them.

2

List the zip archive contents (similar to tar tf):

unzip -l source.zip | head -15
Archive: source.zip Length Date Time Name --------- ---------- ----- ---- 0 2025-01-15 09:00 source/ 51200 2025-01-15 09:00 source/datafile-1.bin ... 26624 2025-01-15 09:00 source/textfile-1.txt ...
3

Extract the zip archive to a specific directory:

mkdir -p ~/lab-compress/extracted-zip unzip source.zip -d ~/lab-compress/extracted-zip/ ls ~/lab-compress/extracted-zip/source/ | wc -l
30
4

Extract a single file from a zip archive without unpacking everything:

unzip source.zip source/textfile-5.txt -d ~/lab-compress/selective/ ls ~/lab-compress/selective/source/
textfile-1.txt textfile-5.txt
5

Compare final archive sizes across all formats:

ls -lh ~/lab-compress/source.tar ~/lab-compress/source.tar.gz \ ~/lab-compress/source.tar.bz2 ~/lab-compress/source.tar.xz \ ~/lab-compress/source.zip
-rw-rw-r-- 1 user user 1.3M source.tar -rw-rw-r-- 1 user user 1.0M source.tar.gz -rw-rw-r-- 1 user user 997K source.tar.bz2 -rw-rw-r-- 1 user user 974K source.tar.xz -rw-rw-r-- 1 user user 1.0M source.zip
5

Automated Backup Pipeline

Build a dated backup script using tar, gzip, and rotation logic

ADVANCED
Real-World Application Production backup scripts combine tar with date-stamped filenames, retention policies, and logging. This exercise builds a minimal but functional version of what sysadmins run daily via cron.
1

Create a backup destination directory and examine the dated filename pattern:

mkdir -p ~/lab-compress/backups # Preview how dated filenames work echo "backup-$(date +%Y%m%d-%H%M%S).tar.gz"
backup-20250115-090730.tar.gz
2

Run a one-liner backup of the source directory with a timestamp:

BACKUP_NAME="backup-$(date +%Y%m%d-%H%M%S).tar.gz" tar czf ~/lab-compress/backups/$BACKUP_NAME ~/lab-compress/source/ 2>/dev/null echo "Created: $BACKUP_NAME" ls -lh ~/lab-compress/backups/
Created: backup-20250115-090730.tar.gz -rw-rw-r-- 1 user user 1.0M Jan 15 09:07 backup-20250115-090730.tar.gz
3

Simulate multiple backups accumulating over time (run a few times):

for i in {1..4}; do BACKUP_NAME="backup-$(date +%Y%m%d-%H%M%S).tar.gz" tar czf ~/lab-compress/backups/$BACKUP_NAME ~/lab-compress/source/ 2>/dev/null echo "Created: $BACKUP_NAME" sleep 2 done ls -lt ~/lab-compress/backups/
Created: backup-20250115-090732.tar.gz Created: backup-20250115-090734.tar.gz Created: backup-20250115-090736.tar.gz Created: backup-20250115-090738.tar.gz
4

Write a rotation script that keeps only the 3 most recent backups:

BACKUP_DIR=~/lab-compress/backups KEEP=3 # List backups sorted by time (newest first), skip the first KEEP EXCESS=$(ls -t $BACKUP_DIR/backup-*.tar.gz | tail -n +$((KEEP + 1))) if [ -n "$EXCESS" ]; then echo "Removing old backups:" echo "$EXCESS" | while read f; do echo " Deleting: $(basename $f)" rm "$f" done else echo "No old backups to remove." fi echo "" echo "Remaining backups:" ls -lh $BACKUP_DIR/
Removing old backups: Deleting: backup-20250115-090730.tar.gz Deleting: backup-20250115-090732.tar.gz Remaining backups: -rw-rw-r-- 1 user user 1.0M Jan 15 09:07 backup-20250115-090738.tar.gz -rw-rw-r-- 1 user user 1.0M Jan 15 09:07 backup-20250115-090736.tar.gz -rw-rw-r-- 1 user user 1.0M Jan 15 09:07 backup-20250115-090734.tar.gz
5

Verify backup integrity by listing its contents before any restore attempt:

LATEST=$(ls -t ~/lab-compress/backups/backup-*.tar.gz | head -1) echo "Verifying: $(basename $LATEST)" tar tzf "$LATEST" | wc -l echo "files in archive" # Test integrity (will fail on corrupt archives) tar tzf "$LATEST" >/dev/null && echo "Integrity: OK" || echo "Integrity: FAILED"
Verifying: backup-20250115-090738.tar.gz 31 files in archive Integrity: OK
Production Note This script is a learning exercise. Production backup systems (Bacula, Amanda, rsnapshot, Borg) add features like incremental backups, remote targets, encryption, and detailed logging. The concepts here — dated filenames, rotation, integrity checks — apply directly to those tools.

Lab Complete

You created archives in five formats, measured compression ratios, extracted to specific directories, and built an automated backup pipeline with rotation logic.

This lab is already marked complete.