Linux Automation | Advanced Linux Administration

Slide 1 of 35 | ALA-06 | Week 3 of 4

Linux Automation
Scheduling, Backups and Monitoring

Cron • systemd Timers • logrotate • rsync • at/batch • Monitoring Scripts

Manual intervention is a liability. Every task that runs on a schedule, every backup that fires at 02:00, every log that rotates before it fills the disk -- these run without a human. This lecture teaches you to build automation that is reliable, observable, and recoverable.

35 Slides ALA-06 Week 3 of 4 Ubuntu 22.04 LTS

Slide 2 of 35

The Automation Principle

Automate anything you do more than twice. But automate it correctly, or you have built a time bomb.

Idempotency

An automated task must be safe to run multiple times. If it fails halfway and retries, it should not corrupt state. Design every automated operation to check before acting: create only if not exists, deploy only if changed.

Observability

Automation that runs silently cannot be debugged. Every scheduled job must log its start time, completion status, and any errors. If you cannot answer "did that job run successfully last Tuesday?", it is not observable.

Failure Isolation

A failed automated task should never cascade. Backups failing should not prevent the next night's backup attempt. Log rotation failing should not fill disks for weeks. Each run is independent, and failures are reported immediately.

Two Automation Tools on Linux

cron is the classic Unix scheduler -- simple, universal, available everywhere. systemd timers are the modern replacement -- better logging, dependency awareness, and missed-run handling. Know both. Use timers for new work on systemd systems.

Slide 3 of 35

Cron: The Five-Field Schedule

Every cron expression is five fields followed by the command. Master the field order and you can schedule anything.

# Field positions:
# .--------- minute       (0-59)
# |  .------- hour         (0-23)
# |  |  .---- day of month (1-31)
# |  |  |  .- month        (1-12 or jan-dec)
# |  |  |  |  . day of week (0-7 or sun-sat; 0 and 7 = Sunday)
# |  |  |  |  |
# *  *  *  *  *  command

# Every minute
*  *  *  *  *   /usr/local/bin/heartbeat.sh

# Every day at 02:30
30 2  *  *  *   /usr/local/bin/backup.sh

# Every Monday at 04:00
0  4  *  *  1   /usr/local/bin/weekly-report.sh

# Every 15 minutes
*/15 * * * *   /usr/local/bin/check-disk.sh

# First day of every month at 00:00
0  0  1  *  *   /usr/local/bin/monthly-cleanup.sh

# Weekdays (Mon-Fri) at 08:00
0  8  *  *  1-5  /usr/local/bin/workday-start.sh

# Multiple specific hours: 06:00, 12:00, 18:00
0  6,12,18 * * *  /usr/local/bin/sync.sh

Slide 4 of 35

Crontab: User vs System vs /etc/cron.d

Cron jobs can live in multiple locations. Know which to use and why.

User Crontabs (crontab -e)

Stored in /var/spool/cron/crontabs/USER. Run as that user. No username field in the schedule. Edit with crontab -e -- never edit the spool file directly. List with crontab -l. Remove all with crontab -r.

/etc/cron.d/ Drop-ins

System cron jobs. Files placed here follow the system crontab format: they have a username field after the time fields. Packages install their cron jobs here. Maintained under version control and deployed as files.

/etc/cron.hourly|daily|weekly|monthly

Place executable scripts directly in these directories. The run-parts mechanism executes them at the corresponding interval. No time syntax needed. Filenames must not contain dots. Good for simple, frequency-based tasks.

# /etc/cron.d/sector-backups — system cron job with username field
# Format: minute hour dom month dow USERNAME command
30 2 * * *  root   /usr/local/bin/sector-backup.sh >> /var/log/backup.log 2>&1

# Check when run-parts will execute scripts in cron.daily
grep -r 'run-parts' /etc/crontab /etc/cron.d/

# View all cron jobs for a user
crontab -u deploy -l

# Validate cron syntax before saving (crontab -e opens $EDITOR)
# Use https://crontab.guru/ or: man 5 crontab

Slide 5 of 35

Cron Environment: The Most Common Source of Failures

Cron does not run your shell profile. Commands that work interactively often fail silently in cron.

Minimal PATH

Cron's default PATH is typically /usr/bin:/bin. Commands in /usr/local/bin, /usr/sbin, or user-specific directories are NOT found. Always use absolute paths or set PATH explicitly at the top of the crontab.

No Profile, No Aliases

Cron does not source ~/.bashrc, ~/.profile, or /etc/profile. Environment variables you set interactively do not exist. Functions and aliases do not exist. Scripts that depend on them will fail silently.

# Top of crontab: set environment variables explicitly
SHELL=/bin/bash
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
MAILTO=ops@sector.local      # email job output here; set to "" to silence
HOME=/root

# Always redirect output in the crontab entry itself
0 3 * * *  /usr/local/bin/backup.sh >> /var/log/backup.log 2>&1

# Suppress all output (use only when you know it works)
*/5 * * * *  /usr/local/bin/heartbeat.sh >/dev/null 2>&1

# Debug a failing cron job: run it with cron's environment manually
env -i HOME=/root PATH=/usr/bin:/bin SHELL=/bin/bash /usr/local/bin/backup.sh

Root Cause of Most Cron Failures

A script works in your terminal but not in cron. The difference is almost always PATH. Set PATH at the top of every crontab file and use absolute paths in every script called by cron.

Slide 6 of 35

systemd Timers: The Modern Scheduler

systemd timers replace cron with dependency awareness, missed-run catching, and journald logging.

Advantages Over cron

Missed runs are caught at next boot (Persistent=true). Full journald logging -- output captured automatically. Dependencies can delay execution. Precise calendar and relative scheduling. Easy enable/disable with systemctl.

Unit Pair

A timer always pairs with a service unit. The .timer unit defines when to run. The .service unit defines what to run. They share the same base name: backup.timer activates backup.service.

Two Schedule Types

Realtime (calendar) timers fire at specific wall clock times, like cron. Monotonic timers fire relative to an event: OnBootSec=5min fires 5 minutes after boot. Both types can be combined in one timer unit.

# List all active timers and their next trigger time
systemctl list-timers --all

# Enable and start a timer
systemctl enable --now backup.timer

# Check timer status and last run time
systemctl status backup.timer

# View output from the last run of the associated service
journalctl -u backup.service -n 50

Slide 7 of 35

Writing a systemd Timer Unit

Two files: a service unit that does the work, and a timer unit that schedules it.

# /etc/systemd/system/sector-backup.service
[Unit]
Description=Sector Node Backup
After=network-online.target
Wants=network-online.target

[Service]
Type=oneshot
User=backup
ExecStart=/usr/local/bin/sector-backup.sh
StandardOutput=journal
StandardError=journal
# SyslogIdentifier makes filtering in journalctl easy
SyslogIdentifier=sector-backup

# /etc/systemd/system/sector-backup.timer
[Unit]
Description=Daily sector backup at 02:30

[Timer]
# Calendar schedule (same expressive power as cron, clearer syntax)
OnCalendar=*-*-* 02:30:00
# Catch up on missed runs (e.g., server was off at 02:30)
Persistent=true
# Randomize within 15 minutes to avoid thundering herd on multiple nodes
RandomizedDelaySec=15min

[Install]
WantedBy=timers.target

# Deploy and activate
systemctl daemon-reload
systemctl enable --now sector-backup.timer
systemctl list-timers sector-backup.timer

Slide 8 of 35

Timer Calendar Syntax

systemd calendar expressions are more readable than cron -- and can be validated before deployment.

# Validate any calendar expression (prints next trigger times)
systemd-analyze calendar '*-*-* 02:30:00'
systemd-analyze calendar 'Mon *-*-* 08:00:00'
systemd-analyze calendar 'weekly'

# Common calendar expressions
OnCalendar=daily               # *-*-* 00:00:00
OnCalendar=weekly              # Mon *-*-* 00:00:00
OnCalendar=monthly             # *-*-01 00:00:00
OnCalendar=hourly              # *-*-* *:00:00
OnCalendar='*:0/15'            # every 15 minutes
OnCalendar='Mon..Fri *-*-* 09:00'  # weekdays at 09:00
OnCalendar='*-*-1,15 02:00'   # 1st and 15th of every month

# Monotonic (relative) triggers
OnBootSec=2min                 # 2 minutes after boot
OnUnitActiveSec=1h             # 1 hour after the last activation
OnStartupSec=30s               # 30 seconds after systemd started

systemd-analyze calendar

Always run systemd-analyze calendar 'your expression' before committing a timer unit. It shows the next 10 trigger times so you can confirm the schedule is exactly what you intended before it goes near production.

Slide 9 of 35

logrotate: Preventing Disk Exhaustion

logrotate manages log files automatically: rotating, compressing, removing old files, and reloading daemons.

How It Works

logrotate runs daily (via cron or systemd timer). It reads config files in /etc/logrotate.d/. For each configured log, it renames the current file, creates a new empty one, optionally compresses older copies, and deletes files beyond the retention count.

Daemon Reload

Many daemons hold a file descriptor open to their log. After rotation, they would keep writing to the renamed file. The postrotate or sharedscripts directive sends a signal (often SIGHUP) to the daemon to reopen its log file handle.

# /etc/logrotate.d/sector-app
/var/log/sector/app.log /var/log/sector/worker.log {
    daily                   # rotate frequency
    rotate 14               # keep 14 rotated files
    compress                # compress rotated files with gzip
    delaycompress           # compress the previous rotation, not the current
    missingok               # do not error if log file does not exist
    notifempty              # skip rotation if the file is empty
    create 0640 sectorapp adm  # permissions and owner for new log file
    dateext                 # add date to rotated filename: app.log-20260409
    sharedscripts           # run postrotate once for all logs in this block
    postrotate
        systemctl reload sector-app 2>/dev/null || true
    endscript
}

Slide 10 of 35

logrotate: Advanced Configuration

Size-based rotation, copytruncate for open file handles, and manual testing.

# Size-based rotation (not time-based)
/var/log/nginx/access.log {
    size 100M               # rotate when file reaches 100MB
    rotate 5
    compress
    copytruncate            # copy then truncate -- no daemon reload needed
    # WARNING: copytruncate has a brief race window -- use postrotate when possible
}

# Hourly rotation for high-volume logs
/var/log/sector/audit.log {
    hourly
    rotate 168              # 168 hours = 7 days of hourly files
    compress
    delaycompress
    missingok
    notifempty
    postrotate
        kill -USR1 $(cat /var/run/sector-audit.pid 2>/dev/null) 2>/dev/null || true
    endscript
}

# Test logrotate config without actually rotating
logrotate --debug /etc/logrotate.d/sector-app

# Force rotation immediately (for testing postrotate scripts)
logrotate --force /etc/logrotate.d/sector-app

# Check logrotate status (last run times per log)
cat /var/lib/logrotate/status | grep sector

Slide 11 of 35

rsync: Efficient File Synchronization

rsync transfers only what changed. It is the backbone of Linux backup, deployment, and replication workflows.

# Core flags you will use on every rsync call
# -a  archive mode: -rlptgoD (recursive, preserve links/perms/times/owner/group/devices)
# -v  verbose (add to see what was transferred)
# -z  compress in transit (useful on slow links)
# --delete  remove files from destination not in source (true sync)
# -n / --dry-run  show what would change without changing anything

# Local sync
rsync -av /opt/sector/ /mnt/backup/sector/

# Remote sync over SSH
rsync -avz --delete /opt/sector/ backup-node:/mnt/backup/sector/

# Dry run first -- always
rsync -avn --delete /opt/sector/ backup-node:/mnt/backup/sector/

# Show progress and transfer statistics
rsync -av --progress --stats /data/ /mnt/backup/data/

# Exclude directories
rsync -av --exclude='*.tmp' --exclude='.git/' /opt/sector/ /mnt/backup/sector/

# Bandwidth limit (useful for background sync, units: KB/s)
rsync -avz --bwlimit=5000 /data/ backup-node:/backup/data/

Slide 12 of 35

rsync Backup Patterns: Incremental with --link-dest

The --link-dest pattern creates full backup snapshots at incremental cost using hard links.

You want 30 days of daily backups, but storing 30 full copies is prohibitively expensive. The --link-dest pattern solves this: each backup appears to be a full copy, but unchanged files are hard links to the previous backup. Storage cost is proportional to change rate, not total size.

#!/usr/bin/env bash
# incremental-backup.sh -- daily snapshot backup with hard-link deduplication
set -euo pipefail

SRC="/opt/sector"
DEST="/mnt/backup"
TODAY="$(date +%Y-%m-%d)"
SNAPSHOT="${DEST}/snapshots/${TODAY}"
LATEST="${DEST}/latest"

mkdir -p "$SNAPSHOT"

# --link-dest: files unchanged since LATEST are hard-linked, not copied
rsync -av --delete \
    --link-dest="${LATEST}" \
    "${SRC}/" "${SNAPSHOT}/"

# Update the 'latest' symlink to point to today's snapshot
ln -sfn "${SNAPSHOT}" "${LATEST}"

# Remove snapshots older than 30 days
find "${DEST}/snapshots" -maxdepth 1 -type d -mtime +30 -exec rm -rf {} +

echo "Backup complete: ${SNAPSHOT}"

Slide 13 of 35

rsync over SSH: Key Auth and Restricted Shell

Automated rsync must authenticate without a password. Use SSH keys with forced commands for least-privilege access.

# Generate a dedicated key for backup automation (no passphrase)
ssh-keygen -t ed25519 -C "backup-auto@sector" -f /root/.ssh/id_backup -N ""

# On the backup destination server: restrict the key to rsync only
# ~/.ssh/authorized_keys entry:
# command="rsync --server --daemon .",from="10.0.0.0/8" ssh-ed25519 AAAA... backup-auto@sector

# Custom SSH options for the rsync call
rsync -av -e "ssh -i /root/.ssh/id_backup -o StrictHostKeyChecking=yes -p 2222" \
    /opt/sector/ backup-node:/backup/sector/

# rsync with SSH config alias (/root/.ssh/config)
# Host backup-node
#     HostName 10.0.1.50
#     User backup
#     IdentityFile /root/.ssh/id_backup
#     Port 2222
rsync -av /opt/sector/ backup-node:/backup/sector/

Principle of Least Privilege

The backup key should be able to do exactly one thing: run rsync on the destination. The forced command in authorized_keys ensures that even if the private key is compromised, the attacker can only run rsync -- not an interactive shell.

Slide 14 of 35

at and batch: One-Shot Deferred Execution

For tasks that need to run once in the future -- not on a recurring schedule -- at is the right tool.

at: Schedule at a Specific Time

at executes a command once at a future time. After the job runs, it is removed from the queue. The job inherits the current environment, unlike cron. Output is mailed or captured. Useful for: scheduled maintenance, deferred deployment, one-time reminders.

batch: Run When Load Allows

batch runs commands when system load drops below 1.5 (configurable). It is at with load-sensitive scheduling. Use it for resource-intensive tasks you want to defer during peak hours without specifying an exact time.

# Schedule a command at a specific time
echo '/usr/local/bin/maintenance.sh >> /var/log/maint.log 2>&1' | at 02:30

# Human-readable time expressions
at 4pm tomorrow
at noon next friday
at now + 2 hours
at 2026-04-10 08:00

# Interactive mode (type commands, Ctrl+D to submit)
at 03:00 <<'EOF'
apt-get update -qq
apt-get upgrade -y
systemctl reboot
EOF

# View pending at jobs
atq

# View the commands in a job (job number from atq)
at -c 3

# Remove a pending job
atrm 3

# batch: defer heavy work to low-load time
echo '/usr/local/bin/reindex-db.sh' | batch

Slide 15 of 35

Monitoring Script: Disk Usage Alerting

A production-grade disk usage monitor that alerts on threshold breach and details which paths are responsible.

#!/usr/bin/env bash
# disk-monitor.sh — alert when any filesystem exceeds threshold
set -euo pipefail

WARN_THRESHOLD=80
CRIT_THRESHOLD=90
ALERT_EMAIL="ops@sector.local"
HOSTNAME="$(hostname -f)"

df -h --output=source,pcent,target | tail -n +2 | while IFS= read -r line; do
    PCT="$(echo "$line" | awk '{gsub(/%/,"",$2); print $2}')"
    MOUNT="$(echo "$line" | awk '{print $3}')"
    DEV="$(echo "$line"  | awk '{print $1}')"

    if (( PCT >= CRIT_THRESHOLD )); then
        BODY="CRITICAL: ${MOUNT} on ${HOSTNAME} is at ${PCT}% (${DEV})\n"
        BODY+="$(du -sh ${MOUNT}/* 2>/dev/null | sort -rh | head -10)"
        echo -e "$BODY" | mail -s "[CRITICAL] Disk ${PCT}% on ${HOSTNAME}:${MOUNT}" "$ALERT_EMAIL"
    elif (( PCT >= WARN_THRESHOLD )); then
        logger -t disk-monitor -p local0.warning \
            "WARN: ${MOUNT} at ${PCT}% on ${DEV}"
    fi
done

Slide 16 of 35

Monitoring Script: Process Watchdog

Ensure critical processes are running and restart them if they die. A lightweight alternative to systemd for legacy processes.

#!/usr/bin/env bash
# watchdog.sh — monitor critical processes, restart if down
set -euo pipefail

# Array of: "process-name:start-command"
declare -A PROCESSES
PROCESSES["nginx"]="systemctl start nginx"
PROCESSES["sector-worker"]="/opt/sector/bin/worker --daemonize"
PROCESSES["redis-server"]="systemctl start redis"

MAX_RESTARTS=3
STATE_DIR="/var/run/watchdog"
mkdir -p "$STATE_DIR"

for proc in "${!PROCESSES[@]}"; do
    STATE_FILE="${STATE_DIR}/${proc}.restarts"
    RESTARTS="$(cat "$STATE_FILE" 2>/dev/null || echo 0)"

    if ! pgrep -x "$proc" >/dev/null 2>&1; then
        if (( RESTARTS < MAX_RESTARTS )); then
            echo "[WARN] $proc down -- restarting (attempt $((RESTARTS+1)))"
            eval "${PROCESSES[$proc]}"
            echo "$((RESTARTS+1))" > "$STATE_FILE"
        else
            echo "[CRIT] $proc down -- max restarts reached, alerting"
            logger -t watchdog -p local0.crit "$proc failed after $RESTARTS restarts"
        fi
    else
        echo 0 > "$STATE_FILE"   # reset counter when process is healthy
    fi
done

Slide 17 of 35

Monitoring Script: Network Connectivity

Check that critical endpoints are reachable and services are responding at the protocol level.

#!/usr/bin/env bash
# net-check.sh — verify reachability and service response
set -euo pipefail

declare -A ENDPOINTS
ENDPOINTS["gateway"]="10.0.0.1:icmp"
ENDPOINTS["web-lb"]="10.0.1.10:443"
ENDPOINTS["db-primary"]="10.0.1.20:5432"
ENDPOINTS["api-health"]="https://api.sector.local/health:http"

FAIL=0

check_tcp() {
    local host="$1" port="$2"
    timeout 3 bash -c "echo >/dev/tcp/${host}/${port}" 2>/dev/null
}

check_http() {
    local url="$1"
    curl -sf --max-time 5 "$url" >/dev/null
}

for name in "${!ENDPOINTS[@]}"; do
    spec="${ENDPOINTS[$name]}"
    case "$spec" in
        *:icmp) ping -c1 -W2 "${spec%:icmp}" >/dev/null 2>&1 || { echo "FAIL: $name"; FAIL=1; } ;;
        https://*:http) check_http "${spec%:http}"          || { echo "FAIL: $name"; FAIL=1; } ;;
        *:*) check_tcp "${spec%:*}" "${spec##*:}"          || { echo "FAIL: $name"; FAIL=1; } ;;
    esac
done

(( FAIL > 0 )) && { logger -t net-check -p local0.error "$FAIL endpoint(s) unreachable"; exit 1; }
echo "All endpoints OK"

Slide 18 of 35

Monitoring Script: Log Anomaly Detection

Scan logs for patterns that indicate problems and report them before users notice.

#!/usr/bin/env bash
# log-anomaly.sh — scan recent logs for error patterns
set -euo pipefail

SINCE="1 hour ago"
PATTERNS=(
    "CRITICAL" "OOM killer" "segfault"
    "Connection refused" "disk quota exceeded"
    "authentication failure" "permission denied"
)
REPORT=""

for pattern in "${PATTERNS[@]}"; do
    COUNT="$(journalctl --since="$SINCE" -q 2>/dev/null \
        | grep -ci "$pattern" || echo 0)"

    if (( COUNT > 0 )); then
        REPORT+="[${COUNT}x] ${pattern}\n"
        # Show the most recent 3 occurrences for context
        REPORT+="$(journalctl --since="$SINCE" -q 2>/dev/null \
            | grep -i "$pattern" | tail -3 | sed 's/^/   /')\n\n"
    fi
done

if [[ -n "$REPORT" ]]; then
    echo -e "[ANOMALY REPORT] $(hostname) @ $(date)\n$REPORT" | \
        mail -s "[Log Anomaly] $(hostname)" ops@sector.local
fi

Slide 19 of 35

flock: Prevent Concurrent Cron Runs

A slow job that runs longer than its schedule interval will overlap with the next run. flock prevents this.

Your backup script takes 35 minutes to run. The cron job fires every 30 minutes. Without locking, two instances run simultaneously, fighting over the same files. flock acquires an exclusive lock: if the first run is still in progress, the second run exits immediately.

# flock approach 1: wrap in crontab
*/30 * * * *  flock -n /var/lock/backup.lock /usr/local/bin/backup.sh

# -n = non-blocking (exit 1 immediately if lock not available)
# Default is blocking (wait for lock to be released)

# flock approach 2: inside the script (preferred for complex scripts)
#!/usr/bin/env bash
set -euo pipefail
LOCK_FILE="/var/lock/backup.lock"

# Open file descriptor 200 and acquire exclusive lock
exec 200>"$LOCK_FILE"
flock -n 200 || { echo "Another instance is running. Exiting."; exit 0; }

# Lock is held for the remainder of the script
# Automatically released when script exits (fd 200 closes)
echo "$BASHPID" >&200   # write PID into lock file for debugging

# ... rest of script

Slide 20 of 35

anacron: Scheduling for Non-24/7 Systems

cron misses jobs when the system is off. anacron catches up on missed daily/weekly/monthly jobs at next boot.

How anacron Works

anacron reads /etc/anacrontab and tracks the last run time in /var/spool/anacron/. On startup, if a job has not run within its period, anacron runs it after a random delay. This prevents simultaneous execution on boot across many systems.

Use Case

Laptops, dev workstations, VMs that are frequently suspended or shut down. Also useful for batch jobs on servers where you care that the job runs at least once per day, not necessarily at exactly 02:00.

# /etc/anacrontab format:
# period(days)  delay(minutes)  job-id  command
1  5   daily-backup      /usr/local/bin/backup.sh
7  10  weekly-report     /usr/local/bin/weekly-report.sh
30 15  monthly-cleanup   /usr/local/bin/cleanup.sh

# View last run times
ls -la /var/spool/anacron/
cat /var/spool/anacron/daily-backup    # contains date of last run

# Force anacron to run all due jobs immediately (testing)
anacron -f -d                          # -f force, -d debug output

# Check if anacron is installed and active on this system
systemctl status cron | grep -i anacron

Slide 21 of 35

Backup Verification: Backups Are Worthless Until Tested

A backup that cannot be restored is not a backup. Automate verification as rigorously as the backup itself.

#!/usr/bin/env bash
# verify-backup.sh -- confirm backup integrity and restore readiness
set -euo pipefail

BACKUP_DIR="/mnt/backup/snapshots/latest"
VERIFY_LOG="/var/log/backup-verify.log"
FAIL=0

log() { echo "$(date +%T) $*" | tee -a "$VERIFY_LOG"; }

# Check 1: backup directory exists and is recent
[[ -d "$BACKUP_DIR" ]] || { log "FAIL: backup dir missing"; FAIL=1; }
AGE=$(( ($(date +%s) - $(stat -c %Y "$BACKUP_DIR")) / 3600 ))
(( AGE > 26 )) && { log "FAIL: backup is ${AGE}h old -- expected <26h"; FAIL=1; }

# Check 2: critical files are present
for f in /etc/passwd /etc/nginx/nginx.conf /opt/sector/config.json; do
    [[ -f "${BACKUP_DIR}${f}" ]] || { log "FAIL: missing ${f} in backup"; FAIL=1; }
done

# Check 3: compare checksums of key configs
sha256sum /etc/nginx/nginx.conf > /tmp/live.sha256
sha256sum "${BACKUP_DIR}/etc/nginx/nginx.conf" >> /tmp/live.sha256
awk 'NR==1{ck=$1} NR==2{if($1!=ck) print "CHECKSUM MISMATCH: nginx.conf"}' /tmp/live.sha256

(( FAIL > 0 )) && logger -t backup-verify -p local0.crit "Backup verification FAILED"
log "Verification complete: FAIL=$FAIL"
exit $FAIL

Slide 22 of 35

Cron Security: Least Privilege for Scheduled Jobs

Running cron jobs as root is a common misconfiguration. Scope every job to the minimum required privilege.

Running Everything as Root

If a cron script contains a bug or is modified by an attacker, running as root means unrestricted system access. A backup script does not need root. A log rotation script does not need root. Audit and minimize.

Dedicated Service Accounts

Create dedicated accounts: backup, monitor, deploy. Grant them only the permissions required using sudo rules, file ACLs, or group membership. If the account is compromised, blast radius is contained.

# Create a dedicated backup user with no shell login
useradd -r -s /usr/sbin/nologin -m backup

# Grant backup user read access to required directories
setfacl -R -m u:backup:rX /opt/sector
setfacl -m u:backup:rX /etc/nginx

# Allow backup user to run rsync as root without password (sudoers)
# /etc/sudoers.d/backup:
# backup ALL=(root) NOPASSWD: /usr/bin/rsync --server *

# /etc/cron.d/sector-backups (using explicit user field)
30 2 * * *  backup  /usr/local/bin/backup.sh >> /var/log/backup.log 2>&1

# Restrict crontab access: /etc/cron.allow lists permitted users
# /etc/cron.deny lists blocked users

Slide 23 of 35

systemd Timers: Advanced Patterns

Resource limits, dependencies, and transient timers for one-shot scheduled operations.

# Resource-limited service unit
# /etc/systemd/system/sector-cleanup.service
[Service]
Type=oneshot
User=cleanup-svc
ExecStart=/usr/local/bin/cleanup.sh
# CPU: maximum 20% of one core
CPUQuota=20%
# Memory: kill if it tries to use more than 512MB
MemoryMax=512M
# Nice level: low priority, yield to other processes
Nice=15
# I/O: low priority scheduling class
IOSchedulingClass=idle
# Prevent writing outside /tmp and /var/log/sector
ReadWritePaths=/var/log/sector /tmp
ProtectSystem=strict

# Transient timer: run once at a specific time, then gone (no unit file needed)
systemd-run --on-calendar='2026-04-10 03:00' \
    --unit=one-shot-reboot \
    /sbin/reboot

# Transient relative timer: 30 minutes from now
systemd-run --on-active=30m \
    --description="Deferred maintenance" \
    /usr/local/bin/maintenance.sh

# View transient jobs
systemctl list-timers

Slide 24 of 35

logger: Scripts Writing to syslog

Use logger to write structured messages from scripts into the system log, where they can be monitored and forwarded.

# Basic usage: write to syslog with a custom tag
logger -t backup "Backup started for /opt/sector"

# Specify facility and priority (facility.priority)
logger -p local0.info    -t backup "Backup completed successfully"
logger -p local0.warning -t backup "Backup completed with 3 skipped files"
logger -p local0.err     -t backup "Backup FAILED: rsync exited 23"
logger -p local0.crit    -t backup "Backup destination unreachable"

# Structured key=value logging (machine-parseable)
logger -t sector-monitor \
    "event=disk_warn host=$(hostname) mount=/var/log usage_pct=87"

# Log stdin (pipe output of a command directly to syslog)
rsync -av /data/ /backup/ 2>&1 | logger -t rsync-backup

# See messages written by logger
journalctl -t backup -n 20
journalctl -t sector-monitor --since "1 hour ago"

# Filter by facility in rsyslog (routes local0.* to a dedicated log)
# /etc/rsyslog.d/sector.conf: local0.* /var/log/sector/automation.log

Slide 25 of 35

Testing Automation: Dry Runs and Staging

Never run a new automated job directly on production. Test it at every level before scheduling it.

Dry Run Mode

Build a -d / --dry-run flag into every automation script. When active, it logs what it would do without doing it. Run in dry-run mode on production first to validate the script sees the right environment.

Manual Trigger Test

Before scheduling with cron or a timer, run the script manually in the exact environment it will have: as the correct user, with the cron PATH, without your shell profile. If it works manually in that context, it will work scheduled.

Log Review

After the first few scheduled runs, review the logs in detail. Check runtime, output, exit codes. A job that succeeds but takes 10x longer than expected is a problem waiting to happen. Set up alerting on non-zero exit codes from day one.

# Run a cron job manually with cron's environment (replicate the context)
sudo -u backup env -i HOME=/home/backup \
    PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \
    SHELL=/bin/bash \
    /usr/local/bin/backup.sh

# Force a systemd timer to fire now (without waiting for schedule)
systemctl start sector-backup.service   # runs the service immediately
journalctl -u sector-backup.service -f  # watch its output

Slide 26 of 35

Cron Best Practices: Production Standards

Rules that separate cron jobs that work reliably for years from ones that cause incidents.

1Set SHELL, PATH, and MAILTO at the top of every crontab or cron.d file. Never assume defaults.

2Redirect all output explicitly: >> /var/log/jobname.log 2>&1. Never let output silently disappear to sendmail or /dev/null without having tested it first.

3Use flock for any job whose duration might exceed its schedule interval. Concurrent runs of the same job are a production hazard.

4Never run jobs as root unless root access is truly required. Create service accounts and use the username field in /etc/cron.d/ entries.

5Spread scheduled jobs with random offsets. Do not run 20 jobs at exactly midnight. Stagger them across the window to avoid I/O and CPU contention.

6Store cron job scripts in version control. Deploying cron jobs by hand is untraceable. If the server is rebuilt, all manually-added jobs are lost.

7Alert on non-zero exit codes. A script that runs but fails silently every night is worse than one that never runs, because you think it is working.

Slide 27 of 35 | Applied Automation

Applied: Complete Backup Automation System

Combining rsync, systemd timer, flock, logging, and verification into a single coherent system.

sector-backup.service

Type=oneshot. User=backup. ExecStart=/usr/local/bin/sector-backup.sh. StandardOutput and StandardError both to journal. SyslogIdentifier=sector-backup. MemoryMax=1G. CPUQuota=40%.

sector-backup.timer

OnCalendar=*-*-* 02:30:00. Persistent=true (catch missed runs). RandomizedDelaySec=5min (prevent multi-node thundering herd). WantedBy=timers.target.

# sector-backup.sh: the actual work, called by the service
flock -n /var/lock/sector-backup.lock || { logger -t sector-backup "already running"; exit 0; }
rsync -av --delete --link-dest=/mnt/backup/latest \
    /opt/sector/ /mnt/backup/snapshots/"$(date +%F)"/
ln -sfn /mnt/backup/snapshots/"$(date +%F)" /mnt/backup/latest
find /mnt/backup/snapshots -maxdepth 1 -type d -mtime +30 -exec rm -rf {} +

# sector-backup-verify.timer fires 30 minutes after backup completes
# OnUnitActiveSec=30m  # fires 30 minutes after sector-backup.service ran

# Monitor: check last run time from systemd
systemctl show sector-backup.service --property=ExecMainExitTimestamp

Slide 28 of 35

rsync Daemon Mode: Pull-Based Replication

When SSH is not available or you need high-performance native rsync protocol, run rsyncd on the destination.

# /etc/rsyncd.conf on the destination/backup server
uid = backup
gid = backup
use chroot = yes
max connections = 4
log file = /var/log/rsyncd.log
pid file = /var/run/rsyncd.pid

[sector-data]
    path = /mnt/backup/sector
    comment = Sector node data backup
    read only = no
    hosts allow = 10.0.1.0/24
    auth users = sector-backup
    secrets file = /etc/rsyncd.secrets
    # /etc/rsyncd.secrets: sector-backup:s3cr3tpassword  (chmod 600)

# Client: rsync to daemon (use :: double colon for daemon protocol)
rsync -av --password-file=/etc/sector/rsync.pass \
    /opt/sector/ backup-node::sector-data/

# Start rsyncd as a systemd service
systemctl enable --now rsync

# Or: launch from xinetd for on-demand connections (legacy environments)
# Or: run with --daemon flag for standalone operation
rsync --daemon --config=/etc/rsyncd.conf

Slide 29 of 35

inotifywait: Event-Driven Automation

React to filesystem events in real time without polling. When a file changes, act immediately.

# Install: apt install inotify-tools

# Wait for a single event (blocking)
inotifywait -e close_write /etc/nginx/nginx.conf
# Blocks until the file is written and closed
systemctl reload nginx

# Watch a directory recursively for any file modification
inotifywait -m -r -e modify,create,delete /etc/sector/

# Event-driven deployment: sync whenever configs change
inotifywait -m -r -e close_write \
    --format '%T %w%f' --timefmt '%F %T' \
    /etc/sector/configs/ | while read -r ts path; do
        echo "[$ts] Changed: $path -- redeploying"
        /usr/local/bin/deploy-config.sh "$path"
done

# Common events to watch for
# modify     — file content changed
# create     — new file created
# delete     — file removed
# moved_to   — file moved into watched dir
# close_write — file closed after writing (safer than modify for deployments)

Slide 30 of 35

Performance Automation: Scheduled Tuning Scripts

Some performance parameters are workload-dependent. Automate adjustment based on observed load.

#!/usr/bin/env bash
# adaptive-tuner.sh -- adjust kernel parameters based on observed load
set -euo pipefail

LOAD="$(awk '{print int($1)}' /proc/loadavg)"
NCPU="$(nproc)"
LOAD_PER_CPU=$(( LOAD * 100 / NCPU ))   # load as % of capacity

if (( LOAD_PER_CPU > 150 )); then
    # High load: increase I/O scheduler time slice for throughput
    echo 256 > /sys/block/sda/queue/nr_requests
    echo "mq-deadline" > /sys/block/sda/queue/scheduler
    logger -t adaptive-tuner "High load (${LOAD_PER_CPU}%) -- throughput mode"
elif (( LOAD_PER_CPU < 30 )); then
    # Low load: reduce latency for interactive workloads
    echo 64 > /sys/block/sda/queue/nr_requests
    echo "bfq" > /sys/block/sda/queue/scheduler
    logger -t adaptive-tuner "Low load (${LOAD_PER_CPU}%) -- latency mode"
fi

# Schedule this every 5 minutes via cron or systemd timer
# */5 * * * *  root  flock -n /var/lock/tuner.lock /usr/local/bin/adaptive-tuner.sh

Slide 31 of 35

Alerting: Email, Webhook, and Slack from Scripts

Automation is only as good as its alerting. Scripts must notify humans when action is required.

# Email via mail command (requires postfix/sendmail configured)
mail -s "[ALERT] Backup failed on $(hostname)" ops@sector.local <<'EOF'
Backup script failed at 02:31.
Last successful backup: check /var/log/backup.log
Action required: verify /mnt/backup is mounted and accessible
EOF

# Webhook alert (Slack, PagerDuty, Mattermost) via curl
send_alert() {
    local msg="$1"
    curl -sf -X POST "${WEBHOOK_URL}" \
        -H 'Content-Type: application/json' \
        -d "{\"text\": \"[${HOSTNAME}] ${msg}\"}" >/dev/null
}

# Integrate alert into any script
run_backup || send_alert "Backup FAILED -- exit code $?"

# Throttle alerts: write a timestamp file, alert only if not alerted recently
ALERT_FILE="/var/run/backup-alert.ts"
LAST="$(cat "$ALERT_FILE" 2>/dev/null || echo 0)"
NOW="$(date +%s)"
if (( NOW - LAST > 3600 )); then   # alert at most once per hour
    send_alert "Backup FAILED"
    echo "$NOW" > "$ALERT_FILE"
fi

Slide 32 of 35

Automation Inventory: Know What Runs on Your System

Most production systems accumulate decades of scheduled jobs. Audit them before you inherit an incident.

# Comprehensive cron audit: all sources on the system
echo "=== /etc/crontab ==="; cat /etc/crontab

echo "=== /etc/cron.d/ ==="; ls -la /etc/cron.d/ && cat /etc/cron.d/*

echo "=== User crontabs ==="
for user in "$(cut -d: -f1 /etc/passwd)"; do
    CTAB="$(crontab -u "$user" -l 2>/dev/null)"
    [[ -n "$CTAB" ]] && { echo "-- $user --"; echo "$CTAB"; }
done

echo "=== /etc/cron.{hourly,daily,weekly,monthly} ==="
for dir in hourly daily weekly monthly; do
    echo "-- cron.$dir --"
    ls -la /etc/cron."$dir"/ 2>/dev/null
done

echo "=== systemd timers ==="; systemctl list-timers --all

Slide 33 of 35

Troubleshooting: Why Is the Cron Job Not Running?

A systematic diagnostic process for cron jobs that silently fail to execute.

1Check that cron is running: systemctl status cron. If cron is dead, nothing runs.

2Check syslog for execution evidence: grep CRON /var/log/syslog | tail -20. Cron logs every execution attempt.

3Verify the script is executable: ls -la /usr/local/bin/myscript.sh. The x bit must be set.

4Run the script manually as the cron user with cron's PATH: sudo -u cronuser env -i PATH=/usr/bin:/bin SHELL=/bin/bash /path/to/script.sh

5Check MAILTO: if it is set to a user, output is emailed. If that user has no mail, it silently discards output. Set MAILTO="" and redirect to a log file instead.

6Check /etc/cron.allow and /etc/cron.deny. If cron.allow exists and the user is not in it, the user cannot run cron jobs.

Slide 34 of 35

Scheduler Comparison: Pick the Right Tool

cron, systemd timers, at, anacron, and inotifywait each solve a distinct scheduling problem.

cron

Recurring jobs on servers that run 24/7. Universal -- on every Linux system. Simple syntax. Use for jobs that have been working well for years and do not need advanced features.

systemd timers

New recurring jobs on systemd systems. Better logging (journald), missed-run catching, resource limits, dependency ordering. The default choice for anything new on Ubuntu 22.04+.

at / batch

One-shot future execution. Use at for scheduled maintenance windows or deferred deployments. Use batch for heavy jobs you want to run during low-load periods without picking an exact time.

anacron

Systems that are not always on. Laptops, intermittently-running VMs, dev workstations. Catches up on missed daily/weekly/monthly jobs at next boot.

inotifywait

Event-driven, not time-driven. Use when you need to react immediately when a file changes. Config-reload triggers, hot-deploy systems, file-drop processing pipelines.

Slide 35 of 35 | ALA-06 Summary

Linux Automation: What You Now Know

A system that relies on humans for recurring tasks is a liability. Every cell operator is now equipped to build automation that runs reliably in the background, catches missed executions, alerts when things go wrong, and leaves detailed logs for every operation.

8 Facts to Carry Out of This Lecture

1Cron fields: minute, hour, day-of-month, month, day-of-week. Set PATH and SHELL at the top of every crontab.

2Cron does not source shell profiles. Commands not in /usr/bin:/bin need absolute paths or an explicit PATH setting.

3systemd timers: pair a .service with a .timer. Use Persistent=true to catch missed runs. Validate schedules with systemd-analyze calendar.

4flock -n /var/lock/job.lock prevents concurrent execution when a job runs longer than its interval.

5rsync --link-dest creates full-snapshot incremental backups using hard links. Storage cost is proportional to change rate only.

6logrotate prevents disk exhaustion. Use postrotate to reload daemons. Use copytruncate only when you cannot reload the writing process.

7logger -t tag -p facility.priority 'message' writes to syslog from any script. Enables central log collection and monitoring.

8Run automated jobs in dry-run mode first. Test in cron's exact environment. Alert on non-zero exit codes. Review logs after the first real runs.