Linux Automation | Advanced Linux Administration

Slide 1 of 35  |  ALA-06  |  Week 3 of 8
Linux Automation
Scheduling, Backups and Monitoring
Cron  •  systemd Timers  •  logrotate  •  rsync  •  at/batch  •  Monitoring Scripts
Manual intervention is a liability. Every task that runs on a schedule, every backup that fires at 02:00, every log that rotates before it fills the disk -- these run without a human. This lecture teaches you to build automation that is reliable, observable, and recoverable.
35 Slides ALA-06 Week 3 of 8 Ubuntu 22.04 LTS
Slide 2 of 35
The Automation Principle
Automate anything you do more than twice. But automate it correctly, or you have built a time bomb.
Idempotency
An automated task must be safe to run multiple times. If it fails halfway and retries, it should not corrupt state. Design every automated operation to check before acting: create only if not exists, deploy only if changed.
Observability
Automation that runs silently cannot be debugged. Every scheduled job must log its start time, completion status, and any errors. If you cannot answer "did that job run successfully last Tuesday?", it is not observable.
Failure Isolation
A failed automated task should never cascade. Backups failing should not prevent the next night's backup attempt. Log rotation failing should not fill disks for weeks. Each run is independent, and failures are reported immediately.
Task to Automate One-Shot? at / batch Recurring? cron / systemd timer Event-Driven? inotifywait
Two Automation Tools on Linux
cron is the classic Unix scheduler -- simple, universal, available everywhere. systemd timers are the modern replacement -- better logging, dependency awareness, and missed-run handling. Know both. Use timers for new work on systemd systems.
Slide 3 of 35
Cron: The Five-Field Schedule
Every cron expression is five fields followed by the command. Master the field order and you can schedule anything.
MIN 0-59 HOUR 0-23 DOM 1-31 MON 1-12 DOW 0-7 COMMAND * * * * * /path/to/command
# Field positions: # .--------- minute (0-59) # | .------- hour (0-23) # | | .---- day of month (1-31) # | | | .- month (1-12 or jan-dec) # | | | | . day of week (0-7 or sun-sat; 0 and 7 = Sunday) # | | | | | # * * * * * command # Every minute * * * * * /usr/local/bin/heartbeat.sh # Every day at 02:30 30 2 * * * /usr/local/bin/backup.sh # Every Monday at 04:00 0 4 * * 1 /usr/local/bin/weekly-report.sh # Every 15 minutes */15 * * * * /usr/local/bin/check-disk.sh # First day of every month at 00:00 0 0 1 * * /usr/local/bin/monthly-cleanup.sh # Weekdays (Mon-Fri) at 08:00 0 8 * * 1-5 /usr/local/bin/workday-start.sh # Multiple specific hours: 06:00, 12:00, 18:00 0 6,12,18 * * * /usr/local/bin/sync.sh
Slide 4 of 35
Crontab: User vs System vs /etc/cron.d
Cron jobs can live in multiple locations. Know which to use and why.
crond crontab -e /var/spool/cron/ /etc/cron.d/ system drop-ins cron.daily/ run-parts scripts
User Crontabs (crontab -e)
Stored in /var/spool/cron/crontabs/USER. Run as that user. No username field in the schedule. Edit with crontab -e -- never edit the spool file directly. List with crontab -l. Remove all with crontab -r.
/etc/cron.d/ Drop-ins
System cron jobs. Files placed here follow the system crontab format: they have a username field after the time fields. Packages install their cron jobs here. Maintained under version control and deployed as files.
/etc/cron.hourly|daily|weekly|monthly
Place executable scripts directly in these directories. The run-parts mechanism executes them at the corresponding interval. No time syntax needed. Filenames must not contain dots. Good for simple, frequency-based tasks.
# /etc/cron.d/sector-backups — system cron job with username field # Format: minute hour dom month dow USERNAME command 30 2 * * * root /usr/local/bin/sector-backup.sh >> /var/log/backup.log 2>&1 # Check when run-parts will execute scripts in cron.daily grep -r 'run-parts' /etc/crontab /etc/cron.d/ # View all cron jobs for a user crontab -u deploy -l # Validate cron syntax before saving (crontab -e opens $EDITOR) # Use https://crontab.guru/ or: man 5 crontab
Slide 5 of 35
Cron Environment: The Most Common Source of Failures
Cron does not run your shell profile. Commands that work interactively often fail silently in cron.
Minimal PATH
Cron's default PATH is typically /usr/bin:/bin. Commands in /usr/local/bin, /usr/sbin, or user-specific directories are NOT found. Always use absolute paths or set PATH explicitly at the top of the crontab.
No Profile, No Aliases
Cron does not source ~/.bashrc, ~/.profile, or /etc/profile. Environment variables you set interactively do not exist. Functions and aliases do not exist. Scripts that depend on them will fail silently.
# Top of crontab: set environment variables explicitly SHELL=/bin/bash PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin MAILTO=ops@sector.local # email job output here; set to "" to silence HOME=/root # Always redirect output in the crontab entry itself 0 3 * * * /usr/local/bin/backup.sh >> /var/log/backup.log 2>&1 # Suppress all output (use only when you know it works) */5 * * * * /usr/local/bin/heartbeat.sh >/dev/null 2>&1 # Debug a failing cron job: run it with cron's environment manually env -i HOME=/root PATH=/usr/bin:/bin SHELL=/bin/bash /usr/local/bin/backup.sh
Root Cause of Most Cron Failures
A script works in your terminal but not in cron. The difference is almost always PATH. Set PATH at the top of every crontab file and use absolute paths in every script called by cron.
Slide 6 of 35
systemd Timers: The Modern Scheduler
systemd timers replace cron with dependency awareness, missed-run catching, and journald logging.
.timer OnCalendar= activates .service ExecStart= runs script.sh exit 0|1 logs journald -u svc Persistent=true
Advantages Over cron
Missed runs are caught at next boot (Persistent=true). Full journald logging -- output captured automatically. Dependencies can delay execution. Precise calendar and relative scheduling. Easy enable/disable with systemctl.
Unit Pair
A timer always pairs with a service unit. The .timer unit defines when to run. The .service unit defines what to run. They share the same base name: backup.timer activates backup.service.
Two Schedule Types
Realtime (calendar) timers fire at specific wall clock times, like cron. Monotonic timers fire relative to an event: OnBootSec=5min fires 5 minutes after boot. Both types can be combined in one timer unit.
# List all active timers and their next trigger time systemctl list-timers --all # Enable and start a timer systemctl enable --now backup.timer # Check timer status and last run time systemctl status backup.timer # View output from the last run of the associated service journalctl -u backup.service -n 50
Slide 7 of 35
Writing a systemd Timer Unit
Two files: a service unit that does the work, and a timer unit that schedules it.
backup.timer [Timer] OnCalendar= same name backup.service [Service] ExecStart= daemon-reload enable --now systemctl
# /etc/systemd/system/sector-backup.service [Unit] Description=Sector Node Backup After=network-online.target Wants=network-online.target [Service] Type=oneshot User=backup ExecStart=/usr/local/bin/sector-backup.sh StandardOutput=journal StandardError=journal # SyslogIdentifier makes filtering in journalctl easy SyslogIdentifier=sector-backup
# /etc/systemd/system/sector-backup.timer [Unit] Description=Daily sector backup at 02:30 [Timer] # Calendar schedule (same expressive power as cron, clearer syntax) OnCalendar=*-*-* 02:30:00 # Catch up on missed runs (e.g., server was off at 02:30) Persistent=true # Randomize within 15 minutes to avoid thundering herd on multiple nodes RandomizedDelaySec=15min [Install] WantedBy=timers.target
# Deploy and activate systemctl daemon-reload systemctl enable --now sector-backup.timer systemctl list-timers sector-backup.timer
Slide 8 of 35
Timer Calendar Syntax
systemd calendar expressions are more readable than cron -- and can be validated before deployment.
cron systemd calendar 0 0 * * * daily 0 0 * * 1 Mon *-*-* 00:00:00 */15 * * * * *:0/15
# Validate any calendar expression (prints next trigger times) systemd-analyze calendar '*-*-* 02:30:00' systemd-analyze calendar 'Mon *-*-* 08:00:00' systemd-analyze calendar 'weekly' # Common calendar expressions OnCalendar=daily # *-*-* 00:00:00 OnCalendar=weekly # Mon *-*-* 00:00:00 OnCalendar=monthly # *-*-01 00:00:00 OnCalendar=hourly # *-*-* *:00:00 OnCalendar='*:0/15' # every 15 minutes OnCalendar='Mon..Fri *-*-* 09:00' # weekdays at 09:00 OnCalendar='*-*-1,15 02:00' # 1st and 15th of every month # Monotonic (relative) triggers OnBootSec=2min # 2 minutes after boot OnUnitActiveSec=1h # 1 hour after the last activation OnStartupSec=30s # 30 seconds after systemd started
systemd-analyze calendar
Always run systemd-analyze calendar 'your expression' before committing a timer unit. It shows the next 10 trigger times so you can confirm the schedule is exactly what you intended before it goes near production.
Slide 9 of 35
logrotate: Preventing Disk Exhaustion
logrotate manages log files automatically: rotating, compressing, removing old files, and reloading daemons.
How It Works
logrotate runs daily (via cron or systemd timer). It reads config files in /etc/logrotate.d/. For each configured log, it renames the current file, creates a new empty one, optionally compresses older copies, and deletes files beyond the retention count.
Daemon Reload
Many daemons hold a file descriptor open to their log. After rotation, they would keep writing to the renamed file. The postrotate or sharedscripts directive sends a signal (often SIGHUP) to the daemon to reopen its log file handle.
# /etc/logrotate.d/sector-app /var/log/sector/app.log /var/log/sector/worker.log { daily # rotate frequency rotate 14 # keep 14 rotated files compress # compress rotated files with gzip delaycompress # compress the previous rotation, not the current missingok # do not error if log file does not exist notifempty # skip rotation if the file is empty create 0640 sectorapp adm # permissions and owner for new log file dateext # add date to rotated filename: app.log-20260409 sharedscripts # run postrotate once for all logs in this block postrotate systemctl reload sector-app 2>/dev/null || true endscript }
Slide 10 of 35
logrotate: Advanced Configuration
Size-based rotation, copytruncate for open file handles, and manual testing.
# Size-based rotation (not time-based) /var/log/nginx/access.log { size 100M # rotate when file reaches 100MB rotate 5 compress copytruncate # copy then truncate -- no daemon reload needed # WARNING: copytruncate has a brief race window -- use postrotate when possible } # Hourly rotation for high-volume logs /var/log/sector/audit.log { hourly rotate 168 # 168 hours = 7 days of hourly files compress delaycompress missingok notifempty postrotate kill -USR1 $(cat /var/run/sector-audit.pid 2>/dev/null) 2>/dev/null || true endscript }
# Test logrotate config without actually rotating logrotate --debug /etc/logrotate.d/sector-app # Force rotation immediately (for testing postrotate scripts) logrotate --force /etc/logrotate.d/sector-app # Check logrotate status (last run times per log) cat /var/lib/logrotate/status | grep sector
Slide 11 of 35
rsync: Efficient File Synchronization
rsync transfers only what changed. It is the backbone of Linux backup, deployment, and replication workflows.
# Core flags you will use on every rsync call # -a archive mode: -rlptgoD (recursive, preserve links/perms/times/owner/group/devices) # -v verbose (add to see what was transferred) # -z compress in transit (useful on slow links) # --delete remove files from destination not in source (true sync) # -n / --dry-run show what would change without changing anything # Local sync rsync -av /opt/sector/ /mnt/backup/sector/ # Remote sync over SSH rsync -avz --delete /opt/sector/ backup-node:/mnt/backup/sector/ # Dry run first -- always rsync -avn --delete /opt/sector/ backup-node:/mnt/backup/sector/ # Show progress and transfer statistics rsync -av --progress --stats /data/ /mnt/backup/data/ # Exclude directories rsync -av --exclude='*.tmp' --exclude='.git/' /opt/sector/ /mnt/backup/sector/ # Bandwidth limit (useful for background sync, units: KB/s) rsync -avz --bwlimit=5000 /data/ backup-node:/backup/data/
Slide 12 of 35
rsync Backup Patterns: Incremental with --link-dest
The --link-dest pattern creates full backup snapshots at incremental cost using hard links.
You want 30 days of daily backups, but storing 30 full copies is prohibitively expensive. The --link-dest pattern solves this: each backup appears to be a full copy, but unchanged files are hard links to the previous backup. Storage cost is proportional to change rate, not total size.
#!/usr/bin/env bash # incremental-backup.sh -- daily snapshot backup with hard-link deduplication set -euo pipefail SRC="/opt/sector" DEST="/mnt/backup" TODAY="$(date +%Y-%m-%d)" SNAPSHOT="${DEST}/snapshots/${TODAY}" LATEST="${DEST}/latest" mkdir -p "$SNAPSHOT" # --link-dest: files unchanged since LATEST are hard-linked, not copied rsync -av --delete \ --link-dest="${LATEST}" \ "${SRC}/" "${SNAPSHOT}/" # Update the 'latest' symlink to point to today's snapshot ln -sfn "${SNAPSHOT}" "${LATEST}" # Remove snapshots older than 30 days find "${DEST}/snapshots" -maxdepth 1 -type d -mtime +30 -exec rm -rf {} + echo "Backup complete: ${SNAPSHOT}"
Slide 13 of 35
rsync over SSH: Key Auth and Restricted Shell
Automated rsync must authenticate without a password. Use SSH keys with forced commands for least-privilege access.
# Generate a dedicated key for backup automation (no passphrase) ssh-keygen -t ed25519 -C "backup-auto@sector" -f /root/.ssh/id_backup -N "" # On the backup destination server: restrict the key to rsync only # ~/.ssh/authorized_keys entry: # command="rsync --server --daemon .",from="10.0.0.0/8" ssh-ed25519 AAAA... backup-auto@sector # Custom SSH options for the rsync call rsync -av -e "ssh -i /root/.ssh/id_backup -o StrictHostKeyChecking=yes -p 2222" \ /opt/sector/ backup-node:/backup/sector/ # rsync with SSH config alias (/root/.ssh/config) # Host backup-node # HostName 10.0.1.50 # User backup # IdentityFile /root/.ssh/id_backup # Port 2222 rsync -av /opt/sector/ backup-node:/backup/sector/
Principle of Least Privilege
The backup key should be able to do exactly one thing: run rsync on the destination. The forced command in authorized_keys ensures that even if the private key is compromised, the attacker can only run rsync -- not an interactive shell.
Slide 14 of 35
at and batch: One-Shot Deferred Execution
For tasks that need to run once in the future -- not on a recurring schedule -- at is the right tool.
user at 02:30 specific time batch load < 1.5 atd queue atq / atrm run once then removed
at: Schedule at a Specific Time
at executes a command once at a future time. After the job runs, it is removed from the queue. The job inherits the current environment, unlike cron. Output is mailed or captured. Useful for: scheduled maintenance, deferred deployment, one-time reminders.
batch: Run When Load Allows
batch runs commands when system load drops below 1.5 (configurable). It is at with load-sensitive scheduling. Use it for resource-intensive tasks you want to defer during peak hours without specifying an exact time.
# Schedule a command at a specific time echo '/usr/local/bin/maintenance.sh >> /var/log/maint.log 2>&1' | at 02:30 # Human-readable time expressions at 4pm tomorrow at noon next friday at now + 2 hours at 2026-04-10 08:00 # Interactive mode (type commands, Ctrl+D to submit) at 03:00 <<'EOF' apt-get update -qq apt-get upgrade -y systemctl reboot EOF # View pending at jobs atq # View the commands in a job (job number from atq) at -c 3 # Remove a pending job atrm 3 # batch: defer heavy work to low-load time echo '/usr/local/bin/reindex-db.sh' | batch
Slide 15 of 35
Monitoring Script: Disk Usage Alerting
A production-grade disk usage monitor that alerts on threshold breach and details which paths are responsible.
#!/usr/bin/env bash # disk-monitor.sh — alert when any filesystem exceeds threshold set -euo pipefail WARN_THRESHOLD=80 CRIT_THRESHOLD=90 ALERT_EMAIL="ops@sector.local" HOSTNAME="$(hostname -f)" df -h --output=source,pcent,target | tail -n +2 | while IFS= read -r line; do PCT="$(echo "$line" | awk '{gsub(/%/,"",$2); print $2}')" MOUNT="$(echo "$line" | awk '{print $3}')" DEV="$(echo "$line" | awk '{print $1}')" if (( PCT >= CRIT_THRESHOLD )); then BODY="CRITICAL: ${MOUNT} on ${HOSTNAME} is at ${PCT}% (${DEV})\n" BODY+="$(du -sh ${MOUNT}/* 2>/dev/null | sort -rh | head -10)" echo -e "$BODY" | mail -s "[CRITICAL] Disk ${PCT}% on ${HOSTNAME}:${MOUNT}" "$ALERT_EMAIL" elif (( PCT >= WARN_THRESHOLD )); then logger -t disk-monitor -p local0.warning \ "WARN: ${MOUNT} at ${PCT}% on ${DEV}" fi done
Slide 16 of 35
Monitoring Script: Process Watchdog
Ensure critical processes are running and restart them if they die. A lightweight alternative to systemd for legacy processes.
#!/usr/bin/env bash # watchdog.sh — monitor critical processes, restart if down set -euo pipefail # Array of: "process-name:start-command" declare -A PROCESSES PROCESSES["nginx"]="systemctl start nginx" PROCESSES["sector-worker"]="/opt/sector/bin/worker --daemonize" PROCESSES["redis-server"]="systemctl start redis" MAX_RESTARTS=3 STATE_DIR="/var/run/watchdog" mkdir -p "$STATE_DIR" for proc in "${!PROCESSES[@]}"; do STATE_FILE="${STATE_DIR}/${proc}.restarts" RESTARTS="$(cat "$STATE_FILE" 2>/dev/null || echo 0)" if ! pgrep -x "$proc" >/dev/null 2>&1; then if (( RESTARTS < MAX_RESTARTS )); then echo "[WARN] $proc down -- restarting (attempt $((RESTARTS+1)))" eval "${PROCESSES[$proc]}" echo "$((RESTARTS+1))" > "$STATE_FILE" else echo "[CRIT] $proc down -- max restarts reached, alerting" logger -t watchdog -p local0.crit "$proc failed after $RESTARTS restarts" fi else echo 0 > "$STATE_FILE" # reset counter when process is healthy fi done
Slide 17 of 35
Monitoring Script: Network Connectivity
Check that critical endpoints are reachable and services are responding at the protocol level.
#!/usr/bin/env bash # net-check.sh — verify reachability and service response set -euo pipefail declare -A ENDPOINTS ENDPOINTS["gateway"]="10.0.0.1:icmp" ENDPOINTS["web-lb"]="10.0.1.10:443" ENDPOINTS["db-primary"]="10.0.1.20:5432" ENDPOINTS["api-health"]="https://api.sector.local/health:http" FAIL=0 check_tcp() { local host="$1" port="$2" timeout 3 bash -c "echo >/dev/tcp/${host}/${port}" 2>/dev/null } check_http() { local url="$1" curl -sf --max-time 5 "$url" >/dev/null } for name in "${!ENDPOINTS[@]}"; do spec="${ENDPOINTS[$name]}" case "$spec" in *:icmp) ping -c1 -W2 "${spec%:icmp}" >/dev/null 2>&1 || { echo "FAIL: $name"; FAIL=1; } ;; https://*:http) check_http "${spec%:http}" || { echo "FAIL: $name"; FAIL=1; } ;; *:*) check_tcp "${spec%:*}" "${spec##*:}" || { echo "FAIL: $name"; FAIL=1; } ;; esac done (( FAIL > 0 )) && { logger -t net-check -p local0.error "$FAIL endpoint(s) unreachable"; exit 1; } echo "All endpoints OK"
Slide 18 of 35
Monitoring Script: Log Anomaly Detection
Scan logs for patterns that indicate problems and report them before users notice.
#!/usr/bin/env bash # log-anomaly.sh — scan recent logs for error patterns set -euo pipefail SINCE="1 hour ago" PATTERNS=( "CRITICAL" "OOM killer" "segfault" "Connection refused" "disk quota exceeded" "authentication failure" "permission denied" ) REPORT="" for pattern in "${PATTERNS[@]}"; do COUNT="$(journalctl --since="$SINCE" -q 2>/dev/null \ | grep -ci "$pattern" || echo 0)" if (( COUNT > 0 )); then REPORT+="[${COUNT}x] ${pattern}\n" # Show the most recent 3 occurrences for context REPORT+="$(journalctl --since="$SINCE" -q 2>/dev/null \ | grep -i "$pattern" | tail -3 | sed 's/^/ /')\n\n" fi done if [[ -n "$REPORT" ]]; then echo -e "[ANOMALY REPORT] $(hostname) @ $(date)\n$REPORT" | \ mail -s "[Log Anomaly] $(hostname)" ops@sector.local fi
Slide 19 of 35
flock: Prevent Concurrent Cron Runs
A slow job that runs longer than its schedule interval will overlap with the next run. flock prevents this.
Your backup script takes 35 minutes to run. The cron job fires every 30 minutes. Without locking, two instances run simultaneously, fighting over the same files. flock acquires an exclusive lock: if the first run is still in progress, the second run exits immediately.
# flock approach 1: wrap in crontab */30 * * * * flock -n /var/lock/backup.lock /usr/local/bin/backup.sh # -n = non-blocking (exit 1 immediately if lock not available) # Default is blocking (wait for lock to be released) # flock approach 2: inside the script (preferred for complex scripts) #!/usr/bin/env bash set -euo pipefail LOCK_FILE="/var/lock/backup.lock" # Open file descriptor 200 and acquire exclusive lock exec 200>"$LOCK_FILE" flock -n 200 || { echo "Another instance is running. Exiting."; exit 0; } # Lock is held for the remainder of the script # Automatically released when script exits (fd 200 closes) echo "$BASHPID" >&200 # write PID into lock file for debugging # ... rest of script
Slide 20 of 35
anacron: Scheduling for Non-24/7 Systems
cron misses jobs when the system is off. anacron catches up on missed daily/weekly/monthly jobs at next boot.
How anacron Works
anacron reads /etc/anacrontab and tracks the last run time in /var/spool/anacron/. On startup, if a job has not run within its period, anacron runs it after a random delay. This prevents simultaneous execution on boot across many systems.
Use Case
Laptops, dev workstations, VMs that are frequently suspended or shut down. Also useful for batch jobs on servers where you care that the job runs at least once per day, not necessarily at exactly 02:00.
# /etc/anacrontab format: # period(days) delay(minutes) job-id command 1 5 daily-backup /usr/local/bin/backup.sh 7 10 weekly-report /usr/local/bin/weekly-report.sh 30 15 monthly-cleanup /usr/local/bin/cleanup.sh # View last run times ls -la /var/spool/anacron/ cat /var/spool/anacron/daily-backup # contains date of last run # Force anacron to run all due jobs immediately (testing) anacron -f -d # -f force, -d debug output # Check if anacron is installed and active on this system systemctl status cron | grep -i anacron
Slide 21 of 35
Backup Verification: Backups Are Worthless Until Tested
A backup that cannot be restored is not a backup. Automate verification as rigorously as the backup itself.
#!/usr/bin/env bash # verify-backup.sh -- confirm backup integrity and restore readiness set -euo pipefail BACKUP_DIR="/mnt/backup/snapshots/latest" VERIFY_LOG="/var/log/backup-verify.log" FAIL=0 log() { echo "$(date +%T) $*" | tee -a "$VERIFY_LOG"; } # Check 1: backup directory exists and is recent [[ -d "$BACKUP_DIR" ]] || { log "FAIL: backup dir missing"; FAIL=1; } AGE=$(( ($(date +%s) - $(stat -c %Y "$BACKUP_DIR")) / 3600 )) (( AGE > 26 )) && { log "FAIL: backup is ${AGE}h old -- expected <26h"; FAIL=1; } # Check 2: critical files are present for f in /etc/passwd /etc/nginx/nginx.conf /opt/sector/config.json; do [[ -f "${BACKUP_DIR}${f}" ]] || { log "FAIL: missing ${f} in backup"; FAIL=1; } done # Check 3: compare checksums of key configs sha256sum /etc/nginx/nginx.conf > /tmp/live.sha256 sha256sum "${BACKUP_DIR}/etc/nginx/nginx.conf" >> /tmp/live.sha256 awk 'NR==1{ck=$1} NR==2{if($1!=ck) print "CHECKSUM MISMATCH: nginx.conf"}' /tmp/live.sha256 (( FAIL > 0 )) && logger -t backup-verify -p local0.crit "Backup verification FAILED" log "Verification complete: FAIL=$FAIL" exit $FAIL
Slide 22 of 35
Cron Security: Least Privilege for Scheduled Jobs
Running cron jobs as root is a common misconfiguration. Scope every job to the minimum required privilege.
Running Everything as Root
If a cron script contains a bug or is modified by an attacker, running as root means unrestricted system access. A backup script does not need root. A log rotation script does not need root. Audit and minimize.
Dedicated Service Accounts
Create dedicated accounts: backup, monitor, deploy. Grant them only the permissions required using sudo rules, file ACLs, or group membership. If the account is compromised, blast radius is contained.
# Create a dedicated backup user with no shell login useradd -r -s /usr/sbin/nologin -m backup # Grant backup user read access to required directories setfacl -R -m u:backup:rX /opt/sector setfacl -m u:backup:rX /etc/nginx # Allow backup user to run rsync as root without password (sudoers) # /etc/sudoers.d/backup: # backup ALL=(root) NOPASSWD: /usr/bin/rsync --server * # /etc/cron.d/sector-backups (using explicit user field) 30 2 * * * backup /usr/local/bin/backup.sh >> /var/log/backup.log 2>&1 # Restrict crontab access: /etc/cron.allow lists permitted users # /etc/cron.deny lists blocked users
Slide 23 of 35
systemd Timers: Advanced Patterns
Resource limits, dependencies, and transient timers for one-shot scheduled operations.
# Resource-limited service unit # /etc/systemd/system/sector-cleanup.service [Service] Type=oneshot User=cleanup-svc ExecStart=/usr/local/bin/cleanup.sh # CPU: maximum 20% of one core CPUQuota=20% # Memory: kill if it tries to use more than 512MB MemoryMax=512M # Nice level: low priority, yield to other processes Nice=15 # I/O: low priority scheduling class IOSchedulingClass=idle # Prevent writing outside /tmp and /var/log/sector ReadWritePaths=/var/log/sector /tmp ProtectSystem=strict
# Transient timer: run once at a specific time, then gone (no unit file needed) systemd-run --on-calendar='2026-04-10 03:00' \ --unit=one-shot-reboot \ /sbin/reboot # Transient relative timer: 30 minutes from now systemd-run --on-active=30m \ --description="Deferred maintenance" \ /usr/local/bin/maintenance.sh # View transient jobs systemctl list-timers
Slide 24 of 35
logger: Scripts Writing to syslog
Use logger to write structured messages from scripts into the system log, where they can be monitored and forwarded.
# Basic usage: write to syslog with a custom tag logger -t backup "Backup started for /opt/sector" # Specify facility and priority (facility.priority) logger -p local0.info -t backup "Backup completed successfully" logger -p local0.warning -t backup "Backup completed with 3 skipped files" logger -p local0.err -t backup "Backup FAILED: rsync exited 23" logger -p local0.crit -t backup "Backup destination unreachable" # Structured key=value logging (machine-parseable) logger -t sector-monitor \ "event=disk_warn host=$(hostname) mount=/var/log usage_pct=87" # Log stdin (pipe output of a command directly to syslog) rsync -av /data/ /backup/ 2>&1 | logger -t rsync-backup # See messages written by logger journalctl -t backup -n 20 journalctl -t sector-monitor --since "1 hour ago" # Filter by facility in rsyslog (routes local0.* to a dedicated log) # /etc/rsyslog.d/sector.conf: local0.* /var/log/sector/automation.log
Slide 25 of 35
Testing Automation: Dry Runs and Staging
Never run a new automated job directly on production. Test it at every level before scheduling it.
Dry Run Mode
Build a -d / --dry-run flag into every automation script. When active, it logs what it would do without doing it. Run in dry-run mode on production first to validate the script sees the right environment.
Manual Trigger Test
Before scheduling with cron or a timer, run the script manually in the exact environment it will have: as the correct user, with the cron PATH, without your shell profile. If it works manually in that context, it will work scheduled.
Log Review
After the first few scheduled runs, review the logs in detail. Check runtime, output, exit codes. A job that succeeds but takes 10x longer than expected is a problem waiting to happen. Set up alerting on non-zero exit codes from day one.
# Run a cron job manually with cron's environment (replicate the context) sudo -u backup env -i HOME=/home/backup \ PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \ SHELL=/bin/bash \ /usr/local/bin/backup.sh # Force a systemd timer to fire now (without waiting for schedule) systemctl start sector-backup.service # runs the service immediately journalctl -u sector-backup.service -f # watch its output
Slide 26 of 35
Cron Best Practices: Production Standards
Rules that separate cron jobs that work reliably for years from ones that cause incidents.
1Set SHELL, PATH, and MAILTO at the top of every crontab or cron.d file. Never assume defaults.
2Redirect all output explicitly: >> /var/log/jobname.log 2>&1. Never let output silently disappear to sendmail or /dev/null without having tested it first.
3Use flock for any job whose duration might exceed its schedule interval. Concurrent runs of the same job are a production hazard.
4Never run jobs as root unless root access is truly required. Create service accounts and use the username field in /etc/cron.d/ entries.
5Spread scheduled jobs with random offsets. Do not run 20 jobs at exactly midnight. Stagger them across the window to avoid I/O and CPU contention.
6Store cron job scripts in version control. Deploying cron jobs by hand is untraceable. If the server is rebuilt, all manually-added jobs are lost.
7Alert on non-zero exit codes. A script that runs but fails silently every night is worse than one that never runs, because you think it is working.
Slide 27 of 35  |  Applied Automation
Applied: Complete Backup Automation System
Combining rsync, systemd timer, flock, logging, and verification into a single coherent system.
sector-backup.service
Type=oneshot. User=backup. ExecStart=/usr/local/bin/sector-backup.sh. StandardOutput and StandardError both to journal. SyslogIdentifier=sector-backup. MemoryMax=1G. CPUQuota=40%.
sector-backup.timer
OnCalendar=*-*-* 02:30:00. Persistent=true (catch missed runs). RandomizedDelaySec=5min (prevent multi-node thundering herd). WantedBy=timers.target.
# sector-backup.sh: the actual work, called by the service flock -n /var/lock/sector-backup.lock || { logger -t sector-backup "already running"; exit 0; } rsync -av --delete --link-dest=/mnt/backup/latest \ /opt/sector/ /mnt/backup/snapshots/"$(date +%F)"/ ln -sfn /mnt/backup/snapshots/"$(date +%F)" /mnt/backup/latest find /mnt/backup/snapshots -maxdepth 1 -type d -mtime +30 -exec rm -rf {} + # sector-backup-verify.timer fires 30 minutes after backup completes # OnUnitActiveSec=30m # fires 30 minutes after sector-backup.service ran # Monitor: check last run time from systemd systemctl show sector-backup.service --property=ExecMainExitTimestamp
Slide 28 of 35
rsync Daemon Mode: Pull-Based Replication
When SSH is not available or you need high-performance native rsync protocol, run rsyncd on the destination.
# /etc/rsyncd.conf on the destination/backup server uid = backup gid = backup use chroot = yes max connections = 4 log file = /var/log/rsyncd.log pid file = /var/run/rsyncd.pid [sector-data] path = /mnt/backup/sector comment = Sector node data backup read only = no hosts allow = 10.0.1.0/24 auth users = sector-backup secrets file = /etc/rsyncd.secrets # /etc/rsyncd.secrets: sector-backup:s3cr3tpassword (chmod 600)
# Client: rsync to daemon (use :: double colon for daemon protocol) rsync -av --password-file=/etc/sector/rsync.pass \ /opt/sector/ backup-node::sector-data/ # Start rsyncd as a systemd service systemctl enable --now rsync # Or: launch from xinetd for on-demand connections (legacy environments) # Or: run with --daemon flag for standalone operation rsync --daemon --config=/etc/rsyncd.conf
Slide 29 of 35
inotifywait: Event-Driven Automation
React to filesystem events in real time without polling. When a file changes, act immediately.
filesystem modify/create inotifywait -m -r -e listening... | while read event action deploy.sh
# Install: apt install inotify-tools # Wait for a single event (blocking) inotifywait -e close_write /etc/nginx/nginx.conf # Blocks until the file is written and closed systemctl reload nginx # Watch a directory recursively for any file modification inotifywait -m -r -e modify,create,delete /etc/sector/ # Event-driven deployment: sync whenever configs change inotifywait -m -r -e close_write \ --format '%T %w%f' --timefmt '%F %T' \ /etc/sector/configs/ | while read -r ts path; do echo "[$ts] Changed: $path -- redeploying" /usr/local/bin/deploy-config.sh "$path" done # Common events to watch for # modify — file content changed # create — new file created # delete — file removed # moved_to — file moved into watched dir # close_write — file closed after writing (safer than modify for deployments)
Slide 30 of 35
Performance Automation: Scheduled Tuning Scripts
Some performance parameters are workload-dependent. Automate adjustment based on observed load.
#!/usr/bin/env bash # adaptive-tuner.sh -- adjust kernel parameters based on observed load set -euo pipefail LOAD="$(awk '{print int($1)}' /proc/loadavg)" NCPU="$(nproc)" LOAD_PER_CPU=$(( LOAD * 100 / NCPU )) # load as % of capacity if (( LOAD_PER_CPU > 150 )); then # High load: increase I/O scheduler time slice for throughput echo 256 > /sys/block/sda/queue/nr_requests echo "mq-deadline" > /sys/block/sda/queue/scheduler logger -t adaptive-tuner "High load (${LOAD_PER_CPU}%) -- throughput mode" elif (( LOAD_PER_CPU < 30 )); then # Low load: reduce latency for interactive workloads echo 64 > /sys/block/sda/queue/nr_requests echo "bfq" > /sys/block/sda/queue/scheduler logger -t adaptive-tuner "Low load (${LOAD_PER_CPU}%) -- latency mode" fi # Schedule this every 5 minutes via cron or systemd timer # */5 * * * * root flock -n /var/lock/tuner.lock /usr/local/bin/adaptive-tuner.sh
Slide 31 of 35
Alerting: Email, Webhook, and Slack from Scripts
Automation is only as good as its alerting. Scripts must notify humans when action is required.
# Email via mail command (requires postfix/sendmail configured) mail -s "[ALERT] Backup failed on $(hostname)" ops@sector.local <<'EOF' Backup script failed at 02:31. Last successful backup: check /var/log/backup.log Action required: verify /mnt/backup is mounted and accessible EOF # Webhook alert (Slack, PagerDuty, Mattermost) via curl send_alert() { local msg="$1" curl -sf -X POST "${WEBHOOK_URL}" \ -H 'Content-Type: application/json' \ -d "{\"text\": \"[${HOSTNAME}] ${msg}\"}" >/dev/null } # Integrate alert into any script run_backup || send_alert "Backup FAILED -- exit code $?" # Throttle alerts: write a timestamp file, alert only if not alerted recently ALERT_FILE="/var/run/backup-alert.ts" LAST="$(cat "$ALERT_FILE" 2>/dev/null || echo 0)" NOW="$(date +%s)" if (( NOW - LAST > 3600 )); then # alert at most once per hour send_alert "Backup FAILED" echo "$NOW" > "$ALERT_FILE" fi
Slide 32 of 35
Automation Inventory: Know What Runs on Your System
Most production systems accumulate decades of scheduled jobs. Audit them before you inherit an incident.
# Comprehensive cron audit: all sources on the system echo "=== /etc/crontab ==="; cat /etc/crontab echo "=== /etc/cron.d/ ==="; ls -la /etc/cron.d/ && cat /etc/cron.d/* echo "=== User crontabs ===" for user in "$(cut -d: -f1 /etc/passwd)"; do CTAB="$(crontab -u "$user" -l 2>/dev/null)" [[ -n "$CTAB" ]] && { echo "-- $user --"; echo "$CTAB"; } done echo "=== /etc/cron.{hourly,daily,weekly,monthly} ===" for dir in hourly daily weekly monthly; do echo "-- cron.$dir --" ls -la /etc/cron."$dir"/ 2>/dev/null done echo "=== systemd timers ==="; systemctl list-timers --all
Slide 33 of 35
Troubleshooting: Why Is the Cron Job Not Running?
A systematic diagnostic process for cron jobs that silently fail to execute.
1Check that cron is running: systemctl status cron. If cron is dead, nothing runs.
2Check syslog for execution evidence: grep CRON /var/log/syslog | tail -20. Cron logs every execution attempt.
3Verify the script is executable: ls -la /usr/local/bin/myscript.sh. The x bit must be set.
4Run the script manually as the cron user with cron's PATH: sudo -u cronuser env -i PATH=/usr/bin:/bin SHELL=/bin/bash /path/to/script.sh
5Check MAILTO: if it is set to a user, output is emailed. If that user has no mail, it silently discards output. Set MAILTO="" and redirect to a log file instead.
6Check /etc/cron.allow and /etc/cron.deny. If cron.allow exists and the user is not in it, the user cannot run cron jobs.
Slide 34 of 35
Scheduler Comparison: Pick the Right Tool
cron, systemd timers, at, anacron, and inotifywait each solve a distinct scheduling problem.
RECURRING ONE-SHOT / EVENT cron systemd timer anacron at batch inotifywait 24/7 servers file events laptops
cron
Recurring jobs on servers that run 24/7. Universal -- on every Linux system. Simple syntax. Use for jobs that have been working well for years and do not need advanced features.
systemd timers
New recurring jobs on systemd systems. Better logging (journald), missed-run catching, resource limits, dependency ordering. The default choice for anything new on Ubuntu 22.04+.
at / batch
One-shot future execution. Use at for scheduled maintenance windows or deferred deployments. Use batch for heavy jobs you want to run during low-load periods without picking an exact time.
anacron
Systems that are not always on. Laptops, intermittently-running VMs, dev workstations. Catches up on missed daily/weekly/monthly jobs at next boot.
inotifywait
Event-driven, not time-driven. Use when you need to react immediately when a file changes. Config-reload triggers, hot-deploy systems, file-drop processing pipelines.
Slide 35 of 35  |  ALA-06 Summary
Linux Automation: What You Now Know
A system that relies on humans for recurring tasks is a liability. Every cell operator is now equipped to build automation that runs reliably in the background, catches missed executions, alerts when things go wrong, and leaves detailed logs for every operation.
Control Node ansible-playbook SSH push web-01 db-01 app-01 ok=3 changed=2 failed=0 Ansible Push Model
1Cron fields: minute, hour, day-of-month, month, day-of-week. Set PATH and SHELL at the top of every crontab.
2Cron does not source shell profiles. Commands not in /usr/bin:/bin need absolute paths or an explicit PATH setting.
3systemd timers: pair a .service with a .timer. Use Persistent=true to catch missed runs. Validate schedules with systemd-analyze calendar.
4flock -n /var/lock/job.lock prevents concurrent execution when a job runs longer than its interval.
5rsync --link-dest creates full-snapshot incremental backups using hard links. Storage cost is proportional to change rate only.
6logrotate prevents disk exhaustion. Use postrotate to reload daemons. Use copytruncate only when you cannot reload the writing process.
7logger -t tag -p facility.priority 'message' writes to syslog from any script. Enables central log collection and monitoring.
8Run automated jobs in dry-run mode first. Test in cron's exact environment. Alert on non-zero exit codes. Review logs after the first real runs.