CH03 — LAB

Grep, Pipes & Text Processing — Hands-On

Five progressive exercises taking you from basic grep searches through real security log analysis, I/O redirection, and multi-stage pipeline construction. These are production-level skills.

Platform: Linux Terminal
Exercises: 5
Difficulty: Intermediate
Est. Time: 50–70 min

Lab Objectives

1
grep Flag Mastery
GREP
SCENARIOPractice each major grep flag against real system files. Build muscle memory for the flags that you will use thousands of times in your career.
  1. Case-insensitive search: grep -i "root" /etc/passwd — how many lines match?
  2. Count matches: grep -c "bash" /etc/passwd — how many users use bash?
  3. Show line numbers: grep -n "PermitRootLogin" /etc/ssh/sshd_config
  4. Invert the match (find non-comment, non-blank lines): grep -v "^#" /etc/ssh/sshd_config | grep -v "^$"
  5. Recursive search: grep -r "PasswordAuthentication" /etc/ssh/ 2>/dev/null
  6. Show context: grep -A 2 -B 1 "Port" /etc/ssh/sshd_config
  7. Extended regex — match multiple patterns: grep -E "root|admin|sudo" /etc/passwd
  8. Only show matching part: grep -oE "[0-9]+" /etc/passwd | sort -rn | head -5
# Active SSH config lines (filter out comments and blanks) $ grep -v "^#" /etc/ssh/sshd_config | grep -v "^$" Include /etc/ssh/sshd_config.d/*.conf KbdInteractiveAuthentication no UsePAM yes X11Forwarding yes PrintMotd no AcceptEnv LANG LC_* Subsystem sftp /usr/lib/openssh/sftp-server
2
I/O Redirection Drills
I/O REDIRECT
SCENARIOMaster the redirection operators that control where command input comes from and where output goes. These operators are used in every script and pipeline you will ever write.
  1. Capture command output to a file: date > ~/timestamp.txt && cat ~/timestamp.txt
  2. Append to a file (don't overwrite): date >> ~/timestamp.txt && date >> ~/timestamp.txt && cat ~/timestamp.txt — should have 3 lines now
  3. Suppress errors: ls /nonexistent 2>/dev/null — verify no error is printed
  4. Capture both stdout and stderr: ls /etc /nonexistent > ~/output.txt 2>&1 && cat ~/output.txt
  5. Read from a file as stdin: sort < ~/timestamp.txt
  6. Use tee to split output: ls -la /etc | tee ~/etc-listing.txt | wc -l
  7. Chain: cat /etc/passwd | grep -v "^#" | wc -l > ~/usercount.txt && cat ~/usercount.txt
# Redirect both stdout and stderr example $ ls /etc /nonexistent > ~/output.txt 2>&1 $ cat ~/output.txt ls: cannot access '/nonexistent': No such file or directory /etc: adduser.conf apt # Both the error message AND the /etc listing are in the file
3
Pipeline Construction
PIPELINES
SCENARIOBuild increasingly complex pipelines to process system data. Each pipeline adds another stage of transformation, demonstrating the Unix "do one thing well" philosophy in action.
  1. List all unique shells used by system accounts: cat /etc/passwd | cut -d: -f7 | sort | uniq
  2. Count accounts per shell: cat /etc/passwd | cut -d: -f7 | sort | uniq -c | sort -rn
  3. Find the 10 largest files in /usr/bin: ls -la /usr/bin | awk '{print $5, $9}' | sort -rn | head -10
  4. Count running processes by state: ps aux | awk '{print $8}' | sort | uniq -c | sort -rn
  5. Find all listening TCP ports: ss -tlnp 2>/dev/null | grep LISTEN | awk '{print $4}' | sort
  6. Count words in a config file: cat /etc/hosts | wc -w
  7. Challenge: list the top 5 users by number of processes: ps aux | awk '{print $1}' | tail -n +2 | sort | uniq -c | sort -rn | head -5
# Count accounts per shell — typical output $ cat /etc/passwd | cut -d: -f7 | sort | uniq -c | sort -rn 18 /usr/sbin/nologin 4 /bin/bash 2 /bin/sh 1 /usr/bin/fish # Top processes by user $ ps aux | awk '{print $1}' | tail -n +2 | sort | uniq -c | sort -rn | head -5 35 root 8 www-data 4 student 2 postgres 1 nobody
4
awk Text Processing
AWK
SCENARIOawk is the column-processor of the Unix toolkit. Practice field extraction, filtering, arithmetic, and formatted output with awk against real system data.
  1. Extract username and UID from /etc/passwd: awk -F: '{print $1, $3}' /etc/passwd | column -t
  2. Show only system accounts (UID below 1000): awk -F: '$3 < 1000 {print $1, $3}' /etc/passwd
  3. Show only human users (UID 1000 and above): awk -F: '$3 >= 1000 && $3 < 65534 {print $1, $3, $6}' /etc/passwd
  4. Calculate total lines and a sum with awk: df -h | awk 'NR>1 {print $1, $2}'
  5. Custom formatted output: awk -F: '{printf "User: %-20s Shell: %s\n", $1, $7}' /etc/passwd | head -10
  6. Count lines in a file: awk 'END {print NR, "lines"}' /etc/passwd
# awk field extraction with custom formatting $ awk -F: '{printf "User: %-20s Shell: %s\n", $1, $7}' /etc/passwd | head -5 User: root Shell: /bin/bash User: daemon Shell: /usr/sbin/nologin User: bin Shell: /usr/sbin/nologin User: sys Shell: /usr/sbin/nologin User: sync Shell: /bin/sync # Filter by numeric field value $ awk -F: '$3 >= 1000 && $3 < 65534 {print $1, $3}' /etc/passwd student 1000
5
Security Log Analysis Mission
LOG ANALYSIS
SCENARIOYou are the on-call analyst. You have received an alert about unusual SSH activity. Use grep pipelines to investigate the authentication log and identify potential brute-force attacks. If auth.log does not exist on your system, use /var/log/syslog or create a simulated log file using the provided commands.
  1. Check if auth.log exists: ls -la /var/log/auth.log 2>/dev/null || echo "auth.log not found"
  2. Create a simulated auth log for practice: sudo bash -c 'for i in 1 2 3 4 5; do echo "$(date) sshd[$$]: Failed password for invalid user admin from 192.168.1.$((RANDOM % 255)) port $((RANDOM % 60000 + 1024)) ssh2" >> /var/log/auth.log.practice; done'
  3. Find all failed authentication attempts: grep "Failed password" /var/log/auth.log 2>/dev/null | head -20
  4. Extract and count attacking IPs: grep "Failed password" /var/log/auth.log 2>/dev/null | grep -oE "([0-9]+\.){3}[0-9]+" | sort | uniq -c | sort -rn | head -10
  5. Find successful logins: grep "Accepted" /var/log/auth.log 2>/dev/null
  6. Check for root login attempts: grep -i "failed.*root\|root.*failed" /var/log/auth.log 2>/dev/null | wc -l
  7. Save your findings to a report: grep "Failed password" /var/log/auth.log 2>/dev/null | grep -oE "([0-9]+\.){3}[0-9]+" | sort | uniq -c | sort -rn > ~/ssh-attack-report.txt && echo "Report saved" && cat ~/ssh-attack-report.txt
# Extract attacking IPs with count — security analyst's pipeline $ grep "Failed password" /var/log/auth.log | \ grep -oE "([0-9]+\.){3}[0-9]+" | \ sort | uniq -c | sort -rn | head -10 47 45.142.212.100 23 185.220.101.45 18 194.165.16.71 9 103.99.0.122 3 192.168.1.99 # These are the top attackers by attempt count # The top IPs are likely automated scanners or botnets
REAL WORLD CONTEXT
The pipeline you just built is a simplified version of what SIEM tools like Splunk, ELK Stack, and Suricata do automatically at scale. Understanding the underlying grep/awk mechanics makes you a better security analyst even when using GUI tools — you understand what the tool is actually doing, can build custom queries, and can work when the SIEM is unavailable. Every SSH-exposed server on the public internet receives thousands of automated login attempts daily. Fail2ban and ufw/iptables are the automated defenses; this manual analysis is for investigation.

Lab Complete

Mark complete when you have finished all exercises.

Lab progress saved. Move on to Chapter 4!