ALA-R3: Process Authority

ALA-R3

Process Authority

Adv Linux / ALA-R3
< Course Index

Operational Briefing

Mission Context:

"Every service in your cell is a process. When a cell malfunctions, the first question is always: what is running, who started it, and what is it doing to the system? This module re-establishes the tools and vocabulary to answer that question under operational pressure."

Process Fundamentals

Every running program is a process with a unique PID (Process ID) and a PPID (Parent Process ID). Processes form a tree rooted at PID 1 (systemd on Ubuntu 22.04). Understanding this tree is the starting point for diagnosing any cell malfunction.

# List all running processes with full detail ps aux # Custom columns: PID, parent PID, owner, command - sorted by CPU ps -eo pid,ppid,user,cmd --sort=-%cpu | head -20 # Visualize the full process tree with PIDs pstree -p

Process states visible in ps output: R (running), S (sleeping, waiting for an event), D (uninterruptible sleep, usually waiting on I/O), Z (zombie, terminated but not yet reaped by parent), T (stopped by a signal). A process stuck in D state is a common sign of disk or NFS issues.

Operational Context:

Every service in your cell is a process. When a cell malfunctions, the first question is always "what is running and who started it?"

Signals and Job Control

Signals are asynchronous notifications sent to processes. The operating system, other processes, or the terminal can deliver them. Operators send signals to stop, restart, or terminate services. Always attempt a graceful shutdown before using force.

# Graceful termination request (process cleans up first) kill -15 <pid> # Forced, immediate kill (cannot be caught or ignored) kill -9 <pid> # Kill all processes named sshd killall sshd # Kill all processes owned by a specific user pkill -u operator # Trap SIGTERM in a script to run cleanup before exit trap 'cleanup_function' SIGTERM
SignalNumberBehavior
SIGHUP1Reload configuration (many daemons respond to this)
SIGINT2Terminal interrupt (Ctrl+C)
SIGTERM15Graceful termination request (can be caught)
SIGKILL9Immediate kill (cannot be caught, blocked, or ignored)
SIGSTOP19Suspend process (cannot be caught or ignored)
Operational Context:

Graceful shutdown vs forced kill. SIGTERM asks politely. SIGKILL does not negotiate. Grid operators always try SIGTERM first.

Foreground / Background Job Control

Operators regularly need to launch long-running diagnostics while keeping their shell prompt available. Background job control and process detachment are the tools for this.

# Launch a process in the background sleep 3600 & # List background jobs with their PIDs jobs -l # Bring job 1 to the foreground fg %1 # Resume a stopped job in the background bg %2 # Remove a job from the shell's job table (survives shell exit) disown %1 # Run a command immune to hangup (survives SSH disconnect) nohup long-script.sh &

The difference between disown and nohup: disown removes the job from the shell's table after the process is already running, preventing SIGHUP on logout. nohup sets up SIGHUP immunity before the process starts and redirects output to nohup.out automatically. Use nohup proactively; use disown to fix a running job you forgot to protect.

Operational Context:

You will frequently need to launch long-running diagnostics and return to your shell. Background jobs are standard operational practice.

systemctl Basics

On Ubuntu 22.04, systemd is PID 1 and manages all services. Every service is a unit with a state. systemctl is the primary interface for inspecting and controlling units. Week 1 goes deep on systemd internals; here we establish the vocabulary.

# Check the current status of a service systemctl status sshd # Start, stop, restart, or reload a service systemctl start sshd systemctl stop sshd systemctl restart sshd systemctl reload sshd # Enable (start at boot) or disable a service systemctl enable sshd systemctl disable sshd # Quick boolean check: is the service currently running? systemctl is-active sshd

The distinction between enable and start: start runs the service now. enable creates the symlinks that cause it to start automatically at boot. A service can be started without being enabled (runs now, not after reboot) and enabled without being started (will run after reboot, not now). Use both together with systemctl enable --now sshd.

Operational Context:

On modern cells (Ubuntu 22.04), systemd controls every service. Week 1 goes deep; here we establish the vocabulary.

nice and renice

Process priority determines how much CPU time the scheduler allocates. When running intensive background operations (backups, integrity checks, compilation), lowering their priority keeps the cell's foreground services responsive.

# Launch a process with a reduced scheduling priority (nicer to others) nice -n 10 ./heavy-backup.sh # Raise priority of an already-running process (requires root for negative values) sudo renice -n -5 -p 4872 # Verify the nice value of a running process ps -eo pid,ni,cmd | grep backup

Nice values range from -20 (highest priority, least nice to other processes) to +19 (lowest priority, most yielding). Only root can set negative nice values. A standard user can only lower a process's priority (raise its nice value), never raise it. Background maintenance jobs should run at +10 to +19.

Operational Context:

When running intensive operations (backups, checksums, compilation), operators adjust priority so cell services stay responsive.

Self-Check

  1. What command would you use to find all processes owned by www-data, sorted by memory usage, showing PID, command name, and RSS?
  2. What is the difference between SIGTERM and SIGKILL? Why should you always try SIGTERM first?
  3. A process you launched in the background is running, but you need to close your SSH session. What two approaches can you use to prevent it from dying?
  4. What is the difference between systemctl start and systemctl enable?

If you are comfortable with all of these, proceed to ALA-R4: Grid Basics.