"Every service in your cell is a process. When a cell malfunctions, the first question is always: what is running, who started it, and what is it doing to the system? This module re-establishes the tools and vocabulary to answer that question under operational pressure."
Every running program is a process with a unique PID (Process ID) and a PPID (Parent Process ID). Processes form a tree rooted at PID 1 (systemd on Ubuntu 22.04). Understanding this tree is the starting point for diagnosing any cell malfunction.
Process states visible in ps output: R (running), S (sleeping, waiting for an event), D (uninterruptible sleep, usually waiting on I/O), Z (zombie, terminated but not yet reaped by parent), T (stopped by a signal). A process stuck in D state is a common sign of disk or NFS issues.
Every service in your cell is a process. When a cell malfunctions, the first question is always "what is running and who started it?"
Signals are asynchronous notifications sent to processes. The operating system, other processes, or the terminal can deliver them. Operators send signals to stop, restart, or terminate services. Always attempt a graceful shutdown before using force.
| Signal | Number | Behavior |
|---|---|---|
| SIGHUP | 1 | Reload configuration (many daemons respond to this) |
| SIGINT | 2 | Terminal interrupt (Ctrl+C) |
| SIGTERM | 15 | Graceful termination request (can be caught) |
| SIGKILL | 9 | Immediate kill (cannot be caught, blocked, or ignored) |
| SIGSTOP | 19 | Suspend process (cannot be caught or ignored) |
Graceful shutdown vs forced kill. SIGTERM asks politely. SIGKILL does not negotiate. Grid operators always try SIGTERM first.
Operators regularly need to launch long-running diagnostics while keeping their shell prompt available. Background job control and process detachment are the tools for this.
The difference between disown and nohup: disown removes the job from the shell's table after the process is already running, preventing SIGHUP on logout. nohup sets up SIGHUP immunity before the process starts and redirects output to nohup.out automatically. Use nohup proactively; use disown to fix a running job you forgot to protect.
You will frequently need to launch long-running diagnostics and return to your shell. Background jobs are standard operational practice.
On Ubuntu 22.04, systemd is PID 1 and manages all services. Every service is a unit with a state. systemctl is the primary interface for inspecting and controlling units. Week 1 goes deep on systemd internals; here we establish the vocabulary.
The distinction between enable and start: start runs the service now. enable creates the symlinks that cause it to start automatically at boot. A service can be started without being enabled (runs now, not after reboot) and enabled without being started (will run after reboot, not now). Use both together with systemctl enable --now sshd.
On modern cells (Ubuntu 22.04), systemd controls every service. Week 1 goes deep; here we establish the vocabulary.
Process priority determines how much CPU time the scheduler allocates. When running intensive background operations (backups, integrity checks, compilation), lowering their priority keeps the cell's foreground services responsive.
Nice values range from -20 (highest priority, least nice to other processes) to +19 (lowest priority, most yielding). Only root can set negative nice values. A standard user can only lower a process's priority (raise its nice value), never raise it. Background maintenance jobs should run at +10 to +19.
When running intensive operations (backups, checksums, compilation), operators adjust priority so cell services stay responsive.
If you are comfortable with all of these, proceed to ALA-R4: Grid Basics.