A Security Operations Center (SOC) is a centralized unit that deals with security issues on an organizational and technical level. The SOC is the first line of defense against cyber threats, providing continuous monitoring, detection, analysis, and response to security incidents.
Core SOC Mission
Protect the organization's assets, data, and reputation through proactive monitoring and rapid incident response.
Primary Functions
Continuous Monitoring: 24/7/365 surveillance of networks, systems, and applications
Threat Detection: Identify potential security incidents through SIEM, IDS/IPS, EDR, and other tools
Incident Response: Rapid triage, investigation, containment, and remediation of threats
Threat Intelligence: Collection and analysis of threat data to anticipate attacks
Vulnerability Management: Identify and prioritize security weaknesses
Compliance: Ensure adherence to security policies and regulatory requirements
Security Tool Management: Maintain and optimize security infrastructure
Key SOC Technologies
SIEM
Security Information & Event Management
Central log aggregation, correlation, and alerting (Splunk, QRadar, Sentinel)
EDR/XDR
Endpoint Detection & Response
Advanced endpoint monitoring and threat hunting (CrowdStrike, SentinelOne)
Automated playbooks and response workflows (Phantom, XSOAR)
SOC Value Proposition
A well-functioning SOC reduces Mean Time to Detect (MTTD) and Mean Time to Respond (MTTR), minimizing the impact of security incidents and protecting business operations.
SOC Organizational Models
Organizations can structure their SOC in various ways depending on resources, expertise, and business requirements. Each model has distinct advantages and challenges.
In-House SOC
Fully owned and operated by the organization with dedicated staff and infrastructure.
✓ Pros:
Full control and customization
Deep organizational knowledge
Immediate access to systems
Better data privacy
✗ Cons:
High operational costs
Recruitment challenges
24/7 staffing requirements
Technology investment
Managed SOC (MSSP)
Outsourced to a Managed Security Service Provider who provides monitoring and response services.
✓ Pros:
Lower upfront costs
Access to expertise
24/7 coverage included
Faster deployment
✗ Cons:
Less control
Limited customization
Data sharing concerns
Dependency on vendor
Hybrid SOC
Combination of in-house and outsourced capabilities, leveraging strengths of both approaches.
✓ Pros:
Balanced control and cost
Flexible scalability
Shared expertise
Risk distribution
✗ Cons:
Complex coordination
Integration challenges
Unclear responsibilities
Communication overhead
Virtual SOC
Distributed team working remotely with cloud-based tools and infrastructure.
✓ Pros:
Global talent access
Lower facility costs
Flexible workforce
Cloud-native tools
✗ Cons:
Communication challenges
Time zone coordination
Remote security risks
Team cohesion
Choosing the Right Model
Factors to Consider:
Budget: Available resources for staff, tools, and infrastructure
Expertise: Internal security talent and hiring capacity
Compliance: Regulatory requirements for data handling
Scale: Organization size and complexity
Risk Tolerance: Acceptable levels of outsourcing and control
Common Pitfall
Many organizations underestimate the total cost of ownership for an in-house SOC. Beyond tools and salaries, consider training, retention, facility costs, and the challenge of maintaining 24/7 coverage.
SOC Roles and Tier Structure
Most SOCs operate with a tiered structure where analysts are organized by skill level and responsibility. This creates clear escalation paths and ensures appropriate expertise is applied to each incident.
Tier 1 - Alert Analyst
Front Line Defense
Primary Responsibilities:
Monitor SIEM and security tool alerts
Perform initial alert triage
Classify alerts (TP/FP/BTP)
Document findings in tickets
Escalate confirmed threats
Follow established runbooks
Basic log analysis
Skills Required: Basic security concepts, log analysis, ticketing systems, communication
Lateral moves are also common: Threat Intelligence, Security Engineering, Penetration Testing, GRC
Alert Lifecycle and Workflow
Understanding the complete lifecycle of a security alert is fundamental to SOC operations. Every alert follows a structured path from initial detection through final resolution.
Standard Alert Lifecycle
1. Detection
Alert generated by SIEM, EDR, IDS, or other security tool
Closure Checklist: Threat eradicated, systems restored, monitoring in place, stakeholders notified, documentation complete
Alert Triage Methodology
Triage is the most critical skill for Tier 1 analysts. Effective triage reduces noise, prevents alert fatigue, and ensures real threats are escalated promptly. Poor triage leads to missed incidents or wasted resources.
The Triage Decision Tree
START: New Alert Received
↓
Question 1: Is this activity malicious or suspicious?
↓
NO → FALSE POSITIVE
→ Document reason
→ Tune detection rule if needed
→ Close ticket
↓
YES → Continue
↓
Question 2: Is this activity authorized or expected?
↓
YES → BENIGN TRUE POSITIVE
→ Verify authorization
→ Document exception
→ Add to whitelist if recurring
→ Close ticket
↓
NO → TRUE POSITIVE
→ Assess severity and urgency
→ Escalate to Tier 2
→ Begin containment if time-critical
Detailed Triage Categories
1. True Positive (TP) - Real Threat
Indicators:
Known malicious IP/domain communication
Malware file hash match on VirusTotal
Exploitation of known vulnerability
Credential theft or brute force success
Unauthorized data access or exfiltration
Command-and-control (C2) beaconing
Example: EDR alert for PowerShell executing encoded commands, investigation shows download of Cobalt Strike beacon from known malicious domain.
Action: Escalate immediately with HIGH severity. Include all context: affected user/host, IOCs, initial containment actions.
2. False Positive (FP) - Benign Activity
Common Causes:
Overly broad detection signatures
Legitimate tools flagged as malicious (Admin tools, pentesting software)
Normal business processes triggering behavioral rules
Outdated threat intelligence (old IOCs)
Misconfigured security tools
Example: IDS alert for SQL injection, investigation shows automated vulnerability scanner from authorized security team.
Action: Close ticket as FP. Document the reason. Submit rule tuning request to reduce future FPs. Consider whitelisting source.
3. Benign True Positive (BTP) - Authorized But Flagged
Common Scenarios:
IT admin using remote access tools outside business hours
Developer accessing production database per change control
Security team running penetration tests
Authorized third-party vendor access
Unusual but legitimate user travel (VPN from foreign country)
Example: Alert for abnormal login time and location. User confirms they are traveling internationally for business.
Action: Verify authorization through ticketing system, email, or manager confirmation. Document justification. Add exception if recurring.
Triage Best Practices
Speed vs. Accuracy
Balance is critical. Rapid triage prevents alert backlog, but rushing leads to missed threats. Aim for 5-15 minutes per alert depending on complexity.
Context is Everything
Never make decisions based solely on the alert. Check user role, system criticality, time of day, geolocation, recent changes.
Document Everything
Your notes may be reviewed during audits or legal proceedings. Include what you checked, why you made your decision, and next steps.
When in Doubt, Escalate
It's better to escalate a questionable alert than to close a real incident. Tier 2 can always de-escalate if needed.
Common Triage Mistakes
Confirmation Bias: Seeing what you expect rather than what's there (assuming all alerts are FPs)
Alert Fatigue: Closing alerts without investigation due to high volume
Insufficient Context: Making decisions without checking related logs
Over-Reliance on Severity: Dismissing low-severity alerts that are part of a larger attack
Poor Documentation: Not recording triage logic for future reference
Escalation Procedures and Criteria
Effective escalation ensures that the right expertise is applied to each incident while preventing bottlenecks. Understanding when, how, and to whom to escalate is essential for SOC efficiency.
Escalation Decision Matrix
LOW Severity
Tier 1 Handles
Single user affected, no data loss, known FP pattern, standard remediation available
Example: Phishing email blocked by gateway
MEDIUM Severity
Escalate to Tier 2
Multiple users, confirmed malware, lateral movement indicators, requires investigation
Example: Trojan detected on workstation, contained but needs analysis
HIGH Severity
Escalate to Tier 3
Critical systems, data exfiltration, advanced techniques, zero-day exploit
Example: Ransomware encryption across file servers
CRITICAL Severity
Incident Commander
Active breach, widespread impact, executive involvement, regulatory reporting needed
Example: Nation-state APT with confirmed data exfiltration
Escalation Criteria by Category
Technical Escalation (Tier 1 → Tier 2)
Escalate when:
Confirmed malware that bypassed preventive controls
Successful exploitation of a vulnerability
Evidence of credential compromise or privilege escalation
Lateral movement between systems detected
Data exfiltration indicators (large uploads, unusual protocols)
Multiple related alerts suggesting coordinated attack
Investigation requires forensic tools or deep analysis
Runbook doesn't cover the scenario
Management Escalation (SOC → Leadership)
Escalate when:
Incident affects executive systems or data
Business-critical systems are compromised or unavailable
Suspected data breach requiring regulatory notification
Media or public attention likely
Attack suggests targeted campaign or APT
Financial fraud or wire transfer compromise
Response requires significant business decisions (shutdown systems, notify customers)
External Escalation
Legal / Compliance
PII/PHI data breach
Regulatory reporting required (GDPR, HIPAA, PCI)
Law enforcement involvement needed
Contractual breach notification
Executive Management
Business continuity impact
Reputational risk
Strategic decision required
Major financial impact
IT Operations
System patching required
Network changes needed
Service restoration
Configuration changes
External IR / Law Enforcement
Capabilities exceeded
Criminal investigation
Advanced forensics needed
Nation-state actor
Effective Escalation Communication
GOOD Escalation Example:
SUBJECT: [HIGH] Confirmed Malware - User jdoe - Finance Workstation
SUMMARY: Tier 1 confirmed malware on Finance user workstation. System isolated.
Escalating for malware analysis and scope determination.
DETAILS:
- Alert: EDR behavioral detection "Suspicious PowerShell Activity"
- User: jdoe (Finance Department - Payroll Access)
- Host: FIN-WS-042 (10.20.30.42)
- Time: 2025-12-21 14:32 UTC
- Initial Triage: User clicked email link, PowerShell downloaded and executed file from
hxxp://malicious-domain[.]com/payload.exe
- VirusTotal: 42/70 engines detect as Emotet variant
- Actions Taken: Host isolated via EDR, user notified, manager informed
- Urgency: User has access to payroll systems and bank account information
ESCALATION REASON: Requires malware analysis, lateral movement check, and
credential reset scope determination.
ATTACHMENTS: Screenshot of EDR alert, VirusTotal report, initial timeline
BAD Escalation Example (Don't Do This):
SUBJECT: Alert
There's an alert on some computer. Looks bad. Can someone check it out?
Problems: No context, no urgency, no details, no actions taken, unprofessional
Escalation Best Practices
Include all relevant context (who, what, when, where, why)
Provide your initial assessment and recommendations
Follow your SOC's escalation SLA (typically 15-30 min for high severity)
Shift Handoff and Communication
Effective shift handoff is critical in a 24/7 SOC. Poor handoffs lead to missed incidents, duplicated work, and alert escalation delays. Treat handoff as a formal process, not an afterthought.
Why Shift Handoff Matters
Consequences of Poor Handoff:
Ongoing incidents fall through the cracks
Next shift re-investigates already triaged alerts (wasted effort)
Context is lost, delaying incident response
Escalations are delayed or forgotten
Alert fatigue increases from duplicate work
Shift Handoff Components
1. Ongoing Incidents
For each active incident, document:
Incident ID and Summary: Ticket number and one-line description
Current Status: Investigation, containment, waiting for external input
Actions Taken: What has been done so far
Next Steps: What needs to happen next and by when
Waiting On: Any blockers or dependencies (IT team, vendor response)
Severity/Urgency: How critical is immediate action
2. Pending Alerts
Alert queue status:
Backlog Count: How many alerts are still untriaged
Priority Alerts: Any high-severity alerts that need immediate attention
Trends: Spike in specific alert types (may indicate ongoing attack)
Known Issues: Tool malfunctions causing alert storms
3. Escalations and Follow-ups
Track escalated items:
Tickets escalated to Tier 2/3 awaiting response
Items escalated to IT/management needing follow-up
Expected callback times from vendors or external teams
Scheduled maintenance or changes that may cause alerts
4. Environmental Notes
Situational awareness:
Scheduled maintenance windows (patching, upgrades)
Known false positive sources being investigated
New detection rules deployed (may cause alert increase)
SOC SHIFT HANDOFF REPORT
Shift: Day Shift (08:00 - 16:00 UTC)
Date: 2025-12-21
Analyst: Sarah Chen
Next Shift: Evening (16:00 - 00:00 UTC) - Mike Johnson
═══════════════════════════════════════════════════════════════
ONGOING INCIDENTS (Action Required):
[INC-12456] HIGH - Suspected Credential Stuffing Attack
Status: Investigation in progress
Summary: Multiple failed logins from distributed IPs targeting VPN portal
Actions Taken:
- Identified 47 targeted accounts
- Blocked 23 malicious IPs via firewall
- Notified affected users to reset passwords
Next Steps:
- Tier 2 performing log analysis for successful logins (ETA 17:00)
- Monitor for additional login attempts
Urgency: HIGH - Active attack
[INC-12461] MEDIUM - Malware Quarantined on HR Workstation
Status: Containment complete, awaiting final verification
Summary: Emotet trojan quarantined by EDR, no execution occurred
Actions Taken:
- EDR quarantined file automatically
- Verified no C2 communication
- User educated on phishing
Next Steps:
- IT patching system tonight
- Close ticket after patch verification tomorrow
Urgency: LOW - Contained
═══════════════════════════════════════════════════════════════
ALERT QUEUE STATUS:
Total Pending: 12 alerts
- HIGH: 0
- MEDIUM: 3 (prioritize IDS alerts from DMZ)
- LOW: 9
Trend: Increase in blocked phishing emails (34 today vs. 12 avg)
- Appears to be targeting Finance department
- Threat intel notified, investigating campaign
═══════════════════════════════════════════════════════════════
ESCALATIONS PENDING RESPONSE:
- Ticket #12450: Escalated to IT Ops for patch deployment (waiting since 12:00)
- Ticket #12455: Escalated to Tier 2 for C2 beacon analysis (under investigation)
═══════════════════════════════════════════════════════════════
ENVIRONMENTAL NOTES:
- Scheduled firewall maintenance 20:00-22:00 (may lose connectivity alerts)
- New SIEM rule deployed for detecting Kerberoasting (may see initial FPs)
- CISO requested daily summary of phishing metrics (send by EOD)
═══════════════════════════════════════════════════════════════
METRICS (This Shift):
Alerts Triaged: 87
- True Positives: 3
- False Positives: 76
- Benign True Positives: 8
Incidents Created: 4
Escalations: 2
MTTD: 8 minutes
MTTR: 32 minutes
═══════════════════════════════════════════════════════════════
Questions? Contact me: sarah.chen@company.com / ext. 5423
Handoff Best Practices
Document During Shift
Don't wait until handoff time to document. Update your handoff notes throughout the shift so you don't forget critical details.
Verbal + Written
Overlap shifts if possible for 15-30 min verbal handoff. Walk through critical items. Written doc is backup, not replacement.
Flag the Critical
Clearly mark items needing immediate attention. Use severity tags, bold text, or highlighting so they stand out.
Encourage Questions
Make sure incoming shift understands and has your contact info. Ambiguity leads to mistakes.
Common Handoff Failures
"Everything's fine": Even if quiet, document what you checked and current queue status
Incomplete Context: "Some server has an issue" - be specific about what, where, severity
No Next Steps: Leaving incoming shift to figure out what to do next
Lost Escalations: Forgetting to mention items sent to Tier 2 or management
Tribal Knowledge: Assuming next shift knows about ongoing situations or tool quirks
SOC Metrics and Key Performance Indicators
Metrics provide visibility into SOC performance, identify areas for improvement, and demonstrate value to stakeholders. However, metrics must be meaningful and actionable—avoid "vanity metrics" that look good but don't drive improvement.
Core SOC Metrics
1. Mean Time to Detect (MTTD)
MTTD
12 minutes
Definition: Average time from when an attack begins to when it's detected
Calculation: Sum of (Detection Time - Incident Start Time) / Number of Incidents
Target: <15 minutes for most organizations, <5 minutes for high-security environments
Why It Matters: Faster detection limits attacker dwell time and reduces damage
Open incidents not yet resolved. Should trend toward zero.
Recurring Incidents
18%
Percentage of repeat incidents. Indicates root cause not addressed.
5. Coverage and Visibility Metrics
Log Source Coverage
94% of critical assets sending logs to SIEM
Target: >95% for critical systems, >80% for all systems
EDR Deployment
98% of endpoints with EDR agent installed
Target: >95% for workstations, 100% for servers
MITRE ATT&CK Coverage
78% of techniques have detection coverage
Target: >70% overall, >90% for high-priority techniques
Detection Rule Health
87% of rules firing in last 30 days
Dead rules should be reviewed and retired/improved
6. Analyst Performance Metrics
Use with Caution: Analyst metrics can be helpful for training but should never be weaponized. Focusing too heavily on individual metrics creates perverse incentives (e.g., closing tickets quickly without proper investigation).
Alerts Triaged per Shift: Productivity indicator (typical: 50-100 depending on complexity)
Triage Accuracy: Percentage of triage decisions upheld by Tier 2 review
Escalation Rate: Percentage of alerts escalated (typical: 5-15%)
Documentation Quality: Completeness of ticket notes (subjective, peer-reviewed)
SLA Compliance: Percentage of alerts triaged within SLA time (e.g., 15 min)
How to Use Metrics Effectively
Trend Over Time: Don't obsess over single data points. Look for trends over weeks/months.
Context Matters: Spike in alerts may be due to new detection rule, not worsening security.
Drive Action: Metrics should lead to concrete improvements (tuning, training, tool changes).
Communicate Value: Use metrics to show leadership the SOC's impact and justify resources.
Balance Leading and Lagging: MTTD/MTTR are lagging (measure past). Coverage is leading (predict future).
Avoid Vanity Metrics: "Blocked 1 million threats" sounds impressive but lacks context. What's the trend? The impact?
Dashboard Example
SOC WEEKLY DASHBOARD - Week of 2025-12-15
┌─────────────────────────────────────────────────────────────┐
│ DETECTION & RESPONSE │
├─────────────────────────────────────────────────────────────┤
│ Mean Time to Detect (MTTD): 11 min [↓ -2 min] ✓ │
│ Mean Time to Respond (MTTR): 38 min [↓ -7 min] ✓ │
│ Incidents Created: 18 [↑ +3] │
│ Critical Incidents: 1 [→ same] │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ ALERT METRICS │
├─────────────────────────────────────────────────────────────┤
│ Total Alerts: 8,734 [↑ +12%] │
│ True Positive Rate: 3.8% [→ same] │
│ False Positive Rate: 88.1% [↑ +3%] │
│ Avg Triage Time: 7 min [→ same] │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ COVERAGE │
├─────────────────────────────────────────────────────────────┤
│ Log Source Coverage: 94% [→ same] │
│ EDR Agent Deployment: 97% [↑ +1%] ✓ │
│ Detection Rules Active: 247 [↑ +5] │
└─────────────────────────────────────────────────────────────┘
KEY FINDINGS:
✓ Response times improved due to new SOAR playbooks
Alert volume spike from new cloud monitoring (tuning in progress)
FP rate increase due to new rule deployment (tuning scheduled)
SOC Runbooks and Playbooks
Runbooks and playbooks are essential SOC documentation that standardizes incident response, reduces decision fatigue, and ensures consistent handling of common scenarios. They're especially critical for Tier 1 analysts who may encounter unfamiliar situations.
Runbook vs. Playbook
Runbook
Step-by-Step Investigation Guide
Purpose: Provide detailed instructions for investigating a specific alert type or scenario
Scope: Single alert type or specific detection rule
Audience: Primarily Tier 1 analysts
Example: "Runbook for Investigating Brute Force Login Alerts"
Contents: Triage steps, data to collect, decision tree, escalation criteria
Playbook
End-to-End Response Workflow
Purpose: Orchestrate complete incident response from detection through recovery
Scope: Entire incident category or attack type
Audience: All SOC tiers, IR team, IT, management
Example: "Ransomware Incident Response Playbook"
Contents: Response phases, roles & responsibilities, communication plan, containment/eradication steps
Sample Runbook: Malware Detection Alert
RUNBOOK: EDR Malware Detection Alert
Version: 2.1 | Last Updated: 2025-12-01 | Owner: SOC Team
═══════════════════════════════════════════════════════════════
1. INITIAL TRIAGE (5 minutes)
□ Verify alert details in EDR console (CrowdStrike/SentinelOne)
- Alert timestamp, severity, detection method
- Affected hostname and IP address
- Username and user department
- Malware family/type identified
- Current host status (online, isolated, offline)
□ Check VirusTotal/hybrid-analysis.com for file hash
- Upload hash (NOT the file itself)
- Document detection ratio (e.g., 45/70)
- Note malware family classification
□ Determine if system is already isolated
- If YES: Proceed to investigation
- If NO: Consider immediate isolation if high severity
DECISION POINT:
- 0-10 vendors detect: Likely FP → Verify with vendor, document, close
- 10-30 vendors detect: Investigate further (proceed to step 2)
- 30+ vendors detect: Confirmed malware → Isolate immediately, escalate
═══════════════════════════════════════════════════════════════
2. CONTEXT GATHERING (10 minutes)
□ User Information
- Check Active Directory: User role, department, privileges
- Contact user: Ask about recent downloads, email clicks
- Recent access: Check if user accessed sensitive systems
□ Host Information
- Asset criticality: Production server? Executive workstation?
- Check CMDB for system purpose and data classification
- Recent changes: Software installs, patches applied?
□ Malware Execution Status
- Was file executed or just downloaded?
- Check EDR process tree for indicators of execution
- Look for persistence mechanisms (registry, scheduled tasks)
□ Network Activity
- Check firewall/proxy logs for C2 communication
- Look for data exfiltration (large uploads, unusual protocols)
- Identify any lateral movement attempts
DECISION POINT:
- File quarantined before execution: Lower priority, likely can handle at Tier 1
- File executed with C2 communication: ESCALATE TO TIER 2 IMMEDIATELY
- Multiple hosts affected: ESCALATE TO TIER 2 IMMEDIATELY
═══════════════════════════════════════════════════════════════
3. CONTAINMENT (If Not Already Isolated)
□ For confirmed malware with execution:
- Isolate host via EDR (prevents network communication)
- Disable user account in AD
- Block C2 IPs/domains at firewall
- Alert other teams (IT, IR) via Slack/email
□ Document containment actions with timestamps
═══════════════════════════════════════════════════════════════
4. TRIAGE CLASSIFICATIONFALSE POSITIVE - Close ticket if:
✓ Low VirusTotal detection (<10)
✓ File is known legitimate software
✓ Vendor confirms as FP
→ Action: Document reason, submit FP to vendor, close
TRUE POSITIVE (Tier 1 Handles) - Continue if:
✓ Malware quarantined before execution
✓ No C2 communication detected
✓ Single host affected
✓ Not a critical system or privileged user
→ Action: Proceed to step 5
TRUE POSITIVE (Escalate to Tier 2) - Escalate if:
✗ Malware executed successfully
✗ C2 communication detected
✗ Multiple hosts affected
✗ Critical system or executive user
✗ Ransomware indicators
→ Action: Create high-priority escalation ticket with all context
═══════════════════════════════════════════════════════════════
5. REMEDIATION (Tier 1 - Simple Cases Only)
□ Verify malware quarantined by EDR
□ Run full endpoint scan to ensure no other malware
□ Check for scheduled tasks, registry run keys (persistence)
□ User education: Send phishing awareness reminder
□ Document incident in ticket with full timeline
□ Un-isolate host once verified clean
□ Re-enable user account
□ Monitor for 24 hours for re-infection
═══════════════════════════════════════════════════════════════
6. DOCUMENTATION
Required fields in ticket:
- Malware family and hash
- VirusTotal detection ratio
- Execution status (yes/no)
- C2 communication (yes/no)
- User notification (yes/no)
- Containment actions taken
- Final disposition (TP/FP/BTP)
═══════════════════════════════════════════════════════════════
ESCALATION CRITERIA:
→ Malware executed + C2 communication
→ Ransomware indicators
→ Multiple hosts infected
→ Executive or critical system
→ Unusual/unknown malware family
→ Analyst unsure of next steps
CONTACTS:
Tier 2 Escalation: tier2@company.com / Slack #soc-tier2
EDR Support: edr-support@company.com
User Support: helpdesk@company.com
Key Elements of Effective Runbooks
Clear Steps
Use checkboxes, numbered steps, and action verbs. Avoid ambiguity like "check for suspicious activity"—specify what to check and where.
Time Estimates
Help analysts manage their time and identify when they're going down a rabbit hole.
Decision Points
Clearly defined "if this, then that" logic helps analysts make confident triage decisions.
Examples
Include screenshots, sample logs, and example scenarios to illustrate concepts.
Contact Info
Who to escalate to, who to call for help, and where to find additional resources.
Version Control
Date, version number, and owner. Outdated runbooks are worse than no runbooks.
Common Runbook Topics
Brute Force / Password Spray Attacks
Phishing Email Investigations
Malware Detection (EDR/AV alerts)
Suspicious PowerShell Activity
Unusual Login Location / Impossible Travel
DDoS Attack Response
Data Exfiltration Indicators
Privilege Escalation Attempts
Web Application Attacks (SQLi, XSS)
Insider Threat Indicators
Runbook Maintenance Best Practices
Review and update runbooks quarterly or after major incidents
Incorporate lessons learned from post-incident reviews
Get feedback from analysts who actually use them
Test runbooks during tabletop exercises
Retire outdated runbooks rather than letting them accumulate
Make runbooks easily searchable (wiki, confluence, SharePoint)
Include runbooks in new analyst onboarding training
Runbook Anti-Patterns
Too Vague: "Investigate the alert" without specifics
Too Rigid: No room for analyst judgment or unusual scenarios
Outdated: References tools or processes no longer in use
Overly Complex: 50-page document when 2 pages would suffice
No Ownership: No one accountable for keeping it current
Analyst Well-Being and Burnout Prevention
SOC analyst burnout is a critical issue in cybersecurity. The combination of high stress, shift work, alert fatigue, and constant exposure to threats takes a toll. Sustainable SOC operations require proactive attention to analyst well-being.
The SOC Burnout Crisis
Industry Statistics:
Average SOC analyst tenure: 18-24 months (high turnover)
70% of SOC analysts report high stress levels
Alert fatigue cited as top reason for leaving SOC roles
24/7 shift work disrupts sleep and personal life
Constant exposure to threats can lead to anxiety and cynicism
Primary Burnout Contributors
1. Alert Fatigue
The Problem: Hundreds or thousands of alerts daily, most of which are false positives. Analysts become desensitized and may miss real threats.
Solutions:
Aggressive false positive tuning and alert reduction programs
Automation of low-value triage tasks via SOAR
Alert prioritization and risk-based routing
Regular "alert health" reviews to retire noisy, low-value rules
Give analysts permission to question and challenge unhelpful alerts
2. Shift Work Challenges
The Problem: 24/7 coverage requires night shifts, weekend work, and rotating schedules that disrupt circadian rhythms and personal life.
Solutions:
Limit consecutive night shifts (no more than 3-4 in a row)
Allow analyst input on scheduling preferences when possible
Provide adequate shift differential pay for nights/weekends
Consider "follow-the-sun" model with geographically distributed teams
Ensure minimum time off between shifts (8-12 hours)
Provide quiet break rooms and encourage regular breaks
3. Lack of Career Growth
The Problem: Analysts feel stuck in reactive triage work with no clear path forward, leading to frustration and attrition.
Cynicism: "Nothing matters," "all alerts are false positives," detachment from impact of work
Irritability: Short temper, conflicts with colleagues, negative attitude
Absenteeism: Calling in sick more often, dreading going to work
If you notice these signs in yourself or colleagues, speak up and seek support.
Remember: You're Protecting People
SOC work can feel thankless—most of what you do prevents incidents that never happen. But your work matters enormously. You protect:
Customer data and privacy
Employee personal information
Business operations and revenue
Your organization's reputation
Jobs and livelihoods of your colleagues
Your vigilance, even when handling the 100th false positive of the day, keeps the organization safe. That's valuable work worthy of respect and sustainability.
Module Complete!
You've finished the SOC Operations presentation. You now understand the structure, workflows, and daily realities of Security Operations Center analyst work.
Key Takeaways:
SOC structure and organizational models
Tier 1/2/3 roles and responsibilities
Alert lifecycle and triage methodology
Escalation criteria and communication
Shift handoff best practices
SOC metrics and KPIs (MTTD, MTTR)
Runbooks and playbooks for consistency
Analyst well-being and burnout prevention
Click the button below to mark this module as complete and earn your achievement!