A production domain controller has crashed. Critical services are down. Users cannot authenticate. The clock is ticking. Your mission: execute a complete disaster recovery operation and restore full functionality.
Incident ID: INC-2026-0131-001
Priority: P1 - Critical
Status: Active - Awaiting Recovery
Summary: At 02:47 AM, monitoring detected that DC02.hexworth.local became unresponsive. Initial diagnosis indicates storage subsystem failure resulting in OS corruption. The server is currently offline. DC01 is operational but showing replication warnings. User authentication is degraded, and several services report connectivity issues.
Respond to a critical alert from home
Evaluate current environment state
Execute disaster recovery procedures
Validate recovery success
Complete incident documentation
Remove orphaned objects and finalize
Implement preventive measures
This capstone tests your mastery of the complete WSA curriculum:
| Phase | Objectives | Requirements |
|---|---|---|
| On-Call Prologue | 3 | Acknowledge alert, verify DC01 connectivity, confirm DC02 is down, attempt remote session |
| Assessment | 6 | Run diagnostics, identify FSMO holders, assess all affected services |
| Backup & Deployment | 5 | Select correct backup with valid rationale, configure replacement server with correct IP/DNS |
| AD / DNS / DHCP Recovery | 17 | Seize FSMO roles, promote new DC, restore DNS zones, configure DHCP failover |
| Sites & Verification | 11 | Update AD Sites, verify GPO/SYSVOL replication, confirm all services healthy |
| Documentation | 5 | Incident timeline, recovery actions, root cause with failure type identified, preventive measures |
All 47 objectives across 9 phases must be completed to pass. There is no partial credit — this simulates real-world expectations where incomplete recovery is not acceptable. Your elapsed time is recorded but not scored.
While there is no strict time limit, in a real disaster recovery scenario, every minute of downtime impacts the business. Work efficiently but carefully - a hasty recovery that causes additional issues is worse than a methodical approach.
The incident is active. Users are waiting. Systems are down. Your expertise is needed now.
Begin FAILSAFE Operation