FINAL CAPSTONE PROJECT

FAILSAFE

Disaster Recovery Operation

A production domain controller has crashed. Critical services are down. Users cannot authenticate. The clock is ticking. Your mission: execute a complete disaster recovery operation and restore full functionality.

← Back to Course Overview

Incident Report

Incident ID: INC-2026-0131-001
Priority: P1 - Critical
Status: Active - Awaiting Recovery

Summary: At 02:47 AM, monitoring detected that DC02.hexworth.local became unresponsive. Initial diagnosis indicates storage subsystem failure resulting in OS corruption. The server is currently offline. DC01 is operational but showing replication warnings. User authentication is degraded, and several services report connectivity issues.

02:47 AM

DC02 stopped responding to ping and authentication requests

02:48 AM

Monitoring alerts triggered - Multiple service failures detected

02:52 AM

On-call engineer notified - Initial assessment: Storage failure

03:15 AM

Hardware team confirms: RAID controller failure, OS partition corrupted

03:30 AM

Replacement server provisioned - You are now activated for recovery

Mission Objectives

On-Call Prologue Phase 0

Respond to a critical alert from home

Acknowledge the incoming alert
Ping DC01 and DC02 to assess connectivity
Attempt a remote session to DC02
Determine that on-site response is required

Assessment Phase 1

Evaluate current environment state

Verify DC01 is healthy and operational
Check AD replication status
Identify affected services and users
Locate and verify backup availability

Recovery Phase 2

Execute disaster recovery procedures

Deploy replacement server (DC02-NEW)
Restore from Windows Server Backup
Perform authoritative restore if needed
Seize FSMO roles if DC02 held them

Verification Phase 3

Validate recovery success

Test AD replication between DCs
Verify DNS resolution
Confirm user authentication
Test dependent services (DHCP, etc.)

Documentation Phase 4

Complete incident documentation

Document timeline of events
Record all recovery actions taken
Identify root cause
Recommend preventive measures

Cleanup Phase 5

Remove orphaned objects and finalize

Remove old DC02 from AD Sites
Clean up DNS records
Perform metadata cleanup if needed
Update monitoring systems

Prevention Phase 6

Implement preventive measures

Configure improved backup schedule
Set up storage health monitoring
Create runbook for future incidents
Schedule DR drill

Skills Assessment

This capstone tests your mastery of the complete WSA curriculum:

M02: Active Directory M03: Storage Management M07: Monitoring M08: DNS M10: Group Policy M18: PowerShell Automation M19: Troubleshooting Disaster Recovery FSMO Roles AD Replication

Deliverables

Complete all recovery tasks in the simulation
Restore full DC functionality and replication
Verify all dependent services operational
Document incident timeline and actions taken
Submit post-incident report with recommendations

Grading Criteria

Phase	Objectives	Requirements
On-Call Prologue	3	Acknowledge alert, verify DC01 connectivity, confirm DC02 is down, attempt remote session
Assessment	6	Run diagnostics, identify FSMO holders, assess all affected services
Backup & Deployment	5	Select correct backup with valid rationale, configure replacement server with correct IP/DNS
AD / DNS / DHCP Recovery	17	Seize FSMO roles, promote new DC, restore DNS zones, configure DHCP failover
Sites & Verification	11	Update AD Sites, verify GPO/SYSVOL replication, confirm all services healthy
Documentation	5	Incident timeline, recovery actions, root cause with failure type identified, preventive measures

Critical Success Factors

All 47 objectives across 9 phases must be completed to pass. There is no partial credit — this simulates real-world expectations where incomplete recovery is not acceptable. Your elapsed time is recorded but not scored.

Time Consideration

While there is no strict time limit, in a real disaster recovery scenario, every minute of downtime impacts the business. Work efficiently but carefully - a hasty recovery that causes additional issues is worse than a methodical approach.

Ready to Execute Recovery?

The incident is active. Users are waiting. Systems are down. Your expertise is needed now.

Begin FAILSAFE Operation