Chapter 11: Troubleshooting Methodology

A+ Core 1 — 220-1101  |  Objectives 5.1, 5.2
Chapter 11:
Troubleshooting Methodology
The systematic, six-step approach to diagnosing and resolving hardware problems. CompTIA's defined methodology is one of the most tested topics on the A+ exam.
21 Slides Objectives 5.1 & 5.2 6-Step Process • POST • RAM • PSU • Cooling Exam 220-1101
Slide 2 of 21
Why Methodology Matters
Random guessing wastes time, risks data loss, and violates policy. Structure solves problems faster.
Consistency
A defined process produces repeatable results regardless of who performs the work. A junior tech following the methodology correctly will outperform an experienced tech who guesses — because guessing creates new problems while chasing the original one.
Protection
The methodology requires backups before changes, documentation throughout, and policy review before acting. These steps protect the user's data, protect the technician legally, and ensure corporate compliance is never violated even during urgent repairs.
Escalation Path
When testing confirms the issue is beyond your expertise, the methodology tells you to escalate — not guess further. Knowing when to stop is as valuable as knowing how to fix. Escalation is not failure; it is correct process execution.
A technician skips the backup step and directly reinstalls Windows to fix a BSOD. The user's documents were not in the default location. Data is gone. The fix is correct but the method failed. The 6-step methodology exists because "just fixing it" has a cost when it goes wrong.
Slide 3 of 21
The 6-Step Methodology
CompTIA's defined order. Scenario questions will test whether you know what comes FIRST.
1
Identify the Problem — Question the user, gather information, review logs, perform backups before any changes.
2
Establish a Theory of Probable Cause — Question the obvious first. Consider multiple possibilities. Research internally and externally.
3
Test the Theory to Determine Cause — Confirm or disprove the theory. If not confirmed, re-establish a new theory or escalate.
4
Establish a Plan of Action and Implement — Create a resolution plan. Consider impact on other systems. Execute with vendor guidance.
5
Verify Full System Functionality — Test the fix thoroughly. Implement preventive measures. Confirm user satisfaction.
6
Document Findings, Actions, and Outcomes — Record the problem, steps taken, and resolution. Update the knowledge base.
Memory Aid
"I Eat Tacos Every Very Day" — Identify, Establish theory, Test theory, Establish plan, Verify, Document
1. Identify Problem + Backup 2. Theory Probable cause 3. Test Confirm or revise 4. Plan Establish + execute 5. Verify Full functionality 6. Document Never skip this If Step 3 fails: return to Step 2 with a new theory or escalate
Slide 4 of 21
Step 1: Identify the Problem
Gather information before touching anything. The first question is always: what changed?
Question the User
Ask open-ended questions first: "What were you doing when this started?" Then specific: "Has anything changed recently — new software, hardware, updates?" Users often know the cause without realizing it. Never make them feel blamed; just gather facts.
Review Logs & Environment
Check Windows Event Viewer, application error logs, and system logs for timestamps that correlate with the problem. Ask about environmental changes: power outages, office moves, temperature changes, new equipment nearby. Infrastructure changes by another team often cause symptoms.
Backup First
Before making any changes, create a backup of critical data. This is non-negotiable even when time pressure is high. A failed repair that destroys data is far worse than a delayed repair. Document what was backed up, where, and when.
Exam Alert
Scenario: "A user's PC won't boot after a Windows update last night. What should you do FIRST?" — Identify the problem by questioning the user and reviewing update logs. Backup comes before any repair attempt.
Slide 5 of 21
Steps 2 & 3: Theory and Testing
Form the most likely explanation, then prove or disprove it with a controlled test.
Step 2 — Establish Theory
Start with the simplest, most obvious explanation. Is it plugged in? Is the cable seated? Is the service running? Avoid complex theories until simple ones are ruled out. Consider multiple possibilities simultaneously, then rank by probability. Research manufacturer documentation, known issues databases, and forums if needed.
Step 3 — Test the Theory
Make one change at a time. If the theory is confirmed — the change fixes the problem — proceed to Step 4. If not confirmed, re-establish a new theory and test again. If you exhaust your theories or the problem exceeds your expertise, escalate to a senior technician or specialist rather than continuing to guess.
Key Principle
Change one variable at a time. If you swap RAM and update drivers simultaneously and the problem goes away, you do not know which action fixed it. This makes future troubleshooting of the same symptom harder, not easier.
Theory: "The monitor is getting no signal because the PCIe cable to the GPU came loose." Test: Power off, reseat the cable, power on. Result: No change — still no signal. New theory required. Now check if iGPU output works, which would point to a GPU failure rather than a cable issue.
Slide 6 of 21
Steps 4, 5 & 6: Plan, Verify, Document
Executing the fix is only one-third of the final phase — verification and documentation are equally required.
Step 4 — Plan of Action
Before implementing, consider: Does this fix require management approval? Will it affect other users or systems? Are vendor instructions required? What is the rollback plan if it fails? Corporate policy may prohibit certain actions without change control sign-off. Never bypass policy even if you know the fix.
Step 5 — Verify
Test that the original problem is resolved. Run the affected application, reproduce the previous scenario, and confirm the symptom is gone. Check for new problems introduced by the fix — a common exam scenario. Ask the user to confirm the system works for their workflow, not just your test.
Step 6 — Document
Record: exact symptoms reported, all steps taken (including failed attempts), root cause identified, resolution applied, and outcome. Update the knowledge base or ticketing system. Documentation builds organizational knowledge, enables future troubleshooting, and provides a record if the issue recurs.
Never Skip
Documentation is the step most skipped under time pressure. The A+ exam treats it as mandatory. "The technician fixed the problem but did not document it" is always the wrong answer in scenario questions.
Slide 7 of 21
Common Hardware Symptoms
Map symptoms to likely components. The exam presents symptoms and asks you to identify probable cause.
Symptom Likely Cause Immediate Action
POST beep codesCPU, RAM, or video card failureLook up beep pattern in motherboard documentation
BSOD / spinning pinwheelRAM failure, driver error, OS corruptionNote error code; test RAM with MemTest86
Black screen at bootVideo card, monitor, or power issueCheck cable, test with different monitor
No power at allPSU, power cable, outletTry different outlet and cable; test PSU
Intermittent shutdownOverheating, failing RAM, PSUMonitor temps; test components individually
Grinding noiseHDD mechanical failure imminentBack up data immediately; replace drive
Burning smellPSU or motherboard failureShut down immediately; do not power on
Inaccurate date/timeDead CMOS battery (CR2032)Replace coin cell battery
Capacitor swellingMotherboard failureReplace motherboard
Slide 8 of 21
Immediate Shutdown Required
Some symptoms demand power-off before any other action. Delay causes irreversible damage.
Burning Smell or Smoke
Indicates active component failure generating heat beyond design limits. Most commonly PSU or motherboard. Shut down immediately and disconnect from mains. Do not power on again until the failed component is identified and replaced. Never ignore a burning smell and hope it resolves.
Electrical Sparks or Arcing
Short circuit in progress. Can cause fire, destroy connected components, and risk personal safety. Shut down and disconnect power immediately. If there is fire, use a Class C (electrical) extinguisher — never water. Report the incident per company safety procedures.
Grinding or Clicking From HDD
Read/write heads are physically contacting platters or the motor is failing. Data loss is imminent, not eventual. Do not run defragmentation or disk utilities — this accelerates failure. Back up immediately using the most direct method available (cloning or file copy), then replace the drive.
Swollen Battery (Mobile)
Lithium battery undergoing thermal runaway or chemical decomposition. The battery is a fire and explosion hazard. Stop using the device immediately. Do not charge it. Do not puncture or compress the battery. Transport to an electronics recycling facility for safe disposal. This applies to laptops and phones.
Safety Rule
When in doubt, power off first and investigate second. A few minutes of downtime is recoverable. A fire or data loss from continued operation is not.
Slide 9 of 21
POST and BIOS/UEFI Problems
Power-On Self-Test runs every boot and signals failure via beeps, codes, or frozen screens.
What POST Tests
On every power cycle the BIOS/UEFI runs POST to verify CPU functionality, RAM integrity, video card presence, keyboard detection, and storage detection. A successful POST results in one short beep (on most systems) and handoff to the bootloader. POST failure halts the boot process.
Beep Codes
Codes vary by BIOS manufacturer (AMI, Award, Phoenix). Always consult the motherboard manual. Common patterns: 1 short = OK; continuous beeping = RAM not seated; 1 long + 2-3 short = video card issue; no beep + no display = CPU or motherboard failure. POST cards (PCI/USB) display two-digit codes when beeps are absent.
BIOS/UEFI Issues
Out-of-date BIOS can cause hardware compatibility failures. Update by flashing — but only when necessary and with stable power (use a UPS). A failed flash can permanently brick the board. Lost settings after shutdown indicate a dead CR2032 CMOS battery. Boot order misconfiguration causes "no bootable device" errors.
Exam Alert
"One beep is GOOD." A single short beep at boot means POST passed. Multiple beeps or continuous beeping indicate hardware failure. Know that beep code patterns differ between BIOS vendors — always reference the specific board's documentation.
Power On POST Beep? Hardware check YES (1 short) Boot to OS NO / Multiple Continuous: RAM 1L+2S: Video Card No beep + no display: CPU / Motherboard
Slide 10 of 21
Motherboard & CPU Troubleshooting
Port failures, swollen capacitors, overheating, and diagnostic LED indicators.
I/O Port Failures
Systematic approach: reseat cables; inspect for bent or broken pins; verify the port is enabled in Device Manager; check if disabled in BIOS; replace cable; use a loopback plug to test port functionality in isolation. Loopback plugs connect transmit to receive and confirm the port itself is working, separate from the device.
Swollen Capacitors
Capacitors on a healthy board have flat tops. Bulging, leaking, or domed-top capacitors indicate the motherboard is failing. Replace the board — do not attempt capacitor repair without specialized training. Capacitors store lethal charge even after power-off. This is a visual inspection finding that leads directly to board replacement.
CPU Problems
Overheating is the most common CPU issue: check thermal paste application, heat sink seating, and fan operation. Overclocking instability: reset to default speeds in BIOS. Incompatibility: verify CPU is on the motherboard's qualified vendor list (QVL). Bent pins on PGA sockets (AMD AM4): inspect carefully; bent pins can often be straightened with a fine tool.
Diagnostic LEDs
Many modern motherboards have onboard diagnostic LEDs or two-digit debug displays. Green typically means system OK or standby power present. Red indicates a detected error. Amber or orange often indicates a warning state or USB power issue. Consult the board manual for the specific LED meaning — implementations vary by manufacturer.
Slide 11 of 21
RAM Troubleshooting
Memory failures disguise themselves as software problems — BSOD, random crashes, application errors.
Failure Symptoms
Blue Screen of Death (Windows) or spinning pinwheel crash (macOS). General Protection Faults. Random system lockups or unexpected reboots. Application crashes with no clear pattern. Corrupted files after write operations. The symptom appears to be software but RAM is the actual cause because software uses memory addresses that return incorrect data.
Diagnostic Tools
Windows Memory Diagnostic (mdsched.exe): built-in, runs at next reboot, basic test. MemTest86: bootable, comprehensive, the gold standard for RAM testing. Run for at least one full pass. Test one module at a time to isolate the failing stick. Try known-good RAM in the same slot to rule out a slot fault vs. a stick fault.
1
Run Windows Memory Diagnostic (mdsched.exe) or MemTest86 to confirm RAM is the issue.
2
Test modules one at a time. Remove all but one stick, boot, and test. Rotate through each module.
3
Try the suspected bad stick in a different slot to distinguish slot failure from module failure.
4
Verify RAM compatibility: matching speeds, timings, and voltage. Check motherboard QVL list.
Slide 12 of 21
Power Supply Troubleshooting
PSU failures range from total dead-system to intermittent crashes under load.
No Power at All
Verify the obvious first: is it plugged in? Is the outlet live (test with another device)? Is the rear PSU switch set to I (on)? Try a different power cable. If still no power, the PSU may be failed. Test with a known-good PSU or a dedicated PSU tester tool that validates all voltage rails.
Fan Spins But No Boot
Standby voltage (+5VSB) works but main rails may not be within spec. Check all power connectors: 24-pin ATX main, 8-pin EPS CPU power, PCIe power to GPU. A partially failed PSU that cannot deliver adequate power on the +12V rail will spin the fan but fail to POST. Use a multimeter or PSU tester under load.
Voltage Rails
PSU outputs: +3.3V, +5V, +12V, and –12V. Test under load, not idle. Voltages more than 5% out of spec indicate PSU degradation. The +12V rail is most critical — it powers CPU and GPU. An underpowered +12V causes instability, crashes under load, and intermittent shutdowns that appear as software or thermal issues.
Voltage Selector Warning
Some older PSUs have a 110V / 220V selector switch on the rear. If set incorrectly for your region, you risk damaging connected hardware or causing fire. Most modern PSUs auto-detect input voltage (universal input). Verify before using any older or refurbished PSU.
Slide 13 of 21
Cooling System Issues
A system that works for minutes then locks up or shuts down is almost always a thermal problem.
Air Cooling Checklist
CPU fan spinning and connected to the correct header. Case fans oriented correctly — front/bottom intake, rear/top exhaust. Heat sinks fully seated with correct thermal paste application. All empty PCIe slot covers installed (missing covers disrupt airflow channeling). Case side panel in place. Dust buildup cleaned from fans, heat sinks, and intake vents.
Liquid Cooling Issues
AIO or custom loop failures: pump failure causes rapid CPU overheat with no airflow sound; air bubbles in the loop reduce flow and create noise; leaks will short-circuit components — shut down immediately if liquid is detected. Radiator dust buildup reduces heat dissipation. Monitor coolant temperature in software; most AIO controllers report pump and coolant temp.
Monitoring Tools
Use SpeedFan, HWMonitor, or Core Temp to monitor real-time temperatures. CPU temperatures above 90°C under load indicate a cooling problem. Most modern CPUs throttle (reduce clock speed) to protect themselves before hitting a thermal shutdown at approximately 95–100°C. Throttling presents as sudden performance degradation under sustained load.
A workstation reboots randomly after 10–15 minutes of use but works fine sitting idle. Thermal shutdown. Check CPU temps under load using HWMonitor. If CPU hits 95°C, the heat sink is unseated or thermal paste has dried out. Reseating with fresh paste resolves the issue 80% of the time.
Slide 14 of 21
Hardware Diagnostic Tools
Know the purpose of each tool. The exam tests tool selection as frequently as symptom diagnosis.
Multimeter
Tests DC voltages on PSU rails (+3.3V, +5V, +12V), AC outlet voltage, cable continuity, and resistance. Use for PSU testing under load and for verifying outlet voltage before connecting sensitive equipment. Must be set to correct measurement type; incorrect setting can damage the meter.
POST Card
Plugs into a PCI/PCIe slot or USB port and displays two-digit hexadecimal codes indicating which hardware initialization phase failed during POST. Essential in server environments where no speaker is present for beep codes. Decode using BIOS vendor documentation.
Loopback Plug
Connects transmit pins to receive pins on serial, parallel, or USB ports. The port sends data to itself. If the loopback test succeeds, the port is functional and the problem is external. Available for RJ-45 (network loopback), DB-9 serial, and USB. Used to confirm NIC functionality independent of the network.
PSU Tester
Connects to the 24-pin ATX connector and displays all voltage rail readings without requiring a motherboard. Quick pass/fail verification. Cannot test under load, so a PSU that passes the tester may still fail when the system draws full power.
MemTest86
Bootable RAM testing tool. Runs independently of the OS. Performs multiple test patterns (reading, writing, and comparing data) across all RAM addresses. The most reliable software diagnostic for RAM failure. A single error in any test pass indicates faulty RAM.
CrystalDiskInfo
Reads S.M.A.R.T. data from HDDs and SSDs. Displays health status, reallocated sector count, uncorrectable errors, and temperature. A drive showing yellow (caution) or red (bad) S.M.A.R.T. status should be replaced immediately regardless of whether it is currently functioning.
Slide 15 of 21
Software Diagnostic Tools
Built-in Windows utilities and third-party tools for diagnosing hardware from within the OS.
Tool Location / Access Purpose
Windows Memory Diagnosticmdsched.exe; runs at rebootBasic RAM test built into Windows
Event Viewereventvwr.mscSystem, application, and security logs with timestamps
Device Managerdevmgmt.mscHardware status, driver issues, disabled ports (yellow bang = error)
CHKDSKCommand Prompt: chkdsk /f /rScan and repair file system errors and bad sectors on HDD/SSD
Task ManagerCtrl+Shift+Esc or TaskmgrCPU, RAM, disk, and network utilization; identify resource hogs
HWMonitor / SpeedFanThird-party downloadReal-time voltage, temperature, and fan speed monitoring
CrystalDiskInfoThird-party downloadS.M.A.R.T. drive health status and detailed metrics
Exam Tip
Device Manager shows a yellow exclamation mark next to any hardware with a driver error or conflict. This is always one of the first checks after a hardware installation or OS upgrade. A red X means the device is disabled.
Slide 16 of 21
Corporate Policy & Escalation
Policy compliance is part of the methodology, not separate from it. Escalation is correct process, not failure.
Policy Compliance
Before implementing any fix, review corporate policies and procedures. Some changes require change management approval, supervisor sign-off, or a maintenance window. You may know exactly how to fix the problem but be prohibited from acting without authorization. Violating policy to fix a problem faster is always wrong — even if the fix works.
When to Escalate
Escalate when: the issue is beyond your skill level; required tools or access are unavailable; the fix requires permissions you do not have; multiple failed theories with no resolution; or the system scope is too large for a single technician. Document what you found and what you tried before escalating — this information speeds resolution by the receiving technician.
Pre-Change Checklist
Before implementing any solution: backup created? — policy reviewed? — approval obtained if required? — rollback plan defined? — change window scheduled if needed? — user notified of expected downtime? Checking these before acting prevents the fix from creating a larger problem.
A technician diagnoses a failed NIC and knows the fix is to replace it. However, the system is a production server requiring a change control ticket and a scheduled maintenance window. The correct answer is to open the ticket, schedule the window, and wait — not to replace the NIC now because it is technically simple.
Tier 1 Help Desk / Field Technician Unresolved + document Tier 2 Systems Admin / Senior Tech Unresolved + document Tier 3 Engineers / Specialists Hardware or vendor bug Vendor Mfr Support / OEM Warranty Document findings at EACH tier before passing up — receiving tech needs context
Slide 17 of 21
Common Mistakes to Avoid
The exam presents these as wrong answers in scenarios. Know what not to do.
Jumping to Conclusions
Assuming the cause without gathering information. Example: immediately replacing the RAM because BSOD appeared, without running any diagnostic to confirm RAM is actually the issue. The symptom could be caused by multiple components. Gather data first.
Not Questioning the User
Skipping the user interview eliminates the fastest path to root cause identification. Users often introduced the problem themselves — installed software, changed a setting, dropped the device. They rarely volunteer this information; you must ask specifically about recent changes.
No Backup Before Changes
Making changes — especially OS-level changes — without first backing up data. If the repair fails or causes data loss, there is no recovery path. A backup that takes 20 minutes to create could save days of data recovery or client relationship repair.
Skipping Verification
Declaring a problem "fixed" after applying the solution without testing that the original symptom is resolved. The fix may have resolved one issue while introducing another. Always reproduce the original scenario to confirm resolution before closing the ticket.
No Documentation
Not recording what was found, what was done, and what the outcome was. The next technician who encounters the same issue cannot benefit from your work. The client cannot reference what was done. Audit trails are lost. Documentation is not optional.
Ignoring Policy
Acting outside corporate procedures, even with good intentions and correct technical knowledge. Policy violations can result in termination, legal liability, or security incidents. Always check whether approval is required before acting, regardless of technical confidence.
Slide 18 of 21
Practice Scenarios
Apply the methodology. Identify what step applies and what the correct action is.
S1
Scenario: A user reports their PC randomly reboots. They mentioned they installed a new GPU last week. What do you do FIRST? — Step 1 — Identify. Question the user about what changed. Review Event Viewer for critical errors. Back up data before any repair action.
S2
Scenario: You replaced the PSU and the system still randomly reboots. What should you do? — Step 3 — Theory was not confirmed. Establish a new theory. Test RAM with MemTest86. Check CPU temps under load. Do not keep replacing components blindly.
S3
Scenario: You fixed the problem but the client calls back next week with the same issue. What likely went wrong? — Step 5 — Verification was incomplete. The root cause was not fully resolved, or preventive measures were not implemented after the fix.
S4
Scenario: You smell something burning from a tower PC while helping a user. What is the FIRST action? — Shut down the system immediately. Do not run additional diagnostics. A burning smell indicates active component failure and potential fire risk.
S5
Scenario: After fixing a network issue, a user says everything works now. Is the process complete? — No. Step 6 — Document the problem, root cause, steps taken, and resolution. The process is not complete until documentation is written.
Slide 19 of 21
More Exam Scenarios
POST, beep codes, and physical symptoms mapped to correct diagnostic responses.
S6
Scenario: A PC powers on but there is no display and three short beeps are heard. What should you check first? — Look up the beep code in the motherboard documentation. Three short beeps is often a RAM error (AMI BIOS). Reseat RAM modules or test individually.
S7
Scenario: After every shutdown, the system loses BIOS settings and the clock resets to January 2000. What is the likely cause? — Dead CMOS battery (CR2032). Replace the coin cell battery on the motherboard. Settings will persist after replacement.
S8
Scenario: A laptop screen shows a dark, barely visible image. An external monitor works fine. What is the probable cause? — Failing LCD backlight or inverter (on CCFL-lit displays). The panel itself is functional but not illuminated. Replace the backlight or entire LCD assembly.
S9
Scenario: CHKDSK reports bad sectors on a 2-year-old HDD that is still functioning. What action is correct? — Back up the drive immediately and replace it. Bad sectors indicate physical degradation. A functioning drive with bad sectors is in the process of failing, not a stable device to continue using.
Slide 20 of 21
Key Vocabulary
Chapter 11 terms organized by topic area.
Methodology Terms
POST — Power-On Self-Test, runs at every boot
Beep code — auditory POST error indicator
Escalation — transferring issue to a higher-level technician
Change control — approval process for system modifications
Knowledge base — documented repository of known issues and fixes
Loopback plug — port self-test adapter (TX connects to RX)
Hardware Failure Terms
BSOD — Blue Screen of Death (Windows stop error)
GPF — General Protection Fault (memory access error)
Capacitor swelling — physical sign of motherboard failure
Thermal throttling — CPU reducing speed to shed heat
CMOS battery (CR2032) — maintains BIOS settings when powered off
S.M.A.R.T. — Self-Monitoring, Analysis, and Reporting Technology
Diagnostic Tools
MemTest86 — bootable comprehensive RAM tester
mdsched.exe — Windows Memory Diagnostic tool
eventvwr.msc — Windows Event Viewer
devmgmt.msc — Device Manager
CrystalDiskInfo — S.M.A.R.T. drive health viewer
HWMonitor — temperature, voltage, and fan monitoring
Slide 21 of 21 — Chapter 11 Complete
Chapter 11 Summary
Eight key takeaways from Troubleshooting Methodology.
1
The 6-step methodology order is fixed: Identify → Establish theory → Test theory → Establish plan → Verify → Document. Steps cannot be skipped or reordered.
2
Backup before any change. Creating a backup is part of Step 1. It is never optional, regardless of time pressure or perceived simplicity of the repair.
3
Change one variable at a time. Multiple simultaneous changes prevent root cause identification and make future troubleshooting harder.
4
Burning smell, smoke, sparks, grinding HDD, and swollen batteries all require immediate power-off before any diagnostic work. Safety before troubleshooting.
5
One short beep at POST = success. Continuous beeping = RAM. 1 long + 2-3 short = video. No beep + no display = CPU or motherboard. Always reference the specific board's documentation.
6
RAM failures mimic software problems: BSOD, random crashes, app freezes. When software explanations are exhausted, test RAM with MemTest86.
7
Corporate policy compliance is part of the methodology. Always check whether approval is required before implementing any fix, even a technically simple one.
8
Documentation is never optional. The process is not complete until findings, actions, and outcomes are recorded. An undocumented fix is an incomplete fix.