← Back to Vault

Reverse Engineering Basics

TIER 2 - REQUIRES GATE 6
0%

1. What is Reverse Engineering?

Reverse engineering is the process of analyzing a system, software, or device to understand its internal workings, design, and functionality without access to its source code or documentation. In cybersecurity and malware analysis, it's an essential skill for understanding how malicious software operates.

Forward vs Reverse Engineering

Forward Engineering

Design → Source Code → Binary → Execution
You control the entire process

Reverse Engineering

Binary → Analysis → Understanding → Recreation
You work backwards from the result

Applications in Malware Analysis

Legal Considerations: Reverse engineering may be restricted by law in some jurisdictions (DMCA, CFAA in the US). Always ensure you have proper authorization and are working in a controlled environment. Only analyze malware in isolated lab environments.
Ethics Note: Reverse engineering skills are powerful. Use them responsibly for defensive security, research, and education - never for unauthorized access or malicious purposes.
Knowledge Check: Reverse Engineering Fundamentals

Answer all 3 questions correctly to complete this section.

2. Assembly Language Fundamentals

Assembly language is a low-level programming language that provides a human-readable representation of machine code. Understanding assembly is crucial for reverse engineering because compiled programs are ultimately executed as machine instructions.

x86/x64 Registers

Registers are small, fast storage locations inside the CPU. Click on each register to learn more:

RAX/EAX
Accumulator
RBX/EBX
Base
RCX/ECX
Counter
RDX/EDX
Data
RSI/ESI
Source Index
RDI/EDI
Destination Index
RBP/EBP
Base Pointer
RSP/ESP
Stack Pointer
RIP/EIP
Instruction Pointer

Common Instructions

Instruction Syntax Description Example
MOV MOV dest, src Copy data from source to destination MOV rax, 5
PUSH PUSH value Push value onto the stack PUSH rbx
POP POP dest Pop value from stack to destination POP rcx
CALL CALL address Call a function at address CALL 0x401000
RET RET Return from function RET
JMP JMP address Unconditional jump to address JMP 0x401020
JE/JZ JE address Jump if equal (or zero) JE 0x401030
JNE/JNZ JNE address Jump if not equal (or not zero) JNE 0x401040
CMP CMP op1, op2 Compare two operands (sets flags) CMP rax, 10
TEST TEST op1, op2 Bitwise AND (sets flags, doesn't store) TEST rax, rax
ADD ADD dest, src Add source to destination ADD rax, 5
SUB SUB dest, src Subtract source from destination SUB rax, 3
XOR XOR dest, src Exclusive OR operation XOR rax, rax
LEA LEA dest, [address] Load effective address LEA rax, [rbp-8]

Calling Conventions

Windows x64 (Microsoft ABI)
  • First 4 integer args: RCX, RDX, R8, R9
  • Floating point args: XMM0, XMM1, XMM2, XMM3
  • Additional args: pushed on stack (right to left)
  • Return value: RAX (or XMM0 for floating point)
  • Caller must allocate 32 bytes of shadow space
🐧 Linux x64 (System V ABI)
  • First 6 integer args: RDI, RSI, RDX, RCX, R8, R9
  • Floating point args: XMM0-XMM7
  • Additional args: pushed on stack (right to left)
  • Return value: RAX (or XMM0 for floating point)
Interactive: Assembly Instruction Matcher

Match the assembly instruction to its purpose:

3. Disassemblers and Debuggers

Tools are essential for reverse engineering. Disassemblers convert machine code back into assembly language, while debuggers allow you to execute and inspect programs step-by-step.

Tool Comparison Chart

IDA Pro
Industry standard disassembler
Excellent decompiler (Hex-Rays)
Cross-platform support
Extensive plugin ecosystem
Commercial (expensive)
Ghidra
Free and open source (NSA)
Built-in decompiler
Multi-architecture support
Collaborative features
Java-based (requires JRE)
x64dbg
Modern Windows debugger
User-friendly interface
Active development
Plugin support
Free and open source
OllyDbg
Classic Windows debugger
Simple interface
Good for malware analysis
32-bit focused
Development ceased
GDB
GNU debugger for Linux
Command-line based
Powerful scripting
GEF/PEDA extensions
Free and open source
Binary Ninja
Modern disassembler
Fast and responsive UI
Built-in IL (intermediate language)
Python API
Commercial (affordable)

Typical Workflow

1
Static Analysis (Disassembler)
  • Load binary into IDA/Ghidra
  • Analyze strings and imports
  • Identify interesting functions
  • Study control flow graphs
  • Use decompiler for high-level view
2
Dynamic Analysis (Debugger)
  • Load binary in x64dbg/GDB
  • Set breakpoints at key locations
  • Step through execution
  • Inspect memory and registers
  • Monitor API calls and behavior
3
Iterate
  • Alternate between static and dynamic analysis
  • Verify hypotheses from static with dynamic behavior
Pro Tip: Start with static analysis to understand the overall structure, then use dynamic analysis to verify your hypotheses and understand runtime behavior. The combination of both approaches is more powerful than either alone.
Knowledge Check: Tools of the Trade

Answer all 3 questions correctly to complete this section.

4. Understanding Control Flow

Control flow refers to the order in which individual instructions are executed. Understanding how high-level constructs (if/else, loops) translate to assembly is crucial for reverse engineering.

If/Else Statements in Assembly

; C Code: if (x == 5) { y = 10; } else { y = 20; } ; Assembly equivalent: CMP eax, 5 ; Compare x with 5 JNE else_block ; Jump if not equal MOV ebx, 10 ; y = 10 (if block) JMP end_if ; Skip else block else_block: MOV ebx, 20 ; y = 20 (else block) end_if: ; Continue execution

Loop Recognition

; C Code: for (i = 0; i < 10; i++) XOR ecx, ecx ; i = 0 (ECX is loop counter) loop_start: CMP ecx, 10 ; Compare i with 10 JGE loop_end ; Jump if greater or equal ; Loop body here INC ecx ; i++ JMP loop_start ; Jump back to start loop_end: ; Continue after loop ; C Code: while (x > 0) while_start: CMP eax, 0 ; Compare x with 0 JLE while_end ; Jump if less or equal ; Loop body here JMP while_start ; Jump back to condition while_end:

Function Calls and Returns

; Calling a function PUSH rbp ; Save base pointer MOV rbp, rsp ; Set up new stack frame SUB rsp, 32 ; Allocate stack space ; Function body MOV rsp, rbp ; Restore stack pointer POP rbp ; Restore base pointer RET ; Return to caller
Guided Walkthrough: Building Your First CFG

Before tackling the builder, let's walk through decomposing a simple assembly snippet into a control flow graph — step by step.

CMP eax, 0 ; Compare eax with zero
JE is_zero ; Jump if equal
MOV ebx, 1 ; Set ebx to 1
JMP done ; Skip to end
is_zero:
XOR ebx, ebx ; Set ebx to 0
done:
RET ; Return
ENTRY CMP/JE F MOV ebx, 1 JMP done T XOR ebx,ebx (ebx = 0) EXIT
Walkthrough complete! You've built a full control flow graph. Now try the builder below.
Interactive: Control Flow Graph Builder
Challenge 1/4: Simple If/Else

Build the control flow graph for this assembly code.

Mode: Drag
Branch:

5. Recognizing C Constructs in Assembly

When reverse engineering, you'll often need to recognize how high-level C constructs appear in assembly code. This skill helps you understand the program's logic faster.

Variables and Data Types

; int x = 5; MOV DWORD PTR [rbp-4], 5 ; char c = 'A'; MOV BYTE PTR [rbp-5], 41h ; 0x41 = 'A' ; long long value = 1000000; MOV QWORD PTR [rbp-16], 0F4240h ; float f = 3.14; MOVSS xmm0, DWORD PTR [.float_constant] MOVSS DWORD PTR [rbp-20], xmm0

Structures and Arrays

; struct Person { int age; char name[20]; }; ; person.age = 25; LEA rax, [rbp-32] ; Load address of struct MOV DWORD PTR [rax], 25 ; Set age field (offset 0) ; strcpy(person.name, "John"); LEA rax, [rbp-32] ; Load struct address ADD rax, 4 ; Offset to name field LEA rdx, [.string] ; Source string CALL strcpy ; int arr[5]; arr[2] = 10; LEA rax, [rbp-40] ; Load array base address MOV DWORD PTR [rax+8], 10 ; arr + (2 * 4 bytes)

String Operations

; strlen(str) - count characters until null terminator MOV rdi, str_ptr ; String address in RDI XOR rcx, rcx ; Counter = 0 strlen_loop: CMP BYTE PTR [rdi+rcx], 0 JE strlen_done INC rcx JMP strlen_loop strlen_done: ; Length now in RCX ; strcmp(str1, str2) - compare strings MOV rsi, str1_ptr MOV rdi, str2_ptr strcmp_loop: MOVZX rax, BYTE PTR [rsi] MOVZX rdx, BYTE PTR [rdi] CMP rax, rdx JNE strcmp_done TEST rax, rax JE strcmp_done INC rsi INC rdi JMP strcmp_loop strcmp_done: SUB rax, rdx ; Return difference

Pointer Dereferencing

; int *ptr = &x; LEA rax, [rbp-4] ; Get address of x MOV QWORD PTR [rbp-16], rax ; Store in ptr ; *ptr = 10; MOV rax, QWORD PTR [rbp-16] ; Load pointer value MOV DWORD PTR [rax], 10 ; Dereference and assign ; y = *ptr; MOV rax, QWORD PTR [rbp-16] ; Load pointer MOV edx, DWORD PTR [rax] ; Dereference MOV DWORD PTR [rbp-8], edx ; Store in y
Interactive: Match the C to Assembly

Drag the C code snippets to their corresponding assembly implementations:

6. Anti-Analysis Techniques

Malware authors employ various techniques to make analysis difficult. Understanding these methods helps you recognize and defeat them during reverse engineering.

Anti-Debugging Tricks

; Technique 1: IsDebuggerPresent API CALL IsDebuggerPresent TEST eax, eax JNE debugger_detected ; Jump if debugger present ; Technique 2: PEB (Process Environment Block) check MOV rax, QWORD PTR gs:[60h] ; Get PEB address CMP BYTE PTR [rax+2], 0 ; Check BeingDebugged flag JNE debugger_detected ; Technique 3: Timing check RDTSC ; Read timestamp counter MOV r8, rax ; Save timestamp ; Some instructions here RDTSC ; Read again SUB rax, r8 ; Calculate difference CMP rax, 1000 ; Too slow = debugger? JA debugger_detected ; Technique 4: Exception-based detection INT 3 ; Software breakpoint ; If we reach here, exception was handled by debugger

Packing and Obfuscation

; UPX-style unpacking stub pattern: PUSH esi PUSH edi MOV esi, packed_data ; Source MOV edi, unpack_dest ; Destination MOV ecx, packed_size ; Size CALL decompress_routine JMP original_entry ; Jump to unpacked code ; Code obfuscation examples: ; Instead of: MOV eax, 5 XOR eax, eax ; eax = 0 ADD eax, 3 ; eax = 3 ADD eax, 2 ; eax = 5 (obfuscated) ; Junk instructions (never executed): JMP skip_junk DB 0E8h, 12h, 34h, 56h ; Random bytes skip_junk: ; Real code continues

VM Detection

; Check for VMware via I/O port MOV eax, 564D5868h ; 'VMXh' magic value MOV ebx, 0 MOV ecx, 10 MOV edx, 5658h ; 'VX' IN eax, dx ; VMware I/O port CMP ebx, 564D5868h ; Check response JE vm_detected ; Check for VirtualBox via registry/files LEA rcx, vbox_path CALL GetFileAttributesA CMP eax, -1 JNE vm_detected ; File exists = VirtualBox ; CPUID-based VM detection MOV eax, 1 CPUID BT ecx, 31 ; Check hypervisor bit JC vm_detected

Common Obfuscation Patterns

Interactive: Spot the Anti-Analysis Code

Identify which code snippets contain anti-analysis techniques:

Analysis Warning: When encountering anti-analysis techniques, document them carefully. Use patches, plugins, or modified environments to bypass them. Always maintain detailed notes about what techniques were used and how you defeated them.

7. Practical Workflow and Best Practices

A systematic approach to reverse engineering increases efficiency and ensures you don't miss critical details. Here's a proven workflow used by professional malware analysts.

Triage Process

1
Initial Assessment
5-10 min
  • File type identification (file, TrID)
  • Hash calculation (MD5, SHA256)
  • VirusTotal lookup
  • String extraction (strings, FLOSS)
  • PE analysis (pecheck, pestudio)
  • Packer detection (Detect It Easy, PEiD)
2
Static Analysis
30-60 min
  • Load in disassembler (IDA/Ghidra)
  • Analyze imports/exports
  • Identify entry point
  • Map out main functions
  • Study interesting routines
  • Document findings
3
Dynamic Analysis
30-60 min
  • Set up isolated environment
  • Configure monitoring tools (Process Monitor, Wireshark)
  • Load in debugger
  • Set strategic breakpoints
  • Step through execution
  • Monitor API calls and behavior
  • Capture artifacts
4
Deep Dive
hours to days
  • Focus on key functionality
  • Reverse engineer algorithms
  • Extract IOCs (Indicators of Compromise)
  • Write detection rules (YARA, Snort)
  • Document complete analysis
  • Create remediation guidance

String Analysis Strategy

Start with Strings: Strings are often the fastest path to understanding a binary's purpose. Look for:
  • URLs, IP addresses, domain names
  • File paths and registry keys
  • Error messages and debug strings
  • Cryptographic constants
  • API function names (if dynamically resolved)
  • Command-and-control protocols

Import Analysis

Key Windows APIs that reveal functionality:

🌐 Network Activity
  • WSAStartup, socket, connect, send, recv
  • InternetOpenA, InternetConnectA, HttpSendRequestA
  • URLDownloadToFileA
📁 File Operations
  • CreateFileA, ReadFile, WriteFile, DeleteFileA
  • FindFirstFileA, FindNextFileA
  • CopyFileA, MoveFileA
Process / Thread Manipulation
  • CreateProcess, CreateRemoteThread
  • VirtualAllocEx, WriteProcessMemory
  • OpenProcess, TerminateProcess
📌 Persistence
  • RegCreateKeyA, RegSetValueExA
  • CreateServiceA, StartServiceA
  • WriteFile (to Startup folder)
🔒 Cryptography
  • CryptAcquireContextA, CryptEncrypt, CryptDecrypt
  • BCryptGenRandom, BCryptEncrypt
🛡 Anti-Analysis
  • IsDebuggerPresent, CheckRemoteDebuggerPresent
  • NtQueryInformationProcess
  • GetTickCount, QueryPerformanceCounter

Documenting Findings

Analysis Report Template:

1 Executive Summary
  • Sample hash (SHA256)
  • File type and size
  • Threat classification
  • Key behaviors
  • Risk assessment
2 Technical Details
  • Packer/obfuscation used
  • Architecture (x86/x64)
  • Compilation timestamp
  • Dependencies
  • Code structure
3 Behavioral Analysis
  • Network communication
  • File system changes
  • Registry modifications
  • Process injection
  • Privilege escalation
4 Indicators of Compromise (IOCs)
  • File hashes
  • IP addresses / domains
  • File paths
  • Registry keys
  • Mutexes / named objects
5 Detection & Mitigation
  • YARA rules
  • Network signatures (Snort/Suricata)
  • Host-based detection
  • Remediation steps
Interactive: "Analyze This Function" Challenge

Study this assembly function and answer questions about its behavior:

; Mystery Function push rbp mov rbp, rsp sub rsp, 20h mov QWORD PTR [rbp-8], rcx ; First parameter mov QWORD PTR [rbp-10h], rdx ; Second parameter mov rax, QWORD PTR [rbp-8] movzx eax, BYTE PTR [rax] test al, al je done xor r8d, r8d ; Counter = 0 loop_start: mov rax, QWORD PTR [rbp-8] add rax, r8 movzx edx, BYTE PTR [rax] mov rax, QWORD PTR [rbp-10h] xor BYTE PTR [rax], dl ; XOR operation inc r8 mov rax, QWORD PTR [rbp-8] add rax, r8 movzx eax, BYTE PTR [rax] test al, al jne loop_start done: mov rax, QWORD PTR [rbp-10h] leave ret

Best Practices Checklist

✓ Professional Reverse Engineering Habits:
  • Always work in isolated/sandboxed environments
  • Take extensive notes and screenshots throughout analysis
  • Rename functions and variables with descriptive names
  • Add comments to explain non-obvious behavior
  • Cross-reference between static and dynamic analysis
  • Document all anti-analysis techniques encountered
  • Save IDA/Ghidra database regularly with version numbers
  • Create YARA rules for unique code patterns
  • Share findings with the security community (responsibly)
  • Keep learning - malware techniques constantly evolve

Module Complete

Congratulations! You've completed the Reverse Engineering Basics module. You now have foundational knowledge of assembly language, disassemblers, debuggers, and common anti-analysis techniques.

Next Steps:
  • Practice analyzing real malware samples in a safe lab environment
  • Learn advanced topics: shellcode analysis, kernel debugging, fuzzing
  • Study malware families: ransomware, trojans, rootkits
  • Participate in CTF challenges and reverse engineering wargames
  • Explore advanced tools: Frida, x64dbg scripting, IDA plugins