Extract Files from TCP Stream

Forensic file extraction from network captures

Week 4 TOPIC

TCP Stream File Extraction

During incident response, analysts often need to extract files that were transferred over the network. Malware downloads, exfiltrated data, and lateral movement artifacts can all be recovered from packet captures. Per NIST SP 800-86, network forensics provides critical evidence that may not exist on the endpoint (especially if the attacker wiped the disk).

Common Extraction Scenarios

Malware Downloads

Extract malicious executables downloaded via HTTP/HTTPS for analysis in sandbox. Look for PE files (MZ header) in HTTP responses with suspicious content-disposition headers or non-standard content types.

Data Exfiltration

Recover files sent to external servers to assess breach scope. Look for large HTTP POST bodies, base64-encoded data, or files uploaded via WebDAV/FTP. Determines exactly what data left the organization.

Email Attachments

Extract attachments from SMTP traffic for phishing analysis. MIME-encoded attachments can be decoded and analyzed for malicious content, macros, or embedded exploits. Critical for initial access vector determination.

Lateral Movement

Recover files transferred via SMB during internal propagation. PsExec, WMI, and PowerShell remoting all leave artifacts in network traffic. SMB file transfers between workstations are almost always malicious.

Chain of Custody

Extracted files are forensic evidence. Always maintain chain of custody: document who captured the pcap, when, from what interface, using what tool and version. Hash the pcap immediately (SHA-256) before any analysis. Store original pcap on write-protected media.

Source: Eye House > Tools > Wireshark Training Open Wireshark Training

Extraction Process

Follow this systematic workflow when extracting files from packet captures. Each step builds on the previous one to maintain forensic integrity.

1

Identify the TCP Stream

Filter traffic to find the file transfer. Look for HTTP 200 responses with large content-length, specific MIME types (application/octet-stream, application/x-executable), or destination IPs flagged by threat intelligence. Use display filters like http.response.code == 200 && http.content_length_header > 100000.

2

Follow TCP Stream

Right-click the packet, select Follow, then TCP Stream. This reassembles the entire conversation, showing request and response in order. Check both directions -- the request may reveal what triggered the download (e.g., a malicious URL parameter or POST payload).

3

Identify File Boundaries

Look for magic bytes (file signatures) at the start of the response payload. The HTTP Content-Length header indicates expected file size. For non-HTTP protocols, use the magic bytes table below to find where the file data begins and ends within the raw stream.

4

Export/Save Raw Data

For HTTP: File, Export Objects, HTTP -- this auto-extracts all transferred files. For raw streams: change the stream display to "Raw" format, then "Save As" to disk. For SMB: File, Export Objects, SMB. For TFTP (common in firmware exfil): File, Export Objects, TFTP.

5

Verify and Analyze

Calculate the hash (SHA-256) of the extracted file. Query the hash against VirusTotal, MalwareBazaar, and internal threat intel. If the file is an executable, analyze in an isolated sandbox (e.g., Joe Sandbox, ANY.RUN, Cuckoo). Never execute on production systems.

Common File Signatures (Magic Bytes)

Magic bytes are the first few bytes of a file that identify its format. Forensic tools and analysts use these to determine file type regardless of extension:

File TypeHex BytesASCIIForensic Significance
PDF25 50 44 46%PDFDocument exfil, malicious PDFs with JS exploits
ZIP/DOCX/XLSX50 4B 03 04PK..Archives, Office docs (OOXML), password-protected exfil
EXE/DLL (PE)4D 5AMZWindows executables -- malware, backdoors, RATs
ELF (Linux)7F 45 4C 46.ELFLinux malware, coinminers, reverse shells
PNG89 50 4E 47.PNGSteganography -- data hidden in image pixels
JPEGFF D8 FF---Steganography, screenshot exfil, surveillance
GZIP1F 8B 08---Compressed exfil data, tar.gz archives
RAR52 61 72 21Rar!Password-protected archives (common in ransomware exfil)
7-Zip37 7A BC AF7z..High-compression archives used for staging data
SQLite53 51 4C 69SQLiBrowser databases, credential stores

Hex Dump Example: Identifying a PE File in a TCP Stream

00000000 48 54 54 50 2F 31 2E 31 20 32 30 30 20 4F 4B 0D HTTP/1.1 200 OK. 00000010 0A 43 6F 6E 74 65 6E 74 2D 54 79 70 65 3A 20 61 .Content-Type: a 00000020 70 70 6C 69 63 61 74 69 6F 6E 2F 6F 63 74 65 74 pplication/octet 00000030 2D 73 74 72 65 61 6D 0D 0A 0D 0A 4D 5A 90 00 03 -stream....MZ... 00000040 00 00 00 04 00 00 00 FF FF 00 00 B8 00 00 00 00 ................ 00000050 00 00 00 40 00 00 00 00 00 00 00 00 00 00 00 00 ...@............

The MZ bytes (4D 5A) at offset 0x3B mark the beginning of a Windows PE executable embedded in the HTTP response. Everything before it is the HTTP header. A forensic analyst would carve the file starting at the MZ signature.

Evidence Documentation Checklist

  1. Record the pcap filename, SHA-256 hash, capture start/end time, and interface
  2. Document the stream index, source/dest IPs and ports, protocol
  3. Record the extraction method (Wireshark Export Objects vs manual carving)
  4. Hash the extracted file immediately (MD5 + SHA-256)
  5. Record VirusTotal detection ratio and any AV family names
  6. Note the analyst name, date/time of extraction, and case number
SOC Analyst Tip

VirusTotal Upload Warning: Hash the file and search VirusTotal by hash first. Uploading the actual file to VirusTotal makes it available to all VT subscribers, including the attacker. If the file is unique to your incident, uploading it tells the adversary you found their malware. Use private sandboxes for sensitive files.

Extraction Tools

Wireshark Export Objects

File, Export Objects, HTTP/SMB/TFTP. Automatic extraction of transferred files. Shows filename, hostname, content-type, and size. Best for quick extraction of HTTP-transferred files. Handles chunked transfer encoding automatically.

NetworkMiner

Automated file extraction. Reconstructs files from pcap. Shows thumbnails for images. Also extracts credentials, DNS queries, and session data. Open-source version available (limited features).

Foremost / Scalpel

File carving tools that extract files based on headers and footers. Work on raw data -- do not require protocol understanding. Useful when the stream is corrupted or non-standard protocol is used. Originally developed for disk forensics, equally useful on raw TCP streams.

tshark (Command-line)

Scriptable extraction for automation. Process large pcap collections in batch. Integrates with SOAR playbooks for automated malware extraction and sandboxing. Syntax: tshark -r capture.pcap --export-objects http,./extracted/

Wireshark HTTP Export Workflow

# In Wireshark GUI: 1. Open pcap file 2. File -> Export Objects -> HTTP 3. Browse the list of transferred files 4. Check Content-Type for executables: application/octet-stream, application/x-msdownload 5. Select suspicious file(s) -> Click "Save" or "Save All" 6. Hash immediately: sha256sum extracted_file.exe # tshark automated extraction (batch processing): tshark -r capture.pcap --export-objects http,./extracted_http/ tshark -r capture.pcap --export-objects smb,./extracted_smb/ # Foremost file carving from raw stream: tcpflow -r capture.pcap -o ./streams/ foremost -i ./streams/* -o ./carved_files/ # Hash all extracted files for VirusTotal bulk lookup: sha256sum ./extracted_http/* > hashes.txt

Protocol-Specific Extraction Notes

ProtocolExtraction MethodComplications
HTTPExport Objects, Follow TCP StreamChunked encoding, gzip compression -- Wireshark handles both
HTTPS/TLSRequires TLS keys (SSLKEYLOGFILE) or SSL inspection proxyWithout keys, only metadata is visible -- cannot extract files
SMBExport Objects -> SMBSMB3 encryption, multi-fragment transfers
FTPFollow TCP Stream on data channel (port 20 or passive port)Passive mode uses random ports -- filter by IP pair
SMTPFollow TCP Stream, decode base64 MIME attachmentsMulti-part MIME, quoted-printable encoding
DNSReassemble TXT record payloads, decode base32/base64Data split across many queries -- manual reassembly needed
SOC Analyst Tip

Hash Before Analysis: Always calculate MD5/SHA256 of extracted files before opening them. Submit the hash (not the file) to VirusTotal first. This documents the artifact, checks against known malware, and avoids tipping off the adversary that you found their tools.

Magic Bytes Identification Lab

In this exercise, you will match file signatures (magic bytes) to their file types. This is a fundamental forensics skill -- when you see hex data in a TCP stream, you need to instantly recognize what file type is being transferred.

Match the Magic Bytes to the File Type

Click a hex signature on the left, then click the matching file type on the right. Complete all pairs, then check your answers.

Hex Signatures

File Types

Hex Dump Analysis Challenge

Examine the following hex dump extracted from a TCP stream and answer the questions below:

00000000 50 4B 03 04 14 00 00 00 08 00 A1 B2 PK............ 00000010 C3 D4 E5 F6 A7 B8 C9 DA 00 00 12 40 63 6F 6E 66 ...........@conf 00000020 69 64 65 6E 74 69 61 6C 2D 63 6C 69 65 6E 74 73 idential-clients 00000030 2E 78 6C 73 78 00 00 00 00 00 00 00 00 00 00 00 .xlsx...........

What file type is being transferred?

Based on the filename visible in the hex dump, what is the security concern?

Knowledge Check

1. What is the magic byte signature for Windows executables (EXE/DLL)?

2. In Wireshark, how do you export HTTP transferred files?

3. What should you do FIRST after extracting a suspicious file?

4. Which tool performs automated file carving from raw data?

5. Why should extracted executables never run on production systems?

6. Why should you search VirusTotal by hash instead of uploading the file?

7. What protocol is most difficult to extract files from without decryption keys?