Extract Files from TCP Stream

Forensic file extraction from network captures

Week 4 TOPIC

TCP Stream File Extraction

During incident response, analysts often need to extract files that were transferred over the network. Malware downloads, exfiltrated data, and lateral movement artifacts can all be recovered from packet captures. Per NIST SP 800-86, network forensics provides critical evidence that may not exist on the endpoint (especially if the attacker wiped the disk).

Common Extraction Scenarios

Malware Downloads

Extract malicious executables downloaded via HTTP/HTTPS for analysis in sandbox. Look for PE files (MZ header) in HTTP responses with suspicious content-disposition headers or non-standard content types.

Data Exfiltration

Recover files sent to external servers to assess breach scope. Look for large HTTP POST bodies, base64-encoded data, or files uploaded via WebDAV/FTP. Determines exactly what data left the organization.

Email Attachments

Extract attachments from SMTP traffic for phishing analysis. MIME-encoded attachments can be decoded and analyzed for malicious content, macros, or embedded exploits. Critical for initial access vector determination.

Lateral Movement

Recover files transferred via SMB during internal propagation. PsExec, WMI, and PowerShell remoting all leave artifacts in network traffic. SMB file transfers between workstations are almost always malicious.

Chain of Custody

Extracted files are forensic evidence. Always maintain chain of custody: document who captured the pcap, when, from what interface, using what tool and version. Hash the pcap immediately (SHA-256) before any analysis. Store original pcap on write-protected media.

Source: Eye House > Tools > Wireshark Training Open Wireshark Training

Extraction Process

Follow this systematic workflow when extracting files from packet captures. Each step builds on the previous one to maintain forensic integrity.

Identify the TCP Stream

Filter traffic to find the file transfer. Look for HTTP 200 responses with large content-length, specific MIME types (application/octet-stream, application/x-executable), or destination IPs flagged by threat intelligence. Use display filters like http.response.code == 200 && http.content_length_header > 100000.

Follow TCP Stream

Right-click the packet, select Follow, then TCP Stream. This reassembles the entire conversation, showing request and response in order. Check both directions -- the request may reveal what triggered the download (e.g., a malicious URL parameter or POST payload).

Identify File Boundaries

Look for magic bytes (file signatures) at the start of the response payload. The HTTP Content-Length header indicates expected file size. For non-HTTP protocols, use the magic bytes table below to find where the file data begins and ends within the raw stream.

Export/Save Raw Data

For HTTP: File, Export Objects, HTTP -- this auto-extracts all transferred files. For raw streams: change the stream display to "Raw" format, then "Save As" to disk. For SMB: File, Export Objects, SMB. For TFTP (common in firmware exfil): File, Export Objects, TFTP.

Verify and Analyze

Calculate the hash (SHA-256) of the extracted file. Query the hash against VirusTotal, MalwareBazaar, and internal threat intel. If the file is an executable, analyze in an isolated sandbox (e.g., Joe Sandbox, ANY.RUN, Cuckoo). Never execute on production systems.

Common File Signatures (Magic Bytes)

Magic bytes are the first few bytes of a file that identify its format. Forensic tools and analysts use these to determine file type regardless of extension:

File Type	Hex Bytes	ASCII	Forensic Significance
PDF	25 50 44 46	%PDF	Document exfil, malicious PDFs with JS exploits
ZIP/DOCX/XLSX	50 4B 03 04	PK..	Archives, Office docs (OOXML), password-protected exfil
EXE/DLL (PE)	4D 5A	MZ	Windows executables -- malware, backdoors, RATs
ELF (Linux)	7F 45 4C 46	.ELF	Linux malware, coinminers, reverse shells
PNG	89 50 4E 47	.PNG	Steganography -- data hidden in image pixels
JPEG	FF D8 FF	---	Steganography, screenshot exfil, surveillance
GZIP	1F 8B 08	---	Compressed exfil data, tar.gz archives
RAR	52 61 72 21	Rar!	Password-protected archives (common in ransomware exfil)
7-Zip	37 7A BC AF	7z..	High-compression archives used for staging data
SQLite	53 51 4C 69	SQLi	Browser databases, credential stores

Hex Dump Example: Identifying a PE File in a TCP Stream

00000000 HTTP/1.1 200 OK. 00000010 .Content-Type: a 00000020 pplication/octet 00000030 4D 5A 90 00 03 -stream....MZ... 00000040 00 00 00 04 00 00 00 FF FF 00 00 B8 00 00 00 00 ................ 00000050 00 00 00 40 00 00 00 00 00 00 00 00 00 00 00 00 ...@............

The MZ bytes (4D 5A) at offset 0x3B mark the beginning of a Windows PE executable embedded in the HTTP response. Everything before it is the HTTP header. A forensic analyst would carve the file starting at the MZ signature.

Evidence Documentation Checklist

Record the pcap filename, SHA-256 hash, capture start/end time, and interface
Document the stream index, source/dest IPs and ports, protocol
Record the extraction method (Wireshark Export Objects vs manual carving)
Hash the extracted file immediately (MD5 + SHA-256)
Record VirusTotal detection ratio and any AV family names
Note the analyst name, date/time of extraction, and case number

SOC Analyst Tip

VirusTotal Upload Warning: Hash the file and search VirusTotal by hash first. Uploading the actual file to VirusTotal makes it available to all VT subscribers, including the attacker. If the file is unique to your incident, uploading it tells the adversary you found their malware. Use private sandboxes for sensitive files.

Extraction Tools

Wireshark Export Objects

File, Export Objects, HTTP/SMB/TFTP. Automatic extraction of transferred files. Shows filename, hostname, content-type, and size. Best for quick extraction of HTTP-transferred files. Handles chunked transfer encoding automatically.

NetworkMiner

Automated file extraction. Reconstructs files from pcap. Shows thumbnails for images. Also extracts credentials, DNS queries, and session data. Open-source version available (limited features).

Foremost / Scalpel

File carving tools that extract files based on headers and footers. Work on raw data -- do not require protocol understanding. Useful when the stream is corrupted or non-standard protocol is used. Originally developed for disk forensics, equally useful on raw TCP streams.

tshark (Command-line)

Scriptable extraction for automation. Process large pcap collections in batch. Integrates with SOAR playbooks for automated malware extraction and sandboxing. Syntax: tshark -r capture.pcap --export-objects http,./extracted/

Wireshark HTTP Export Workflow

# In Wireshark GUI: 1. Open pcap file 2. File -> Export Objects -> HTTP 3. Browse the list of transferred files 4. Check Content-Type for executables: application/octet-stream, application/x-msdownload 5. Select suspicious file(s) -> Click "Save" or "Save All" 6. Hash immediately: sha256sum extracted_file.exe # tshark automated extraction (batch processing): tshark -r capture.pcap --export-objects http,./extracted_http/ tshark -r capture.pcap --export-objects smb,./extracted_smb/ # Foremost file carving from raw stream: tcpflow -r capture.pcap -o ./streams/ foremost -i ./streams/* -o ./carved_files/ # Hash all extracted files for VirusTotal bulk lookup: sha256sum ./extracted_http/* > hashes.txt

Protocol-Specific Extraction Notes

Protocol	Extraction Method	Complications
HTTP	Export Objects, Follow TCP Stream	Chunked encoding, gzip compression -- Wireshark handles both
HTTPS/TLS	Requires TLS keys (SSLKEYLOGFILE) or SSL inspection proxy	Without keys, only metadata is visible -- cannot extract files
SMB	Export Objects -> SMB	SMB3 encryption, multi-fragment transfers
FTP	Follow TCP Stream on data channel (port 20 or passive port)	Passive mode uses random ports -- filter by IP pair
SMTP	Follow TCP Stream, decode base64 MIME attachments	Multi-part MIME, quoted-printable encoding
DNS	Reassemble TXT record payloads, decode base32/base64	Data split across many queries -- manual reassembly needed

SOC Analyst Tip

Hash Before Analysis: Always calculate MD5/SHA256 of extracted files before opening them. Submit the hash (not the file) to VirusTotal first. This documents the artifact, checks against known malware, and avoids tipping off the adversary that you found their tools.

Magic Bytes Identification Lab

In this exercise, you will match file signatures (magic bytes) to their file types. This is a fundamental forensics skill -- when you see hex data in a TCP stream, you need to instantly recognize what file type is being transferred.

Match the Magic Bytes to the File Type

Click a hex signature on the left, then click the matching file type on the right. Complete all pairs, then check your answers.

Hex Signatures

File Types

Hex Dump Analysis Challenge

Examine the following hex dump extracted from a TCP stream and answer the questions below:

00000000 50 4B 03 04 PK............ 00000010 63 6F 6E 66 ...........@conf 00000020 69 64 65 6E 74 69 61 6C 2D 63 6C 69 65 6E 74 73 idential-clients 00000030 2E 78 6C 73 78 .xlsx...........

What file type is being transferred?

Windows executable (PE/EXE) ZIP archive (containing an XLSX spreadsheet) PDF document RAR archive

Based on the filename visible in the hex dump, what is the security concern?

It is a system configuration file It contains encrypted malware It appears to contain confidential client data (potential data exfiltration) It is a harmless image file