Outlook PST Forensic Parsing: Extracting Evidence from Email Archives

A single PST file can contain years of communication, deleted messages users thought were gone, and metadata that contradicts the story they’re telling in deposition.

Microsoft Outlook’s Personal Storage Table format has been around since the late 1990s and remains one of the most evidence-rich file types in civil litigation. Employment disputes, business email compromise investigations, harassment cases, intellectual property theft — if email is involved and the user ran Outlook, there’s likely a PST or OST file somewhere that contains more than the current inbox shows.

This guide covers the difference between PST and OST, how these files are structured at a binary level, the tools that parse them reliably, and how to approach deleted item recovery with appropriate methodological rigor.

PST vs. OST: Knowing Which File You’re Looking At

These terms get conflated constantly. They’re different files with different forensic properties.

PST (Personal Storage Table) is a local archive file. Users (or IT administrators) create PSTs to archive email off the live server. PSTs can exist anywhere on the filesystem — the Documents folder, an external drive, a network share. They work offline and don’t require a live server connection. PST files accumulate over years. A user who’s been with a company for a decade might have multiple PST archives going back to their first days of employment.

OST (Offline Storage Table) is the local synchronization cache for a live Exchange or Microsoft 365 mailbox. Outlook creates it automatically when you add an Exchange account. The OST mirrors what’s on the server, allowing offline access. When connectivity is restored, changes sync.

The forensic implications are significant:

A PST is independent. What’s in a PST exists there because someone put it there (or archived it). Its contents don’t change based on server-side deletions.
An OST reflects the server. Items deleted from the server will eventually disappear from the OST during the next sync. But items deleted locally, before sync, may persist in the OST’s deleted items structure even after the user empties the trash.

When you’re examining a user’s machine, look for both. PSTs are often hiding in unexpected locations — users drag and drop them to external drives, email them to personal accounts, copy them to home directories. A thorough search of the filesystem for `.pst` and `.ost` is always worth running.

The Internal Structure of a PST File

Understanding PST structure matters for two reasons: it helps you understand what recovery is possible, and it makes you a more credible expert when opposing counsel challenges your methodology.

PST files use a layered architecture:

The NDB Layer (Node Database)

At the lowest level, a PST is a B-tree node database. Everything in the file — every message, folder, attachment, contact — is stored as a node. The NDB layer manages how these nodes are stored and referenced using a system of page structures and block structures.

The B-tree structure means that PST files can grow to very large sizes while maintaining searchable, indexed access to any item. It also means that when items are “deleted,” the node entries are marked as free but the underlying data blocks may not be immediately overwritten. This is the foundation of deleted item recovery.

Modern PST files use the Unicode format (as opposed to the older ANSI format for PSTs created before Outlook 2003). Unicode PSTs can hold up to 50GB. ANSI PSTs are limited to 2GB. You can identify the format by examining the first 4 bytes of the file — the signature `!BDN` for ANSI, `!BD` with different version bytes for Unicode.

The LTP Layer (Lists, Tables, Properties)

Above the NDB layer sits the LTP layer, which organizes nodes into meaningful data structures. This is where property tables live — the key-value stores that hold message properties like sender address, subject, send time, and delivery time.

Property tags are four-byte identifiers. A forensic examiner looking at raw PST data will encounter them constantly. Some important ones:

`0x0037` — Subject
`0x0042` — Sent representing name (the display name of the sender)
`0x0E06` — Delivery time
`0x0039` — Client submit time (when the user hit send)
`0x0E04` — Received representing name

The distinction between `0x0E06` (delivery time) and `0x0039` (submit time) matters. Submit time is when the client sent the message. Delivery time is when the server delivered it. In cases where message timing is disputed, both timestamps are relevant and potentially significant if they differ substantially.

The Messaging Layer

The top layer organizes the LTP structures into familiar email concepts: folders, messages, attachments, recipients. This is what tools like Outlook, forensic parsers, and viewers present to the user.

Attachments are stored as sub-objects of their parent message, with their own property tables. Embedded attachments (attachments within attachments) create nested sub-object trees that some tools handle poorly. When you’re looking for a specific file that was allegedly attached to a message, verify your tool handles embedded attachments fully.

Parsing Tools

Kernel PST Viewer

Kernel PST Viewer is a commercial tool that handles both PST and OST formats, including password-protected and corrupted files. Its interface presents the familiar folder/message tree structure. For investigators who aren’t primarily technical, the Outlook-like interface reduces errors in navigation.

Kernel PST Viewer exports to multiple formats — EML, MSG, PDF, MBOX — which is useful when you need to deliver specific messages as exhibits rather than requiring opposing counsel to use a specialized viewer.

SysTools PST Viewer and Forensic Suite

SysTools produces a range of email forensic tools, including a PST Viewer and a broader Forensic Suite. The Forensic Suite includes hash verification, case management, and the ability to document your examination process within the tool. For investigators who need audit logging of their examination steps, SysTools provides this natively.

One practical advantage: SysTools handles large PST files (20GB+) without the performance degradation that plagues some tools. When you’re examining a decade of archived email, performance matters.

pffexport (libpff)

libpff is an open-source library for accessing PST and OST files, and `pffexport` is its command-line export utility. It’s the forensic tool of choice for examiners who want to verify their commercial tool results, work in Linux environments, or need scriptable processing of many PST files.

`pffexport` extracts every message, folder, and attachment from a PST into a directory structure. Each message becomes a separate file with its headers and body preserved. This makes it easy to hash individual messages, search across the extracted content, and build timelines with standard command-line tools.

The output isn’t pretty — you get a directory tree rather than an Outlook-style interface — but the completeness and verifiability are strengths in expert witness contexts.

Magnet AXIOM and FTK

Both major commercial forensic platforms handle PST/OST parsing as part of their email artifact analysis. If you’re running a full examination of a device, these tools integrate email parsing with the rest of your artifacts automatically, which makes timeline correlation much easier.

The limitation is transparency: understanding exactly how these tools handle edge cases (malformed PSTs, partial files, overlapping property tags) requires going deeper into the documentation than most examiners do. Know your tool before you testify about what it found.

Deleted Item Recovery

This is where PST forensics gets interesting.

When a user deletes a message in Outlook, the message moves to the Deleted Items folder. Most users know this. What they don’t know is what happens at the two levels below that.

Level 1: Soft delete. Message moves to Deleted Items. The message is still there, fully intact, in the PST. Recovery is trivial with any viewer.

Level 2: Hard delete / Emptying trash. User empties Deleted Items, or presses Shift+Delete to bypass the trash entirely. In Outlook, the message moves to a “hidden” folder called `Recoverable Items` (or `Dumpster` in older nomenclature). This folder isn’t visible in the standard Outlook interface. It holds deleted messages for a configurable retention period. In PST files, the equivalent structure is maintained at the NDB layer — the node is marked as unallocated but the data blocks remain until overwritten.

Level 3: Purge from recoverable items. The user explicitly purges the message from Recoverable Items, or the retention period expires. At this point, the node is fully deallocated. The data blocks are available for reuse, but haven’t been overwritten yet.

For forensic recovery of Level 2 and Level 3 deleted items:

Parse the PST for unallocated nodes. Tools like libpff and some commercial parsers can scan the NDB layer for nodes marked as free and attempt to reconstruct message structures from the data blocks. This is carving — the same concept as file carving from unallocated space, applied to the PST’s internal structure.

Look for the Recoverable Items folder. In PSTs from Exchange/365-connected accounts, this folder (GUID: `\IPM_SUBTREE\Recoverable Items`) contains messages the user explicitly deleted. Navigate to it directly — many viewing tools don’t show it by default, but it’s there.

Examine block file slack. When a PST reallocates a block to a new message, the old content may remain in the slack space within that block. This is the PST equivalent of file system slack space — partially overwritten data that can sometimes be recovered.

Document your recovery methodology. “I found this deleted message” is not sufficient. Explain which folder it came from, how that folder’s contents reach deleted items, and what state the message was in (soft-deleted vs. hard-deleted vs. recovered from unallocated nodes).

Message Integrity and Authentication

A recurring challenge in email litigation: “How do we know this PST wasn’t modified?”

The honest answer is that PST files can be modified. Outlook modifies them constantly during normal use. The question is whether specific messages within the PST show signs of modification inconsistent with normal Outlook operation.

Internal consistency checks. Well-formed messages have consistent sender/delivery timestamps, proper MIME structure, and property tags that align with the message type. A message that claims to be a calendar invite but has no calendar-specific properties is anomalous.

Header-to-body consistency. Extract the raw MIME content of specific messages and compare the internet headers to the PST property tags. The `Received:` headers inside the MIME body should tell the same routing story as the delivery metadata in the PST properties.

Cross-reference with server logs. If you have access to the mail server (or can obtain logs via legal process), server-side message IDs, delivery times, and routing records can be compared against the PST content. Discrepancies warrant investigation.

Tool-generated hash comparison. Some investigators hash the PST file at intake and again after examination. More useful is hashing specific messages (their full MIME content) and comparing those hashes against any server-side exports of the same messages.

See the companion guide on [email header analysis](/email-header-analysis-authentication/) for the technical details of reading internet headers inside extracted PST messages.

Practical Workflow for a PST Examination

Hash the PST at intake. SHA-256 the file before opening it with any tool. Document this hash in your case notes.

Open with a read-only tool or on a forensic copy. Some PST parsers write to the file when they open it (particularly commercial viewers that try to repair corruption). Use libpff/pffexport for initial cataloging — it’s read-only.

Document the folder structure. Map every folder, including hidden and special-purpose folders. Note the message count in each. This establishes a baseline before you start examining individual messages.

Identify relevant date range. Before diving into individual messages, bound your search. If the relevant period is January to March, filter your tool to that range first. This prevents scope creep and focuses your analysis on what’s legally relevant.

Export and hash relevant messages. For messages you’re going to present as exhibits, export them individually (EML or MSG format) and hash each export. The hash proves the exhibit matches the source at the time you pulled it.

Examine deleted items. Run your recovery process on Deleted Items, Recoverable Items, and unallocated nodes. Document what you found and where, even if the deleted items aren’t ultimately relevant to your case.

Document your tool versions and methodology. This is the piece that gets skipped and then questioned in court. Write it down before you start.

For cases where email evidence connects to broader employment investigations, the [wrongful termination Gmail cache case study](/wrongful-termination-gmail-cache/) illustrates how local email artifacts — whether PST files or cache data — build a communication timeline.

Frequently Asked Questions

Can a PST file be authenticated if there’s no chain of custody from the original server?

Partial authentication is possible through internal consistency analysis — checking that timestamps, message IDs, and MIME structures are self-consistent and align with external records where available. Full authentication without server-side corroboration is difficult and should be presented as such. Where authentication is contested, the best approach is to identify specific messages that can be verified against external evidence (the recipient’s copy, server logs obtained via subpoena, internet provider records) and use those as anchors for the broader collection.

How do you handle password-protected PST files?

PST password protection in older Outlook versions (pre-2010) was notoriously weak — the password stored a simple hash in the file header that tools like pst-password or various commercial tools can bypass quickly. Newer PST encryption (when used) is stronger, but true encryption of PST content is rare — most “password-protected” PSTs use only the weak header-based protection. Document your method for bypassing protection, confirm you have legal authority to access the file, and note the original protection state in your report.

What’s the difference between a message’s “sent” time and “delivery” time in a PST?

The submit time (`PR_CLIENT_SUBMIT_TIME`) reflects when the Outlook client submitted the message to the mail server — when the user hit Send. The delivery time (`PR_MESSAGE_DELIVERY_TIME`) reflects when the recipient’s mail server marked the message as delivered. For messages in a sender’s PST, the submit time is the primary timestamp. For messages in a recipient’s PST, the delivery time matters more. Discrepancies between the two can reflect normal network delays (usually seconds to minutes) or, in cases of suspected manipulation, may warrant deeper investigation against server logs.

How large can PST files get, and does size affect analysis?

Unicode PST files can theoretically reach 50GB, though performance degrades significantly above 20GB in most tools. Very large PSTs are common in matters involving senior executives with years of retained email. For large PSTs, command-line tools (pffexport) are more reliable than GUI tools, and breaking the analysis into date ranges helps manage scope. Some commercial forensic platforms handle large PSTs better than others — test your tool against a large file before you’re under deadline pressure in an active case.

Do PST files contain any metadata beyond what’s visible in message headers?

Yes. In addition to standard email headers, PST messages carry a rich set of MAPI (Messaging Application Programming Interface) properties that Outlook uses internally. These include modification timestamps for the PST record itself (distinct from the email send/receive dates), named properties that track Outlook-specific metadata like read/unread status and reply state, and in some cases extended metadata from transport providers. This MAPI layer is where you find evidence of message modification — if a message’s MAPI record modification timestamp postdates the message’s claimed send time by years, that’s worth investigating.