Investigating white-collar fraud when the evidence can be fake

Occupational fraud is not a rounding error. The Association of Certified Fraud Examiners estimates that the typical organization loses roughly five percent of revenue to fraud each year, with a median loss per case of about $145,000 across the nearly 1,900 cases in its most recent global study.^[1] Those numbers describe schemes that were eventually detected. The cases that matter most to an investigator are the ones still hiding behind a plausible explanation — and the presenting issue is almost never the whole problem. It is the symptom that happens to be visible from where the company stands when it begins to look.

Closing the gap between what is presenting and what is actually occurring is the work of curiosity, not just process. Procedural rigor, technical capability, and broad data access are all necessary; none of them, on its own, forces an investigator to keep asking why a fact pattern exists after a plausible answer has already been offered. Generative AI raises the stakes on both halves of that sentence. It gives fraudsters cheaper ways to fabricate the documents, invoices, and voices an investigation relies on, and it gives the investigator a fluent assistant that will confirm almost any theory it is asked to confirm.

The first theory is the most dangerous one

Investigations run under cost pressure, time pressure, and a structural temptation to close. The team that scopes tightly to the presenting issue can deliver a report on schedule; the team that follows an open question into adjacent systems and unanticipated counterparties cannot promise when the work will end. The economics favor the first team. The outcomes do not.

Confirmation bias here is not a beginner's mistake — it is the failing of a working theory that becomes too useful too early. An investigator forms a hypothesis in the first days because it is necessary to organize the work: which custodians to image, which transactions to sample, which interviews to prioritize. The hypothesis is a tool. The discipline lies in remembering that it is a tool, not a finding. A better framing than “does the evidence support the theory that this executive directed the misstatement?” is “what would I expect to see if someone else directed it — and are those indicators present, absent, or simply not yet examined?” The first question can be answered with the data already collected. The second usually cannot, and that gap is where investigations either deepen or quietly close.

A large language model turns that asymmetry into a trap. Ask a model whether the evidence supports a theory and it returns a fluent, well-organized answer that supports it. Pose the inverse and it returns an equally fluent answer for the inverse. The model is not validating anything; it is performing the task it was given, in the register requested. Treating that output as confirmation rather than as a hypothesis to be tested is the working theory tested against itself — with better grammar.

The dataset you are offered is not the dataset that exists

The same skepticism is owed to the data. Custodian lists are negotiated, sometimes by people whose visibility into the conduct is partial by design. Sources are omitted because they are administratively inconvenient or technically unfamiliar — encrypted messaging apps, collaboration platforms, voice and video systems with retention windows measured in days, personal devices used for work. Most collection workflows were built around email, so everything else is under-collected by default.

AI adds a layer that did not exist a few years ago: the prompts, retrieval traces, intermediate outputs, and agent action logs generated by the systems employees now use to draft, summarize, code, and decide. When the conduct involves those systems, the relevant evidence is not only the resulting document but the interaction history behind it. The curious investigator asks where that history is stored, who has rights to it, and what its retention period is — and asks early enough that the answers still exist. A defensible methodology matters precisely because it keeps that question from depending on the instincts of the most experienced person in the room.

FIG. 1 — Claim → corroborate → authenticate, with an AI-artifact checkpoint. A synthetic finding is not an endpoint; it widens the collection and reopens the theory.

Authenticity is now a separate inquiry

The risk is no longer hypothetical. The FBI’s Internet Crime Complaint Center has warned that criminals are using generative AI to facilitate financial fraud — fabricating text, images, audio, and video to make schemes more believable while reducing the effort required to deceive.^[2] Voice cloning now takes only seconds of source audio. In late 2024 the Financial Crimes Enforcement Network issued a dedicated alert on fraud schemes that use deepfake media to defeat the identity-verification, authentication, and due-diligence controls banks rely on, describing falsified documents, photographs, and videos created with generative tools.^[3] Synthetic documents, AI-generated invoices, and voice-cloned audio are within the operational reach of mid-sophistication actors, and they turn up in matters that begin as ordinary commercial disputes.

That changes the investigator’s posture toward every artifact. Accepting that a document is authentic because it looks authentic is no longer a defensible default. Authenticity has become a distinct inquiry — corroborate the claim against independent records, then check the artifact itself for provenance, metadata, and synthetic markers — and it has to be contemplated from the outset rather than discovered on cross-examination. Where it exists, signed provenance such as C2PA Content Credentials does more work than any after-the-fact recollection; see /provenance.

Curiosity at the human layer

Interviews are where curiosity produces the highest return, and where synthetic risk meets human behavior. An interview run as a checklist confirms what the interviewer already suspects: the questions come from the working theory, and the theory exits slightly more confirmed than it entered. A curious interview proceeds differently. The interviewer notices when a witness answers a different question than the one posed, which topics the witness steers toward unprompted, and which names appear in the answer that were not in the question. None of those is decisive alone. Each is a thread, and the willingness to pull on threads is what separates an interview that confirms from one that opens.

Silence is the most useful instrument in the room. A prepared answer is short; an unprepared one is longer, and a witness invited to fill a pause will often fill it with something that was not in the rehearsed response. A name mentioned in passing in the third interview, untethered to anything the investigator was looking for, sometimes turns out to organize the whole matter when it reappears in the seventh.

Curiosity is not license

None of this means following every interesting question off the books. A forensic investigator works within a scope set by counsel, a privilege framework that protects the work product, proportionality limits on discovery, and an authorized budget. The professional discipline is recognizing when an unanswered question calls for a conversation with counsel about scope, rather than a quiet exploration that produces a record nobody can defend. The hobbyist follows curiosity wherever it leads; the professional follows it to the edge of authorization and stops there, with a memo. A defensible record of what was asked, pursued, deferred, and why is what lets curiosity operate without becoming undisciplined.

Regulators are watching that shape. The Department of Justice’s 2024 update to its Evaluation of Corporate Compliance Programs asks prosecutors whether a company is vulnerable to schemes enabled by new technology — including false approvals and documentation generated by AI — and whether it has controls to identify and mitigate those risks.^[4] A prosecutor, agent, or successor regulator who later evaluates an internal investigation does so with full compulsory process and an internal benchmark: what a thorough independent inquiry into the same conduct would have produced. An investigation that never asks whether AI tools served as a vector — for the conduct, for its concealment, or for misidentifying who did what — is increasingly exposed.^[4] A report that looks scoped to avoid finding the larger problem does not earn credit for its conclusions, and the company ends up defending the conduct twice: once on the merits, once on the credibility of its own inquiry.

Curiosity, in this discipline, is not temperament. It is a methodological commitment — that the working theory is the thing most in need of testing, that the dataset offered is not the dataset that exists, that the interview is the answers rather than the questions, and that an artifact’s authenticity is something to prove rather than assume. The tools have changed; AI expands both the volume of relevant material and the number of ways it can be misread. The discipline has not. The investigations that close cleanly tend to be the ones that ask the additional question. The difference is rarely visible at the time. It is almost always visible afterward.