Using AI in e-discovery: how to defend the review, not just run it
Technology-assisted review has been judicially blessed for more than a decade. Generative-AI tools are now joining the workflow. The hard part was never running the algorithm — it is building a record that survives a defensibility challenge.
The volume problem in modern disputes is old news: a mid-size commercial arbitration can carry hundreds of gigabytes of email, chat, and file shares, and the cost of reviewing all of it by hand is the single largest line item in most discovery budgets. Technology-assisted review (TAR) — the family of supervised-learning methods also called predictive coding — was the legal profession's first encounter with machine learning as a production workflow, and courts approved it years ago. The newer entrants are generative-AI review tools that draft issue summaries, propose relevance calls, and answer natural-language questions across a corpus. The technology has changed; the governing question has not. It is not may I use AI? It is can I defend the process I used?
That distinction matters because e-discovery is governed by a process standard, not an outcome standard. No review — human or machine — finds every relevant document. What a producing party owes is a reasonable, proportional, and good-faith effort, and the ability to show its work if challenged. AI changes the tooling inside that obligation. It does not change the obligation.
The case law settled the threshold question
Judicial acceptance of TAR is not new or contested. In 2012, Magistrate Judge Andrew Peck issued the first opinion approving computer-assisted review, holding that it “is an acceptable way to search for relevant ESI in appropriate cases.”[1] Three years later, in Rio Tinto PLC v. Vale S.A., the same judge went further, writing that “the case law has developed to the point that it is now black letter law that where the producing party wants to utilize TAR for document review, courts will permit it.”[2] The empirical case had been made even earlier: a widely cited 2011 study by Maura Grossman and Gordon Cormack found that TAR could achieve recall and precision at least comparable to, and often better than, exhaustive manual review — at a fraction of the cost.[3]
There is an important limit on the other side. In Hyles v. City of New York, Judge Peck declined to force a responding party to use TAR over its preference for keyword search, invoking Sedona Principle 6: the responding party is generally best positioned to choose the tools for producing its own ESI.[4] The producing party owns the methodology — and therefore owns the burden of defending it.
Proportionality is the frame, not a footnote
In federal practice, the analysis starts with FRCP 26(b)(1): discovery must be relevant and proportional to the needs of the case, weighing the amount in controversy, the parties' resources, the importance of the issues, and whether the burden outweighs the likely benefit.[5] Proportionality is the strongest argument for AI review and, paradoxically, a constraint on it. A six-figure manual review in a dispute worth a few hundred thousand dollars is hard to justify; TAR is the proportionate answer. But the same logic cuts against gold-plating an AI validation protocol beyond what the stakes warrant.
Arbitration sharpens this. The whole premise of ADR is a faster, cheaper, more party-controlled process, and providers expect discovery to be tailored, not litigation-grade by default. That makes AI review a natural fit — and makes the parties' agreement, or the arbitrator's procedural order, the real source of the rules. JAMS, for its part, has published dedicated rules for disputes involving AI systems, built around default protective orders and disclosure of the systems at issue, signaling that ADR institutions now treat AI as a first-class procedural subject rather than an afterthought.[6] Practitioners should assume that how they used AI in review may itself become a disclosable, negotiable term.
Validation: the part that actually gets challenged
When TAR disputes reach the courts, they rarely turn on whether AI was permissible. They turn on validation — whether the producing party can show the result was adequate. The Sedona Conference's TAR Case Law Primer tracks exactly this shift, noting that the unsettled questions are now about methodology, metrics, and validation rather than basic acceptability.[7] Two metrics dominate. Recall measures completeness — the share of truly relevant documents the process actually found. Precision measures discipline — the share of documents the process flagged that were in fact relevant. They trade off against each other, and the defensible posture is a documented recall target, a statistically valid control sample to estimate it, and a record of where the process landed.
The generative-AI layer raises the validation bar rather than lowering it. A large language model can produce a fluent relevance rationale that is confidently wrong, and its calls can drift as prompts or model versions change. None of that is fatal, but it means the same empirical discipline applies: sample the model's output, measure it against human judgment, log the prompt and model configuration, and be ready to explain it. Treat a generative tool's output as a proposal to be validated, not a conclusion to be trusted.
The human-in-the-loop is a legal requirement, not a courtesy
Under FRCP 26(g), an attorney's signature on a discovery response certifies that, after a reasonable inquiry, the response is complete and correct.[8] That certification cannot be delegated to a model. A lawyer signs; a lawyer must therefore understand and stand behind the process. Practically, the human-in-the-loop does the work the rule assumes: coding the seed set, reviewing the model's edge cases, running the validation sample, and deciding when the metrics are good enough to certify. The arbitration variant adds an ethical overlay — the duty of competence reaches the technology a lawyer deploys, and candor with the tribunal and opposing party about the use of AI review tools protects the process from a later integrity challenge.
A second human-judgment point is bias. A skewed seed set can teach the model to systematically miss a category of documents, producing an under-inclusive result that looks clean on its face. That is precisely the kind of defect a validation sample is designed to catch — and precisely why a reviewer, not the algorithm, has to own the final call.
What to do before the dispute, not during it
Defensibility is built into the workflow or it is not built at all. Four moves pay off regardless of which tool you run:
- Document the protocol contemporaneously. Record the tool and version, the seed-set methodology, the recall target, the validation sampling plan, and the metrics achieved. Reconstructing this after a challenge is far weaker than a record made as you went.
- Negotiate the methodology up front. In both arbitration and litigation, transparency about the approach — the cooperative posture the early TAR opinions rewarded — converts a potential motion into an agreed protocol.
- Keep the human gate explicit. Identify who reviews edge cases, who runs validation, and who signs under 26(g). Make the certification a deliberate step, not a formality.
- Validate generative output like any other measurement. Sample it, measure recall and precision against human calls, and log the prompts and configuration. See /research-lab.
The honest summary is that AI in e-discovery is no longer a frontier question — it is a competence expectation. The tools will keep improving, and generative review will keep absorbing tasks that used to be manual. What endures is the standard underneath: a reasonable, proportional, documented process, validated by numbers, certified by a human who understands it. Defend the process, and the technology takes care of itself.
Sources
- [1]Da Silva Moore v. Publicis Groupe, 287 F.R.D. 182 (S.D.N.Y. 2012) (Peck, M.J.) — first judicial opinion approving computer-assisted review.
- [2]Rio Tinto PLC v. Vale S.A., 306 F.R.D. 125, 127 (S.D.N.Y. 2015) (Peck, M.J.) — TAR is “black letter law” where the producing party elects it.
- [3]Maura R. Grossman & Gordon V. Cormack, “Technology-Assisted Review in E-Discovery Can Be More Effective and More Efficient Than Exhaustive Manual Review,” 17 Rich. J.L. & Tech. 11 (2011).
- [4]Hyles v. City of New York, No. 10 Civ. 3119 (S.D.N.Y. Aug. 1, 2016) (Peck, M.J.) — a responding party cannot be forced to use TAR; Sedona Principle 6 controls.
- [5]Federal Rule of Civil Procedure 26(b)(1), proportionality factors. Legal Information Institute, Cornell Law School.
- [6]JAMS, “Artificial Intelligence Disputes Clause and Rules” (2024) — default protective order and disclosure of AI systems at issue.
- [7]The Sedona Conference, “TAR Case Law Primer, Second Edition,” 24 Sedona Conf. J. (2023) — methodology, metrics, and validation as the live issues.
- [8]Federal Rule of Civil Procedure 26(g), attorney certification after reasonable inquiry. Legal Information Institute, Cornell Law School.