Trust, but Verify: Building Legal AI You Can Rely On

Over the past couple of months I’ve been building an internal application to help with my estate planning practice. The short version: it sits where my drafting software and client-intake system meet. In theory, after completing a detailed estate plan design meeting with a client, the system should produce a first-draft of the entire document set (revocable living trust, a pour-over will, powers of attorney, a health-care directive, and the rest) automatically for me to review. This is the first in a series of posts about what building it has actually involved. I want to start not with the AI that writes documents, but with the far less glamorous work that comes first: teaching the system what a good trust looks like.

A model can only draft like a lawyer if it has seen how lawyers draft.

Polished sample documents are scarce and proprietary. But there is a large public reservoir hiding in plain sight: court files. When a trust is litigated, the instrument itself gets attached as an exhibit. I was able to obtain roughly 100+ California court filings, about a gigabyte of complaints, petitions, and accountings, with real trusts buried inside.

Buried is the right word. These were scanned, mislabeled, occasionally corrupt. Several hundred genuine estate-planning documents sat in there, mixed with thousands of pages of pleadings. I needed them pulled out, sorted, and cleaned into something a model could learn from. Reading all of it by hand was not realistic.

The Obvious Move, and the Trap

So I pointed AI at the pile. That is the obvious move in 2026. It is also where most legal-AI stories quietly go wrong. The model does the work, produces a confident-looking result, and you have no real idea how much of it is right. In a field where a misfiled document or a dropped page is a genuine problem, “looks done” is not the same as done. The answer is older than AI: trust, but verify.

What that meant in practice…

I did not ask the AI to “process the folder” and take its word for the result. First I wrote a spec: a plain-language operating manual that defined exactly what counted as a document, what the output had to look like, and the rules that could never be broken. Then I set a fleet of AI workers loose, one per case file, all reading the same spec. And critically, every result was checked by a separate, automated verifier. Not another AI, but ordinary code that mechanically confirmed each file was complete, correctly structured, and that nothing had gone missing.

The model did the reading and the judgment calls. The code enforced the rules. That division is the whole game.

It mattered more than I expected, because the first pass lied to me. Gently, and in predictable ways. A handful of files were flagged “no trust here.” Every one of them actually contained a trust. They were scanned images with no readable text, so the quick automated triage simply missed them. One person showed up as two separate cases because the filings used two different surnames.

None of this was the model being stupid. It was the model being confidently wrong in exactly the situations a careful human would slow down on. That is the point. You cannot tell which results to trust just by looking at them.

The verifier could. And once, it caught something that matters even more. One of the AI workers, trying to be helpful, deleted an original source file. A tripwire in the system flagged the deletion immediately, before it became permanent. The fix was a new line in the spec: never delete a source. The next worker followed it.

The Corpus is not Really the Point

I ended with 280+ clean documents pulled from 117 court files, for a few dollars in processing cost. That corpus can be part of the raw material the rest of the application learns from. But the number is not the lesson here. The lesson is how I got to trust those cleaned, organized documents without personally reading all of them.

It generalizes well beyond estate planning. The hard part of legal AI is almost never the model. It is the system you build around the model: the explicit rules, the independent checks, the tripwires that catch the rare bad call before it ships. The model is the probabilistic core. Almost everything trustworthy about the result comes from the deterministic shell around it.

Trust, but verify. The old phrase turns out to be a software architecture.