Intuit compressed months of tax code implementation into hours — and built a workflow any regulated-industry team can adapt

turbotax ai smk
When the One Big Beautiful bill arrived as a 900-page unstructured document—with no standardized schema, no published IRS forms, and a tight shipping deadline—Intuit’s TurboTax team had a question: Could AI compress a months-long implementation into days without sacrificing accuracy?

What they created to do this is less a story than a template, more a workflow combining commercial AI tools, a proprietary domain-specific language, and a custom unit testing framework that any domain-constrained development team can learn.

Joey Shaw, Director of Tax at Intuit, has spent more than 30 years at the company and adheres to both the Tax Cuts and Jobs Act and the OBGB. "There was a lot of noise in the legislation and we were able to remove the tax implications, limit it to individual tax provisions, limit it to our clients," Shaw told VentureBeat. "That kind of distillation using the tools was really fast, and then enabled us to start coding even before we had the forms and instructions."

How did OBBB level up?

When the Tax Cuts and Jobs Act passed in 2017, the TurboTax team worked through the law without AI assistance. It took months, and the accuracy requirements left no room for shortcuts.

"We had to study the law and we would code sections that referenced other law code sections and try to figure it out on our own," Shaw said.

OBBB came with similar accuracy requirements but with a different profile. Containing over 900 pages, it was structurally more complex than the TCJA. It came as an unstructured document without any standardized schema. The House and Senate versions used different language to describe the same provisions. And the team had to begin implementation before the IRS published official forms or instructions.

The question was whether AI tools could compress the timeline without compromising the output. The answer required a specific sequence and tooling that did not yet exist.

From unstructured documents to domain-specific code

The OBGB was still moving through Congress when the TurboTax team began working on it. Using large language models, the team summarized the House version, then the Senate version, and then resolved differences. Both chambers referenced the same underlying tax code sections, a consistent anchor point that lets models make comparisons across structurally inconsistent documents.

By signing day, the team had already filtered the provisions impacting TurboTax customers, limited to specific tax situations and customer profiles. Parsing, reconciliation and provisioning filtering moved from weeks to hours.

Those functions were handled by ChatGPT and general purpose LLM. But those tools became hard to limit as the work shifted from analysis to implementation. TurboTax does not run on a standard programming language. Its tax calculation engine is built on a proprietary domain-specific language created internally at Intuit. Any model generating code for that codebase would have to translate legal text into syntax it was never trained on, and recognize how new provisions interact with decades of existing code without breaking what was already working.

The cloud became the primary tool for that translation and dependency-mapping work. Shaw said it can identify what changed and what didn’t, allowing developers to focus only on the new provisions.

"It is able to integrate with things that do not change and identify dependencies on those that do change," He said. "This sped up the development process and enabled us to focus only on the things that made a difference."

Building tooling meets near-zero error limits

The general purpose LLM team got working code. Making that code shippable requires two proprietary tools built during the OBBB cycle.

The first auto-generated TurboTax product screen directly from the law change. Previously, developers curated those screens individually for each provisioning. The new tool handled most of it automatically, with manual optimization only where necessary.

The second was a purpose-built unit testing framework. Intuit had always run automated tests, but the previous system only produced pass/fail results. When a test failed, developers had to manually open the underlying tax return data file to find the cause.

"Automation will tell you pass, fail, you have to dig into the actual tax data file to see what could be wrong," Shaw said. The new framework identifies the specific code segment responsible, generates an explanation and allows improvements to be made within the framework.

Shaw said the accuracy of the consumer tax product should be close to 100 percent. Sarah Ernie, Intuit’s vice president of technology for the consumer group, said the architecture must produce deterministic outcomes.

"Having the kinds of abilities around determinism and being verifiably true through tests – that’s what leads to that kind of confidence," Ernie said.

Tooling handles the speed. But Intuit also uses LLM-based assessment tools to validate AI-generated outputs, and even requires a human tax expert to assess whether the results are correct. "It takes human expertise to be able to validate and verify anything," Ernie said.

Four components any regulated-industry team can use

OBBB was a tax problem, but the underlying conditions are not unique to the tax. Healthcare, financial services, legal tech and government contracting teams routinely face the same combination: complex regulatory documentation, tight deadlines, proprietary codebases and near-zero error tolerance.

Depending on Intuit’s implementation, four elements of the workflow are transferable to other domain-constrained development environments:

  1. Use commercial LLM for document analysis. General-purpose models handle parsing, reconciliation, and provisioning filtering well. This is where they add speed without jeopardizing accuracy.

  2. When the analysis implementation is built, shift to domain-aware tooling. General-purpose models generate code in proprietary environments without understanding that it will produce output that cannot be trusted at scale.

  3. Build evaluation infrastructure before the deadline, not during the sprint. Generates simple automated test pass/fail outputs. Domain-specific testing tooling that identifies failures and enables improvements in context is what makes AI-generated code shippable.

  4. Deploy AI tools across the entire organization, not just engineering. Shaw said Intuit provided training and oversight of usage across all operations. AI flow was distributed throughout the organization rather than concentrated among early adopters.

"We continue to lean toward AI and human intelligence opportunities here, so our customers can get what they need from the experiences we build." Ernie said.



<a href

Leave a Comment