In a recent article published by Futurism, Jad Tarifi, the founder of Google’s inaugural generative AI team, posited a cautionary stance regarding careers in law and medicine, asserting that these professions primarily entail the memorization of information. This assertion prompted a critical examination of prevalent misconceptions concerning the true nature of knowledge and skill acquisition within these domains.
A compelling illustration of this misunderstanding is evidenced by the consistent erroneous responses from three distinct AI large language models (LLMs) when queried about specific provisions within the One Big, Beautiful Bill Act (OBBBA) designed to reduce an individual’s adjusted gross income. As I documented in a prior blog entry, I, a certified public accountant (CPA) without additional legal training, and certainly not possessing comprehensive recall of the entire OBBBA, was immediately able to identify the flaw in the models’ conclusions. This occurred despite the theoretical premise that these models “knew” the entirety of the Internal Revenue Code (IRC), along with all pertinent binding guidance, and had unfettered access to the complete legislative text that modified the aforementioned law.
Within tax law, rote memorization has never been, nor will it ever be, the sole method for ascertaining legal principles and their application. Instead, a meticulous reading of the law is imperative, necessitating careful attention to its enumerated items, cross-references, and the subsequent tracing of those references (which may lead to further cross-references) to construct a comprehensive understanding of the statute. Furthermore, the process involves distinguishing unambiguous statutory provisions, which dictate a singular interpretation, from those exhibiting ambiguity, which require interpretation. This interpretive process relies upon established canons of statutory construction developed over time.
Subsequent to this initial statutory analysis, the practitioner must then explore other existing interpretive binding official guidance, assess whether the law has undergone changes since such guidance was issued, and finally, consult expert discussions from third-party sources and non-binding official guidance. This comprehensive approach yields potentially supportable interpretations of the matter at hand, alongside an assessment of the likelihood that any given interpretation would ultimately be accepted by an IRS agent, an appellate conferee, or the highest court to which the case might escalate.
The fundamental flaw in the analyses provided by ChatGPT, Gemini, and BlueJ in this instance was their failure to commence with an isolated examination of the law. Instead, they prematurely resorted to other sources, neglecting to analyze the memorized legal text and the legislative amendments in isolation. Moreover, it remains unclear whether such isolated analysis is truly feasible for these models given their operational paradigms.
As previously detailed, the OBBBA provisions concerning the exemption of tips, overtime pay, and car loan interest from taxation were appended as deductions to Section 63 of the IRC. This classification renders them deductions taken along with either the standard deduction or itemized deductions in computing taxable income, rather than deductions that reduce adjusted gross income. As elaborated upon in the earlier article, deductions utilized in the computation of adjusted gross income would be allocated to Section 62.
While the models had purportedly memorized the entire structure of the IRC, theoretically imparting this knowledge, they failed to undertake the crucial step that tax CPAs and attorneys should perform at this juncture: consulting the statutory text and following cross-references to gain an initial understanding of the IRC’s provisions as amended pertaining to this issue. Instead, they were diverted by other related information contained within their models and acquired from the web during data updates to fill in for developments taking place after the end of time covered by their training. This data included articles quoting Congressional sources that correctly noted these items were deductible regardless of whether a taxpayer itemized deductions. Furthermore, the training data encompassed years of articles that referred to various prior specific items deductible irrespective of itemization as “above the line” deductions. Their data also indicated that “above the line” signified a deduction used in computing adjusted gross income. None of this memorized data was factually incorrect, though much of it proved irrelevant in this specific context.
Consequently, the models quickly identified a high correlation between the terms “deductible even if the taxpayer does not itemize deductions” and a deduction being categorized as “above the line,” as well as a nearly one-to-one correlation between a deduction being referred to as “above the line” and its use in computing adjusted gross income. Again, all of this derived from the memorized information was accurate.
However, at this juncture, the analysis fell apart due to an incorrect inference. The models utilized all data indicating these deductions were available to both itemizing and non-itemizing taxpayers (which was entirely correct) to infer that these deductions were “above the line” and utilized in computing adjusted gross income (absolutely not correct). This inference was reinforced by the historical commentary present within their vast memorized data. Nevertheless, it was an inference that proved completely erroneous.
The question then arises: why did the models’ access to more current sources not challenge their conclusions? This highlights the fallibility of many human authors who were expeditiously producing commentary on the bill. Numerous such articles, some originating from highly reputable sources, incorrectly labeled these provisions as “above the line” deductions.
While the precise reasons for these authors’ misclassification remain speculative, it appears evident that they did not base their initial articles on a direct analysis of the bill itself and the Internal Revenue Code for this issue. It is plausible that many made the same inference that the LLMs derived from their extensive memorized data—namely, that historically, Congress had enacted provisions for items deductible even without itemization as deductions used to compute adjusted gross income. Others, encountering previously published articles making the claim that these deductions reduced adjusted gross income, simply relied upon those sources. Furthermore, I suspect that some authors, operating under time constraints, posed the question to LLMs, which, based on their initial model assumptions, provided the erroneous result.
In any event, the articles discovered by the LLMs analyzing the law served to corroborate their models’ initial assumptions, which were rooted in their extensive memorized data, thereby creating a significant feedback loop.
My approach to the new legislation mirrored that of previous laws, with the notable addition of Google’s NotebookLM, an AI application primarily reliant on user-provided sources. The notebook I established contained exclusively the tax provisions of the statute. As I systematically reviewed each section of the bill, I utilized NotebookLM to generate summaries of the respective sections, while simultaneously conducting my own independent review to ensure comprehensive understanding and prevent oversight during rapid analysis of the law.
NotebookLM’s analysis was largely confined to the provided document, offering little beyond the assertion that each provision modified Section 63 since that was all that was available to the application. Consequently, I cross-referenced Section 63 to ascertain the specific implications of such modifications. As I knew Congress had, on a rare occasion, introduced an additional “below the line” deduction by placing it in this Section, accessible regardless of whether a taxpayer itemized deductions. They had decided to do this multiple times in the new bill.
A precedent for such a deduction can be found in the Qualified Business Income (QBI) Deduction under IRC §199A, enacted by the Tax Cuts and Jobs Act of 2017. This provision has remained in effect and its application was extended as part of the OBBBA.
Posing inquiries to the models regarding the placement of these deductions within Section 63, akin to the QBI deduction in 2017, frequently resolved the models’ prior limitations. This shift in focus directed their attention to articles detailing that specific deduction and highlighting the impact of its placement in Section 63 rather than Section 62. The models subsequently confirmed that the bill text indeed added these deductions to Section 63 and, likely drawing from the newly focused articles, concluded that these deductions do not reduce adjusted gross income.
Even if their approach remained flawed, they did finally arrive at a correct solution.
I am not an attorney and do not claim to be one. However, I would be astonished if similar problems didn’t frequently emerge when parties depend on LLMs for tasks typically handled by law students who would become attorneys, whom Mr. Tarifi suggests should discontinue their legal studies.
Mr. Tarifi undoubtedly possesses far greater knowledge of LLMs and AI than I ever will. Yet, I question whether proponents of AI solutions truly grasp the issues they aim to resolve, or if they merely presume to understand them. The immediate availability of the entire IRC and all supporting documentation within an LLM’s model is genuinely incredibly useful and will revolutionize our approach to tax research and analysis. Nevertheless, I hope I have demonstrated how mere rote memorization is insufficient to perform the job being done today—nor do I observe signs that newer models are progressing beyond memorization to conduct a comprehensive analysis of new laws and developments.
Large Language Models (LLMs) represent a significant advancement that professionals in tax practice must comprehend to maintain relevance. However, similar to the introduction of computers in the 1960s and 1970s and subsequent machine learning enhancements beyond initial automation of routine preparation tasks, LLMs are not anticipated to be the technology that renders tax professionals obsolete.