Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Model training data already contains all the text there is[0], so they can already answer questions like this (especially with web search), but they aren't good at tax calculations.

https://arxiv.org/abs/2507.16126v1

[0] but it's quite possible the conversion from HTML to text is bad



The problem is that the text of US tax code isn't enough to know the correct action to take. The IRS has semi-formal policies based on how it has chosen to interpret the statutes. There are areas of gray that they don't clearly specify. Some of this is in supplementary publications but it still has subjective elements. One example is that settlements for "serious injuries" are regarded as non-taxable income. What constitutes serious is a squishy concept.


Yeah you'd have to pull in a lot of case law and perform a lot of fine tuning on expert tax advice (you'd probably have to create this training data).

Would be neat (and still legally fraught!).


You can technically use the language model as a data model. That was the quick hack that started it all, autocomplete on a question produces the answer, yes.

However it's clear that we are moving towards separating the data and the language model. Even base chatgpt is given Search Tools and python Tools instead of producing them by text, the tool call itself may be generated by the model though.

You can for sure use a pure LLM to ask it questions about tax code, but we'll probably see specific tools that only contain canon law and kosher case law, and sources it properly. Y'know instead of halucinating




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: