No. The way to code going forward with AI is Test Driven Development. The code i...

DanHulton · 2025-10-28T03:48:13 1761623293

This is incorrect for a lot of reasons, many of which have already been explored, but also:

> with every new iteration of the AI, the internal code will get better

This is a claim that requires proof; it cannot just be asserted as fact. Especially because there's a silent "appreciably" hidden in there between "get" and "better" which has been less and less apparent with each new model. In fact, it more and more looks like "Moore's law for AI" is dead or dying, and we're approaching an upper limit where we'll need to find ways to be properly productive with models only effectively as good as what we already have!

Additionally, there's a relevant adage in computer science: "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." If the code being written is already at the frontier capabilities of these models, how the hell are they supposed to fix the bugs that crop up, especially if we can't rely on them getting twice as smart? ("They won't write the bugs in the first place" is not a realistic answer, btw.)

CuriouslyC · 2025-10-28T10:59:38 1761649178

Just because you're not writing code where you can see that the new models are appreciably better doesn't mean they aren't. LLM progress now isn't in making it magically appear smarter at the top end (that's in diminishing returns as you imply), but at filling in weak points in knowledge, holes in capability, improving default process, etc. That's relevant because it turns out most of the time the LLM doesn't fail at coding because it's not a general super genius, but because it just had a hole in its capabilities that caused it to be dumb in a specific scenario.

Additionally, while the intelligence floor is shooting up and the intelligence ceiling is very slowly rising, the models are also getting better at following directions, writing cleaner prose, and their context length support is increasing so they can handle larger systems. The progress is still going strong, it just isn't well represented by top line "IQ" style tests.

LLMs and humans are good at dealing with different kinds of complexity. Humans can deal with messy imperative systems more easily assuming they have some real world intuition about it, whereas LLMs handily beat most humans when working with pure functions. It just so happens that messy imperative systems are bad for a number of reasons, so the fact that LLMs are really good at accelerating functional systems gives them an advantage. Since functional systems are harder to write but easier to reason about and test, this directly addresses the issue of comprehending code.

lugu · 2025-10-28T07:53:04 1761637984

The argument they are making is that if a bug is discovered, the agent will not debug it, instead a new test case is created, and the code is regenerated (I suppose if a quick fix isn't found). That is why they don't need debugging agent twice as capable as coding agent. I don't know if this works in practice, as in my experience, tests are intertwined with the code base.

sarchertech · 2025-10-28T00:43:31 1761612211

> Then, just wait a few months and then rerun the code generation with a new version of the AI and the code will be better.

How many times have you seen a code change that “passed all the tests” take down production or break an important customer’s workflow?

Usually that was just a relatively small change.

Now imagine that you regenerated literally all the code.

The code is the spec. Any other spec comprehensive enough to cover all possible functionality has to be at least as complex as the code.

wonnage · 2025-10-28T02:04:03 1761617043

TDD is testing in production in disguise. After all, bugs are unexpected and you can’t write tests for a bug you don’t expect. Then the bug crops up in production and you update the test suite.

embedding-shape · 2025-10-28T02:33:44 1761618824

TDD has always been about two things for me; be able to move forward faster because I have something easy to execute that compares it against the known wanted state, and in the future preventing unwanted regressions. I'm not sure I've ever thought of unit testing as "prevent potential future bugs", mostly up front design prevents that, or I'd use property testing, but neither of those are inside the whole "write test then write code" flow.

Retric · 2025-10-28T03:58:42 1761623922

The intended workflow of TDD is to write a set of tests before some code. The only reason that makes sense conceptually is to prevent possible future bugs from going undetected.

Put another way if your TDD always pass then there’s no point in writing them, and there’s no known bugs before you have any code. So discovering future bugs that didn’t exist when you’re writing those tests is the point.

fainpul · 2025-10-28T08:38:23 1761640703

But with tests you can only prevent those future bugs you managed to think of. Anything you didn't anticipate will not be covered by tests.

TDD is useful to build some initial "guard rails" when writing new code and it's useful to prevent regressions (by adding more guard rails when you notice the program went off the road). You can't just add "all the guard rails ever needed" in advance.

Retric · 2025-10-28T20:34:58 1761683698

Some classes of bugs need specific tests to find, but I can catch a spelling error without specifically looking for a spelling error.

Similarly, bugs often crop up because of interactions which aren’t obvious at the time. Thus the reason a test is failing can be wildly different than the intended use case of a test. Perhaps the test failed because the continuous integration environment has some bad RAM, you’ll need to investigate to discover why a test fails.

XorNot · 2025-10-28T10:15:44 1761646544

Honestly the way I use testing these days is as a more persistent version of a Jupyter notebook. Some piece of code is just complex enough I don't fully understand it, so hopefully the test framework in language of choice will make it easy enough to isolate it and right a bunch of quick to execute explorations of things I expect and do not expect about it.

black_knight · 2025-10-28T04:29:35 1761625775

I don’t really understand how to write tests before the code… When I write code, the hard part is writing the code which establishes the language to solve the problem in, which is the same language the tests will be written in. Also, once I have written the code I have a much better understanding as the problem, and I am in a way better position to write the correct tests.

numpy-thagoras · 2025-10-28T04:59:18 1761627558

You write the requirements, you write the spec, etc. before you write the code.

You then determine what are the inputs / outputs that you're taking for each function / method / class / etc.

You also determine what these functions / methods / classes / etc. compute within their blocks.

Now you have that on paper and have it planned out, so you write tests first for valid / invalid values, edge cases, etc.

There are workflows that work for this, but nowadays I automate a lot of test creation. It's a lot easier to hack a few iterations first, play with it, then when I have my desired behaviour I write some tests. Gradually you just write tests first, you may even keep a repo somewhere for tests you might use again for common patterns.

pjmlp · 2025-10-28T08:11:42 1761639102

I want to have a CUDA based shader that decays the colours of a deformable mesh, based on texture data fetched via Perlin noise, it also has to have a wow look as per designer requirements.

Quite curious about the TDD approach to that, espcially taking into account the religious "no code without broken tests" mantra.

CuriouslyC · 2025-10-28T11:08:18 1761649698

Break it down into its independent steps, you're not trying to write an integration test out of the gate. Color decay code, perlin noise, etc. Get all the sub-parts of the problem mapped out and tested.

Once you've got unit tests and built what you think you need, write integration/e2e tests and try to get those green as well. As you integrate you'll probably also run into more bugs, make sure you add regression tests for those and fix them as you're working.

pjmlp · 2025-10-28T11:20:16 1761650416

Got to figure that TDD for the UX wow designer part.

sarchertech · 2025-10-28T11:53:47 1761652427

TDD is terrible for anything where the hard part is the subjective look and feel.

MoreQARespect · 2025-10-28T12:00:17 1761652817

1. Write test that generates an artefact (e.g. picture) where you can check look and feel (red).

2. Write code that makes it look right, running the test and checking that picture periodically. When it looks right, lock in the artefact which should now be checked against the actual picture (green, if it matches).

3. Refactor.

The only criticism ive heard of this is that it doesnt fit some people's conceptions of what they think TDD "ought to be" (i.e. some bullshit with a low level unit test).

CuriouslyC · 2025-10-28T12:16:41 1761653801

You can even do this with LLM as a judge as well. Feed screenshots into a LLM as a judge panel and get them to rank the design 1-10. Give the LLM judge panel a few different perspectives/models to get a good distribution of ranks, and establish a rank floor for test passing.

embedding-shape · 2025-10-28T12:48:04 1761655684

Parent mentioned "subjective look and feel", LLMs are absolutely trash at that and have no subjective taste, you'll get the blandest designs out of LLMs, which makes sense considering how they were created and trained.

CuriouslyC · 2025-10-28T13:47:40 1761659260

LLMs can get you to about a 7.5-8/10 just by iterating itself. The main thing you have to do is just wireframe the layout and give it the agent a design that you think is good to target.

embedding-shape · 2025-10-28T17:08:15 1761671295

Again, they have literally zero artistic vision and no, you cannot get an LLM to create a 7.5 out of 10 web design or anything else artistic, unless you too miss the facilities to properly judge what actually works and looks good.

CuriouslyC · 2025-10-28T18:41:40 1761676900

You can get an AI to produce a 10/10 design trivially by taking an existing 10/10 design and introducing variation along axes that are orthogonal to user experience.

You are right that most people wouldn't know what 10/10 design looks/behaves like. That's the real bottleneck: people can't prompt for what they don't understand.

embedding-shape · 2025-10-28T20:00:30 1761681630

Yeah, obviously if you're talking about copying/cloning, but that's not what I thought the context here was, I thought we were talking about LLMs themselves being able to create something that would look and feel good for a human, without just "Copy this design from here".

sarchertech · 2025-10-28T20:45:51 1761684351

That only works for the simplest minimally interactive examples.

It is also so monumentally brittle that if you do this for interactive software, you will drive yours nuts trying.

Retric · 2025-10-28T05:23:23 1761629003

TDD fits better when you use a bottom up style of coding.

For a simple example, FuzzBuzz as a loop that has some if statements inside is not so easy to test. Instead break it in half so you have a function that does the fiddly bits and a loop that just contains “output += MakeFizzBizzLineForNumeber(X);” Now it’s easy to come up tests for likely mistakes and conceptually you’re working with two simpler problems with clear boundaries between them.

In a slightly different context you might have a function that decides which kind of account to create based on some criteria which then returns the account type rather than creating the account. That function’s logic is then testable by passing in some parameters and then looking at the type of account returned without actually creating any accounts. Getting good at this requires looking at programs in a more abstract way, but a secondary benefit is rather easy to maintain code at the cost of a little bookkeeping. Just don’t go overboard, the value is breaking out bits that are likely to contain bugs at some point where abstraction for abstraction’s sake is just wasted effort.

leptons · 2025-10-28T06:52:49 1761634369

That's great for rote work, simple CRUD, and other things where you already know how the code should work so you can write a test first. Not all programming works well that way. I often have a goal I want to achieve, but no clue exactly how to get there at first. It takes quite a lot of experimentation, iteration and refinement before I have anything worth testing - and I've been programming 40+ years, so it's not because I don't know what I'm doing.

Retric · 2025-10-28T20:17:46 1761682666

Not every approach works for every problem, still we’re all writing a lot of straightforward code over our careers. I also find longer term projects eventually favor TDD style coding as over time unknown unknowns get filled in.

Your edge case depends on the kind of experimentation you’re doing. I sometimes treat CSS as kind of black magic and just look for the right incantation that happens to work across a bunch of browsers. It’s not efficient, but I’m ok punting because I don’t have the time to become an expert on everything.

On the other hand when looking for an efficient algorithm or optimization I likely to know what kind of results I’m looking for at some stage before creating the relevant code. In such cases tests help clarify what exactly the mysterious code needs to do so in a few hours to weeks later when inspiration hits you haven’t forgotten any relevant details. I might have gone in a wildly different direction, but as long as I consider why each test was made before deleting it the process of drilling down into the details has value.

lisbbb · 2025-10-28T13:48:29 1761659309

I don't want to insult you, but I had to re-program myself in order to accept TDD and newer processes and there are a lot of systems out there that weren't written with testability in mind and are very difficult to deal with as a result. You are describing a prototype-until-you-reach-done type of approach, which is how we ended up with so much untestable code. My take is that you do a PoC, then throw it out and write the real application. "Build one to throw away" as Brooks said back in 1975.

I get where you're coming from, because I'm about a decade behind you, but resisting change is not a good look. I feel the same way about all this vibe coding and junk--don't really think it's a good idea, but there it is. Get used to being wrong about everything.

leptons · 2025-10-28T21:13:49 1761686029

>but resisting change is not a good look

Your condescending attitude is not a good look. You don't know me at all.

lisbbb · 2025-10-28T13:42:47 1761658967

It's as matter of practice. The major problem is that business folks don't even know how to produce a testable spec, they just give you some vague idea about what it is they want and you're supposed to produce a PoC and show it to them so they can refine their idea. If you go and produce a bunch of tests based on what they asked for, but no working code, you're getting fired. The whole process is on its head because we don't have solid engineering minds in most roles, we have people with liberal arts degrees faking it until they make it.

There were a few places I worked that TDD actually succeeded because the project was fairly well baked and the requirements that came it could be understood. That was the exception, not the rule.

Ekaros · 2025-10-28T07:00:50 1761634850

I am not really sure if TDD often is compatible with modern agile development. It lends well to more waterfall style. Or clearly defined systems.

If you can design fully what your system does before starting it is more reasonable. And often that means going down to level of are inputs and states. Think more of something like control systems for say mobile networks or planes or factory control. You could design whole operation and all states that should happen or could happen before single line of code.

Retric · 2025-10-28T20:25:15 1761683115

TDD operates at a vastly smaller scale. You don’t write every single test for the entire project before writing a single line of code.

Write some tests for a non trivial function before creating the function and the entire cycle might take as little as 20 minutes.

lisbbb · 2025-10-28T13:49:50 1761659390

There is no relationship between agile/waterfall and TDD. Same as there is no relationship to pair programming and agile/waterfall, either.

embedding-shape · 2025-10-28T12:36:54 1761655014

> The intended workflow of TDD is to write a set of tests before some code. The only reason that makes sense conceptually is to prevent possible future bugs from going undetected.

Again, I don't do that for correctness, I do it because it's faster than not having something to work against, that you can run with one command that tells you "Yup, you did the thing!" or "Nope, not there yet". When I don't do TDD, I'm slower, because I have to manually verify things and sometimes there are regressions.

Catching these things and automating the process is what makes (for me) TDD worth it.

> Put another way if your TDD always pass then there’s no point in writing them

Uuh, no one said this?

I'm not sure where people got the idea that TDD is this very strict "one way and one way only", the core idea is that your work gets easier to do, if it doesn't, then you're doing it wrong, probably following the rules too tightly.

We don't have to be so dogmatic about any methodologies out there, everything has tradeoffs, chose wisely.

enraged_camel · 2025-10-28T14:17:55 1761661075

>> After all, bugs are unexpected and you can’t write tests for a bug you don’t expect.

Ironically, AI can. In my experience it is extremely good at thinking about edge cases and writing tests to defend against them.

tcmart14 · 2025-10-28T04:37:46 1761626266

While TDD can have some merits, I think this is being way to generous to the value of tests. As Dijkstra said once, "Testing shows the presence, not the absence of bugs." I'm not a devout follower of Uncle Bob, but I was just thumbing through Clean Architecture today and he has a whole section to this point (including the above quote). Right after that quote he writes, "a program can be proven incorrect by a test, but it can not be proven correct." Which is largely true. The only garuntee of TDD is you can show a set of behaviors your program doesn't do, it never proves what the program actually does. To extrapolate to here, all TDD does it put up guardrails for the the AI should not generate.

astahlx · 2025-10-28T05:43:52 1761630232

It depends on how you define testing now: Property-based testing would test sets of behaviors. The main idea is: Formalize your goal before implementing. So specification driven development would be the thing to aim for. And at some point we might be able to model check (proof) the code that has been generated. Then we are the good old idea of code synthesis.

AstralStorm · 2025-10-28T06:59:28 1761634768

Don't worry, you're going to be searching for logic vs requirements mismatches instead if the thing provides proofs.

That means, you have to understand if it is even proving the properties you require for the software to work.

It's very easy to write a proof akin to a test that does not test anything useful...

practal · 2025-10-28T08:58:55 1761641935

No, that misunderstands what a proof is. It is very easy to write a SPEC that does not specify anything useful. A proof does exactly what it is supposed to do.

svieira · 2025-10-28T15:51:09 1761666669

No, a proof proves what it proves. It does not prove what the designer of the proof intended it to prove unless the intention and the proof align. Proving that is outside of the realm of software.

practal · 2025-11-03T09:57:56 1762163876

Yes, indeed, a proof proves what it proves.

You confuse spec and proof.

MoreQARespect · 2025-10-28T08:59:44 1761641984

The reason why property testing isnt used that much is because it is useful at catching tests only for a specific type of code which most people arent writing.

9rx · 2025-10-28T13:43:51 1761659031

I'm not sure that's true. In essence, property tests are a method for defining types where a language lacks natural expression. In a vacuum, nearly all code could benefit from (more advanced) types. But:

1. Tradeoffs, as always. The more advanced typing you head towards, the much more time consuming it becomes to reason about the program. There is good reason for why even the most staunch type advocates rarely push for anything more advanced than monads. A handful of assertive tests is usually good enough, while requiring significantly less effort.

2. Not just time consuming, but often beyond comprehension. Most developers just don't know how to think in terms of formal proofs. Throw a language with an advanced type system, like Coq or Idris, in from of them and they wouldn't have a clue what to do with it (even ignoring the unfamiliar syntax). And with property tests, now you're asking them to not only think in advanced types, but to also effectively define the types themselves from scratch. Despite #1, I fully expect we would still see more property testing if it weren't for this huge impediment.

MoreQARespect · 2025-10-28T15:01:11 1761663671

>Most developers just don't know how to think in terms of formal proofs

Formal proofs are useful on the same class of bug property tests are.

And vice versa.

The issue isnt necessarily that devs cant use them, it's that the problems they have which cause most bugs do not map on to the space of "what formal proofs are good at".

9rx · 2025-10-28T21:00:05 1761685205

What do you consider to be the source of most bugs?

thesz · 2025-10-28T06:14:31 1761632071

  > You give the AI a set of requirements, ie. tests that need to pass, and then let it code whatever way it needs to in order to fulfill those requirements.

SQLite has tests-lines-to-code-lines ratio above 1000 (yes, 1000 lines of tests for single line of code) and still has bugs.

AMD, at the time it decided to apply ACL2 to its FPU, had 29 million tests (not lines of code, but test inputs and outputs). ACL2 verification found several bugs in the FPU.

Just to make a couple of points for someone to draw a line.

pjmlp · 2025-10-28T08:08:03 1761638883

Try to do TDD with graphics programming.

I never bought into TDD because it is only usefull for business logic, plain algorithms and data structures, it is no accident that is what 99% of conference talks and books focus on.

There isn't a single TDD talk about shader programming for GPGPU, and validating that what the shader algorithms produce via automated tests, the reason being the amount of enginneering effort only to make it work, and still lacks human sensitivity for what gets rendered.

mettamage · 2025-10-28T08:46:49 1761641209

Relatable. Every time I read something about testing it seems backend web dev related. Of course, that’s great but what about the rest?

MoreQARespect · 2025-10-28T08:35:56 1761640556

I have. I call it snapshot test driven development. You put the preconditions in, generate and record the graphics as an artefact at runtime and when it looks right, freeze it.

pjmlp · 2025-10-28T09:07:17 1761642437

But that isn't TDD, no line of code should be written without broken tests.

MoreQARespect · 2025-10-28T09:24:52 1761643492

Yes it is. Until the artefact which has been visually validated is locked in it is still a broken test.

You can argue semantics until you're blue in the face it still follows red-green-refactor and it confers the same benefits as TDD.

brazukadev · 2025-10-28T10:13:20 1761646400

Your nickname tells me you are not talking bs.

BobbyTables2 · 2025-10-28T03:33:51 1761622431

The problem is — nobody commits code that fails tests.

The bugs occur because the initial tests didn’t fully capture the desired and undesired behaviors.

I’ve never seen a formal list of software requirements state that a product cannot take more than an hour to do a (trivial) operation. Nobody writes that out because it’s implicitly understood.

Imagine writing a “life for dummies” textbook on how to grow from a 5yr old to 10yr old. It’s impossible to fully cover.

rob_c · 2025-10-28T08:21:19 1761639679

> The problem is — nobody commits code that fails tests.

Hah, if that were true the industry would be a better place. Or a worse place. Or a slower place but exactly the same. I should build a test for that...

I've worked on many projects where tests get disabled as nobody can tell why it's failing (or why it was even written in some cases).

I've rewritten test systems from scratch in the past to drag projects out of the dumpster fire by getting them into a state of passing simple startup/shutdown safely routines and then watched as I pass the project onto others how it rots until some "genius" young coder comes along and "removes the slow test-suite because it takes 2hr+ to run on my way out of spec laptop".

heavyset_go · 2025-10-28T03:24:07 1761621847

The code always matters. Black box coding like this leads to systems you can't explain, and that's your whole damn job: to understand the system you're building. Anything less is negligence.

fainpul · 2025-10-28T08:18:52 1761639532

No.

TDD combined with vibe-coding can create code that has unwanted side-effects, because your tests only check the result. It can also have various security vulnerabilities, which you don't test for, because how would you know what to test. It can also lead to massive duplication and code bloat, while tests still pass. It can lead to software which wastes a lot of resources (memory, cpu, inefficient network requests and the like) due to bad algorithms. If you try to keep that in check by writing performance tests, how do you know what acceptable performance is, if you have no idea how your program works?

CuriouslyC · 2025-10-28T11:12:57 1761649977

TDD doesn't solve those problems for human code either. That's why every org has several security scanners that most engineers ignore unless you hard gate them, linting, code duplication detection, etc.

Also, you can give AI a SLO for code and fail stress tests that don't meet it. AI will happily respond to a failing stress test with profiling and well thought out optimizations in many cases.

skywhopper · 2025-10-28T13:08:57 1761656937

Who is arguing that TDD solves those problems with human coders?

tartoran · 2025-10-28T04:52:27 1761627147

If the code doesn't matter anymore, in order of it to be of any quality the test should be as detailed as was the code in the first place, you'd end up writing the code in tests more or less.

HellDunkel · 2025-10-27T23:02:08 1761606128

No.

The reason AI code generation works so well is a) it is text based- the training data is huge and b) the output is not the final result but a human readable blueprint (source code), ready to be made fit by a human who can form an abstract idea of the whole in his head. The final product is the compiled machine code, we use compilers to do that, not LLMs.

Ai genereted code is not suitable to be directly transferred to the final product awaiting validation by TDD, it would simply be very inefficient to do so.

epicureanideal · 2025-10-28T02:52:25 1761619945

TDD doesn’t ensure the code is maintainable, extendable, follows best practices, etc, and while AI might write some code that can pass tests while the code is relatively small, I would expect in the long run it will find it extremely difficult to just “rewrite everything based on this set of new requirements” and then do that again, and again, and again, each time potentially choosing entirely different architectures for the solution.

lkjdsklf · 2025-10-28T03:24:32 1761621872

> TDD doesn’t ensure the code is maintainable, extendable, follows best practices, etc, and while AI might write some code

None of that matters of its not a person writing the code

sarchertech · 2025-10-28T03:59:38 1761623978

AI has a hard time working with code that humans would consider hard to maintain and hard to extend.

If you give AI a set of tests to pass and turn it loose with no oversight, it will happily spit out 500k LOC when 500 would do. And then it will have a very hard time when you ask it to add some functionality.

AI routinely writes code that is beyond its ability to maintain and extend. It can’t just one shot large code bases either, so any attempt to “regenerate the code” is going to run into these same issues.

jjav · 2025-10-28T08:04:46 1761638686

> If you give AI a set of tests to pass and turn it loose with no oversight, it will happily spit out 500k LOC when 500 would do. And then it will have a very hard time when you ask it to add some functionality.

I've been playing around with getting the AI to write a program, where I pretend I don't know anything about coding, only giving it scenarios that need to work in a specific way. The program is about financial planning and tax computations.

I recently discovered AI had implemented four different tax predictions to meet different scenarios. All of them incompatible and all incorrect but able to pass the specific test scenarios because it hardcoded which one to use for which test.

This is the kind of mess I'm seeing in the code when AI is left alone to just meet requirements without any oversight on the code itself.

rvz · 2025-10-28T00:42:48 1761612168

> We will need to make sure the test cases are accurate and describe what the AI needs to generate, but that's it.

Yes. The first thing I always check in every project (an especially vibe-coded projects) is whether if:

A. Does it have tests?

B. Is the coverage over 70%?

C. Do the tests actually test for the behaviour of the code (good) or just its implementation (bad.)

If any of those requirements are missing, then that is a red flag for the project.

While TDD is absolutely valuable for clean code, focusing too much on it can be the death of a startup.

As you said the code itself is $0, then the first product is still worth $10 and the finished product is worth $1M+ once it makes money, which is what matters.

lisbbb · 2025-10-28T13:38:28 1761658708

Why did you think TDD was garbage? Formalizing a specification is all that test first is. It's just that most devs I know had big egos and believing writing tests was somehow below them. I prefer the "build a little, test a little" approach, personally, but there's nothing inherently wrong with TDD.

My prediction is that in the future, a lot of desperate companies are going to need living, breathing reverse software engineers to aid them because they have lost the ability to understand their own codebases.

Oh, and why is code worth $0? A lot of code is throwaway, but I still got paid to produce it and much of it makes money for the company or saves them money.

blibble · 2025-10-27T22:56:07 1761605767

you will end up with something that passes all your tests then smashes into the back of the lorry the moment it sees anything unexpected

writing comprehensive tests is harder than writing the code

throwaway7783 · 2025-10-28T03:30:09 1761622209

AI can help here too, by exploding the spec into a series of questions to clarify behavior.

Today, it just does something and when corrected it says "You are right!....".

reenorap · 2025-10-27T22:58:12 1761605892

Then you write another test. That's the whole point of TDD. As you keep writing more tests, the closer it gets to its final form.

BobbyTables2 · 2025-10-28T03:38:54 1761622734

Have you ever seen someone carve the inverse of a statue from a solid block of stone? If so, they are doing TDD.

Yeah, me neither…

topaz0 · 2025-10-28T03:08:42 1761620922

The idea of TDD is that you should have the tests before you have the code. If your code is failing in real life before you have the tests, that's no longer TDD.

blibble · 2025-10-27T23:06:59 1761606419

right, and by the time I have 2^googolplex tests then the "AI" will finally be able to produce a correctly operating hello world

oh no! another bug!

3vidence · 2025-10-28T02:41:59 1761619319

I've definitely seen a number of files where the implementation is maybe like 500 LOC and the test file is 10000+ LOC.

I agree rigidly defining exactly what the code does through tests is harder than people think.

halfcat · 2025-10-28T14:24:19 1761661459

I could not disagree more strongly with everything you’ve said in this comment.

> The way to code going forward with AI is Test Driven Development.

No. TDD already collapses under its own weight as a project grows.

> The code itself no longer matters.

No. Definitely no. That’s absurd. You can’t box in a correct solution with guard rails. Especially since, even if you could get something close to that, you would also lose the ability to understand the tests.

> You give the AI a set of requirements, ie. tests that need to pass, and then let it code whatever way it needs to in order to fulfill those requirements. That's it. The new reality us programmers need to face is that code itself has an exact value of $0.

No. The opposite. When code is cheap, understanding and control become expensive. Code a human can understand will be the most valuable going forward.

> That's because AI can generate it, and with every new iteration of the AI, the internal code will get better.

No. All code is technical debt. AI produces code faster. Therefore AI produces bugs faster.

”Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it” -Brian Kernighan

This is literally where we’re at. AI writes code just beyond its ability to fix.

> What matters now are the prompts.

No. This is such a dead end. It’s a roll of the dice, and so we have examples of people who seem to get it to build something faster. That’s like saying there are people who win the lottery. It’s true, and it also says nothing of your ability to repeat their process. Confirmation bias of the wins. But in building something reliable, we care more about the floor (minimum quality) than the ceiling (the peak it can reach sometimes).

zeroq · 2025-10-28T04:27:04 1761625624

> code itself has an exact value of $0. That's because AI can generate it

That's only true for problems that has been solved and well documented before. AI can't solve novel problems. I have ton of examples I use from time to time when new models come out. I've tried to ride the hype train, and I've been frustrated working with people before, but I've never been so frustrated as trying to make AI follow simple set of rules and getting:

  "Oh yes, my bad, I get that now. Black is white and white is black. Let me rewrite the code..."

My favorite example is tasked AI with a rudimentary task and it gave me a working answer but it was fishy, so I googled the answer and lo and behold I landed on stackoverflow page with exact same answer being top voted answer to question very similar to my task. But that answer also had a ton of comments explaining why you never should do it that way.

I've been told many times that "you know, kubernetes is so complicated, but I tell AI what I want and it gives me a command I simply paste in my terminal". Fuck no.

AI is great for scaffolding projects, working with typical web apps where you have repeatable, well documented scenarios, etc.

But it's not a silver bullet.

tmoertel · 2025-10-28T11:48:45 1761652125

Show me the TDD tests you would use to show that your AI-generated code isn't creating security vulnerabilities.

Yhippa · 2025-10-28T04:05:51 1761624351

I remember talking about this with a friend a long time ago. Basically, you'd write up tests and there was a magic engine that would generate code that would self-assemble and pass tests. There was no guarantee that the code would look good or be efficient--just that it passed the tests.

We had no clue that this could actually happen one day in the form of gen AI. I want to agree with you just to prove that I was right!

This is going to bring up a huge issue though: nailing requirements. Because of the nature of this, you're going to have to spec out everything in great detail to avoid edge cases. At that point, will the juice be worth the squeeze? Maybe. It feels like good businesses are thorough with those kinds of requirements.

jcgrillo · 2025-10-28T12:27:24 1761654444

How would you handle production incidents in such a codebase? The primary focus of a software engineer is to make the codebase easy (or at least possible) to understand. To tame complexity while achieving some business objectives. If we're going to just throw that part out the window you need to have a plan for how to operate the resultant mess in production.

pcarolan · 2025-10-27T22:46:59 1761605219

I mostly agree, but why stop at tests? Shouldn’t it be spec driven development? Then neither the code or the language matter. Wouldn’t user stories and requirements à la bdd (see cucumber) be the right abstraction?

int_19h · 2025-10-28T04:26:14 1761625574

Natural language is too ambiguous for this, which makes it impossible to automatically verify

What you need is indeed spec-driven development, but specs need to be written in some kind of language that allows for more formal verification. Something like https://en.wikipedia.org/wiki/Design_by_contract, basically.

It is extremely ironic that, instead, the two languages that LLMs are the most proficient in - and thus the ones most heavily used for AI coding - are JavaScript and Python...

__MatrixMan__ · 2025-10-27T22:58:26 1761605906

Maybe one day. I find myself doing plenty of course correction at the test level. Safely zooming out doesn't feel imminent.

reenorap · 2025-10-27T22:52:57 1761605577

I don't think you're wrong but I feel like there's a big bridge between the spec and the code. I think the tests are the part that will be able to give the AI enough context to "get it right" quicker.

It's sort of like a director telling an AI the high level plot of a movie, vs giving an AI the actual storyboards. The storyboards will better capture the vision of the director vs just a high level plot description, in my opinion.

gmd63 · 2025-10-27T22:59:15 1761605955

Why stop there? Whichever shareholders flood the datacenter with the most electrical signals get the most profits.

baq · 2025-10-29T07:12:59 1761721979

> The new reality us programmers need to face is that code itself has an exact value of $0.

This is not new at all. Code has always been a liability. It having $0 value would be a great improvement IMHO.

The value was always in the product regardless of the amount of code in it and regardless of its quality. Customers don’t buy code. (Except of course when the code is the product, which is very unusual nowadays.)

jcgrillo · 2025-10-28T12:15:33 1761653733

IME devs actually do precisely the opposite. They write code and then ask the LLM to do the "boring" part and write the tests for them.

dboreham · 2025-10-28T13:26:50 1761658010

Tests are also code and can be buggy, incomplete etc.

MarcelOlsz · 2025-10-28T08:43:04 1761640984

I've been experimenting with various TDD methods with AI and it cannot do frontend work. Frontend has too many ancient illogical incantations and ways of doing things that it has no clarity on, you have to handhold it every step of the way. When I let AI go off the rails and build a frontend it's an absolute mess and it frequently chooses the hardest and dumbest way to do things. Stellar for low surface-area work though.

Once AI has cheap real-time eyes it might get slightly better, but all the logs and browser MCP tools and yadda yadda in the world will not get it to produce anything remotely efficient.

mettamage · 2025-10-28T08:48:18 1761641298

You can have eyes by pasting in screenshots. So you could write tests that create a screenshot and send it to an llm if it doesn’t match the output.

MarcelOlsz · 2025-10-28T08:52:14 1761641534

Been there done that lol. It needs real-time extremely badly. If I wanted to write English instead of code I'd have been a writer instead. It will nudge pixels but it will not take in the myriad of reasons that button is the way that it is and solve it in any meaningful way. Decent for MVPing with stuff like shadcn/tailwind but falls apart with anything else.

rob_c · 2025-10-28T08:17:00 1761639420

> The code itself no longer matters.

Good luck explaining that when you get hacked out of oblivion.

This is like saying the fine-print of contracts don't matter so I get "AI" to regurgitate them all for me as a lawyer. It's so wrong as to be beyond laughable.

Put the coffee down and go for a walk, preferably to a library, and LEARN SOMETHING.

lifeformed · 2025-10-28T04:26:14 1761625574

Not everything can be tested by a computer.

RayVR · 2025-10-28T18:10:05 1761675005

I feel terrible for anyone relying on anything you produce as a proompt engineer

android521 · 2025-10-28T05:39:43 1761629983

This is wrong in so many ways. Have you even tried what you believe? If you have tried, you would find out it is nonsense quickly.

heavyset_go · 2025-10-28T06:09:51 1761631791

The irony is that I tried this with a project I've been meaning to bang out for years, and I think the OP's idea a natural thought to have when working with LLMs: "what if TTD but with LLMs"

When I tried it, it "worked", I admittedly felt really good about it, but I stepped away for a few weeks because of life and now I can't tell you how it works beyond the high level concepts I fed into the LLM.

When there's bugs, I basically have to derive from first principles where/how/why the bug happens instead of having good intuition on where the problem lies because I read/wrote/reviewed/integrated with the code myself.

I've tried this method of development with various levels of involvement in implementation itself and the conclusion I came to is if I didn't write the code, it isn't "mine" in every sense of the term, not just in terms of legal or moral ownership, but also in the sense of having a full mental model of the code in a way I can intellectually and intuitively own it.

Really digging into the tests and code, there are fundamental misunderstandings that are very, very hard to discern when doing the whole agent interfacing loop. I believe they're the types of errors you'd only pick up on if you wrote the code yourself, you have to be in that headspace to see the problem.

Also, I'd be embarrassed to put my name on the project, given my lack of implementation, understanding and the overall quality of the code, tests, architecture, etc. It isn't honest and it's clearly AI slop.

It did make me feel really productive and clever while doing it, though.

svieira · 2025-10-28T15:55:58 1761666958

> It did make me feel really productive and clever while doing it, though.

And that's the greatest trap of this whole thing. That the _feels_ are so quickly diverged from the actual.