It is really important that such posts exist. There is the risk that we only hear about the wild successes and never the failures. But from the failures we learn much more.
One difference between this story and the various success stories is that the latter all had comprehensive test suites as part of the source material that agents could use to gain feedback without human intervention. This doesn’t seem to exist in this case, which may simply be the deal breaker.
>> This doesn’t seem to exist in this case, which may simply be the deal breaker.
Perhaps, but perhaps not. The reason tests are valuable in these scenarios is they are actually a kind of system spec. LLMs can look at them to figure out how a system should (and should not) behave, and use that to guide the implementation.
I don’t see why regular specs (e.g. markdown files) could not serve the same purpose. Of course, most GitHub projects don’t include such files, but maybe that will change as time goes on.
One difference between this story and the various success stories is that the latter all had comprehensive test suites as part of the source material that agents could use to gain feedback without human intervention. This doesn’t seem to exist in this case, which may simply be the deal breaker.