Would be interesting to see an in depth breakdown on a project that has went through the vibe code to cleanup pipeline in full. Or even just a 'heavy LLM usage' to 'cleanup needed' process. So, if the commits were tagged as LLM vs human written similar to how it's done for Aider[0]: At which point does the LLM capability start to drop off a cliff, which parts of the code needed the most drastic refactors shortly after?
* come up with requirements for a non-obvious system
* try to vibe-code it
* clean it up manually
* add detailed description and comparisons of before and after; especially, was it faster or slower than just writing everything manually in the first place?
That's also my intuition, but I would like to test and measure it against a real and non-obvious system use case; some day I will and will write about it :)
[0]: https://aider.chat/HISTORY.html