I disagree, Codex always gets stuck and wants to double check and clarify things...

ewoodrich · 2025-10-20T22:54:43 1761000883

I've been dealing with this on Codex a lot lately. It confidently wraps up a task, I go to check it's work... and it's not even close.

Then I do a double take and re-read the summary message and realize that it pulled a "and then draw the rest of the owl", seemingly arbitrarily picking and choosing what it felt like doing in that session and what it punted over to "next steps to actually get it running".

Claude is more prone to occasional "cheating" with mocked data or "tbd: make this an actual conditional instead of hardcoded If True" stuff when it gets overwhelmed which is annoying and bad. But it at least has strong task adherence for the user's prompt and doesn't make me write a lawyer-esque contract to avoid any loopholes Codex will use to avoid doing work.

aaronblohowiak · 2025-10-20T23:45:18 1761003918

Are you using something like spec-kit?