I think the key part if that advice is the without evidence bit:
> I suggest not thinking of switching model as the main axes of how to improve your system off the bat without evidence.
If you try to fix problems by switching from eg Gemini 2.5 Flash to OpenAI o3 but you don't have any evals in place how will you tell if the model switch actually helped?
This is not ideal, but it's pragmatic - or at least was, for the last two years - since new models showed large improvements across the board. If your main problem was capability, not cost, then switching was an easy win - from GPT-3.5 to GPT-4, from GPT-4 to say Sonnet 3.5, to Gemini 2.5 Pro, now to Opus (if you can afford it); or from Sonnet 3.5 to Deepseek-R1, to o3 (and that doesn't even consider multi-model solutions). The jump in capability was usually quite apparent.
Of course, Hamel is right too. In the long run, people will need to take more scientific approach. They already do, if inference costs are the main concern.
> I suggest not thinking of switching model as the main axes of how to improve your system off the bat without evidence.
If you try to fix problems by switching from eg Gemini 2.5 Flash to OpenAI o3 but you don't have any evals in place how will you tell if the model switch actually helped?