Most image models are diffusion models, not LLMs, and have a bunch of other idio...

Gormo · 2025-11-17T19:11:01 1763406661

But the clocks in this demo aren't images.

phire · 2025-11-18T00:25:44 1763425544

Yes, but they are reasoning within their dataset, which will contain multiple example of html+css clocks.

They are just struggling to produce good results because they are language models and don’t have great spatial reasoning skills, because they are language models.

Their output normally has all the elements, just not in the right place/shape/orientation.