LLMs can't "look" at the rendered HTML output to see if what they generated make...

firtoz · 2025-11-14T21:08:21 1763154501

Cursor has this with their "browser" function for web dev, quite useful

You can also give it a mcp setup that it can send a screenshot to the conversation, though unsure if anyone made an easy enough "take screenshot of a specific window id" kind of mcp, so may need to be built first

I guess you could also ask it to build that mcp for you...

EMM_386 · 2025-11-14T21:42:52 1763156572

You can absolutely do this. In fact, with Claude Anthropic encourages you to send it screenshots. It works very well if you aren't expecting pixel-perfection.

YMMV with other models but Sonnet 4.5 is good with things like this - writing the code, "seeing" the output and then iterating on it.

pil0u · 2025-11-14T21:19:32 1763155172

I had some success providing screenshots to Cursor directly. It worked well for web UIs as well as generated graphs in Python. It makes them a bit less blind, though I feel more iterations are required.

fragmede · 2025-11-14T21:18:33 1763155113

Claude totally can, same with ChatGPT. Upload a picture to either one of them via the app and tell it there's no line where there should be. There’s some plumbing involved to get it to work in Claude code or codex, but yes, computers can "see". If you have lm-server, there's tons of non-text models you can point your code at.

TheKidCoder · 2025-11-14T21:03:58 1763154238

Kinda - Hand waiving over the question of if an LLM can really "look" but you can connect Cursor to a Puppeteer MCP server which will allow it to iterate with "eyes" by using Puppeteer to screenshot it's own output. Still has issues, but it does solve really silly mistakes often simply by having this MCP available.