Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The SVG created for the first prompt is valid but is a garbage image.


In general I've had poor results with LLMs generating pictures using text instructions (in my case I've tried to get them to generate pictures using plots in KQL). They work but the pictures are very very basic.

I'd be interested for any LLM emitting any kind of text-to-picture instructions to get results that are beyond a kindergartner-cardboard-cutout levels of art.



I've had success with LLMs producing mermaid.js or three.js output but that is a different use case.


That's why I use the SVG pelican riding a bicycle thing as a benchmark: it's a deliberately absurd and extremely difficult task.


Appreciate your rapid analysis of new models, Simon. Have any models you've tested performed well on the pelican SVG task?



The gemini result is great. I modified your prompt to encourage more detail ("Generate an SVG of a pelican riding a bicycle. The degree of detail should be surprisingly high and should spark delight for the viewer.")

This is what o1-pro yielded: https://gist.github.com/carbocation/8d780ad4c3312693ca9a43c6...


The Gemini result is quite impressive, thanks for sharing these!


They probably trained it for this specific task (generating SVG images), right?


I'm hoping that nobody has deliberately trained on SVG images of pelicans riding bicycles yet.


I'm really glad that I see someone else doing something similar. I had the epiphany a while ago that if LLMs can interpret textual instructions to draw a picture and output the design in another textual format that this a strong indicator that they're more than just stochastic parrots.

My personal test has been "A horse eating apples next to a tree" but the deliberate absurdity of your example is a much more useful test.

Do you know if this is a recognized technique that people use to study LLMs?


I've seen people using "draw a unicorn using tikz" https://adamkdean.co.uk/posts/gpt-unicorn-a-daily-exploratio...


I did some experiments of my own after this paper, but letting GPT-4 run wild, picking its own scene. It wanted to draw a boat on a lake, and I also asked it to throw in some JS animations, so it made the sun set:

https://int19h.org/chatgpt/lakeside/index.html

One interesting thing that I found out while doing this is that if you ask GPT-4 to produce SVG suitable for use in HTML, it will often just generate base64-encoded data: URIs directly. Which do contain valid SVG inside as requested.


That came, IIRC, from one of the OpenAI or Microsoft people (Sebastian Bubeck); it was recounted in an NPR podcast "Greetings from Earth"

https://www.thisamericanlife.org/803/transcript


It's in this presentation https://www.youtube.com/watch?v=qbIk7-JPB2c

The most significant part I took away is that when safety "alignment" was done the ability plummeted. So that really makes me wonder how much better these models would be if they weren't lobotomized to prevent them from saying bad words.


But how will that prove that it's more than a stochastic parrot, honestly curious?

Isn't it just like any kind of conversion or translation? Ie. a relationship mapping between diffrent domains and just as much parroting "known" paths between parts of different domains?

If "sun" is associated with "round", "up high", "yellow","heat" in english that will map to those things in SVG or in whatever bizarre format you throw at with relatively isomorphic paths existing there just knitted together as a different metamorphosis or cluster of nodes.

On a tangent it's interesting what constitutes the heaviest nodes in the data, how shared is "yellow" or "up high" between different domains, and what is above and below them hierarchically weight-wise. Is there a heaviest "thing in the entire dataset"?

If you dump a heatmap of a description of the sun and an SVG of a sun - of the neuron / axon like cloud of data in some model - would it look similar in some way?


that’s a huge stretch for parroting


Not sure if this counts. I recently went from description of a screenshot of graph to generate pandas code and plot from description. Conceptually it was accurate.

I don’t think it reflects any understanding. But to go from screenshot to conceptually accurate and working code was impressive.



Yeah, it didn't do very well on that one. The best I've had from a local model there was from QwQ: https://simonwillison.net/2024/Nov/27/qwq/


For context, pelican riding a bicycle: https://imgur.com/a/2nhm0XM

Copied SVG from gist into figma, added dark gray #444444 background, exported as PNG 1x.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: