The SVG created for the first prompt is valid but is a garbage image.

bentcorner · on Dec 16, 2024

In general I've had poor results with LLMs generating pictures using text instructions (in my case I've tried to get them to generate pictures using plots in KQL). They work but the pictures are very very basic.

I'd be interested for any LLM emitting any kind of text-to-picture instructions to get results that are beyond a kindergartner-cardboard-cutout levels of art.

pizza · on Dec 16, 2024

I do with Claude: https://news.ycombinator.com/item?id=42351796#42355665

mycall · on Dec 18, 2024

I've had success with LLMs producing mermaid.js or three.js output but that is a different use case.

simonw · on Dec 16, 2024

That's why I use the SVG pelican riding a bicycle thing as a benchmark: it's a deliberately absurd and extremely difficult task.

accrual · on Dec 16, 2024

Appreciate your rapid analysis of new models, Simon. Have any models you've tested performed well on the pelican SVG task?

simonw · on Dec 16, 2024

gemini-exp-1206 is my new favorite: https://simonwillison.net/2024/Dec/6/gemini-exp-1206/

Claude 3.5 Sonnet is in second place: https://github.com/simonw/pelican-bicycle?tab=readme-ov-file...

carbocation · on Dec 22, 2024

The gemini result is great. I modified your prompt to encourage more detail ("Generate an SVG of a pelican riding a bicycle. The degree of detail should be surprisingly high and should spark delight for the viewer.")

This is what o1-pro yielded: https://gist.github.com/carbocation/8d780ad4c3312693ca9a43c6...

accrual · on Dec 16, 2024

The Gemini result is quite impressive, thanks for sharing these!

codedokode · on Dec 16, 2024

They probably trained it for this specific task (generating SVG images), right?

simonw · on Dec 16, 2024

I'm hoping that nobody has deliberately trained on SVG images of pelicans riding bicycles yet.

Teever · on Dec 16, 2024

I'm really glad that I see someone else doing something similar. I had the epiphany a while ago that if LLMs can interpret textual instructions to draw a picture and output the design in another textual format that this a strong indicator that they're more than just stochastic parrots.

My personal test has been "A horse eating apples next to a tree" but the deliberate absurdity of your example is a much more useful test.

Do you know if this is a recognized technique that people use to study LLMs?

simonw · on Dec 16, 2024

I've seen people using "draw a unicorn using tikz" https://adamkdean.co.uk/posts/gpt-unicorn-a-daily-exploratio...

int_19h · on Dec 16, 2024

I did some experiments of my own after this paper, but letting GPT-4 run wild, picking its own scene. It wanted to draw a boat on a lake, and I also asked it to throw in some JS animations, so it made the sun set:

https://int19h.org/chatgpt/lakeside/index.html

One interesting thing that I found out while doing this is that if you ask GPT-4 to produce SVG suitable for use in HTML, it will often just generate base64-encoded data: URIs directly. Which do contain valid SVG inside as requested.

girvo · on Dec 16, 2024

That came, IIRC, from one of the OpenAI or Microsoft people (Sebastian Bubeck); it was recounted in an NPR podcast "Greetings from Earth"

https://www.thisamericanlife.org/803/transcript

krackers · on Dec 16, 2024

It's in this presentation https://www.youtube.com/watch?v=qbIk7-JPB2c

The most significant part I took away is that when safety "alignment" was done the ability plummeted. So that really makes me wonder how much better these models would be if they weren't lobotomized to prevent them from saying bad words.

MyFirstSass · on Dec 16, 2024

But how will that prove that it's more than a stochastic parrot, honestly curious?

Isn't it just like any kind of conversion or translation? Ie. a relationship mapping between diffrent domains and just as much parroting "known" paths between parts of different domains?

If "sun" is associated with "round", "up high", "yellow","heat" in english that will map to those things in SVG or in whatever bizarre format you throw at with relatively isomorphic paths existing there just knitted together as a different metamorphosis or cluster of nodes.

On a tangent it's interesting what constitutes the heaviest nodes in the data, how shared is "yellow" or "up high" between different domains, and what is above and below them hierarchically weight-wise. Is there a heaviest "thing in the entire dataset"?

If you dump a heatmap of a description of the sun and an SVG of a sun - of the neuron / axon like cloud of data in some model - would it look similar in some way?

sabbaticaldev · on Dec 16, 2024

that’s a huge stretch for parroting

memhole · on Dec 16, 2024

Not sure if this counts. I recently went from description of a screenshot of graph to generate pandas code and plot from description. Conceptually it was accurate.

I don’t think it reflects any understanding. But to go from screenshot to conceptually accurate and working code was impressive.

chen_dev · on Dec 16, 2024

Amazon Nova models:

https://gist.github.com/uschen/38fc65fa7e43f5765a584c6cd24e1...

simonw · on Dec 16, 2024

Yeah, it didn't do very well on that one. The best I've had from a local model there was from QwQ: https://simonwillison.net/2024/Nov/27/qwq/

refulgentis · on Dec 16, 2024

For context, pelican riding a bicycle: https://imgur.com/a/2nhm0XM

Copied SVG from gist into figma, added dark gray #444444 background, exported as PNG 1x.