Instead you have retreated to qualia like "well" and "sucks hard".
> hallucinating
Literally every human memory. They may seem tangible to you, but they're all in your head. The result of neurons behaving in ways which have directly inspired ML algorithms for nearly a century.
Further, history is rife with examples of humans learning from books and other written words. And also of humans thinking themselves special and unique in ways we are not.
> When using Claude Code or codex to write Swift code, I need to be very careful to provide all the APIs that are relevant in context (or let it web search), or garbage will be the result.
Yep. And humans often need to reference the documentation to get details right as well.
Unfortunately we can’t know at this point whether transformers really understand chess, or just go on a textual representation of good moves in their training data. They are pretty good players, but far from the quality of specialized chess bots. Can you please explain how we can discern that GPT-2 in this instance really built a model of the board?
Regarding qualia, that’s ok on HN.
Regarding humans - yes, humans also hallucinate. Sounds a bit like whataboutism in this context though.
> Can you please explain how we can discern that GPT-2 in this instance really built a model of the board?
Read the article. It's very clear. To quote it:
"Next, I wanted to see if my model could accurately track the state of the board. A quick overview of linear probes: We can take the internal activations of a model as it’s predicting the next token, and train a linear model to take the model’s activations as inputs and predict board state as output. Because a linear probe is very simple, we can have confidence that it reflects the model’s internal knowledge rather than the capacity of the probe itself."
Thanks for putting these sources together. It’s impressive that they got to this level of accuracy.
And is your argument now that an LLM can capture arbitrary state of the wider world as a general rule, eg pretending to be a Swift compiler (or LSP), without overfitting to that one task, making all other usages impossible?
> is your argument now that an LLM can capture arbitrary state of the wider world as a general rule, eg pretending to be a Swift compiler (or LSP), without overfitting to that one task, making all other usages impossible?
Overfitting happens, even in humans. Have you ever met a scientist?
My points have been only that 1: language encodes a symbolic model of the world, and 2: training on enough of it results in a representation of that model within the LLM.
Exhaustiveness and accuracy of that internal world model exist on a spectrum with many variables like model size, training corpus and regimen, etc. As is also the case with humans.
Instead you have retreated to qualia like "well" and "sucks hard".
> hallucinating
Literally every human memory. They may seem tangible to you, but they're all in your head. The result of neurons behaving in ways which have directly inspired ML algorithms for nearly a century.
Further, history is rife with examples of humans learning from books and other written words. And also of humans thinking themselves special and unique in ways we are not.
> When using Claude Code or codex to write Swift code, I need to be very careful to provide all the APIs that are relevant in context (or let it web search), or garbage will be the result.
Yep. And humans often need to reference the documentation to get details right as well.