RLHF Is Cr*P, It's a Paint Job on a Rusty Car: Geoffrey Hinton

lblume · 2025-03-31T16:54:32 1743440072

Of course it has fundamental problems. Likely they are more prevalent then in other techniques, regarding LLM training or otherwise.

But one also needs to acknowledge how immensely useful and powerful the resulting models have become, achieving results pure supervised fine-tuning could likely not have achieved at all. Whether fundamental alignment of pre-trained models is even possible is a very different question.

adt · 2025-03-31T20:05:41 1743451541

Related: 'Fine-tuning on human preferences is a fool’s errand'

https://lifearchitect.ai/alignment/

Terr_ · 2025-03-31T16:42:58 1743439378

> Hinton said in a recent interview. “You design a huge piece of software that has gazillions of bugs in it. And then you say what I’m going to do is I’m going to go through and try and block each and put a finger in each hole in the dyke,” he said.

While I agree with the underlying sentiment, that transcription, uh... I change that to "dike".

Anyway, it seems while we can round off a lot of rough edges from an LLM, it's fundamentally so much of a dream-machine that there's always something else, a never-ending game of whack-a-mole.

It gets even worse when it comes to malicious or adversarial stuff, and not just "ensure it doesn't say generate a rude word most of the time."

kjellsbells · 2025-03-31T18:55:53 1743447353

> uh... I change that

The variant spellings are all standard English, and, notably, in the British English that Hinton speaks, dyke is a common spelling for a levee built against water. As in, Offa's Dyke, a large earthwork in the UK.

xeonmc · 2025-03-31T19:10:03 1743448203

See also: Dick Van Dyke, which is another example of a perfectly normal usage without any possibilities of euphemistic interpretation.

Terr_ · 2025-03-31T20:22:54 1743452574

As lampooned in Austin Powers: https://www.youtube.com/watch?v=CpiP_jN1Pv4