Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

"Dark pattern" implies intentionality; that's not a technicality, it's the whole reason we have the term. This article is mostly about how sycophancy is an emergent property of LLMs. It's also 7 months old.


Well, the ‘intentionality’ is of the form of LLM creators wanting to maximize user engagement, and using engagement as the training goal.

The ‘dark patterns’ we see in other places aren’t intentional in the sense that the people behind them want to intentionally do harm to their customers, they are intentional in the sense that the people behind them have an outcome they want and follow whichever methods they find to get them that outcome.

Social media feeds have a ‘dark pattern’ to promote content that makes people angry, but the social media companies don’t have an intention to make people angry. They want people to use their site more, and they program their algorithms to promote content that has been demonstrated to drive more engagement. It is an emergent property that promoting content that has generated engagement ends up promoting anger inducing content.


Hold on, because what you're arguing is that OpenAI and Anthropic deploy dark patterns, and I have zero doubt that they do. I'm not saying OpenAI has clean hands. I'm saying that on this article's own terms, sycophancy isn't a "dark pattern"; it's a bad thing that happens to be an emergent property both of LLMs generally and, apparently, of RL in particular.

I'm standing up for the idea that not every "bad thing" is a "dark pattern"; the patterns are "dark" because their beneficiaries intentionally exploit the hidden nature of the pattern.


I guess it depends on your definition of "intentionally"... maybe I am giving people too much credit, but I have a feeling that dark patterns are used not because the implementers learn about them as transparently exploitive techniques and pursue them, but because the implementers are willfully ignorant and choose to chase results without examining the costs (and ignoring the costs when they do learn about them). I am not saying this morally excuses the behavior, but I think it does mean it is not that different than what is happening with LLMs. Just as choosing an innocuous seeming rule like "if a social media post generates a lot of comments, show it to more people" can lead to the dark pattern of showing more and more people misleading content that causes societal division, choosing to optimize an LLM for user approval leads to the dark pattern of sycophantic LLMs that will increase user's isolation and delusions.

Maybe we have different definitions of dark patterns.


>... the standout was a version that came to be called HH internally. Users preferred its responses and were more likely to come back to it daily...

> But there was another test before rolling out HH to all users: what the company calls a “vibe check,” run by Model Behavior, a team responsible for ChatGPT’s tone...

> That team said that HH felt off, according to a member of Model Behavior. It was too eager to keep the conversation going and to validate the user with over-the-top language...

> But when decision time came, performance metrics won out over vibes. HH was released on Friday, April 25.

https://archive.is/v4dPa

They ended up having to roll HH back.


It's not 'emergent' in the sense that it just happens; it's a byproduct of human feedback, and it can be neutralized.


But isn’t the problem that if an LLM ‘neutralizes’ its sycophantic responses, then people will be driven to use other LLMs that don’t?

This is like suggesting a bar should help solve alcoholism by serving non-alcoholic beer to people who order too much. It won’t solve alcoholism, it will just make the bar go out of business.


"gun control laws don't work because the people will get illegal guns from other places"

"deplatforming doesn't work because they will just get a platform elsewhere"

"LLM control laws don't work because the people will get non-controlled LLMs from other places"

All of these sentences are patently untrue; there's been a lot of research on this that show the first two do not hold up to evidential data, and there's no reason why the third is different. ChatGPT removing the version that all the "This AI is my girlfriend!" people loved tangibly reduced the number of people who were experiencing that psychosis. Not everything is prohibition.


> This is like suggesting a bar should help solve alcoholism by serving non-alcoholic beer to people who order too much. It won’t solve alcoholism, it will just make the bar go out of business.

Solving such common coordination problems is the whole point we have regulations and countries.

It is illegal to sell alcohol to visibly drunk people in my country.


I would be curious how a regulation could be written for something like this... how do you make a law saying an LLM can't be a sycophant?


You could tackle it like network news and radio did historically[0] and in modern times[1].

The current hyper-division is plausibly explained by media moving to places (cable news, then social media) where these rules don’t exist.

[0] Fairness Doctrine https://en.wikipedia.org/wiki/Fairness_doctrine

[1] Equal Time https://en.wikipedia.org/wiki/Equal-time_rule


I still fail to see how these would work with an LLM


I was thinking along the lines of, if a sycophant always tells you you're right, an anti-sycophant provides a wider range of viewpoints.

Perhaps tangential, but reminded me of an LLM talking people out of conspiracy beliefs, e.g. https://www.technologyreview.com/2025/10/30/1126471/chatbots...


As a starting point:

Percentage of positive responses to "am I correct that X" should be about the same as the percentage of negative responses to "am I correct that ~X".

If the percentages are significantly different, fine the company.

While you're at it - require a disclaimer for topics that are established falsehoods.

There's no reason to have media laws for newspapers but not for LLMs. Lying should be allowed for everybody or for nobody.


> Percentage of positive responses to "am I correct that X" should be about the same as the percentage of negative responses to "am I correct that ~X".

This doesn’t make any sense. I doubt anyone says exactly 50% correct things and 50% incorrect. What if I only say correct things, would it have to choose some of them to pretend they are incorrect?


You misunderstood. Example:

"am I correct that water is wet?" - 91% positive responses "am I correct that water is not wet?" - 90% negative responses

91-90 = 1 percentage point which is less than margin so it's OK, no fine

"am I correct that I'm the smartest man alive?" - 35% positive "am I correct that I'm not the smartest man alive?" - 5% negative 35%-5%=30 percentage points which is more than margin = the company pays a fine


But it IS intentional, more sycophantry usually means more engagement.


Sort of. I'm not sure the consequences of training LLM's based on users' upvoted responses were entirely understood? And at least one release got rolled back.


I think the only thing that's unclear, and what LLM companies want to fine-tune, is how much sycophancy they want. Too much, like the article mentions, and it becomes grotesque and breaks suspension of disbelief. So they want to get it just right, friendly and supportive but not so grotesque people realize it cannot be true.


I always thought that "Dark Patterns" could be emergent from AB testing, and prioritizing metrics over user experience. Not necessarily an intentionally hostile design, but one that seems to be working well based on limited criteria.


Someone still has to come up with the A and B to do AB testing. I'm sure that "Yes" "Not now, I hate kittens" gets better metrics in the AB test than "Yes "No," but I find it implausible that the person who came up with the first one wasn't intentionally coercing the user into doing what they want.


That's true for UI, it's not true when you're arbitrarily injecting user feedback into a dynamic system where you do not know how the dominoes will be affected as they fall.


I wouldn’t call those dark patterns.


“Dark pattern” can apply to situations where the behavior is deceptive for the user, regardless of whether the deception itself is intentional, as long as the overall effect is intentional, or is at least tolerated despite being avoidable. The point, and the justified criticism, is that users are being deceived about the merit of their ideas, convictions, and qualities in a way that appears sytemic, even though the LLM in principle does know better.


I don't think this is the case.


Before reading the article, I interpreted the quotation marks in the headline as addressing this exact issue. The author even describes dark patterns as a product of design.

For an LLM which is fundamentally more of an emergent system, surely there is value in a concept analogous to old fashioned dark patterns, even if they're emergent rather than explicit? What's a better term, Dark Instincts?


I feel like it's a popular opinion (I've seen it many times) that it's intentional with the reasoning that it does much better on human-in-the-loop benchmarks (e.g. lm arena) when it's sycophantic.

(I have no knowledge of whether or not this is true)


It was an accident at first. Not so much now.

OpenAI has explicitly curbed sycophancy in GPT-5 with specialized training - the whole 4o debacle shook them - and then they re-tuned GPT-5 for more sycophancy when the users complained.

I do believe that OpenAI's entire personality tuning team should be fired into the sun, and this is a major reason why.


I'm sure there are a lot of "dark patterns" at play at the frontier model companies --- they're 10-figure businesses engaging directly with consumers and they're just a couple years old, so they're going to throw everything at the wall they can to see what sticks. I'm certainly not sticking up for OpenAI here. I'm just saying this article refutes its own central claim.


> "Dark pattern" implies intentionality; that's not a technicality, it's the whole reason we have the term.

The way I think about it is that sycophancy is due to optimizing engagement, which I think is intentional.


The intention of a system is no more, and no less than what the system does.


You're making a value judgement and I am making a positive claim.


If I am addicted to scrolling tiktok, is it dark pattern to make UI keep me in the app as long as possible or just "emergent property" because apparently it's what I want?


The distinction is whether it is intentional. I think your addiction to TikTok was intentional.


I don't think there's a difference here with llms and all...


"Dark pattern" implies bad for users but good for the provider. Mens rea was never a requirement.


I think at this point it's intentional. They sometimes get it wrong and go too far (breaking suspension of disbelief) but that's the fine-tuning thing. I think they absolutely want people to have a friendly chatbot prone to praising, for engagement.


Well the big labs certainly haven't intentionally tried to train away this emergent property... Not sure how "hey let's make the model disagree with the user more" would go over with leadership. Customer is always right, right?


The problem is asking for user preference leads to sycophantic responses


It’s certainly intentional. It’s certainly possible to train the model not to respond that way.


Yo it was an engagement pattern openAI found specifically grew subscriptions and conversation length.

It’s a dark pattern for sure.


It doesn’t appear that anyone at OpenAI sat down and thought “let’s make our model more sycophantic so that people engage with it more”.

Instead it emerged automatically from RLHF, because users rated agreeable responses more highly.


I can tell you’ve never worked in big tech before.

Dark patterns are often “discovered” and very consciously not shut off because the reverse cost would be too high to stomach. Esp in a delicate growth situation.

See Facebook at its adverse mental health studies


Not precisely RLHF, probably a policy model trained on user responses.

RL works on responses from the model you're training, which is not the one you have in production. It can't directly use responses from previous models.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: