ACCount — LessWrong

This is an entangled behavior, thought to be related to multi-turn instruction following.

We know our AIs make dumb mistakes, and we want an AI to self-correct when the user points out its mistakes. We definitely don't want it to double down on being wrong, Sydney style. The common side effect of training for that is that it can make the AI into too much of a suck up when the user pushes back.

Which then feeds into the usual "context defines behavior" mechanisms, and results in increasingly amplified sycophancy down the line for the duration of that entire conversation.

Towards a Typology of Strange LLM Chains-of-Thought

ACCount6d71

Repeated token sequences - is it possible that those tokens are computational? Detached from their meaning by RL, now emitted solely to perform some specific sort of computation in the hidden state? Top left quadrant - useful thought, just not at all a language.

Did anyone replicate this specific quirk in an open source LLM?

"Spandrel" is very plausible for that too. LLMs have a well known repetition bias, so it's easy to see how that kind of behavior could pop up randomly and then get reinforced by an accident. So is "use those tokens to navigate into the right frame of mind", it seems to get at one common issue with LLM thinking.

shortplav

ACCount11d30

Chat or API?

API access gives way better tools for this kind of thing.

faul_sname's Shortform

ACCount16d10

Does this quirk reproduce on open weights models, i.e. GPT-OSS? Similar reasoning trace quirks in different model families?

Sounds like a fun target for some mechinterpret work. Might be a meaningful behavior, might be meaningless noise, plenty of room to try different things to figure that out.

But, of course, OpenAI wouldn't let anyone have fun with their proprietary models, so we'd need to replicate this in an open model to start.

lemonhope's Shortform

ACCount22d10

There's a lot of "they used user data to shoot themselves in the foot" and not nearly enough "they used user data to improve performance" happening in the industry.

Maybe frontier labs will finally crack applying user feedback once the training data bottleneck begins to bite? I imagine that getting good utility out of user data is hard, both from the standpoint of the engineering required, and the computation required.

Immigration to Poland

ACCount1mo2-1

Don't mistake my "very good for economic growth" for "any good for social cohesion".

I make no such claim. My claim is that there is a lot of economic incentives to overlook the negatives of immigration.

My honest opinion is that immigration is not going to be good for social cohesion unless the immigration policy is nothing short of immaculate. And the gap between "spectacularly failed" and "nothing short of immaculate" is where most immigration policies currently reside.

Immigration to Poland

ACCount1mo3-7

Pro-immigrant stance is not just "cosmopolitan" - it's also the stance backed by the economic interests.

From an economic standpoint, the case is clear: immigration is very good for economic growth. The macro scale effect is pronounced, sticky, and it takes a spectacularly failed immigration policy to undo it.

Big business also tend to be pro-immigration: a lot of them stand to benefit from increases in supply of labor directly, and a few interest groups stand to benefit from the demand driven by increases in population too - i.e. the housing sector.

This means that there's a lot of money riding on "pro-immigration", which has a way of distorting policy decisions.

I'm sure that a part of this attitude is created by the idealism of "cosmopolitan elites" - but the cynic in me can't help but notice all the financial incentives to take a pro-immigration stance, social costs and popular opinion be damned.

If I imagine that I am immune to advertising, what am I probably missing?

Answer by ACCountSep 05, 2025104

Ads are not just about "manipulate people into buying something they wouldn't normally want". Ads are also about "informing people about something they would want, if they only knew that it exist", which is the most benign form of advertisement.

And, critically, ads are about building brand familiarity. Which is the easy-to-overlook aspect I'm going to focus on.

Imagine if you wanted a soft carbonated drink, and the three options at the nearest shop were: Oh Cola Soda, Coca-Cola and Penny's Purple Drink. The price difference is, to you, negligible. You're only willing to spend up to 5 seconds on the buying decision. With that, which one would you buy?

It's probably Coca-Cola. You are familiar with Coca-Cola, it's a known quantity, and you don't hate the taste. The rest of the shelf looks like some strange off-brand drinks you've never heard of - which makes buying them a gamble. And why do you find yourself in a world where you're more familiar with Coca-Cola than with the other two? Because someone spent literal billions a year on advertisement to make sure that anyone in the US, young or old, knows that Coca-Cola is a thing.

By spending money on ads, the Coca-Cola Company created the familiarity - which then served as a little nudge in millions of little buying decisions that happen all across the country. Millions of people would try Coca-Cola before they try any other soda. And millions of people who try Coca-Cola would like it enough to prefer it slightly over "unknown unfamiliar soda" for the rest of their lives. Which makes it all worth it.

Cole Wyeth's Shortform

ACCount1mo10

Yep, that's what I've seen.

The "entry-level jobs" study looked alright at a glance. I did not look into the claims of outsourcing job losses in any more detail - only noted that it was claimed multiple times.

Cole Wyeth's Shortform

ACCount2mo71

Are you looking for utility in all the wrong places?

Recent news have quite a few mentions of: AI tanking the job prospects of fresh grads across multiple fields and, at the same time, AI causing a job market bloodbath in the usual outsourcing capitals of the world.

That sure lines up with known AI capabilities.

AI isn't at the point of "radical transformation of everything" yet, clearly. You can't replace a badass crew of x10 developers who can build the next big startup with AIs today. AI doesn't unlock all that many "things that were impossible before" either - some are here already, but not enough to upend everything. What it does instead is take the cheapest, most replaceable labor on the market, and make it cheaper and more replaceable. That's the ongoing impact.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments