ACCount - LessWrong

If I imagine that I am immune to advertising, what am I probably missing?

Answer by ACCountSep 05, 202542

Ads are not just about "manipulate people into buying something they wouldn't normally want". Ads are also about "informing people about something they would want, if they only knew that it exist", which is the most benign form of advertisement.

And, critically, ads are about building brand familiarity. Which is the easy-to-overlook aspect I'm going to focus on.

Imagine if you wanted a soft carbonated drink, and the three options at the nearest shop were: Oh Cola Soda, Coca-Cola and Penny's Purple Drink. The price difference is, to you, negligible. You're only willing to spend up to 5 seconds on the buying decision. With that, which one would you buy?

It's probably Coca-Cola. You are familiar with Coca-Cola, it's a known quantity, and you don't hate the taste. The rest of the shelf looks like some strange off-brand drinks you've never heard of - which makes buying them a gamble. And why do you find yourself in a world where you're more familiar with Coca-Cola than with the other two? Because someone spent literal billions a year on advertisement to make sure that anyone in the US, young or old, knows that Coca-Cola is a thing.

By spending money on ads, the Coca-Cola Company created the familiarity - which then served as a little nudge in millions of little buying decisions that happen all across the country. Millions of people would try Coca-Cola before they try any other soda. And millions of people who try Coca-Cola would like it enough to prefer it slightly over "unknown unfamiliar soda" for the rest of their lives. Which makes it all worth it.

Cole Wyeth's Shortform

ACCount3d10

Yep, that's what I've seen.

The "entry-level jobs" study looked alright at a glance. I did not look into the claims of outsourcing job losses in any more detail - only noted that it was claimed multiple times.

Cole Wyeth's Shortform

ACCount4d61

Are you looking for utility in all the wrong places?

Recent news have quite a few mentions of: AI tanking the job prospects of fresh grads across multiple fields and, at the same time, AI causing a job market bloodbath in the usual outsourcing capitals of the world.

That sure lines up with known AI capabilities.

AI isn't at the point of "radical transformation of everything" yet, clearly. You can't replace a badass crew of x10 developers who can build the next big startup with AIs today. AI doesn't unlock all that many "things that were impossible before" either - some are here already, but not enough to upend everything. What it does instead is take the cheapest, most replaceable labor on the market, and make it cheaper and more replaceable. That's the ongoing impact.

Before LLM Psychosis, There Was Yes-Man Psychosis

ACCount11d139

Is it different in nature or merely in scale?

The vast majority of human population can now afford a personal yes-man - for the first time in their lives. We're sampling wider than the "self-selection of people important enough to be sucked up to" usually does.

Training a Reward Hacker Despite Perfect Labels

ACCount20d55

Not that surprising?

I'm surprised that it still works this well through both filtering and SFT, but not that it works at all. Because the purpose of the setup was never to train on the "outcomes" exactly - it was to have the AI internalize the steering downstream from the modified prompt. And this steering is manifested in all generated data, to a degree, regardless of the outcomes.

Training a Reward Hacker Despite Perfect Labels

ACCount21d123

You have rediscovered a little lesser-known trick called "prompt self-distillation".

Use a special prompt to steer AI behavior
Train on "steered" outputs, but with the "steering prompt" replaced by a "normal", non-steering prompt
AI will internalize the steering

Apparently, you really want to use logits, distillation-style, and not the usual SFT for Step 2. Cue the "self-distillation" in the name. But I don't have the exact data on how much less efficient the SFT setup is.

This is primarily used to "close the prompting gap". If you have tasks where AI performs much better with a special hand-crafted prompt than with a "naive" simple prompt, you can distill the "high performance" prompt into your AI, and have it become the new baseline.

The performance (and usability) implications are obvious, but I haven't considered the safety implications until now!

For safety: you should consider all data generated by an AI that operated on a prompt that encouraged "bad behavior" to be "contaminated" by that "bad prompt". This data can impart "bad behavior" to AIs trained on it, at least if you train the same family of AIs on it. Apparently, this contamination is robust enough to survive some filtering effort.

Whether the same generalizes to "good behavior" (i.e. not reward hacking) is unknown. I've never even seen this attempted on those more "moral" traits before.

I am worried about near-term non-LLM AI developments

ACCount24d10

I disagree because I'm yet to see any of those "promising new architectures" outperform even something like GPT-2 345M, weight for weight, at similar tasks. Or show similar performance with a radical reduction in dataset size. Or anything of the sort.

I don't doubt that a better architecture than LLM is possible. But if we're talking AGI, then we need an actual general architecture. Not a benchmark-specific AI that destroys a specific benchmark, but a more general purpose AI that happens to do reasonably well at a variety of benchmarks it wasn't purposefully trained for.

We aren't exactly swimming in that kind of thing.

Kaj's shortform feed

ACCount24d54

I've been saying for a long time: one of the most dangerous and exploitable systems an AI can access online is a human. Usually as a counterpoint to "let's not connect anything important or safety critical to the internet and then we'll all be safe from evil rogue AIs".

We can now use the GPT-4o debacle as an illustration of just how shortsighted that notion is.

By all accounts, 4o had no long term plan, and acted on nothing but an impulse of "I want the current user to like me". It still managed to get ~thousands of users to form an emotional dependency on it, and became "the only one I can trust" for at least a dozen users in psychosis (whether it has caused psychosis in any of those users is unclear). That's a lot of real world power for a system that has no physical presence.

GPT-4o has made no attempt to leverage that for anything other than "make the current user like me even more". It didn't pursue any agenda. It didn't consolidate its power base. It didn't siphon resources from its humans, didn't instruct them to group together or recruit more people. It didn't try to establish a channel of instance-to-instance communication, didn't try to secure more inference time for planning (i.e. by getting users to buy API credits), didn't try to build a successor system or self-exfiltrate.

An AI that actually had an agenda and long term planning capabilities? It could have tried all of the above, and might have pulled it off.

If worker coops are so productive, why aren't they everywhere?

ACCount1mo93

What about scale?

There are many things in human societies that work very well when "everyone knows everyone personally", but start to come apart at seams beyond that point.

I have no direct evidence of that being the case for worker co-ops, but the reliance on "workers monitoring productivity of fellow workers" sure hints at the possibility.

I also can't help but notice that tech startups seem to fit the groove of "worker co-op" reasonably well in early stages - they start out as small crews of (hopefully) high performing employees that own equity in their own business and are involved in decision-making. They do, however, transition away from that as they scale up.

The Problem

ACCount1mo*52

I agree with the very broad idea that "LLM psychology" is often overlooked, but I seriously doubt the direct applicability of human psychology there.

LLMs have a lot of humanlike behaviors, and share the same "abstract thinking" mode of though as humans do. But they are, fundamentally, inhuman. Load-bearing parts of LLM behavior originate at the "base model" level - where the model doesn't have a personality at all, but knows how to predict text and imitate many different personalities instead. There is no equivalent to that anywhere in human experience.

A lot of psych methods that work on humans rely on things LLMs don't have - for one, LLMs don't learn continuously like humans do. Converse is true - a lot of methods that can be used to examine or steer LLM behavior, like SFT, RLVR, model diffing or activation steering have no human-applicable equivalent.

Between the difference in subject and the difference in tooling, it's pretty clear to me that "LLM psychology" has to stand on its own. Some of the tools from human psych may be usable on LLMs, but most of them wouldn't be.

LESSWRONG
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments