The reward function is already how well you manipulate humans

Kerry

The reward function is already how well you manipulate humans

3 min read19th Oct 20229 comments

20

We have many, many parables about dangers arising from super intelligent or supremely capable AI that wreaks ruin by maximizing a simple reward function. "Make paperclips" seems to be a popular one^[1]. I worry that we don't give a more realistic scenario deep consideration: the world that arises from super capable AIs that have "manipulate humans to do X" as the reward function.

This is a real worry because many human domains already have this as a reward function. The visual art that best captures human attention and manipulates emotion is what is sought out, copied, sold, and remembered. The novels that use language most effectively to manipulate human emotion and attention are what sells, gather acclaim, and spread through the culture. Advertisements that most successfully grab human attention and manipulate emotion and thought leading to purchase get the most money. I'm sure anyone reading this can easily add many more categories to this list.

An ML model that can successfully manipulate humans at a super human level in any of these examples would generate great wealth for it's owner. I would argue that there is considerable evidence that humans can be manipulated. The question is, how good are humans at this manipulation? Is there enough untapped overhead in manipulative power that an AI could access to perform super human manipulation?

Fiction explores the concept of super human entertainment. "The Entertainment"^[2] of Infinite Jest, and Monty Python's "Funniest Joke in the World"^[3] both present media that can literally kill people by extreme emotional manipulation. In the real world it seems video games are already entertaining enough to kill some people.^[4] Could an AI create content so engaging as to cause dehydration, starvation, sleep deprivation, or worse? I can perhaps too easily imaging a book or show with a perfect cliff-hanger for every chapter or episode, where I always want to read or watch just a little bit more. With a super human author this content could continue such a pattern for ever.

In a more commercial vein, imagine there is a company that sells ads, and also has the best AI research in the world. This company trains a model that can create and serve ads such that 90% of people who view the ad, buy the product. This would lead to a direct transfer of wealth, draining individuals bank accounts and enriching the ad company. Could super human AI present such appealing ads, for such necessary products, that individuals would spend all savings, and borrow to buy more?

Does this sound impossible? I think about the times I have been the user who saw an ad, and said "Hmmm, I really DO need that now." A discount airfare right when I'm thinking about a trip, the perfect gift for my spouse a week before her birthday, tools for the hobby I'm considering. All cases where the ad really led to a purchase that was not necessarily going to happen without the ad. Sometimes the bill at the end of the month surprised me.

Is it really possible for an AI model to manipulate humans to the extent I explore above? My fear is that it is more than possible, it is relatively easy. Humans have evolved with many, many hooks for emotional manipulation. This entire community is built around the idea that it is difficult to overcome our biases, and the best we can hope is be less wrong. Such an AI would have so much training data. Reinforcement is easy, because people seek out and interact with emotionally manipulative media constantly.

Is there anything we can do? Personally, I am watching how I use different media. I keep a large backlog of "safe" entertainment; books, CDs, old games. When my use of new media crosses a time threshold I plan to cut myself off from new (particularly online) entertainment, only consuming old media. I fear entertainment the most, because that is where I know my own weakness lies. I think it is worthwhile to consider where your own weakness is, and prepare.

New to LessWrong?

Getting Started

FAQ

Library

AI PersuasionNarrow AIAI

Frontpage

20

The reward function is already how well you manipulate humans

1the gears to ascension

New Comment

9 comments, sorted by

top scoring

Click to highlight new comments since: Today at 3:37 PM

[-]shminux2y61

Humans are easily hackable by other humans all the time. We have very few safeguards in that regard, and even make a virtue out of being mind-hacked. Some examples: falling in love, being radicalized into participating in the Jan 6 riot, any movie scene where a character has an epiphany based on what someone else said and completely changes their behavior. This mundane manipulation of humans is pretty dramatic but unnoticeable from the inside the society. If the former president can do it to tens of millions within a very short time, surely something smarter can do it without humans ever noticing the sleight of... manipulator.

One dramatic but mundane example (plausible but not necessarily true): https://www.snopes.com/fact-check/the-shirt-off-his-back/

[-]belkarx2y10

There's also this: https://en.wikipedia.org/wiki/Memory_implantation

[-]shminux2y20

Yeah, constructing a memory that feels real is not hard.

[-]Noosphere892y30

I want to make a conjecture on Goodhart and truth.

Conjecture: Areas where Goodhart's law can't be avoided by any means are the same as areas which lack any notion of objective truth, and thus manipulation is the only useful outcome.

EDIT: I'm excluding Regressional Goodhart from this analysis.

I think art and fiction novels might be this.

[-]qbolec2y21

Based on the title alone I was expecting a completely different article: about how our human brains had originally evolved to be so big and great just to outsmart other humans in the political games ever increasing in complexity over millennia and

==thus==>

our value system already steers us to manipulate and deceive others but also ourselves so that we don't even realize that that's what our goal system is really about so that we can be more effective at performing those manipulations with straight face

==so==>

any successful attempt at aligning a super-intelligence to our values, will actually result in a super-manipulator which can perfectly hide it from everyone including self diagnostic

[-]_self_2y10

Oh no the problem is already happening, and the bad parts are more dystopian than you probably want to hear about lol

From the behaviorism side yes it's incredibly easy to manipulate people via tech, it's not always done on purpose as you state. But it's frequently insomnia inducing as a whole.

Your point about knowing your weakness and preparing is spot on!

For the UX side of this, look up Harry Brignull and Dark Patterns. (His work has been solid for 10+ years, to my knowledge he was the first to call out some real BS that went un-called-out for most of the 2010s.)
The Juul lawsuit is another good one if you're interested in advertising ethics
Look up "A/B testing media headlines outrage addiction".
If you want to send your brain permanently to a new dimension, look up the RIA propaganda advertising dataset.
For disinformation - "Calling Bullshit", there's a course and materials online from two professors who just popped off one day
Want to read about historical metric optimization perils and have a huge moral/existential crisis?: Read about Robert McNamara
For actual solutions on a nonacademic consumer level (!!) -- Data Detox Kit and the nonprofit that runs that page. So excellent.

The problem isn't so much the manipulation. Isn't that what all marketing has been, forever, a mix of creativity and manipulation of attention and desire?

A long time ago someone realized we respond positively to color, we eat more when we see red, we feel calm when we see blue. Were they manipulative? Yes. Is it industry knowledge now? Yes. Maybe they just felt like making it blue for no reason, but now everyone does it because it works? Yes.

That's the nature of it. But now, the SPEED at which manipulative techniques can be researched, fine tuned, learned, used, and scaled up, is unheard of.

There's no time for people, or psychology, to keep up. I think it's a public health risk, and risk to our democracy that we aren't receiving more public education on how to handle it.

Back when subliminal advertising was used back in the 19somethings, it had its run and then the US cracked down and banned it for being shady asf. Since then, we haven't really done a lot else. New manipulation techniques develop too fast and frequently now. And they're often black box.

Now the solutions for the problems that tech causes are usually folk knowledge, disseminated long before education, psychology, or policy catch up. We should be bloody faster.

Instead the younger generation grows up with the stuff and absorbs it. Gets it all mixed up in their identity. And has to reverse engineer themselves for years to get it back out.

Didn't we all do that? Sure, at a slower pace. What about gen alpha? Are they ever going to get to rest? Will they ever be able to separate themselves from the algorithms that raised them? Great questions to ask!

Frankly Gen Z is already smarter and faster at navigating this new world than us. That is scary because it means we're helpless to help them, a lot of the time.

Some of it, we can't even conduct relevant research on, because the board thinks the treatment is too unethical. *See: porn addiction studies.

Knowledge is power. But power is knowledge. And it's tightly guarded. Watch how people high up in tech regulate technology use with their children.

The general resistance to addressing the core of the issue, and the features that continually keep the car driving this direction...that's valuable informative in itself. How do we balance this with the economy as a whole, and the fact that the machine seems to eat the weak to keep spinning...I don't know! Someone else please figure out that answer, thank you.

But one of the most helpful things I think we can do is provide education. Behaviorism and emotion is powerful and you can use it on yourself, too. You are your own pavlov and your own dog. Sometimes other people will be Pavlov. It's best if you're consciously aware of it when that happens and you're ok with it.

The other thing, is preserving the right to living low tech. (I hope unions are up on this already.) Biometric tracking is nice and helpful sometimes. And sometimes, it's not. . As always, if you can't outrun them, confuse them.

If something in this comment is incorrect please correct me. I was freeballing it

[-]Observer34452y10

Your worst fears are already here for some people. Check out www.reddit.com/r/nosurf. It's a subreddit for people dealing with internet addiction and how to beat it. If you search through it there are a lot of people going to extremes and still not able to break free.

Have you ever downloaded TikTok? Some people seem immune to it but the majority of people in my life who download it go through the same thing. They think the app is for kids at first so they are averse to it. Then they download it "just to try it" and they spend at least 3 hours on the first day. That number never goes down and it grips your mind. If you've never tried it then please stay away. I have already implemented blocks on my phone to make sure I never download it again.

Remember, not everyone has the same mental capacities. Some people are completely immune while others are completely helpless even if they are aware of the problem.

I think this is only going to get worse.

[-]the gears to ascension2y11

note: there's been a recent post on the same topic that contains discussion you may find interesting. https://www.lesswrong.com/posts/TjbNF8QGJEDqdBEN7/misalignment-harms-can-be-caused-by-low-intelligence-systems

[-]the gears to ascension2y10

agreed. right now some humans believe they can ride the "manipulate-others" beast without getting destroyed by manipulation themselves; as ai gets stronger, there's significant reason to believe that the frontier of unfriendliness will come from advertising companies.

currently the youtube recommender is quite weak. it's some sort of a reinforcement system that does not plan far ahead; I think it may be a transformer, and it has a lot of representation capability, but as we've seen repeatedly, most of the crazy strength of deepmind's strongest agents is combining planning with a strong model that can learn to guide the planning as the RL occurs.

adding planning to a sufficiently general system can make it catastrophically strong without it being clear that it's done so and plan right through the agents who built the planner, though for a weak advertising planner, that would take a few weeks probably. and an ai that is already in use by a group who desires to use the ai to manipulate has illusion of incentive to add planning, because it would seem that being able to plan ahead would be able to schedule ads to manipulate the user into very specific emotional states. even if much of upper management is initially spared from impact, it wouldn't take long for the added chaos in the global system to result in severe damage to the company's viability and plausibly even ruin lives fast.

I hope deepmind has stern words with anyone on ad teams who tries that shit. and in the meantime, we need better tools for countering attempted manipulation. what objective helps users come into understanding of a system, rather than being manipulated? maybe MIMI+ai aided education stuff?

Moderation Log