There has been some recent discussion about how to conceptualize how we lose sight of goals, and I think there's a critical conceptual tool missing. Simply put, there's a difference between addiction and exploitation. Deciding to try heroin and getting hooked is addiction, but the free sample from a dealer given to an unwary kid to 'try it just once' is exploitation. This difference is not about the way the addict acts once they are hooked, it's about how they got where they are, and how wary non-addicts need to be when they notice manipulation.

I'm going to claim that the modern world is more addictive overall not just because humans have gotten better at wireheading ourselves. The reason is because humanity has gotten better at exploitation at scale. People (correctly) resent exploitation more than addiction, and we should pay more attention to how we are getting manipulated. Finally, I'll claim that this all ties back to my pet theory that some form of Goodhart's law can explain almost anything that matters. (I'm kind of joking about that last. But only kind of.)


Humans are adaptation executioners, not fitness maximizers - so sometimes we figure out how to wirehead using those adaptations in a way that doesn't maximize fitness. Sex is great for enhancing natural fitness because it leads to babies, condoms let us wirehead. High-fat foods are great for enhancing natural fitness because food is scarce, modern abundance lets us wirehead. Runner's high enhances natural fitness, #punintentional, because it allows people to continue running from a predator, but running marathons for the exhilarating feeling lets us wirehead. These examples are mostly innocuous - they aren't (usually) adddictions.

That doesn't mean they can't be bad for us. Valentine suggested that we try to notice the taste of the lotus when our minds get hijacked by something other than our goal. I think this is exactly right - we get hijacked by our own brain into doing something other than our original goal. Our preferences get modified by doing something, and several hours later it's 2 AM and we realize we should stop p̶l̶a̶y̶i̶n̶g̶ ̶p̶u̶z̶z̶l̶e̶s̶ ̶a̶n̶d̶ ̶d̶r̶a̶g̶o̶n̶s̶ r̶e̶a̶d̶i̶n̶g̶ ̶o̶l̶d̶ ̶p̶o̶s̶t̶s̶ ̶o̶n̶ ̶S̶l̶a̶t̶e̶s̶t̶a̶r̶c̶o̶d̶e̶x doing whatever it is we're currently getting distracted by.

In some of those cases natural behavior reinforcement systems are easier to trick than they are to satisfy and wireheading takes over. That's probably a bad thing, and most people would prefer not to have it happen. Valentine says: "I claim you can come to notice what lotuses taste like. Then you can choose to break useless addictions." But this isn't addiction. Addiction certainly involves hijacked motivations, but it goes further. It doesn't count as addiction unless it breaks the system a lot more than just preference hijacking.

"Addiction is a primary, chronic disease of brain reward, motivation, memory and related circuitry. Dysfunction in these circuits leads to characteristic biological, psychological, social and spiritual manifestations. This is reflected in an individual pathologically pursuing reward and/or relief by substance use and other behaviors." - American Society of Addiction Medicine

Addiction is when the system breaks, and breaking is more than being hijacked a bit - but the breaking that occurs in addiction is often passive. No-one broke it, the system just failed.

Loot Boxes

It's not hard to see that companies have figured out how to exploit people's natural behavior reinforcement systems. Foods are now engineered to trigger all the right physiological and gustatory triggers that make you want more of them. The engineering of preferences is not an art. (I'd call it a science, but instead of being ignored, it's really well funded.)

There is big money riding on a company's ability to hijack preferences. When Lays chips said “Betcha Can't Eat Just One,” they were being literal - they invested money into a product and a marketing campaign in order to bet that consumers would be unable to stop themselves from eating more than would be good for them. Food companies have been making this and similar bets for a few decades now, and each time they tilt the odds even further in their favor.

Modern video games include loot boxes, a game feature explicitly designed to turn otherwise at least kind-of normal people into money pumps. The victims are not money pumps in the classical dutch book sense, though. Instead, they are money pumps because playing the game is addictive, and companies have discovered it's possible to hijack people's limbic systems by coupling immersive games with addictive gambling. The players preferences are modified by playing the game, and had players been told beforehand that they would end up spending $1,000 and six months of their life playing the game, many would have chosen not to play.

Writing this post instead of the paper I should be writing on multipandemics is a distraction. Browsing reddit mindlessly until 2AM is an addiction. But it's Cheez-its that rewired my mind to make a company money. This last isn't addiction in the classical sense, it's exploitation of an addictive tendency. And that's where I think Goodhart's Law comes in.

Being Tricked

Scott Garrabrant suggested the notion of "Adversarial Goodhart," and in our paper we started to explore how it works a bit. Basically, Goodhart's law is when the metric, which in this case is whatever gets promoted by our natural behavior reinforcement systems, diverges from the goal (of the blind idiot god evolution,) which is evolutionary fitness. This divergence isn't inevitably a problem when selection pressure is only moderate, but because people are motivated to munchkin their own limbic system, it gets a bit worse.

Still, individuals are limited - we have only so much ability to optimize, and when we're playing in single player mode, the degree to which metrics and goals get misaligned is limited by the error in our models. It turns out that when you're screwing up on your own, you're probably safe. No one is there to goad you into further errors, and noticing the taste of the lotus might be enough. When the lotus is being sold by someone, however, we end up deep in adversarial-goodhart territory.

And speaking of being tricked, if I started this post saying I was going to mention AI safety and multi-agent dynamics most of you wouldn't have read this far. But now that I've got you this far, I'll misquote Paul Erlich, “To err is human but to really foul things up requires [multi-agent competitive optimizations].” I'll get back to psychology and wire-heading in the next paragraph, but the better a system becomes at understanding another system it's manipulating, the worse this problem gets. If there is ever any form of superintelligence that wants to manipulate us, we're in trouble - people don't seem to be hard to exploit.

As I've discussed above, exploitation doesn't require AI, or even being particularly clever - our goals can get hijacked by simple nerd-snipes, gamification (even if it's just magic internet points,) or any other idea that eats smart people. So I'll conclude by saying I think that while I agree we should notice the taste of the lotus, it's worth differentiating between being distracted, being addicted, and being exploited.

New Comment
2 comments, sorted by Click to highlight new comments since:

I agree with this, except for the implication that evolutionary fitness is a sufficient or even applicable goal-set for an individual. I fully intend to hijack my primitive desires in order to achieve my higher-order goals, and I recommend this to others.

I certainly didn't mean to imply that a person's goals should be related to evolutionary fitness - I was attempting to draw a parallel between the "goal" of evolution and the goals people set for themselves. Evolutionary goals get subverted by clever agents with their own agenda, and human goals can be affected by the same process.