Reducing Agents: When abstractions break

Hazard

Epistemic Effort: A month of dwelling on this idea, 12-16 hours of writing to explore the idea, and 2-5 hours rereading old LW stuff.

In the past few months, I’ve been noticing more things that lead me to believe there’s something incomplete about how I think about beliefs, motives, and agents. There’s been one too many instances of me wondering, “Yeah, but but what do you really believe?” or “Is that what you really want?”

This post is the first in a series where I'm going to apply More Dakka to a lot of Lesswrong ideas I was already familiar with, but hadn’t quite connected the dots on.

Here are the main points:

Agents are an abstraction for making predictions more quickly.
In what contexts does this abstraction break down?
What model should be used in places where it does break down?

Relevant LW posts (will also be linked throughout)

Reductionism 101

Blue Minimizing Robot

Adaptation-Executors, Not Fitness-Maximizers

Abstractions

Abstraction is awesome. Being able to make quality high level abstractions is like a super power that bends the universe to your will. Think of an abstraction as a model. A given abstraction has its ontologically basic building blocks defined, as well as the rules that govern their interactions.

Imagine a universe where 2x6 Lego bricks are the base level of reality. They connect to each other just like they do in our universe, but they do so axiomatically, not because of friction or anything. In this universe, we might make a higher level abstraction by defining a handful of multi-brick structures to be ontologically basic. You lose some of the resolution of having 2X6’s as your basis, but it also doesn’t take as long to make large things.

That’s the fundamental trade-off of abstractions. Each time you hop up a layer, you lose some detail (and thus, model accuracy) but you gain simplicity in computation. You could talk about a complex system like a computer on the quark level, but the computational time would be silly. Same for the atom or molecule layer. There’s a sparkle of hope when you get to the level of transistor being basic. Now you can quickly talk about tons of cool stuff, but talking about an entire computer is still out of your reach. Hop up to logic gates. Hop up to basic components like adders, multiplexers, and flip-flops. Now we've reached a level where you could actually design a useful piece of hardware that does something. Hop up to registers and ALU’s. Make the awesome leap to having a 16-bit CPU that can be programmed in assembly. Keep going all the way up until it’s possible to say, “Did you see Kevin’s skiing photos on facebook?”

Each time we hopped a layer of abstraction, we decided to simplify our models by ignoring some details. Luckily for us, many brave souls have pledged their lives to studying the gaps between layers of abstractions. There’s someone who works on how to make transistors closer to ontologically basic things. On the flip side, there’s someone with the job of knowing how transistors really work, so they can design circuits where it truly doesn’t matter that transistors aren’t basic. The rest of us can just hang-out and work on our own level of abstraction, care free and joyful.

For something like computers, lots of smart people have put lots of thought into each layer of abstraction. Not to say that computer engineering abstractions are the best they can be, but to make the point that you don’t get good abstractions for free. Most possible next level abstractions you could make would suck. You only get a nice abstraction when you put in hard work. And even then, abstractions are always leaky (except maybe in math where you declare your model to have nothing to do with reality)

The thing that’s really nice about engineering abstractions is that they are normally completely specified. Even if you don’t know how the IEEE defines floating point arithmetic, there is a canonical version of “what things mean”. In engineering, looking up definitions is often a totally valid and useful ways to resolve arguments.

When an abstraction is under-specified, there’s lots of wiggle room concerning how something is supposed to work, and everyone fills-in-the-blank with their own intuition. It’s totally acceptable for parts of your abstraction to be under-specified. The C programming language declares the outcome of certain situations to be undefined. C does not specify what happens when you try to dereference a null pointer. You do get problems when you don’t realize that parts of your abstraction are under-specified. Language is a great example of a useful abstraction that is under-specified, yet to the untrained feels completely specified, and that difference leads to all sorts of silly arguments.

Agents and Ghosts

I’m no historian, but from what I’ve gathered most philosophers for most of history have modeled people as having some sort of soul. There is some ethereal other thing which is one’s soul, and it is the source of you, your decisions, and your consciousness. There is this machine which is your body, and some Ghost that inhabits it and makes it do things.

Even though it’s less common to think Ghosts are part of reality, we still model ourselves and others as having Ghosts, which isn’t the most helpful. Ghosts are so under-specified that they shift almost all of the explanatory burden to one’s intuition. Ghosts do not help explain anything, because they can stretch as much as one’s intuition can, which is a lot.

Lucky for us, people have since made better abstractions than just the basic Ghost. The decision theory notion of an Agent does a pretty good job of capturing the important parts of “A thing that thinks and decides”. Agents have beliefs about the world, some way to value world states, some way of generating actions, and some way to choose between them (if there are any models of agents that are different let me know in the comments).

Again, we are well versed in reductionism and know that there are no agents in the territory. They are a useful abstraction which we use to predict what people do. We use it all the time, and it often works to great success. It seems to be a major load bearing abstraction in our tool kit for comprehending the world.

The rest of this series is a sustained meditation on two questions, one’s which are vital to ask anytime one asks an abstraction to do a lot of work:

In what contexts does the Agent abstractions break down?
When it breaks down, what model do we use instead?

The rest of this post is going to be some primer examples of the Agent abstraction breaking down.

The Blue Minimizing Robot

Remember the Blue Minimizing Robot? (Scott’s sequence was a strong primer for my thoughts here)

Imagine a robot with a turret-mounted camera and laser. Each moment, it is programmed to move forward a certain distance and perform a sweep with its camera. As it sweeps, the robot continuously analyzes the average RGB value of the pixels in the camera image; if the blue component passes a certain threshold, the robot stops, fires its laser at the part of the world corresponding to the blue area in the camera image, and then continues on its way.

It’s tempting to look at that robot and go, “Aha! It’s a blue minimizing robot.” Now you can model the robot as an agent with goals and go about making predictions. Yet time and time again, the robot fails to achieve the goal of minimizing blue.

In fact, there are many ways to subvert this robot. What if we put a lens over its camera which inverts the image, so that white appears as black, red as green, blue as yellow, and so on? The robot will not shoot us with its laser to prevent such a violation (unless we happen to be wearing blue clothes when we approach) - its entire program was detailed in the first paragraph, and there's nothing about resisting lens alterations. Nor will the robot correct itself and shoot only at objects that appear yellow - its entire program was detailed in the first paragraph, and there's nothing about correcting its program for new lenses. The robot will continue to zap objects that register a blue RGB value; but now it'll be shooting at anything that is yellow.

Maybe you conclude that the robot is just a Dumb Agent™ . It wants to minimize blue, but it just isn’t clever enough to figure out how. But as Scot points out, the key error with such an analysis is to even model the robot as an agent in the first place. The robot’s code is all that’s needed to fully predict how the robot will operate in all future scenarios. If you were in the business of anticipating the actions of such robots, you’d best forget about trying to model it as an agent and just use the source code.

The Connect 4 VNM Robot

I’ve got a Connect 4 playing robot that beats you 37 times in a row. You conclude it’s a robot whose goal is to win at Connect 4. I even let you peak at the source code, and aha! It’s explicitly encoded as a VNM agent using a mini-max algorithm. Clearly this can safely be modeled as an expected utility maximizer with the goal of whooping you at connect 4, right?

Well, depends on what counts as safely. If the ICC (International Connect 4 Committee) declares that winning at Connect 4 is actually defined by getting 5 in a row, my robot is going to start losing games to you. Wait, but isn’t it cheating to just say we are redefining what winning is? Okay, maybe. Instead of redefining winning, let’s run interference. Every time my robot is about to place a piece, you block the top of the board (but only for a few seconds). My robot will let go of its piece, not realizing it never made a move. Arg! If only the robot was smart enough to wait until you stopped blocking the board, then it could have achieved it’s true goal of winning at connect 4!

Except this robot doesn’t have any such goal. The robot is only code, and even though it’s doing a faithful recreation of a VNM agent, it’s still not a Connect 4 winning robot. Until you make an Agent model that is at least as complex as the source code, I can put the robot in a context where your Agent model will make an incorrect prediction.

“So what?” you might ask. What if we don’t care about every possible context? Why can’t we use an Agent model and only put the robot in contexts where we know the abstraction works? We absolutely can do that. We just want to make sure we never forget that this model breaks down in certain places, and we'd also like to know exactly where and how it will break down.

Adaptation Executors, Not Fitness Maximisers

Things get harder when we talk about humans. We can’t yet “use the source code” to make predictions. At first glance, using Agents might seem like a perfect fit. We want things, we believe things, and we have intelligence. You can even look at evolution and go, “Aha! People are fitness maximizers!” But then you notice weird things like the fact that humans eat cookies.

Eliezer has already tackled that idea.

No human being with the deliberate goal of maximizing their alleles' inclusive genetic fitness, would ever eat a cookie unless they were starving. But individual organisms are best thought of as adaptation-executors, not fitness-maximizers.

Adaptation executors, not fitness-maximizers.

Repeat that 5 more times every morning upon waking, and then thrice more at night before going to bed. I’ve certainly been muttering it to myself for the last month that I’ve been dwelling on this post. Even if you’ve already read the Sequences, give that chunk another read through.

Rebuttal: Maybe fitness isn’t the goal. Maybe we should model humans as Agents who want cookies.

We could, but that doesn’t work either. More from Scott:

If there is a cookie in front of me and I am on a diet, I may feel an ego dystonic temptation to eat the cookie - one someone might attribute to the "unconscious". But this isn't a preference - there's not some lobe of my brain trying to steer the universe into a state where cookies get eaten. If there were no cookie in front of me, but a red button that teleported one cookie from the store to my stomach, I would have no urge whatsoever to press the button; if there were a green button that removed the urge to eat cookies, I would feel no hesitation in pressing it, even though that would steer away from the state in which cookies get eaten. If you took the cookie away, and then distracted me so I forgot all about it, when I remembered it later I wouldn't get upset that your action had decreased the number of cookies eaten by me. The urge to eat cookies is not stable across changes of context, so it's just an urge, not a preference.

Like with the blue minimizing robot, it’s tempting to resort to using a Dumb Agent™ model. Maybe you really do have a preference for cookies, but there is a counter-preference for staying on your diet. Maybe proximity to cookies increases how much you value the cookie world-state. There are all sorts of weird ways you could specify your Dumb Agent ™ to produce human cookie. But please, don’t.

I can’t appeal to “Just use the source code” anymore, but hopefully, I’m getting across the point that it’s at least a little bit suspicious that we (I) want to conform all human behavior to the Agent Abstraction.

So if we aren’t agents, what are we?

Hopefully that last sentence triggered a strong reflex. Remember, it’s a not a question of whether or not we are agents. We are quarks/whatever-is-below, all hail reductionism. We are trying to get a better understanding of when the Agent abstraction breaks down, and what alternative models to use when things do break down.

This post’s main intent was to motivate this exploration, and put to rest any fears that I am naively trying to explain away agents, beliefs, and motives.

Next Post: What are difficult parts of intelligence that the Agent abstraction glosses over?

[-][anonymous]6y30

This was great, and I found it to be a clearer way of pointing at how typical "straw rationalist-y" ways of looking at decision-making might not work well in certain situations. I'm looking forward to the rest of the sequence!

[-]rk6y10

Something I struggle with in the adaptation-executor-vs-fitness-maximiser is not dismissing too much as just 'misfiring adaptation' and allowing that some of my intuitive behaviour is sensible. I'm particularly prone to doing this with productivity: "ah, a part of me that is fairly well described as being agent-y, is being stymied by a non-agenty part that just interacts with shiny things! I should make sure that in-so-far as I have shininess in my vicinity, it should mostly be attached to things the agenty part of me wants to interact with!"

I don't know if that is something this sequence will cover, but I hope so!

[-]Qiaochu_Yuan6y30

The word "just" in "just interacts with shiny things" is giving your parts too little credit, I think. There are a lot of flavors of being attracted to shiny things and some of them look like genuinely poor impulse control but some of them look like trying to avoid feeling pain, or perhaps trying to avoid existing in some sense. Your description isn't detailed enough for me to say more, though.

Clearly this wasn't the best description, because the comment was supposed to not be giving the part in quotes much credit. So as you say, the word 'just' is needlessly dismissive, but I am dismissing the part of me that is so dismissive!

(I added some more detail in another comment)

[-]Hazard6y20

Some of the topics I was planning on will be related, though there might not be a direct commentary on it.

What are some specific things your paragraph in quotes refers to? I don't see that problem description as fitting nicely into, " not dismissing too much as just 'misfiring adaptation' and allowing that some of my intuitive behaviour is sensible ", so an example would help.

You're right, that isn't very clear. By the way, the thought in this example is not that I definitely dismiss unfairly in this case, the idea is that I'm doing it in an unnuanced way that doesn't take into account possible reasons the impulse can't be ignored. The 'silly shiny things' attitude is not able to tease apart when my behaviour is sensible from when it's not.

As a hopefully clearer example, say I'm trying to get some work done, but I'm just sending stupid messages and gifs to whatsapp groups instead. This could be a case where I should be getting the work done and should turn off notifications on my phone, but it could be an unresolved fear of the consequences of failure. It could also be that I'm exhausted and there's no point working. It could even be that I feel isolated and need some interaction!

Does that make the case I'm talking about a bit clearer?

You might also think your work is stupid and pointless on top of all the other stuff, cf. the Procrastination Equation; that part's important too, because you might also have narratives around the kind of work that you "should" be doing, underneath which are fears that if you don't do the work you "should" be doing then something terrible will happen.

[-]confusedperson6y30

I think have a bug of this form, and it’s been an issue for me for a long time.

When it’s school break or a huge weight has been lifted off my shoulders, I find that it’s peaceful for me to study mathematics because I feel like I’m not being forced to study certain things. But as soon as school starts, and I take a math class where the material starts to become unfamiliar to me , and there’s no motivation being provided for the material as to why we are doing what we we’re doing, it feels forced and so I become stymied by listening to music and browsing articles on psychology/cognitive science, reddit ,or lesswrong to figure out this burnout , and then bouncing back to thinking whether I should suppress my curiosity to do well in the class , or should I let my curiosity run free but in return not doing so great in the class.

Because if I don’t suppress my curiosity and hence flow with mathematics, then I feel like I’ll run the risk of diverging away from the course material, which in turn I’ll do badly on exams because I didn’t focus on the required topics enough.

I’ve taken some proof based courses like real analysis, so it’s not like I don’t know how to prove some things in a typical traditional math undergrad course . It’s just that I feel guilty not focusing on my school , so I retreat to listening to music or reading psychology or browsing lesswrong articles to escape from these negative feelings on whether I should focus exclusively on the math in the class or playing with math I find interesting but at the risk of performing poorly in my class. I know these two activities don’t have to be mutually exclusive: you can play with math you find interesting that’s been assigned by the professor. However, the math assigned by the professor might not be interesting sometimes at first, so I burnout from being bombarded by "should" statements.

Any input/advice/guidance from anyone here would be greatly appreciated. I've been having trouble fixing this bug alone.

My suggestion is that if it feels effortful to do well enough in the class to get the grade you want, then drop it. School is terrible. If you want to optimize for credentials then you can do it by taking the easiest classes that allow you to graduate in the major you want, and you can optimize for credentials completely separately from your actual learning.

LESSWRONG
LW

Reducing Agents: When abstractions break

13

New to LessWrong?

13