SlimeMoldTimeMold (SMTM) recently finished a 14-part series on psychology, “The Mind in the Wheel”, centering on feedback loops (“cybernetics”) as kind of a grand unified theory of the brain.

There are parts of their framework I like—in particular, I think there are innate drives (pain is bad, eating-when-hungry is good, etc.), and I think that the top priority of neuroscience & psychology should be to figure out exactly what they are and how they work.

But when we drill into details, I have a bunch of disagreements with SMTM’s account.

Just kidding! Control theory is great. I’m all for it. …I just don’t think it’s a grand unified theory of the brain. (source)

I think there’s a general failure mode of “grand unified theories of the brain”: the brain needn’t have any grand unified theory in the first place! I find that people with such theories are often doing the thing where they have a hammer and everything looks like a nail. (I throw this same criticism at the “brain = Bayesian inference” people, and the “brain = prediction” people, and the “brain = neural net with gradient descent” people, etc.).

I’m likewise generally opposed to grand unified theories of the body. The shoulder involves a ball-and-socket joint, and the kidney filters blood. OK cool, those are two important facts about the body. I’m happy to know them! I don’t feel the need for a grand unified theory of the body that includes both ball-and-socket joints and blood filtration as two pieces of a single overly-cute grand narrative. Ditto for the cortex, striatum, cerebellum, and all the other components of the brain.

So, how do homeostatic feedback control loops actually work—what’s the not-overly-cute big picture that they really fit into? Read on for my opinions on that, along with my spirited defense of brain-like AGI doomerism, of reinforcement learning and value functions in the brain, of neural tracer studies, and much more!

Anyway, onto the posts!

(Thanks SMTM for kindly chatting with me before I published, but I haven’t run this piece by them, and I wrote it quickly, and I have little doubt that there are mistakes and mischaracterizations below.)

0. PROLOGUE – EVERYBODY WANTS A ROCK

This post was a very long-winded way to say: “In psychology, just like in every other domain, we should try to actually understand how things work at a nuts-and-bolts level.”

Great! I agree.

(…Except insofar as a sufficiently good nuts-and-bolts understanding of how the brain works would enable brain-like AGI, which would be an extremely dangerous technology that we are not ready for. But I don’t think SMTM is working towards understanding the dangerous aspects.^[1] So it’s all good!)

1. PART I – THERMOSTAT

Here SMTM comes out with their big idea: feedback control is their grand unified theory of the brain. Like I said at the top, I think there’s an important kernel of truth, but it’s overstated, and I disagree with many of the details.

I’ll just start complaining about stuff they wrote, in no particular order. Then I’ll circle back to a positive account of how I think about homeostatic feedback loops in the brain.

1.1 “Feedback” is broader than the setpoint + comparator thing

SMTM uses the term “governor” to refer to a feedback loop—for example, a “hunger governor” would track energy stores and send a hunger signal if it’s inadequate.

They draw “governors” as classic cookie-cutter feedback control loops, with an input signal, comparator, and error signal. I think that’s overly specific compared to how feedback control works in the brain.

For example, consider the famous AgRP/NPY neurons of the arcuate nucleus in the hypothalamus, closely related to hunger (and hence well-studied by obesity-motivated neuroscience researchers). As I discussed here, on the input side, these neurons are (among other things) excited by ghrelin (which is emitted by an empty stomach), inhibited by Peptide YY (which is emitted by a full gut), and inhibited by leptin (which is emitted by fat cells). On the output side, they project to (among other things) parts of the brainstem related to pain tolerance (when you’re about to starve, better to focus on eating rather than invest in long-term health), to nonshivering thermogenesis (when you’re about to starve, better to save energy even at the expense of mild hypothermia), and I think indirectly to the conscious sensation of hunger^[2], and to motivation systems that make hunger seem bad and eating seem good^[3].

So, take a look at those inputs and outputs. Is this feedback control? Absolutely! Is there a setpoint, comparator, and error signal? Umm, well, kinda, but not really per se. Is there PID control, or PI, or just P? Umm, kinda none of the above, although one could argue that there are P-ish and I-ish elements.

So we have “feedback control” in a broad, qualitative sense, but I think SMTM is conceptualizing how that works too narrowly.

1.2 “Emotions” do not correspond to error signals in homeostatic feedback loops

SMTM identifies “emotions” with the error signals in homeostatic feedback loops. I disagree.

For one thing, happiness is an emotion, but obviously not a homeostatic error signal. SMTM bravely bites this bullet—“happiness is not an emotion”, they say—and instead they mount a spirited defense of redefining everyday words when our scientific knowledge grows.

But it’s worse than that. Angry people don’t generally behave in a way that systematically reduces their anger, but rather seem to deliberately brainstorm additional reasons to be angry; likewise sad people seek out reasons to be sad, anxious people will ruminate on why everything is terrible and doomed, and so on. Are we going to define anger, sadness, and anxiety as “not emotions” too? Isn’t this getting a bit ridiculous?

Even worse, their positive examples of “emotions”, like needing to pee, are not what people normally call an “emotion” at all—I think most people would call that a “feeling”.

So SMTM’s proposed definition of “emotion” seems to have approximately zero overlap with the everyday usage of the word “emotion”!

1.3 “Happiness” does not correspond to correcting error signals

I would distinguish two things: “happiness” (a mood) and “pleasure” (a feeling). I think SMTM mixes these up.

SMTM’s explicit definition of “happiness” is the latter: “When a governor sends its signal back into alignment, correcting an error signal, this causes happiness.”

For example, I have to pee, and then I pee, and that feels good. Is that “happiness”? I don’t think most people would call it that. They would call it pleasure.

(But SMTM also says “happiness is a signal used to calibrate explore versus exploit”, which I think indicates that they’re inconsistently flipping back to the “mood” definition sometimes.)

1.4 Pleasure does not really correspond to correcting error signals either

So when I correct that (IMO) terminological error, SMTM’s claim is now “When a governor sends its signal back into alignment, correcting an error signal, this causes [pleasure].” Is that right? No, I don’t think that’s right either. Or if it is, it’s only right in a way that’s basically tautological.

Like, name something that causes pleasure. Sex? Ah, it’s “correcting the error” of insufficient sex. Eating a brownie sundae? Ah, it’s “correcting the error” of insufficient salt, sugar, and fat. Getting licked in the face by your beloved puppy? Ah, it’s “correcting the error” of insufficient friendship. Enjoying a beautiful view? Ah, it’s “correcting the error” of insufficient beautiful views. You thought of something funny and it made you laugh? Ah, it’s “correcting the error” of insufficient laughter. Injecting heroin? Ah, it’s “correcting the error” of, umm, I guess insufficient heroin?

Anyway, the first few of these examples are undoubtedly useful framings. The later ones are IMO rather dubious. Anyway, why not just say “pleasure is a brain signal that says that something good is happening right now”? That’s what I say! To me, this framing makes everything about pleasure very intuitive—why it exists in the first place, and why it has the various properties that it has.

And then, sure, after we say “pleasure is a signal that something good is happening”, we can also add that “I corrected a homeostatic imbalance” is a very important special case situation in which something good is happening.

(The common term “relief” is more specifically about correcting an unpleasant error signal. I don’t think relief is its own innate brain signal, separate from pleasure. I think “relief” is a learned concept that points to a common situation in which pleasure occurs.)

1.5 My picture

Enough criticism. Again, I think feedback control is real and important. So what’s my positive account of how feedback control works in the brain? I would draw it as something like the following:

My (mildly oversimplified) sketch of how homeostatic feedback control loops work in the brain. See text.

Let’s go through it. I drew “Learning Subsystem” and “Steering Subsystem” boxes, as I am wont to do, see discussion here & here. The red circle is an example of what SMTM would call a “governor”.

(a) The bottom-left input is one or more “ground truth” signals from the body—bladder stretch receptors or whatever. For other governors, these signals might be detecting blood osmolality, temperature, energy reserves, salt levels, or whatever else.
(b) The bottom-right output is saying that the governor probably has one or more direct “involuntary” motor outputs to the body, related to the kinds of homeostatic actions that are generally outside conscious control—maybe relaxing the bladder wall in this case. For other governors, these signals might be adjusting nonshivering thermogenesis, vasoconstriction, goosebumps, hormone levels, and so on.
(c) The top-left signal is an interoceptive sensory input, leading to conscious awareness of a feeling of having to pee. (More discussion here.) This particular signal probably lands in the insular cortex, which houses a “primary interoceptive cortex” much like the “primary visual cortex”, “primary auditory cortex”, and so on.
(d) The top-middle signal is a motivational signal, which leads to displeasure related to the feeling of having to pee, and pleasure when that feeling is relieved. It probably lands in one or more parts of the basal ganglia.
(e) The top-right signal is an involuntary attention control signal, which makes it difficult to think about anything else besides the need to pee.
(f) The purple arrow labeled “motivated behavior” is where the person eagerly seeks out a bathroom, thanks to the motivations stemming ultimately from (d) (but signposted by (c), to make the learning algorithm’s job easier).

Some more commentary:

Conscious awareness is in the top box (“Learning Subsystem”). In terms of Hume’s is-ought distinction, (c) is “is”, while (d) is “ought”.
(d) and (e) can kinda work at cross-purposes: if I’m anxious, then it’s unpleasant to think about all the different terrible things that are going to happen to me. So (d) is saying: stop doing that! But then (e) laughs and says “lol, nope”. (More discussion here.)
While (b) and (f) are both behaviors that subserve homeostasis, they’re very different! (f) leverages our life experience and intelligent understanding of the world, whereas (b) is basically an innate reflex.
I’m leaving some things out for simplicity; also, some of these ingredients may be absent or inactive in certain cases. Multiple “governors” can interact with each other, or one “governor” can consist of multiple interconnected cell groups, etc.
The distinction between “voluntary” and “involuntary” is often conflated with the distinction between “ego-syntonic” and “ego-dystonic”, but they’re actually quite different, cf. my handy diagram ↓. “Voluntary” means that it’s controlled by reinforcement learning, and happens if and only if the idea of doing it is positive valence (a.k.a. motivating). “Voluntary and ego-dystonic” means that we do find it motivating, but we wish that we didn’t.

OK, so that’s my view on feedback control in the brain.

As for other things: What about emotions? “Emotion concepts” are their own thing (see Lisa Feldman Barrett versus Paul Ekman on facial expressions & basic emotions), but “emotions” (as I use the term) more-or-less refer to innate reactions in the hypothalamus or brainstem. And then “moods” are the special case where it involves a state variable in the hypothalamus or brainstem persists over minutes-to-hours. These can have any number of triggers and any number of effects, which might be (but need not be) related to one or more homeostatic feedback control systems.

2. PART II – MOTIVATION

SMTM talks about how “governors” (feedback control systems) compete with each other for control. SMTM doesn’t make very strong claims here (“We can’t say exactly how the selector works, there are too many mysteries, lots more work to be done, a lot of possible lines of research.”). But I have some skepticism about what they do say.

2.1 Do governors vote for different voluntary actions?

My answer is “at the end of the day, yes, but not in the specific mechanistic way that SMTM is proposing”.

Instead, I think the path is

(Step 1) “governors” contribute to the ground-truth reinforcement-learning (RL) reward signals,
(Step 2) the RL reward signals guide voluntary actions and behaviors.

I’ll defend this position when we return to this topic in §4 below.

2.2 Is there a “gate” that stops votes below a minimum threshold?

My answer to that question is: basically no. But a kernel of truth in the vicinity of this idea is: a plan needs to be motivating (positive valence) for you to do it. If you’re in deep clinical depression, every thought and plan is negative valence (see here). This means: no voluntary motor control, no voluntary attention control. You’re lying in bed, unable to move or even think.

But most of the examples that SMTM invokes for their “gate” theory are not like that. Consider their example: “If you are a tiny bit hungry, you shouldn’t bother leaving the house to get a meal, even if there is nothing better to do.” I don’t think “gates” are the right way to explain that. Instead I would say: (A) going to get a meal is a plan, (B) staying home and watching TV is also a plan. Your brain compares those two plans. They each have advantages and disadvantages. For example, we generally have an innate drive to minimize energy expenditure (unless we’re in a restless, fidgety mood, in which case it flips sign), and that innate drive is a pro tanto reason to prefer (B) over (A). If you’re hungry, then hunger provides a pro tanto reason to prefer (A) over (B), but it’s a very small thumb on the scale if you’re only slightly hungry, so other considerations will win out. That’s not a “gate”, it’s just a matter of having better things to do.

2.3 How do we think about “executive function” and “self-control”?

SMTM indulges in an extremely common intuition in which “executive function” is a kind of agent, and this agent possesses a superpower called “willpower” (a.k.a. “free will”), and it deploys that superpower in battle with other forces called “urges”. However, this whole way of thinking is cursed nonsense—it’s a deeply misleading perceptual illusion, hiding an underlying picture that is much simpler: willpower is not a magical force, and the brain is just running a step-by-step algorithm involving innate drives and reinforcement learning, all the way down.

However, after relying on these cursed intuitions for our entire lifetime, it takes a lot of work to untangle them. I have attempted to walk readers through this process in my post [Intuitive self-models] 8. Rooting Out Free Will Intuitions.

SMTM, to their credit, hints that something closer to my view might be the right answer, even if they haven’t fully sorted it out for themselves: "Eventually we may discover that what appears to be “self-control” is actually just the combined action of social emotions like shame. It may be that there is no such thing as an executive function, and what feels like self-control is really the result of different social emotions, the drives to do things like maintain our status or avoid shame, voting for things that are in their interest…” Yup, that’s the right idea.

2.4 Is suffering good when it’s under your control?

SMTM asks why people seek things like “roller coasters, saunas, horror movies, extreme sports, and even outright suffering”. Their answer is that “suffering” is only bad when it's not under your control.

I think I would just say that none of these had anything to do with suffering in the first place. People do these things because they enjoy them. And why shouldn’t they enjoy them? Whoever said that roller coasters shouldn’t be enjoyable? That’s their whole point!

More specifically, I think there’s a thing where physiological arousal can by itself cause pleasure under certain conditions, as described by words like “exciting”. I think it’s partly related to play drive (see A Theory of Laughter for mechanisms and evolutionary explanation). To be clear, physiological arousal can also cause displeasure under other circumstances (cf. “feeling overwhelmed”). It’s complicated.

So it’s not about being in control, and it has nothing to do with suffering. It’s just that pleasurable things are pleasurable, and things can be pleasurable for all sorts of non-obvious reasons.

3. PART III – PERSONALITY AND INDIVIDUAL DIFFERENCES

I have nitpicks about a bunch of the details, but one of the big ideas here is that personality differences are often related to the strengths of various innate drives. I enthusiastically agree with that part, and bring this up often myself (e.g. here, here).

I think their discussion of autism and psychopathy is a bit off—for my take, see shorter version here, or longer version here with psychopathy at the end.

4. PART IV – LEARNING

Here SMTM comes out against the ideas of “reward function” and “value function”. Well, I like those ideas, although I don’t think they manifest in the brain in exactly the same form as they do in the AI literature. Instead, SMTM proposes a system where governors vote directly for plans and ideas.

By contrast, copying from §2 above, my preferred model says:

(Step 1) “governors” contribute to the ground-truth reinforcement-learning (RL) reward signals,
(Step 2) the RL reward signals guide voluntary actions and behaviors.

Some relevant evidence that (IMO) support my model would be:

(1) See the literature on “hedonic interface theory”, e.g. Anthony Dickenson’s watermelon story that I copied here.

(2) Sometimes I don’t drink water now so I won’t have to pee later. Is the need-to-pee governor supposed to figure out that that’s a good idea? But the need-to-pee governor is right now in a state where the error signal is zero, so shouldn’t it be inactive? (Maybe I’m misunderstanding this part.)

(3) We learn and know a lot of things that are not tied to any particular governor, e.g. how to spell words.

(4) …Honestly, the thing is, I think I have this great nuts-and-bolts model of what the cortex does, and striatum, and hypothalamus, etc., and it all fits with my understanding of algorithms and evolution and everyday life and neuroscience data etc., and I’m reluctant to drop that model in favor of this alternative model that I find vague and confusing, and that IMO involves kinda personified components that I am unable to flesh out into nuts-and-bolts algorithmic steps.

One more specific dispute is whether there is such a thing as “value” and “reward” in the brain. I think there is—I call it “valence guess” and “actual valence” respectively in this diagram (well, the correspondence isn’t perfect, see also here). SMTM offers some arguments against this, but I don’t find them convincing.

For example, SMTM writes: “…And there is nothing at all that is always ‘rewarding’. Your first donut after a long day at work will be rewarding, but by the 10th donut you will start to find donuts ‘punishing’.” But that’s not an argument against “reward” or “value”, but rather against the assumption that “reward” and “value” are real-world stuff, as opposed to signals in the brain.

It’s not a paradox to say that eating-donuts-when-I'm-hungry is rewarding and eating-donuts-when-I'm-full is not, just like it's not a paradox to say that chocolate donuts taste good but mud donuts taste bad. They're two different things!

More generally, in brain planning, value is assigned to consequences as much as to actions. “Feeling the relief from cold” is an expectation, just as “seeing the lever go down” is an expectation. Expectations can be motivating. Our world-model is sophisticated enough to predict that, when I'm cold, pressing the lever will lead to feeling-of-relief-from-cold. And our value function is sophisticated enough to recognize that feeling-of-relief-from-cold is good, based on past experience.

Again, I think SMTM is anchored on a specific conceptualization of how reinforcement learning works, based on how it works in AI today, whereas I think of reinforcement learning as a broad paradigm with many yet-to-be-invented variations. See more of my RL-related opinions here & here.

5. PART V – DEPRESSION AND OTHER DIAGNOSES

Again, my take on clinical depression is that the proximal cause is that every thought seems bad (negative valence), so the thought gets thrown out and the brain goes fishing for a new thought, but that one seems bad too, and so on. In the worst cases, you wind up laying in bed, unable to do any voluntary attention control or motor control at all.

That’s the proximal cause. What’s the upstream cause? I don’t really know. I think it can be lots of things. For example, if you have sufficiently severe OCD, then maybe 50% of your thoughts are “I’m going to not wash my hands now, and then get sick and die”, and the other 50% of your thoughts are “I’m going to wash my hands now, thus exacerbating how my OCD is ruining my life and relationships”, and so 50%+50%=100% of your thoughts are negative valence.

Anyway, SMTM’s proposal is that “depression [is] a disorder of negative-feedback-loop emotions”. I guess that’s not mutually exclusive with what I wrote, but I couldn’t really follow their argument (probably because I don’t sufficiently buy into their detailed picture), so I’m not sure.

One of their claims is that “everything is going great” is a risk factor for depression, I guess? “Some people have such perfect control over their life that they never meaningfully get hungry, thirsty, tired, lonely, cold, etc. etc. This sounds good, great even. But in fact the person ends up very depressed, for a simple reason. Nothing is actually wrong with this person, there’s no damage, it’s not even really a malfunction. But the fact that they almost never correct major errors means they very rarely produce any happiness.”

Well, I’m skeptical that “everything is going great” is in fact a risk factor for depression. I’d be interested to see statistics that back that up.

My understanding is, quite the contrary, that an episode of depression is often triggered by something going wrong, like the death of a loved one. That event naturally makes the person feel very miserable. And then instead of recovering, they spiral into a months-long depressive episode. Someone can correct me if I’m wrong.

6. PART VI – CONFLICT AND OSCILLATION

I think anxiety (a.k.a. stress) is just another innate reaction with all the ingredients in the diagram of §1 above: a conscious feeling of anxiety, involuntary attention to that feeling, negative valence, and involuntary actions like physiological arousal. What about the input? What triggers it? I think it’s an expectation that something bad (displeasure / negative reward) is going to happen. Or something like that—I haven’t tried to write out the relevant circuits in detail, although I’m optimistic that I could if I tried. I should do that sometime.

SMTM’s account is kinda more specific and complicated, and I don’t really buy it. Something about different governors duking it out. But I can also be stressed because I’m in a prison cell and the guard is going to come by this afternoon and beat me senseless. SMTM mentions this at the end, but only as a funny edge-case, whereas I see it as a central example that incidentally explains everything else.

7. PART VII – NO REALLY, SERIOUSLY, WHAT IS GOING ON?

SMTM makes an eye-catching claim in this post: “The really weird implication here is that a faster form of exposure therapy might be to get you to freeze and cower, and then actually hurt you! This would hopefully teach your fear and pain governors that ‘freeze and cower’ has negative value, and it should get you to that conclusion faster. That said, the ethics of actually harming your patients seems questionable.”

Ethics aside, I predict that this would not work.

Anyway, I think everything in SMTM’s post 7—at least the parts of it that I skimmed—are downstream from parts of their model that I already don’t buy, particularly the way that SMTM is thinking about these governors and how they are updated in the course of learning. See above; I don’t have much to add here.

7½. INTERLUDE – I LOVE YOU FOR PSYCHOLOGICAL REASONS

This post is kind of a digression into how cybernetics fits into the history of psychology.

I wonder whether SMTM is overstating the extent to which behaviorist psychologists were ignorant of innate drives (which I believe the behaviorists called “primary rewards”, “primary reinforcers”, and/or “primary punishers”)? But I dunno, I don’t really know the history.

Again, I join SMTM in advocating that experts spend more time studying innate drives in general (and social innate drives in particular), as mentioned above (see e.g. here). In particular, I find it very frustrating that almost 100% of the neuroscientists with a knack for algorithms spend their energy studying the cortex, thalamus, and basal ganglia (which would unlock powerful and dangerous AGI), and don’t give a crap about the hypothalamus and brainstem (which would be helpful for Safe & Beneficial AGI). Speaking of which:

8. PART VIII – ARTIFICIAL INTELLIGENCE

SMTM proposes that these kinds of feedback control systems offer a safe way to make powerful future AI. I disagree.

Let’s start with a simple positive case that AI will be dangerous:

The path we’re heading down is to eventually make AIs that are like a new intelligent species on our planet, and able to do everything that humans can do—understand what’s going on, creatively solve problems, take initiative, get stuff done, make plans, pivot when the plans fail, invent new tools to solve their problems, etc.—but with various advantages over humans, like the ability to make copies of themselves and think at superhuman speed.
Nobody currently has a great plan to figure out whether such AIs have our best interests at heart—not at the designing & programming phase, nor while they run. We can ask the AI, but it will probably just say “yes”, and we won’t know if it’s lying.
The path we’re heading down is to eventually wind up with billions or trillions of such AIs, with billions or trillions of robot bodies spread all around the world.
It seems pretty obvious to me that by the time we get to that point—and indeed probably much much earlier—human extinction should be at least on the table as a possibility.

SMTM’s solution is vaguely like: “well, what if we make AIs that don’t work tirelessly, creatively, and autonomously towards ambitious long-term goals?” Umm, OK. I’m sure that we will make such AIs, and indeed we’re already doing so. But people will also make the dangerous kind of AIs.

A few more specific issues that SMTM seems to be pointing at:

Maximization versus minimization. But that’s kind of irrelevant, because maximizing X is equivalent to minimizing –X, or 1/X (for X>0).

Open-ended problems versus solvable problems. Intuitively, “put three paperclips in the bin” is a solvable problem, whereas “make as much money as possible” is an open-ended problem. Two issues are: (1) it’s possible to try to make an AI whose motivations are not open-ended, but nevertheless make one with open-ended motivations by accident, for various reasons including the alignment problem, (2) there are clearly some people who will explicitly try to make AIs with open-ended motivations, even if you and I think it’s a bad idea.

Convex optimization versus non-convex optimization^[4]. Convex optimization is things like optimizing stock at a warehouse. Too much or too little is bad, right in the middle is perfect. Innate homeostatic reactions are like that: you want just the right amount of vasoconstriction, just the right amount of nonshivering thermogenesis, just the right amount of sweating, and so on, to optimally balance everything.

By contrast, non-convex optimization is things like making a billion dollars, or winning at chess. This involves planning, creative exploration of different strategies, instrumental reasoning, and so on.

I think SMTM is implicitly suggesting a convex optimization kind of picture: “Reward maximizers are always unstable. Even very simple reinforcement learning agents show very crazy specification behaviors. But control systems can be made very stable.”

…But people want to make the kind of AIs that can earn a billion dollars through creative out-of-the-box planning! They are trying to do that right now, and they have been trying to do that since the dawn of AI. And that’s the problem that I want to talk about.

Lazy AI versus indefatigable AI. If you imagine some achievable goal like “improve the solar cell design”, you might imagine the AI spending a reasonable amount of time and resources to do reasonable things, like what humans might do—run simulations, use the available lab space for experiments, etc. If it finds a decent design, then it will submit it and declare victory. Whereas you might not imagine the AI “going hard”—making as much money as possible to buy more experimental facilities, hacking into competitors’ systems to steal trade secrets, preventing the humans from ever turning it off so that it has more time for experiments, and so on.

Humans generally don’t “go hard” in this sense for various reasons, including a desire to follow norms, but also just laziness! Good enough is good enough.

So why not make AIs like that? Two problems.

First, the competitiveness problem. If I make a lazy AI that does straightforward things then quits, then good for me, but the person down the street may well make an indefatigable AI, because that’s a better way to earn money and do impressive things. (This comes with the price of risking catastrophic escape and takeover. But people have bad judgment! It’s going to happen sooner or later.)

Second, the reflective stability problem. If I make a lazy AI, it might just comment out the “laziness” part of its code, or spin off a non-lazy copy of itself. This doesn’t require much effort, and it leads to marginally better results (from the AI’s own perspective), so why not? It’s a bit like a person choosing to take Adderall.

…It’s also worth remembering that the absurdity heuristic for AI goals and motivations is a bad guideline. For example, fun fact, I’ve already eaten 40,000 meals in my life. But I still want to eat another meal right now. Isn’t that absurd? Isn’t 40,000 meals enough already? And yet, apparently not! So by the same token, if an AI has a 23% efficient solar cell design, is it really going to want to work tirelessly and ruthlessly, perhaps wiping out humans in the process, for a chance to get the efficiency up to 23.1%? It sounds absurd from our perspective, but it’s entirely possible!

8.1 Do humans do “instrumental convergence” stuff?

SMTM writes: “instrumental convergence is not a problem for cybernetic agents”. I disagree.

How would one construct a counterexample? Easy. Recall that instrumental convergence is what naturally happens when an agent cares about the state of the world in the future. So let’s put a human in such a situation!

Meet Joe. Joe is a smart ambitious guy who gets things done—his day job is as a startup founder, and he “unwinds” by welding and programming robots in his garage, running a community campaign to fix the playground, etc.

A mob boss has just made a highly credible threat to murder both of Joe’s young children, exactly 10 years from today. So now Joe cares very much about the state of the world in the future—he wants his children to live.

Accordingly, you’ll find that Joe does most or all of the “instrumental convergence” things in response. He will try to accumulate flexible resources like money (to hire security staff), and friendship with the police chief and local government officials. He will try to not die for those ten years, so he can be around on the critical day. He will try to not get brainwashed into not caring about his children, and more generally will be motivated to never lose sight of what he’s trying to do. For example, if he learns that Buddhist enlightenment, or acid, would make him feel less ambitious, and more content to let the world be as it is, then he’ll stay the hell away from Buddhist enlightenment and acid! He will increase his own knowledge and capabilities, such as an ability to learn fast and think on his feet, to the extent that this is possible. Etc.

…So there’s nothing special about “cybernetic agents” like humans that prevents them from feeling the tug of “instrumental convergence”. Rather, most humans don’t aggressively seek power and influence most of the time because they have innate social and moral drives that make them want to follow norms and fit in. (And let’s remember that Hitler and Stalin were cybernetic agents with innate drives too.) By the same token, the fact that (IMO) future powerful AI will have innate drives doesn’t mean everything is fine. It means we have our work cut out to figure out exactly what innate drives to put in.

9. PART IX – ANIMAL WELFARE

SMTM suggests (IIUC) that feedback control loops imply the possibility of pain and suffering, and moral consideration. I don’t buy it. (My opinion about consciousness is hinted at here.) I wonder whether SMTM would ascribe moral consideration to a bimetallic strip thermostat?

10. PART X – DYNAMIC METHODS

Here SMTM proposes experiments to sort out what the innate drives are and how they work. I’m very enthusiastic about this goal, as I keep mentioning. But I think actually sorting these things out experimentally would be much harder than SMTM would have us believe.

For many, perhaps even all, of their proposed experiments, I picture myself seeing the future results, and I ask my future self whether he will change his beliefs about what is or isn’t an innate drive on the basis of those results. And the answer keeps coming back “no”.

For example, SMTM proposes to see whether people prefer iodized versus non-iodized salt. Suppose they do. Does that mean they have an innate “iodine governor”? Well, maybe. Or maybe the two types of salt taste slightly different, and one kind of salt subconsciously reminds people of the taste of their favorite regular pizza place.

Ah, but what if the experiment shows that people prefer iodized salt more when they’re iodine deficient? Hmm, well, I guess that would be a bit more convincing, but I still have questions. How were the people made iodine deficient? Couldn’t that interact with what they feel like eating in other ways besides the iodine per se? Might those other ways be modulated by their life history of iodized salt correlating with some food types more than others? Or here’s another thing: maybe iodine deficiency causes general mild bad feelings or lack-of-energy or something, and maybe those symptoms go away after eating iodized salt, and maybe the brain has, over the course of life experience, noticed that delayed correlation and thus started to associate good vibes with the taste of iodine in conjunction with correlates of iodine deficiency? That story would be a bona fide iodine-deficiency-dependent preference for iodine, but developed through pure within-lifetime learning, with no innate iodine governor at all.

(And conversely, if these kinds of experiments reported a negative result, I would still be totally open-minded to the possibility that there is in fact an innate iodine drive. Maybe it’s weak! Maybe it’s only present during childhood! Maybe it’s overruled by various learned taste associations! Who knows?)

That’s just one example. Everything was like that.

…But maybe the more damning piece of evidence is that when SMTM actually tried to reason from everyday observations and common sense about which innate drives do or don’t exist, many of their conclusions were things that I happen to disagree with!

So, what kind of evidence would convince me that such-and-such innate drive (“governor”) actually exists? Ooh, I know one thing! I would be convinced by a nuts-and-bolts hypothesis wherein:

there’s a particular cell group, probably in the hypothalamus, doing a particular thing,
this hypothesis is algorithmically plausible, neuroscientifically plausible, evolutionarily plausible, and compatible with everyday experience,
this hypothesis is borne out by targeted neuroscience experiments, especially neural tracer (or connectomics) studies that prove that the cell group has the inputs and outputs posited by this hypothesis, and/or optogenetic studies, lesion studies, or other observations and manipulations of this cell group showing results consistent with this hypothesis.

This is a high bar! (It’s sufficient, maybe not fully necessary?) But there are in fact things that pass this bar! Two examples are: The literature on AgRP/NPY neurons in the arcuate nucleus of the hypothalamus (see §1 above, also here), and the recent Liu et al. 2025 study showing a rodent “loneliness” homeostatic feedback control circuit.

…And as a possible future third example, in A Theory of Laughter, I’m at least part-way there, with a nuts-and-bolts hypothesis for laughter and play drive that more-or-less passes the first two bullet points. As for the third bullet point, in §5 of that post, I describe a specific neural tracer experiment that I think would pin down the exact cell group that plays a key role in this alleged circuit.

I also hope that the post Neuroscience of human social instincts: a sketch can eventually lead to more examples—but for that one, I haven’t quite pinned things down in enough detail to propose an experimental test, so far.

11. PART XI – OTHER METHODS

More on possible experimental tests. Ditto previous section.

12. PART XII – HELP WANTED

More proposed projects. Again I strongly endorse the broader idea of reverse-engineering innate drives, even if I question some of these detailed proposals.

Good on SMTM for setting their sights high and thinking big thoughts! I think they got some important things right, and I’m hopeful they can keep iterating their way forward.

^{^}
To make a long story short, I think the danger comes from figuring out exactly how the cortex and thalamus (and to a lesser extent, basal ganglia) work. Whereas this SMTM series has (IMO) more to do with the hypothalamus, which is mostly great to reverse-engineer, from my perspective.
^{^}
The pathway might be AgRP/NPY neurons → parabrachial nucleus → insular cortex maybe?
^{^}
The pathway might be AgRP/NPY neurons → parabrachial nucleus → VTA maybe?
^{^}
h/t @tailcalled

[-]Jonas Hallgren8mo10

I'm not sure of my reading of this was correct but I will describe it below in the hopes of someone engaging with me and telling me if I'm wrong or not. (Low confidence in the following claim:)

This view seems like what you would have if you thought that P(y|x) would be here:

y is what the brain is doing as an agent and x is the prior which in this case is that the human body is a homeostatic control system trying to minimise the time it spends out of balance then it seems you might converge to a view like this?

There's also a couple of ther assumptions around:

Fixed points within our evolutionary past not mattering as much as the functional of homeostatic control.
Emotions not being related to exploration somehow? (In that there's no innate positive drives?)
That the brain is a relatively straightforwardly coupled control system?

It seems a bit like taking dynamic programming and saying that it is what Reinforcement Learning is? You're missing part of the algorithm (the action part of the action-perception loop?).

(I also pre-emptively apologise for invoking Active Inference in a Steven Byrnes comment field)

LESSWRONG
is fundraising!
LW