Social agency

Elias Schmied

Crossposted from Substack.

I wrote this three years ago, before becoming extremely depressed and developing a lot of aversiveness around it (even though I had gotten a bunch of positive feedback). As a result, it’s a bit “out of step” with the current state of the conversation, and the writing is not fully up to my current standard. I still believe the core idea could be very valuable though, and wanted to get it out there.

January 2023

This is a braindump sketching out a major change in intuition that I went through a few months ago, and that I would guess either hasn’t been experienced by most people who are thinking about AI or hasn’t been properly updated on. I’m not going to hedge as much as I naturally would, to get my point across. I have a decent amount of uncertainty of course, especially about the specifics, and I also barely know anything about the relevant fields.

Summary

There’s a model of how agency works that lots of people are explicitly or implicitly assuming that goes like “During the training of an intelligent agent, low-level reflexes generalize to heuristics, which in turn generalize to a general planning algorithm”. I believe this isn’t what happens in humans, but that “planning” is a bunch of superficial, distinct, socially learned behaviors itself, that are not learned primarily/at root through feedback about how well they fulfill your goals. I think this has some important consequences for thinking about AI - for example, it leaves us with no reason to think there is such a thing as a simple core of agency, and it leaves us less worried about inner misalignment, since sophisticated planning and reasoning is not acquired inaccessibly in the agent’s cognition, but learned by itself.

The minimal takeaway is that even if I’m wrong about my interpretations here, introspective evidence about cognition seems extremely neglected, and the fact that seemingly no one is having the debates I’m gesturing towards in this essay is crazy.

The common model

There’s a model of the emergence of general planning/agency that goes something like “Low-level reflexes generalize to heuristics, which in turn generalize to a general planning algorithm”. I would guess that MIRI believes this, since Yudkowksy talks about “safe” or “unsafe” tasks (with respect to AGI arising) and about how humans “generalize” from the Savannah to the moon. Even the shard theory people, who in some ways define themselves as being contra MIRI, seem to believe that a general planning algorithm gets bootstrapped out of low-level motor command planning. I would also guess Steven Byrnes believes this (see below).

I don’t think this is what happens in humans.

My alternative

Here’s Steven Byrnes’ example of a “foresighted plan” (prinsesstårta is a type of cake, and the “plan” is to order it)^[1]:

This is framed as the brain planning using its self-supervised learned world model. But what I think is actually happening is that Steven has a socially learned association between being hungry / thinking about food and ordering food far in advance. (I could also imagine there being a self-image of being someone who treats themselves sometimes, or someone who is disciplined/rational enough to pursue delayed gratification - there’s a lot of possibilities). I’ve literally never thought about ordering food a week in advance, even though I’ve enjoyed cake a lot too - it’s not a socially learned affordance to me.

Calling this “planning through a world model” stretches the concept for me. It’s a much smaller world model than is portrayed here, namely (I would guess) only eating the cake is viscerally modeled and then there is a socially learned belief-about-concepts/vocalization/story of “If I order food, food will arrive in a week”. (plus imagining eating the cake / generally being hungry being associated with the behavior of ordering food).

It’s not clear to me how “order food -> food will come” is even supposed to be learned by the brain’s self-supervised learning/predictive processing or RL. The prediction error/reward comes in a week after the prediction. And if it’s somehow deduced from higher-level knowledge about the world - how did that get learned? I think this is called the “temporal credit assignment problem” in RL and neuroscience (how do we correctly identify and reward the actions responsible for long-term outcomes?) - I guess my thesis is that there is a simple explanation which fits the evidence better, which is that it actually doesn’t get solved, and humans don’t viscerally model the wider world.

I’ve gotten into the habit of trying to model what’s going on when I experience an impulse for an action or reasoning step that could be interpreted as ”long-term planning”, and it seems to me that it’s all actually just a bunch of superficial, distinct, socially learned behavioral patterns, rather than any planning through a world model or any general/sophisticated heuristics for accomplishing long-term goals (In the domain of long-term planning, to be clear. Obviously we have a bunch of very general heuristics for navigating our immediate physical and social environments).

An uncontroversial example is when the average person is getting close to finishing high school and (say) starts thinking about which college they want to go to - clearly they are only in a eyebrow-raisingly concept-stretching way “planning to optimize for their long-term goals” - they are looking at colleges because they feel like they have to because that’s the normal thing to do, or because they don’t have an internal affordance to do anything else, or because it feels like that’s what you do if you’re the kind of person they want to be. ^[2]

So in that case, it’s probably intuitive. But I think all human behavior is like that, just in more subtle ways.

For example, on the more sophisticated end of the scale, a very agentic person with good epistemics became this way not because they’re smarter. In the best case, they have a mental motion of paying attention to small doubts, biases and known failure modes of what they are doing - but this has been internalized via an escalating internal social desire to be a smart, diligent and exceptional person (probably combined with more specific memes), not in any direct way because they’re more intelligent (of course, being more intelligent helps with learning in general, but it’s not the cause of learning any particular behavior). And in any particular case of this person planning ahead or contemplating a decision, they are internally (sequentially) applying some of their portfolio of self-socially learned patterns based on how much they get activated by the given mental context.

It would probably would be useful to add more examples, e.g. of someone reasoning about the wider world and making a decision, or of someone changing their mind, but this is already way too long.

Why do humans today look and behave so much like agents then? Why are agentic stories so easy to tell about our behavior? (“I want to get this job” etc). I think it’s that behavioral patterns that involved some goal-directed behavior got memetically selected for (assigned higher status through people acting with those patterns being more successful) over the past (tens of?) thousands of years, and so achieved higher rates of being reproduced through mimesis / imitative learning (and this process has probably intensified as people’s memetic environments became bigger and more interconnected - cultural FOOM). In other words, cultural evolution has preselected our behavioral impulses to be vaguely goal-directed for us.

More abstractly, I think the reason why agency arose out of deeply social animals is that your reward signals being dependent on other agents’ approval makes the behaviors that you can learn extremely variable, and allows selection among them to take place.^[3]

An example of such a very advanced mimetic role: the social role of the entrepreneur/founder (e.g. in Silicon Valley) gets you a lot of status if successful and intrinsically requires you to have a self-narrative of goal-directed behavior - in addition to lots of smaller behavioral patterns that help you succeed at founding companies that you get acculturated to (e.g. work hard, be flexible, push people, ask for help, solve problems).

Some arguments and intuition pumps

It fits the behavioral evidence much better: the deep irrationality, inflexibility, lack of agency and status quo bias of humans, the way ideology is immediately reified, the way that we change our mind mostly when something socially significant happens to us, the way that nonsocial low-level desires/goals don’t matter for our long-term goals (e.g. most drug addicts don’t long-term plan to get drugs). In retrospect, there was very clearly a constant slight doesn’t-quite-fit in the way that humans are modeled as “His goal was X, but he was biased in way Y” - at some point, if you keep adding on epicycles and supposed imperfections to the hypothesis that human beings algorithmically plan ahead, maybe there’s just no there there after all.
It fits the introspective evidence much better - my thinking about this only started because I was trying really hard to model myself, my desires, and psychology in general, around August and September 2022. All that I feel happening in me are simple behavioral patterns easily explained as triggered by my immediate mental context - I don’t feel myself (algorithmically) planning ahead.
A priori, we would expect the first, naturally arising, agent to attain this agency in the stupidest, most hacky way possible.
This hypothesis is strictly simpler because there is no additional cognitive process of planning or modeling the wider world, so insofar as we think it fits the evidence we should prefer it.
I would guess that humans provide an untapped wealth of evidence about cognition as well, not just alignment. In the MIRI framing, AGI is seen as so alien that evidence from humans isn’t worth much, and in the strains of thought around concrete ML safety, it feels like there is a disinclination towards speculative-feeling reasoning. Stepping back, it actually seems very weird that people aren’t basing their models of general intelligence/agency/etc on the one example that we have immediate and introspective access to, and a priori I think we should expect a community of such people to be missing something really important.
[Added in 2026]: Clearly, current LLMs are in an important way agentic - they pursue coherent goals in e.g. complicated coding tasks, think of new options to try, decompose tasks into subtasks - but simply via imitating human planning patterns (and the right ones among those being reinforced in post-training), not because there was any simple core of agency or sophisticated planning algorithm that was found.

Some (oversimplifying) catchphrases:

Planning and reasoning is behavior, not cognition. (socially learned behavior, to be exact).
Agency is first a behavioral, not an algorithmic/functional/internal (?), property of a cognitive system.
Human long-term planning and agency was bootstrapped out of local/internal social-symbolic maneuvering.

Miscellaneous comments and caveats

This misconception can probably also be seen as a consequence of the intelligence fetishism of rationalists and nerds in general. There’s probably something to be said about enactivism, and a more mundane conceptualization of intelligence as the ability to adapt to one’s environment or something.^[4]
It feels like there is in some way a deep philosophical mistake being made in thinking that modeling the world or general planning, on the algorithmic level, is at all tractable for the brain’s learning algorithms. It seems like people mostly don’t realize that there even is another option? In retrospect, it’s very clear how the story of “the brain learns to move its limbs, then how to affect its immediate physical environment, andthenitsuddenlymodelstheentirerestoftheuniversedontworryabouthowthishappens” functioned as a semantic stopsign / fake reified abstraction / the part where my eyes glazed over in my models.
A less radical / more incremental way of putting all this might be: Long-term planning, credit assignment on very delayed reward, and modeling the wider world is computationally intractable, and so the agentic behavior of cognitive systems is much more determined by the learned “prior”^[5] about what kind of actions to take, what kind of stories to act out (most importantly in practice, “this is the kind of plan that a helpful hard-working AI assistant would come up with”). We can also see this introspectively in the example of humans. So the update I am talking about is simply one of downgrading the influence of competence/feedback from reality/goal-directedness/instrumental rationality/instrumental convergence in stories of how powerful cognition would work, and upgrading the importance of the learned superficial agentic behavior/internalized prior over plans/”just do the thing you were trained to do” (this is very vague, I’m sorry).
Again trying to be properly nuanced (and stepping out of the non-hedging intuition-pumping mode the rest of the post is in), humans definitely show some interaction (more than a naive application of the ontology I’ve expounded here would imply) between the social behavior and the visceral, “small” world model. E.g. the self-narratives that define or deeply influence our social behavior also use words that refer to things in our visceral world model. So some version of the “we are planning for outcomes” story survives - it would be absurd to claim otherwise. E.g. we can imagine how having our dream job would feel in the moment, get motivated by it, and backchain to get a long-term plan by filling in the rest of the story. Some version of “general planning” gets constructed out of the building blocks of social behavior. So there is a lot more nuance, of course - my point is just that framing it as “social stories first, general-ish planning out of that”, as opposed to “general planning, distorted by social stories”, overall requires less ad hoc modifications to fit reality.
Another example (to not come across like a crank, which I’m somewhat worried about): I would never say something as unnuanced as “the classic paperclipper conception of a rogue AI could never exist because powerful AI will have high-level humanlike stories“. That would be completely insane - even humans, over the course of their lives, often get grinded down to care more about their base desires (comfort, sex, power) and less about the abstract high-level stories that originally drove them when they were young. Getting feedback from the world can still make you shift your high-level stories to accord more with your low-level reflexes. All the same things are still possible, instrumental convergence remains a very useful concept. All the same worries remain - I am just trying to shift you smoothly, give you a different “lense” to incorporate into your portfolio, not say anything radical/insane.
There is probably a lot that could be said about precursors to this - enactivism, the Cultural Intelligence Hypothesis, this Scott Alexander post on Janus’ Simulators, this and this hit some of the same notes. This still feels pretty different from anything I’ve seen before though, with my focus on planning and introspective evidence.

Possible consequences

Some possible implications/updates (which I haven’t thought that much about):

More important:

Nothing left to elevate the hypothesis of a simple core structure of general planning or agency to our attention.
As a consequence, a much lower probability of agentic behavior “emerging” without it being in the dataset or directly rewarded in training.
Downweighting cognition’s power in general (for threat models etc), although that might be more specific to me since I was overvaluing intelligence before.
Shorter timelines since humans seemed much smarter before this change in intuition, but also maybe more specific to me as above.
I’m confused by the implications for takeoff speed, and shouldn’t put in more thought before finishing this, but probably the tendency is slower.
Somewhat lower concern about inner misalignment, because if we think reasoning/planning is a behavior, it seems much more likely that it will directly be given feedback on in the training process (as is basically already done in post-training) - and this feedback won’t be internalized as low-level, “motor” constraints on some kind of general planning module which will surely figure out a way to outmaneuver them, but the feedback will shape the reasoning, agentic behaviors and wider world model themselves - because they’re happening out in the open. (cf Externalized Reasoning Oversight?).
On balance, probably lower p(doom)?

Less important:

Situational awareness seems somewhat less natural than before, if “knowledge” like that needs to get internalized specifically, instead of being deduced through reasoning somehow.
An even higher probability (although it was already pretty high) of LLMs being involved in powerful AI, since a lot of agentic behavioral patterns (ones that systematically lead to a bigger effect on the world) are present in language.
The long-term goals emergent out of the agentic behaviors of an agentic AI could be completely unrelated to low-level reflexes (=nonagentic behavioral patterns), because in this framing they’re just separate behaviors. (for example, no human long-term plans to get candy or sex without a supporting socially learned story, and lots of humans don’t do it at all).^[6]

Conclusion

This all feels quite important to me and like a lot of people might be confused about it. It’s not clear to me how much of this people already know or not, how much they “know” on some level but haven’t internalized and propagated to other beliefs, how much they have thought about it and disagree, how much they haven’t thought about it, etc.

^{^}
[Added in 2026]: I see now that Steven has updated the post a few times over the last few years, so quite possibly he appreciates my points here more nowadays.
^{^}
(again, for the average person - obviously there are some people who do some amount of backchaining).
^{^}
I could imagine an animal just as smart as humans, with learning algorithms just as good, but with less hardcoded social reward - I would guess they would just get very good at moving through their immediate physical environment and meeting their hardcoded needs, but would never ever develop what we would call “general” planning or agency (cf this famous paper that argues that chimpanzees actually are this (although I’m skeptical), and is generally the closest thing to my theory here I’ve found).
^{^}
see also “realism about rationality”. Also, note that I consider all this pretty orthogonal to the debate around whether human intelligence (as in, the capacity to learn to do tasks competently or something) is general or a bunch of specialized hacks - it seems like the former is likely right - I’m talking about agency, how you get from intelligence to long-term planning.
^{^}
Thanks to Quintin Pope for inspiring this way of framing it.
^{^}
In other words, shard theory is exactly wrong.

I agree that the social world is usually very very important for (1) making options salient, and (2) making options seem appealing, and (3) providing evidence about the consequences of different options. I think that’s the kernel of truth that this post is gesturing at.

But I think you’re taking that observation WAY too far.

In particular, the social world is not REQUIRED for any of those three things.

For one thing, if people learn planning from other people, where did it come from in the first place? Somebody had to have been the first, right?

For another thing, sometimes people do quite unusual things in the effective pursuit of goals. E.g. Jeff Bezos founded Amazon in order to get enough money to pursue his real dream of running a space exploration company. Who would he have learned that from?

(I think some people are more motivated by following norms than others. Sociopaths, autistics, and “high-agency people” would typically be on the lower end of norm-following motivation, so I would look there first to find especially clear-cut evidence of non-social agency.)

For another thing, if you take someone’s general advice (say, they counsel “it’s better to ask for forgiveness than permission!”), and then next week you end up humiliated and with a painful broken arm and giant hospital bill, aren’t you marginally less likely to follow that same heuristic in the future? Conversely, if you adopt their general advice and then next week you end up with a proud new accomplishment under your belt, aren’t you marginally more likely to follow that same heuristic in the future? Obviously yes, right? So doesn’t this constitute “[learning] through feedback about how well they fulfill your goals”?

“Low-level reflexes generalize to heuristics, which in turn generalize to a general planning algorithm”… I would also guess Steven Byrnes believes this (see below).

No, I think I’d mostly disagree with that statement. I think planning is basically innate, although it’s augmented by a lifetime of learning how to plan better (e.g. you can learn metacognitive heuristics from experience, or from reading a book etc.).

It’s not clear to me how “order food -> food will come” is even supposed to be learned by the brain’s self-supervised learning/predictive processing or RL. The prediction error/reward comes in _a week_ after the prediction. And if it’s somehow deduced from higher-level knowledge about the world - how did that get learned?

Obviously it’s a hard problem that AI researchers have not solved yet, but it’s equally obvious to me that a solution exists in the brain. It seems crazy to me to deny that. We make a zillion accurate long-term predictions about the world every day (e.g. “if I put on ripped pants right now at 8am, then my knees might get cold when I’m outside at 10pm tonight, and I know this because it happened to me yesterday”). We make way too many long-term predictions in way too many circumstances to have learned all of them from observing or listening to other people. And even the things that we did learn from someone else telling us, that person in turn had to have learned it somehow, and if we trace that chain back then it eventually has to end in somebody actually figuring something out by observing the world. Right?

Have you really never in your life figured out something like "X today implies Y tomorrow" all by yourself that you didn't learn from someone else??

I feel like I’m probably misunderstanding your position here, because it really seems crazy to me.

I’ve gotten into the habit of trying to model what’s going on when I experience an impulse for an action that could be interpreted as ”long-term planning”, and it seems to me that it’s all actually just a bunch of superficial, distinct, socially learned behavioral patterns, rather than any planning through a world model or any general/sophisticated heuristics for accomplishing long-term goals

Maybe you should read Cate Hall’s book when it comes out? :-P

OK here’s an example that I challenge you to explain: if I’m hungry, I might take a bus to the restaurant to get a slice of pizza, but if I’m not hungry, then I won’t.

The obvious explanation that I endorse is: when I’m hungry, eating pizza seems good and motivating, so I make a plan to eat pizza, and execute the plan. When I’m not hungry, eating pizza seems pointless or aversive, so I don’t.

By contrast, this seems impossible to explain in your framework. If I’m just copying people, how can that get linked to my own interoceptive sensation of hunger? That sensation is private to me, and other people’s sensations of hunger is private to them. There’s no SOCIAL logic behind connecting my own internal sensation of hunger to a plan-to-eat. Right?

Moreover, the plan to eat pizza is clearly “planning through a world-model”. For example, if it’s 4am and the buses aren’t running and the pizza place is closed, then I won’t try to take a bus to the pizza place. If there’s a wildfire blazing between me and the restaurant, then I also won’t try to go there. I will set out to the restaurant only if it seems like eating pizza is the plausible result of doing so. Because I want to eat pizza.

Of course, I’m not omniscient, and even beyond that, sometimes I “know” something but temporarily forgot about it. Like, maybe I forgot that the restaurant owner was on vacation. Oops. But that doesn’t undermine the idea that I am hungry, and trying to get pizza so I can eat it. The goal (eating pizza) is in my mind, and I am brainstorming how to make that goal happen. Right??

Anyway, I reiterate the first paragraph of my comment, that there’s a kernel of truth here, and that it’s very important, even if I think you’re taking it way too far.

Thank you Steven! Really appreciate the comment.

I feel like I’m probably misunderstanding your position here, because it really seems crazy to me.

Yes, I think you are, but I take a lot of responsibility for that :-) - the post is kind of a whirlwind, undecided between slightly different framings, simplified to intuition-pump better (as I say in the introduction), and not very optimized for comprehensibility. I tried pretty hard in the caveats section to make it clear that I'm not being this radical, but I was probably still not being loud enough about it. There is a lot of work left to be done for me to give better examples and intuition pumps, and to lay things out more.

For one thing, if people learn planning from other people, where did it come from in the first place? Somebody had to have been the first, right?

Well, no - that's like the old creationist argument that an eye couldn't have evolved from nothing because the eye is so complex. Complex machinery can still evolve gradually, even if it seems counterintuitive when you only see the final product. The many separate planning heuristics and long-term planning social stories can gradually culturally evolve from initial small random variation of social vibes or ideas. (and then also be improved on through individual internal cognitive processes, of course).

(and you agree with gradual emergence of it being possible anyway, I imagine, since you just think the planning algorithm evolved biologically instead)

(not sure how much you skimmed, more on this if you ctrl-F "mimesis")

(I think some people are more motivated by following norms than others. Sociopaths, autistics, and “high-agency people” would typically be on the lower end of norm-following motivation, so I would look there first to find especially clear-cut evidence of non-social agency.)

For the record, "following norms" (and also as you later say, "just copying people") is a much more impoverished notion of social agency than what I'm talking about. It's more about e.g. what's elevated to attention, subtle life stories or vibes that you saw somewhere (including ones that purposely break norms!), and the social tool of language providing a substrate for abstract ideas like long-term plans. I imagine you agree that sociopaths and autists still have lots of social RL reward in their heads (even if it's unusual in some ways), so they're not pure examples of some kind of hypothetical nonsocial humans.

I think planning is basically innate, although it’s augmented by a lifetime of learning how to plan better (e.g. you can learn metacognitive heuristics from experience, or from reading a book etc.).

Ah I see, maybe I should reread Brain-like-AGI safety (again, it's been a few years). After asking Opus 4.8 about what exactly you mean (a somewhat "deflationary" notion of planning, it says), I actually think what we're saying is probably compatible.

Maybe you should read Cate Hall’s book when it comes out? :-P

:P

Let's get into the meat of it - you mentioned a few different examples, let me zoom in where it feels most useful.

the plan to eat pizza is clearly “planning through a world-model”.

Yes, absolutely - let me say this very loudly. PLANNING THROUGH A WORLD MODEL IS REAL AND WORKS. :-)

It's just that this world model is substantively one of social stories, not of the physical world. (or less radically, you might say the abstractions used to plan through the world model are extremely socially mediated, and the more so the more sophisticated and long-term the plans are).

if you adopt their general advice and then next week you end up with a proud new accomplishment under your belt, aren’t you marginally more likely to follow that same heuristic in the future? Obviously yes, right? So doesn’t this constitute “[learning] through feedback about how well they fulfill your goals”?

Yes, absolutely. I should probably have written "primarily/at root through feedback about how well they fulfill your goals" - that's closer to my beliefs. Let me edit that, thank you.

Let me say this very loudly as well. WE IMPROVE AT PLANNING THROUGHOUT OUR LIFETIME, IN PART BY SEEING WHAT WORKS AND WHAT DOESN'T. :-)

Again, the caveat section goes into a little more detail here.

“if I put on ripped pants right now at 8am, then my knees might get cold when I’m outside at 10pm tonight, and I know this because it happened to me yesterday”

This is a really useful example, because I'd actually tend to agree with you here that this is nonsocial! It's a negative update about simple physical things in your environment (ripped pants, cold weather), and a simple physical action cognitively connected to them (putting them on) - pretty plausible to me that it works with no social involvement whatsoever.

(I would still quibble here that there's a possibility of involvement of a "responsible person social role" that activates when there's something that pattern-matches to "failure", and that searches internally for possible targets for a cognitive update to fix the bad "failure feeling" - but this is getting into the weeds a bit more, and I'm much more uncertain)

Now to our disagreements:

OK here’s an example that I challenge you to explain: if I’m hungry, I might take a bus to the restaurant to get a slice of pizza, but if I’m not hungry, then I won’t.

Okay, so I'll simplify this by removing the bus part, and having the plan only be "going to the restaurant" - I'd be saying very similar things twice otherwise.

First, the feeling of hunger triggers an impulse to somehow acquire food (I feel pretty agnostic on how much this is socially learned, seems like something in early childhood that I don't have good intuitions for).

Then, there is a possible action associated with "getting food" that pops up in your head: "I could go to a restaurant." Seemingly a medium-term plan. Now, where did it come from? You first learned about the concept of going to restaurants sometime in your childhood, from other people. You didn't come up with it yourself - you're following a social script. There's no backchaining or world-modelling in the full sense of the word happening here - it's just an if-then statement.

You might think that this is a minor definitional quibble. But the crucial point is that the substance of the cognitive work happened previously, in the macro-cultural process, not in you. The "modeling" (or rather, actual experiencing) of going to a restaurant having good outcomes, the learning process of absorbing it - it happened gradually in the culture, as restaurants became a thing, people internalized that it was convenient and good, and going to them became normal.

That's what I mean by "yes, you're planning through a world model, but it is a world model of social stories, not a model of the physical world".

Let's modify the example. Let's say you didn't know the concept of restaurants, and you're a caveman who just had a modern restaurant spawn a kilometer away from his cave. Let's also say you already have language, and it's been explained to you that you can get food there much easier than if you hunted and gathered, and that you magically 100% trust the source that explained it to you (let's say it's not a person, to make it as nonsocial as possible).

In my intuition... this would be different - even though, by assumption, you believe in the abstract concept of a restaurant in the same way, which would seem to suggest that the planning should be just as easy. It would feel importantly different from going to the restaurant today, and even from when you first did it as a child (when it felt more safe and normal than in this example, because your parents were doing it with you). That's because there's no social script for it - the cognitive work hasn't been done for you yet. It would be a nontrivially cognitively difficult process, with a little bit of internal friction, to take the leap to such a weird new thing. That would be you doing the "real" cognitive work of learning about and modeling the real world yourself. And you'd (probably) internalize it permanently as a salient option here too. It would just be harder than learning it from your parents as a child - but it would actually be "nonsocial agency" in that case, so to speak.

And with going to a restaurant, at least you actually experienced it when you first learned about it - it's a very short-term plan, still. All these dynamics get more extreme when we talk about true long-term plans over years and decades, where you have never experienced the long-term goal or even the substeps of the plan (like the "going to college" example in the main post).

Now of course, this is all still explainable in your frame as well (e.g. you could say that new things always feel weird, or something). Hyper-abstract frames like the ones we're talking about here can generally accommodate any observation - the question is which frame requires less epicycles overall.

Repeating myself from the main post: to me, saying that long-term planning is algorithmic but that "the social world is usually very very important for (1) making options salient, and (2) making options seem appealing, and (3) providing evidence about the consequences of different options" feels like a lot of epicycles. And starting the other way around - saying we're acting out social stories and that this gets us to some wonky general-ish planning ability, and then adding the caveat that sometimes "real" ground-level feedback can sway us from one social story to the other - feels like overall less epicycles, when we try to make our frames fit the evidence of our own introspection and human behavior in general.

But I think you’re taking that observation WAY too far.

In particular, the social world is not REQUIRED for any of those three things.

For one thing, if people learn planning from other people, where did it come from in the first place? Somebody had to have been the first, right?

“Low-level reflexes generalize to heuristics, which in turn generalize to a general planning algorithm”… I would also guess Steven Byrnes believes this (see below).

It’s not clear to me how “order food -> food will come” is even supposed to be learned by the brain’s self-supervised learning/predictive processing or RL. The prediction error/reward comes in _a week_ after the prediction. And if it’s somehow deduced from higher-level knowledge about the world - how did that get learned?

Have you really never in your life figured out something like "X today implies Y tomorrow" all by yourself that you didn't learn from someone else??

I feel like I’m probably misunderstanding your position here, because it really seems crazy to me.

I’ve gotten into the habit of trying to model what’s going on when I experience an impulse for an action that could be interpreted as ”long-term planning”, and it seems to me that it’s all actually just a bunch of superficial, distinct, socially learned behavioral patterns, rather than any planning through a world model or any general/sophisticated heuristics for accomplishing long-term goals

Maybe you should read Cate Hall’s book when it comes out? :-P

OK here’s an example that I challenge you to explain: if I’m hungry, I might take a bus to the restaurant to get a slice of pizza, but if I’m not hungry, then I won’t.

Anyway, I reiterate the first paragraph of my comment, that there’s a kernel of truth here, and that it’s very important, even if I think you’re taking it way too far.

Thank you Steven! Really appreciate the comment.

I feel like I’m probably misunderstanding your position here, because it really seems crazy to me.

For one thing, if people learn planning from other people, where did it come from in the first place? Somebody had to have been the first, right?

(and you agree with gradual emergence of it being possible anyway, I imagine, since you just think the planning algorithm evolved biologically instead)

(not sure how much you skimmed, more on this if you ctrl-F "mimesis")

(I think some people are more motivated by following norms than others. Sociopaths, autistics, and “high-agency people” would typically be on the lower end of norm-following motivation, so I would look there first to find especially clear-cut evidence of non-social agency.)

I think planning is basically innate, although it’s augmented by a lifetime of learning how to plan better (e.g. you can learn metacognitive heuristics from experience, or from reading a book etc.).

Maybe you should read Cate Hall’s book when it comes out? :-P

:P

Let's get into the meat of it - you mentioned a few different examples, let me zoom in where it feels most useful.

the plan to eat pizza is clearly “planning through a world-model”.

Yes, absolutely - let me say this very loudly. PLANNING THROUGH A WORLD MODEL IS REAL AND WORKS. :-)

if you adopt their general advice and then next week you end up with a proud new accomplishment under your belt, aren’t you marginally more likely to follow that same heuristic in the future? Obviously yes, right? So doesn’t this constitute “[learning] through feedback about how well they fulfill your goals”?

Yes, absolutely. I should probably have written "primarily/at root through feedback about how well they fulfill your goals" - that's closer to my beliefs. Let me edit that, thank you.

Let me say this very loudly as well. WE IMPROVE AT PLANNING THROUGHOUT OUR LIFETIME, IN PART BY SEEING WHAT WORKS AND WHAT DOESN'T. :-)

Again, the caveat section goes into a little more detail here.

“if I put on ripped pants right now at 8am, then my knees might get cold when I’m outside at 10pm tonight, and I know this because it happened to me yesterday”

Now to our disagreements:

OK here’s an example that I challenge you to explain: if I’m hungry, I might take a bus to the restaurant to get a slice of pizza, but if I’m not hungry, then I won’t.

Okay, so I'll simplify this by removing the bus part, and having the plan only be "going to the restaurant" - I'd be saying very similar things twice otherwise.

That's what I mean by "yes, you're planning through a world model, but it is a world model of social stories, not a model of the physical world".

15

Social agency

15

Summary

The common model

My alternative

Some arguments and intuition pumps

Miscellaneous comments and caveats

Possible consequences

Conclusion

15

15