This post originally appeared on The Gears To Ascension


I present generative modeling of minds as a hypothesis for the complexities of social dynamics, and build a case for it out of pieces. My hope is that this explains social behaviors more precisely and with less handwaving than its components. I intend this to be a framework for reasoning about social dynamics more explicitly and for training intuitions. In future posts I plan to build on it to give more concrete evidence, and give examples of social dynamics that I think become more legible with the tools provided by combining these ideas.

Epistemic status: Hypothesis, currently my maximum likelihood hypothesis, of why social interaction is so weird.


People talk to each other a lot. Many of them are good at it. Most people don't really have a deep understanding of why, and it's rare for people to question why it's a thing that's possible to be bad at. Many of the rules seem arbitrary at first look, and it can be quite hard to transfer skill at interaction by explanation.

Some of the rules sort of make sense, and you can understand why bad things would happen when you break them: Helping people seems to make them more willing to help you. Being rude to people makes them less willing to help you. People want to "feel heard". But what do those mean, exactly?

I've been wondering about this for a while. I wasn't naturally good at social interaction, and have had to put effort into learning it. This has been a spotty success - I often would go to people for advice, and then get things like "people want to know that you care". That advice sounded nice, but it was vague and not usable.

The more specific social advice seems to generalize quite badly. "Don't call your friends stupid", for example. Banter is an important part of some friendships! People say each other are ugly and feel cared for. Wat?

Recently, I've started to see a deeper pattern here that actually seems to have strong generalization: it's simple to describe, it correctly predicts large portions of very complicated and weird social patterns, and it reliably gives me a lens to decode what happened when something goes wrong. This blog post is my attempt to share it as a package.

I basically came up with none of this. What I'm sharing is the synthesis of things that Andrew Critch, Nate Soares, and Robin Hanson have said - I didn't find these ideas that useful on their own, but together I'm kind of blown away by how much they collectively explain. In future blog posts I'll share some of the things I have used this to understand.

WARNING: An easy instinct, on learning these things, is to try to become more complicated yourself, to deal with the complicated territory. However, my primary conclusion is "simplify, simplify, simplify": try to make fewer decisions that depend on other people's state of mind. You can see more about why and how in the posts in the "Related" section, at the bottom.


Newcomb's problem is a game that two beings can play. Let's say that the two people playing are you and Newcomb. On Newcomb's turn, Newcomb learns all that they can about you, and then puts one opaque box and one transparent box in a room. Then on your turn, you go into the room, and you can take one or both of the boxes. What Newcomb puts in the boxes depends on what they think you'll do once it's your turn:

  • If Newcomb thinks that you'll take only the opaque box, they fill it with $1 million, and put $1000 in the transparent box.
  • If Newcomb thinks that you'll take both of the boxes, they only put $1000 in the transparent box.

Once Newcomb is done setting the room up, you enter and may do whatever you like.

This problem is interesting because the way you win or lose has little to do with what you actually do once you go into the room, it's entirely about what you can convince Newcome you'll do. This leads many people to try to cheat: convince Newcomb that you'll only take one box, and then take two.

In the original framing, Newcomb is a mind-reading oracle, and knows for certain what you'll do. In a more realistic version of the test, Newcomb is merely a smart person and paying attention to you. Newcomb's problem is simply a crystallized view of something that people do all the time: evaluate what kind of people each other, to determine trust. And it's interesting to look at it and note that when it's crystallized, it's kind of weird. When you put it this way, it becomes apparent that there are very strong arguments for why you should always do the trustworthy thing and one-box.


(This section inspired by nate soares' post "newcomblike problems are the norm".)

You want to know that people care about you. You don't just want to know that the other person is acting helpfully right now. If someone doesn't care about you, and is just helping you because it helps them, then you'll trust and like them less. If you know that someone thinks your function from experience to emotions is acceptable to them, you will feel validated.

I think this makes a lot of sense. In artificial distributed systems, we ask a bunch of computers to work together, each computer a node in the system. All of the computers must cooperate to perform some task - some artificial distributed systems, like bittorrent, are intended to allow the different nodes (computers) in the system to share things with each other, but where each participating computer joins to benefit from the system. Other distributed systems, such as the backbone routers of the internet, are intended to provide a service to the outside world - in the case of the backbone routers, they make the internet work.

However, nodes can violate the distributed system's protocols, and thereby gain advantage. In bittorrent, nodes can download but refuse to upload. In the internet backbone, each router needs to know where other routers are, but if a nearby router lies, then the entire internet may slow down dramatically, or route huge portions of US traffic to china. Unfortunately, despite the many trust problems in distributed systems, we have solved relatively few of them. Bitcoin is a fun exception to this - I'll use it as a metaphor in a bit.

Humans are each nodes in a natural distributed system, where each node has its own goals, and can provide and consume services, just like the artificial ones we've built. But we also have this same trust problem, and must be solving it somehow, or we wouldn't be able to make civilizations.

Human intuitions automatically look for reasons why the world is the way it is. In stats/ML/AI, it's called generative modeling. When you have an experience - every time you have any experience, all the time, on the fly - your brain's low level circuitry assumes there was a reason that the experience happened. Each moment your brain is looking for what the process was that created that experience for you. Then in the future, you can take your mental version of the world and run it forward to see what might happen.

When you're young, you start out pretty uncertain about what processes might be driving the world, but as you get older your intuition learns to expect gravity to work, learns to expect that pulling yourself up by your feet won't work, and learns to think of people as made of similar processes to oneself.

So when you're interacting with an individual human, your brain is automatically tracking what sort of process they are - what sort of person they are. It is my opinion that this is one of the very hardest things that brains do (where I got that idea). When you need to decide whether you trust them, you don't just have to do that based off their actions - you also have your mental version of them that you've learned from watching how they behave.

But it's not as simple as evaluating, just once, what kind of person someone is. As you interact with someone, you are continuously automatically tracking what kind of person they are, what kind of thoughts they seem to be having right now, in the moment. When I meet a person and they say something nice, is it because they think they're supposed to, or because they care about me? If my boss is snapping at me, are they to convince me I'm unwelcome at the company without saying it outright, or is my boss just having a bad day?


Note: I am not familiar with the details of the evolution of cooperation. I propose a story here to transfer intuitions, but the details may have happened in a different order. I would be surprised if I am not describing a real event, and it would weaken my point.

Humans are smart, and our ancestors have been reasonably smart going back a very long time, far before even primates branched off. So imagine what it was like to be an animal in a pre-tribal species. You want to survive, and you need resources to do so. You can take them from other animals. You can give them to other animals. Some animals may be more powerful than you, and attempt to take yours.

Imagine what it's like to be an animal partway through the evolution of cooperation. You feel some drive to be nice to other animals, but you don't want to be nice if the other animal will take advantage of you. So you pay attention to which animals seem to care about being nice, and you only help them. They help you, and you both survive.

As the generations go on, this happens repeatedly. An animal that doesn't feel caring for other animals is an animal that you can't trust; An animal that does feel caring is one that you want to help, because they'll help you back.

Over generations, it becomes more and more the case that the animals participating in this system actually want to help each other - because the animals around them are all running newcomblike tests of friendliness. Does this animal seem to have a basic urge to help me? Will this animal only take the one box, if I leave the boxes lying out? If the answer is that you can trust them, and you recognize that you can trust them, then that is the best for you, because then the other animal recognizes that they were trusted and will be helpful back.

After many generations of letting evolution explore this environment, you can expect to end up with animals that feel strong emotions for each other, animals which want to be seen as friendly, animals where helping matters. Here is an example of another species that has learned to behave sort of this way.

This seems to me be a good generating hypothesis for why people care about what other people think of them innately, and seems to predict ways that people will care about each other. I want to feel like people actually care about me, I don't just want to hear them say that they do. In particular, it seems to me that humans want this far more than you would expect of an arbitrary smart-ish animal.

I'll talk more in detail about what I think human innate social drives actually are in a future blog post. I'm interested in links to any research on things like human basic needs or emotional validation. For now, the heuristic I've found most useful is simply "People want to know that those around them approve of/believe their emotional responses to their experiences are sane". See also Succeed Socially, in the related list.


Knowing that humans evaluate each other in newcomblike ways doesn't seem to me to be enough to figure out how to interact with them. Only armed with the statement "one needs to behave in a way that others will recognize as predictably cooperative", I still wouldn't know how to navigate this.

At a lightning talk session I was at a few months ago, Andrew Critch made the argument that humans regularly model many layers deep in real situations. His claim was that people intuitively have a sense of what each other are thinking, including their senses of what you're thinking, and back and forth for a bit. Before I go on, I should emphasize how surprising this should be, without the context of how the brain actually does it: the more levels of me-imagining-you-imagining-me-imagining-you-imagining… you go, the more of an explosion of different options you should expect to see, and the less you should expect actual-sized human minds to be able to deal with it.

However, after having thought about it, I don't think it's as surprising as it seems. I don't think people actually vividly imagine this that many levels deep: what I think is going on is that as you grow up, you learn to recognize different clusters of ways a person can be. Stereotypes, if you will, but not necessarily so coarse as that implies.

At a young age, if I am imagining you, I imagine a sort of blurry version of you. My version of you will be too blurry to have its own version of me, but I learn to recognize the blurry-you when I see it. The blurry version of you only has a few emotions, but I sort of learn what they are: my blurry you can be angry-"colored", or it can be satisfied-"colored", or it can be excited-"colored", etc. ("Color" used here as a metaphor, because I expect this to be built a similar way to color or other basic primitives in the brain.)

Then later, as I get older, I learn to recognize when you see a blurry version of me. My new version of you is a little less blurry, but this new version of you has a blurry-me, made out of the same anger-color or satisfaction-color that I had learned you could be made out of. I go on, and eventually this version of you becomes its own individual colors - you can be angry-you-with-happy-me-inside colored when I took your candy, or you can be relieved-you-with-distraught-me-inside colored when you are seeing that I'm unhappy when a teacher took your candy back.

As this goes on, I learn to recognize versions of you as their own little pictures, with only a few colors - but each color is a "color" that I learned in the past, and the "color" can have me in it, maybe recursively. Now my brain doesn't have to track many levels - it just has to have learned that there is a "color" for being five levels deep of this, or another "color" for being five levels deep of that. Now that I have that color, my intuition can make pictures out of the colors and thereby handle six levels deep, and eventually my intuition will turn six levels into colors and I'll be able to handle seven.

I think it gets a bit more complicated than this for particularly socially competent people, but that's a basic outline of how humans could reliably learn to do this.


I found the claim that humans regularly social-model 5+ levels deep hard to believe at first, but Critch had an example to back it up, which I attempt to recreate here.

Fair warning, it's a somewhat complicated example to follow, unless you imagine yourself actually there. I only share it for the purpose of arguing that this sort of thing actually can happen; if you can't follow it, then it's possible the point stands without it. I had to invent notation in order to make sure I got the example right, and I'm still not sure I did.

(I'm sorry this is sort of contrived. Making these examples fully natural is really really hard.)

  • You're back in your teens, and friends with Kris and Gary. You hang out frequently and have a lot of goofy inside jokes and banter.
  • Tonight, Gary's mom has invited you and Kris over for dinner.
  • You get to Gary's house several hours early, but he's still working on homework. You go upstairs and borrow his bed for a nap.
  • Later, you're awoken by the activity as Kris arrives, and Gary's mom shouts a greeting from the other room: "Hey, Kris! Your hair smells bad.". Kris responds with "Yours as well." This goes back and forth, with Gary, Kris, and Gary's mom fluidly exchanging insults as they chat. You're surprised - you didn't know Kris knew Gary's mom.
  • Later, you go downstairs to say hi. Gary's mom says "welcome to the land of the living!" and invites you all to sit and eat.
  • Partway through eating, Kris says "Gary, you look like a slob."
  • You feel embarrassed in front of Gary's mom, and say "Kris, don't be an ass."
  • You knew they had been bantering happily earlier. If you hadn't had an audience, you'd have just chuckled and joined in. What happened here?

If you'd like, pause for a moment and see if you can figure it out.

You, Gary, and Kris all feel comfortable bantering around each other. Clearly, Gary and Kris feel comfortable around Gary's mom, as well. But the reason you were uncomfortable is that you know Gary's mom thought you were asleep when Kris got there, and you hadn't known they were cool before, so as far as Gary's mom knows, you think she thinks kris is just being an ass. So you respond to that.

Let me try saying that again. Here's some notation for describing it:

  • X => Y: X correctly believes Y
  • X ~> Y: X incorrectly believes Y
  • X ?? Y: X does not know Y
  • X=Y=Z=...: X and Y and Z and ... are comfortable bantering

And here's an explanation in that notation:

  • Kris=You=Gary: Kris, You, and Gary are comfortable bantering.
  • Gary=Kris=Gary's mom: Gary, Kris, and Gary's mom are comfortable bantering.
  • You => [gary=Gary's mom=kris]: You know they're comfortable bantering.
  • Gary's mom ~> [You ?? [gary=Gary's mom=kris]]: Gary's mom doesn't know you know.
  • You => [Gary's mom ~> [You ?? [gary=Gary's mom=kris]]]: You know Gary's mom doesn't know you know they're comfortable bantering.

And to you in the moment, this crazy recursion just feels like a bit of anxiety, fuzzyness, and an urge to call Kris out so Gary's mom doesn't think you're ok with Kris being rude.

Now, this is a somewhat unusual example. It has to be set up just right in order to get such a deep recursion. The main character's reaction is sort of unhealthy/fake - better would have been to clarify that you overheard them bantering earlier. As far as I can tell, the primary case where things get this hairy is when there's uncertainty. But it does actually get this deep - this is a situation pretty similar to ones I've found myself in before.

There's a key thing here: when things like this happen, you react nearly immediately. You don't need to sit and ponder, you just immediately feel embarrassed for Kris, and react right away. Even though in order to figure out explicitly what you were worried about, you would have had to think about it four levels deep.

If you ask people about this, and it takes deep recursion to figure out what's going on, I expect you will generally get confused non-answers, such as "I just had a feeling". I also expect that when people give confused non-answers, it is almost always because of weird recursion things happening.

In Critch's original lightning talk, he gave this as an argument that the human social skills module is the one that just automatically gets this right. I agree with that, but I want to add: I think that that module is the same one that evaluates people for trust and tracks their needs and generally deals with imagining other people.


So people have generative models of each other, and they care about each other's generative models of them. I care about people's opinion of me, but not in just a shallow way: I can't just ask them to change their opinion of me, because I'll be able to tell what they really think. Their actual moral judgement of their actual generative model of me directly affects my feelings of acceptance. So I want to let them know what kind of person I am: I don't just want to claim to be that kind of person, I want to actually show them that I am that kind of person.

You can't just tell someone "I'm not an asshole"; that's not strong evidence about whether you're an asshole. People have incentives to lie. People have powerful low-level automatic bayesian inference systems, and they'll automatically and intuitively recognize what social explanations are more likely as explanations of your behavior. If you want them to believe you're not an asshole, you have to give credible evidence that you are not an asshole: you have to show them that you do things that would have been unlikely had you been an asshole. You have to show them that you're willing to be nice to them, you have to show them that you're willing to accommodate their needs. Things that would be out of character if you were a bad character.

If you hang out with people who read Robin Hanson, you've probably heard of this before, under the name "signaling".

But many people who hear that interpret it as a sort of vacuous version, as though "signaling" is a sort of fakery, as though all you need to do is give the right signals. If someone says "I'm signaling that I'm one of the cool kids", then sure, they may be doing things that for other people would be signals of being one of the cool kids, but on net the evidence is that they are not one of the cool kids. Signaling isn't about the signals, it's about giving evidence about yourself.In order to be able to give credible evidence that you're one of the cool kids, you have to either get really good at lying-with-your-behavior such that people actually believe you, or you have to change yourself to be one of the cool kids. (This is, I think, a big part of where social anxiety advice falls down: "fake it 'til you make it" works only insofar as faking it actually temporarily makes it.)

"Signaling" isn't fakery, it is literally all communication about what kind of person you are. A common thing Hanson says, "X isn't about Y, it's about signaling" seems misleading to me: if someone is wearing a gold watch, it's not so much that wearing a gold watch isn't about knowing the time, it's that the owner's actual desires got distorted by the lens of common knowledge. Knowing that someone would be paying attention to them to infer their desires, they filtered their desires to focus on the ones they thought would make them look good. This also can easily come off as inauthentic, and it seems fairly clear why to me: if you're filtering your desires to make yourself look good, then that's a signal that you need to fake your desires or else you won't look good.

Signals are focused around hard-to-fake evidence. Anything and everything that is hard to fake and would only happen if you're a particular kind of person, and that someone else recognizes as so, is useful in conveying information about what kind of person you are. Fashion and hygiene are good examples of this: being willing to put in the effort make yourself fashionable or presentable, respectively, is evidence of being the kind of person who cares about participating in the societal distributed system.

Conveying truth in ways that are hard to fake is the sort of thing that comes up in artificial distributed systems, too. Bitcoin is designed around a "blockchain": a series of incredibly-difficult-to-fake records of transactions. 
Bitcoin has interesting cryptographic tricks to make this hard to fake, but it centers around having a lot of people doing useless work, so that no one person can do a bunch more useless work and thereby succeed at faking it.


From the inside, it doesn't feel like we're in a massive distributed system. It doesn't feel like we're tracking game theory and common knowledge. Even though everyone, even those who don't know about it, do it automatically.

In the example, the main character just felt like something was funny. The reason they were able to figure it out and say something so fast was that they were a competent human who had focused their considerable learning power on understanding social interaction, presumably from a young age, and automatically recognized a common knowledge pattern when it presented itself.

But in real life, people are constantly doing this. To get along with people, you have to be willing to pay attention to giving evidence about your perception of them. To be accepted, you have to be willing to give evidence that you are the kind of person that other people want to accept, and you might need to change yourself if you actually just aren't.

In general, I currently think that minimizing recursion depth of common knowledge is important. Try to find ways-to-be that people will be able to recognize more easily. Think less about social things in-the-moment so that others have to think less to understand you; adjust your policies to work reliably so that people can predict them reliably.

Other information of interest


New Comment
15 comments, sorted by Click to highlight new comments since: Today at 2:15 AM

I rather like this concept, and probably put higher credence on it than you. However, I don't think we are actually modeling that many layers deep. As far as I can tell, it's actually rare to model even 1 layer deep. I think your hypothesis is close, but not quite there. We are definitely doing something, but I don't think it can properly be described as modeling, at least in such fast-paced circumstances. It's something close to modeling, but not quite it. It's more like what a machine learning algorithm does, I think, and less like a computer simulation.

Models have moving parts, and diverge rapidly at points of uncertainty, like how others might react. When you build a model, it is a conscious process, and requires intelligent thought. The model takes world states as inputs, and simulates the effects these have on the components of the model. Then, after a bunch of time consuming computation, the model spits out a play-by-play of what we think will happen. If there are any points of uncertainty, the model will spit out multiple possibilities stemming from each, and build up multiple possible branches. This is extremely time consuming, and resource intense.

But there's a fast, System 1 friendly way to route around needing a time-consuming model: just use a look-up-table.^[1] Maybe run the time consuming model a bunch of times for different inputs, and then mentally jot down the outputs for quick access later, on the fly. Build a big 2xn lookup table, with model inputs in 1 column, and results in the other. Do the same for every model you find useful. Maybe have 1 table for a friend's preferences: inputting tunafish outputs gratitude (for remembering her preferences). Imputing tickling outputs violence.

Perhaps this is why we obsess over stressful situations, going over all the interpretations and everything we could have done differently. We're building models of worrying situations, running them, and then storing the results for quick recall later. Maybe some of this is actually going on in dreams and nightmares, too.

But there's another way to build a lookup table: directly from data, without running any simulation. I think we just naturally keep tabs of all sorts of things without even thinking about it. Arguably, most of our actions are being directed by these mental associations, and not by anything containing conscious models.

Here's an example of what I think is going on, mentally:

Someone said something that pattern matches as rash? Quick, scan through all the lookup tables within arm’s reach for callous-inputs. One output says joking. Another says accident. A third says he's being passive aggressive. Joking seems to pattern match the situation the best.

But oh look, you also ran it through some of the lookup tables for social simulations, and one came up with a flashing red light saying Gary's mom doesn't realize it was a joke.

That's awkward. You don't have any TAPs (Trigger Action Plans) installed for what to do in situations that pattern match to an authority figure misunderstanding a rude joke as serious. Your mind spirals out to less and less applicable TAP lookup tables, and the closest match is a trigger called "friend being an ass". You know he's actually joking, but this is the closest match, so you look at the action column, and it says to reprimand him, so you do.

Note that no actual modeling has occurred, and that all lookup tables used could have been generated purely experimentally, without ever consciously simulating anyone. This would explain why it's so hard to explain the parts of our model when asked: we have no model, just heuristics, and fuzzy gut feeling about the situation. Running the model again would fill in some of the details we've forgotten, but takes a while to run, and slows down the conversation. That level of introspection is fine in an intimate, introspective conversation, but if it's moving faster, the conversation will have changed topics by the time you've clarified your thoughts into a coherent model.

Most of the time though, I don't think we even explicitly think about the moving parts that would be necessary to build a model. Take lying, for example:

We rarely think "A wants B to think X about C, because A models B as modeling C in a way that A doesn't like, and A realizes that X is false but would cause B to act in a way that would benefit A if B believed it." (I'm not even sure that will parse correctly for anyone who reads it. That's kind of my point though.)

Instead, we just think "A told lie X to B about C". Or even just "A lied", leaving out all the specific details unless they become necessary. All the complexity of precisely what a lie is gets tucked away neatly inside the handle "lie", so we don't have to think about it or consciously model it. We just have to pattern match something to it, and then we can apply the label.

If pressed, we'll look up what "lied" means, and say that "A said X was true, but X is actually false". If someone questions whether A might actually believe X, we'll improve out model of lying further, to include the requirement that A not actually believe X. We'll enact a TAP to search for evidence that A thinks X, and come up with memories Y and Z, which we will recount verbally. If someone suspects that you are biased against A, or just exhibiting confirmation bias, they may say so. This just trips a defensive TAP, which triggers a "find evidence of innocence" action. So, your brain kicks into high gear and automatically goes and searches all your lookup tables for things which pattern match as evidence in your favor.

We appear to be able to package extremely complex models up into a single function, so it seems unlikely that we are doing anything different with simpler models of things like lying. There's no real difference in how complex the concept of god feels from the concept of a single atom or something, even though one has much more moving parts under the hood of the model. We're not using any of the moving parts of the model, just spitting out cashed thoughts from a lookup table, so we don't notice the difference.

If true, this has a bunch of other interesting implications:

  • This is likely also why people usually act first and pick a reason for that choice second: we don't have a coherent model of the results until afterward anyway, so it's impossible to act like an agent in real time. We can only do what we are already in the habit of doing, by following cashed TAPs. This is the reason behind akrasia, and the "elephant and rider" (System 1 and System 2) relationship.

  • Also note that this scales much better: you don't need to know any causal mechanisms to build a lookup table, so you can think generally about how arbitrarily large groups will act based only on past experience, without needing to build it up from simulating huge numbers of individuals.

  • It implies that we are just Chinese Rooms most of the time, since conscious modeling is not involved most of the time. Another way of thinking of it is that we store keep the answers to the sorts of common computations we expect to do in (working?) memory, so that the more computationally intense consciousness can concentrate on the novel or difficult parts. Perhaps we could even expand our consciousness digitally to always recompute responses every time.

[1] For the record, I don't think our minds have neat, orderly lookup tables. I think they use messy, associative reasoning, like the Rubes and Bleggs in How An Algorithm Feels From The Inside. This is what I'm referring to when I mention pattern matching, and each time I talk about looking something up in a empirically derived lookup table, a simulation input/results lookup table, or a TAP lookup table.

I think these sorts of central nodes with properties attached make up a vast, web-like networks, built like network 2 in the link. All the properties are themselves somewhat fuzzy, just like the central "rube"/"blegg" node. We could de-construct "cube" into constituent components the same way: 6 sides, all are flat, sharp corners, sharp edges, sides roughly 90 degrees apart, etc. You run into the same mental problems with things like rhombohedrons, and are forced to improve your sloppy default mental conception of cubes somehow if you want to avoid ambiguity.

All nodes are defined only by it's relation to adjacent nodes, just like the central rube/blegg node. There are no labels attached to the nodes, just node clusters for words and sounds and letters attached to the thing they are meant to represent. It would be a graph theory monster if we tried to map it all out, but in principle you could do it by asking someone how strongly they associated various words and concepts.

I enthusiastically agree with you. I actually do machine learning as my day job, and its ability to store "lookup table" style mappings with generalization was exactly what I was thinking of when referring to "modeling". I'm pleased I pointed to the right concept, and somewhat disappointed that my writing wasn't high enough quality to clarify this from the beginning. what you mention about obsessing seems extremely true to me, and seems related to Satvik's internationalization of it as "rapid fire simulations".

in general I think of s1 as "fast lookup-table style reasoning" and s2 as "cpu-and-program style reasoning". my goal here was to say:

  1. humans have a hell of a lot of modeling power in the fast lookup style of reasoning
  2. that style of reasoning can embed recursive modeling
  3. a huge part of social interaction is a complicated thing that gets baked into lookup style reasoning


I'm not sure you actually disagree with the OP. I think you are probably right about the mechanism by which people identify and react to social situations.

I think the main claims of the OP hold whether you're making hyper-fast calculations, or lookup checks. The lookup checks still correspond roughly to what the hyperfast calculations would be, and I read the OP mainly as a cautionary tale for people who attempt to do use System 2 reasoning to analyze social situations (and, especially, if you're attempting to change social norms)

Aspiring rationalists are often the sort of people who look for inefficiencies in social norms and try to change them. But this often results in missing important pieces of all the nuances that System 1 was handling.

One confusion I've had; where people treat emotions as a level on which it is difficult to fake things but then later don't act surprised when actions are at odds with the previous moment to moment feelings. Like, I was happy to accept that your current experience is validly how you feel in the moment, but I didn't think how you feel is strong evidence for your 'true' beliefs/future actions etc. And it's weird that others do given that the track record is actually quite bad. So if I take one perspective, people are constantly lying and get very mad if you point this out.

This made a lot more sense when I stopped modeling people as monolithic agents. The friction arises because they are modeling themselves as monolithic agents. So I changed the way I talk about people's preferences. But it is still tricky and I often forget. I've thought of this as a sort of extension to NVC, NMC or non-monolithic communication also encourages you to remove the I and You constructs from your language and see what happens. It isn't possible in real time communication, but it is an interesting exercise while journaling in that it forces a confrontation with direct observer moments.

What is NMC?

(For anyone who doesn't know: NVC stands for Nonviolent Communication. I would highly recommend it.)

I enjoyed the post and appreciate the additional links for reading.

The main character's reaction is sort of unhealthy/fake - better would have been to clarify that you overheard them bantering earlier.

I did not feel that way at all, the reaction is simple and appropriate. Imagine how clunky and awkward it would be for the main character trying to explain that in fact you over heard the banter and that you don't want the mom to think, that you think its OK for there to be rude things said about her son, in front of her. That would come off as weird.

it's not so much that wearing a gold watch isn't about knowing the time, it's that the owner's actual desires got distorted by the lens of common knowledge. Knowing that someone would be paying attention to them to infer their desires, they filtered their desires to focus on the ones they thought would make them look good. This also can easily come off as inauthentic, and it seems fairly clear why to me: if you're filtering your desires to make yourself look good, then that's a signal that you need to fake your desires or else you won't look good.

The gold watch wearer does not come off as inauthentic to me. If I had more information, like that the person was not well off then I would. Just because the gold watch wearer wants to look good in front of me doesn't make it inauthentic nor does it mean the wearer is faking it. There isn't much difference between the fashion and hygiene example and the gold watch example. Putting in the effort to look good by being well dressed and clean (presuming that is in fact true, people might think you fail at both), is the same as using money to wear a gold watch to convey wealth. All three attempt to convey some information about the person, and nothing is inauthentic if it's true. How else do you let people know you got money?

I would not recommend analyzing the interactions of real live meatbag humans in a Newcomb Problem framework :-/

Your Newcomb game isn't about trust, anyway. "You" needs to persuade "Newcomb" that she'll one-box, and "Newcomb" needs to predict what "you" will actually do. It's an asymmetric test of how well "you" can manipulate people vs how well "Newcomb" can see through the manipulation.

I'm confused about this response. at what level of prediction would it start becoming reasonable to consider it to be an approximation of newcomb's problem? if this isn't about trust, what similar thing is? I mean, certainly you don't get exact behavior, but I don't see why the usual reasoning about newcomb's problem doesn't apply here. could you read Critch's post and comment about where you disagree with it?

Let me be more clear.

Point 1: The Newcomb Problem tells you nothing about actual social interactions between actual humans. If you're interested in social structures, techniques, etc., the Newcomb Problem is the wrong place to start.

Point 2: Trust in this context can be defined more or less as "accepting without verifying". There is no trust involved in the Newcomb problem.

Oh, and in case you're curious, I two-box.

If you 2-box, shouldn't Point 1 be "Newcomb's problem doesn't tell you anything useful about anything" rather than "Newcomb's probably doesn't tell you anything useful about trust?"

Newcomb's is a hypothetical scenario (highly unlikely to exist in reality). As such, I think it's usefulness is more or less on par with other hypothetical scenarios unlikely to happen.

I'm not sure William Newcomb would agree to pose as Omega, and if you're going to change the problem, you really ought to explore the ramifications. Like what happens if the prediction is wrong - it becomes a boring "convince the agent you'll one-box, then two-box" problem if you assume only human-like predictive abilities. Being a cloaked psychopath in a community of cooperators probably nets you more points than just cooperating all the time.

Also, Hofstadter's idea of superrationality deserves a link in the "other sources" list.

I think this is explored in Critch's post (which is linked at the bottom)

I considered calling the players A and B, but several of the folks who proofread this found that hard to understand, so I changed it to use the same names as Critch's post: "you" and "newcomb".

as for prediction power, a core point I'm making is that no, humans really are good enough at this to be able to administer the test. it is possible to trick people, but I predict that it's possible to do much better than chance at predicting what any given human would do in a trust problem. certainly scammers exist, but they don't succeed as often as all that. a big part of why is that over an extended period of time, you can include previous behavior on newcomblike problems as evidence about whether an agent will two box again. if you two box once, you immediately lose trust. if you reliably do not two box, in a variety of situations, people will learn to trust you.

humans really are good enough at this to be able to administer the test

laugh. Not even close.

much better than chance

this is my objection. Changing the prediction from "has never been wrong, in enough trials to overcome a fairly conservative prior about trickery vs predictive power" to 'better than chance" completely changes the problem.

If the guess is somewhat better than chance, it matters a lot how much better, and in which directions, and under what conditions the guesser is wrong. I think it'd pay to study how to manipulate the guesser so he expects one-boxing, and then two-box.