Concept Safety
Multiagent Models of Mind
Keith Stanovich: What Intelligence Tests Miss


Beliefs as emotional strategies

Of course, we can't know for sure. It could be that the interventions actually worked by a different method than they seemed to.

But consider e.g. the first story. Here was a person who started out entirely convinced that the belief in free will was an intrinsically hardwired need that they had. It had had a significant impact on their entire life, to the point of making them suicidally depressed when they couldn't believe it. I had a theory of how the mind works which made a different prediction, and I only needed to briefly suggest it for them to surface compatible evidence without me needing to make any more leading comments. After that, I only needed to suggest a single intervention which my model predicted would cause a change, and it did, causing a long-term and profound change in the other person.

Because I do expect it to be a permanent change rather than just a short-term effect. Of course, the first two examples are both from this year - I didn't ask Sampo when exactly his example happened - so in principle it's still possible that these will reverse themselves. But that's not my general experience with these things - rather, these interventions tend to produce permanent and lasting change. The longest-term effect I have personal data for is from June 2017; this follow-up from December 2018 still remains a good summary of what that intervention ended up fixing in the long term. (As noted in that follow-up, it's still possible for some issues to come back in a subtler form, or for some of the issues to also have other causes; but that's distinct from the original issue coming back in its original strength.)

So it's possible that my model is mistaken about the exact causality - but that by treating the model as if it was true, you're still able to cause lasting and deep changes in people's psychology.  If my model is wrong, then we need another model that would explain the same observations. Currently I think that the kinds of models that I've outlined would explain those observations pretty well while being theoretically plausible, but I'm certainly open to alternative ones. 

I don't think that e.g. just "hearing the right emotional story can produce relief" is a very good alternative theory. I've certainly also had experience of superficial emotional stories that sounded compelling for a little while and whose effect then faded out, but over time I've learned that a heuristic of "do these effects last for longer than a month" is pretty good for telling those apart from the ones that have a real effect. The permanent ones may also have an effect on things you didn't even realize were related beforehand - e.g. the person in the first example analyzing the things that they realized about it in retrospect - whereas in my experience, the short-term ones mostly just include effects that are obviously and directly derivable from the story. 

So some compelling stories seem to produce relatively minor short-term effects while other interventions cause much broader and longer-lasting ones, and just the hypothesis of "emotional stories can be compelling" doesn't explain why some emotional stories work better than others. Nor would it have predicted that suggesting the specific intervention that I offered would have been particularly useful. 

All of that said, I do admit that the third story has more interacting pieces and that the overall evidence for that one is weaker. We can only be relatively sure that telling the client to imagine a different kind of mother was the final piece in resolving the issue; it's possible that the other inferences about the mother's beliefs are incorrect. I still wanted to include it, in the spirit of learning soft skills, because I think that many beliefs that affect our behavior aren't nice and clear-cut ones where you can just isolate a single key belief and be relatively sure of what happened because you can observe the immediate effects. Rather there's much more behavior that's embedded in an interacting web of beliefs like I outlined there. Even if the details of that particular story were off, enough of it resonates in my inner simulator that I'm pretty sure that something like that story could be true and often is true. But for that one I can't offer a more convincing argument than "load it up in your own inner sim and see whether it resonates".

Testing The Natural Abstraction Hypothesis: Project Intro

Oh cool! I put some effort into pursuing a very similar idea earlier:

I'll start this post by discussing a closely related hypothesis: that given a specific learning or reasoning task and a certain kind of data, there is an optimal way to organize the data that will naturally emerge. If this were the case, then AI and human reasoning might naturally tend to learn the same kinds of concepts, even if they were using very different mechanisms.

but wasn't sure of how exactly to test it or work on it so I didn't get very far.

One idea that I had for testing it was rather different; make use of brain imaging research that seems able to map shared concepts between humans, and see whether that methodology could be used to also compare human-AI concepts:

A particularly fascinating experiment of this type is that of Shinkareva et al. (2011), who showed their test subjects both the written words for different tools and dwellings, and, separately, line-drawing images of the same tools and dwellings. A machine-learning classifier was both trained on image-evoked activity and made to predict word-evoked activity and vice versa, and achieved a high accuracy on category classification for both tasks. Even more interestingly, the representations seemed to be similar between subjects. Training the classifier on the word representations of all but one participant, and then having it classify the image representation of the left-out participant, also achieved a reliable (p<0.05) category classification for 8 out of 12 participants. This suggests a relatively similar concept space between humans of a similar background.

We can now hypothesize some ways of testing the similarity of the AI's concept space with that of humans. Possibly the most interesting one might be to develop a translation between a human's and an AI's internal representations of concepts. Take a human's neural activation when they're thinking of some concept, and then take the AI's internal activation when it is thinking of the same concept, and plot them in a shared space similar to the English-Mandarin translation. To what extent do the two concept geometries have similar shapes, allowing one to take a human's neural activation of the word "cat" to find the AI's internal representation of the word "cat"? To the extent that this is possible, one could probably establish that the two share highly similar concept systems.

One could also try to more explicitly optimize for such a similarity. For instance, one could train the AI to make predictions of different concepts, with the additional constraint that its internal representation must be such that a machine-learning classifier trained on a human's neural representations will correctly identify concept-clusters within the AI. This might force internal similarities on the representation beyond the ones that would already be formed from similarities in the data.

The farthest that I got with my general approach was "Defining Human Values for Value Learners". It felt (and still feels) to me like concepts are quite task-specific: two people in the same environment will develop very different concepts depending on the job that they need to perform...  or even depending on the tools that they have available. The spatial concepts of sailors practicing traditional Polynesian navigation are sufficiently different from those of modern sailors that the "traditionalists" have extreme difficulty understanding what the kinds of birds-eye-view maps we're used to are even representing - and vice versa; Western anthropologists had considerable difficulties figuring out what exactly it was that the traditional navigation methods were even talking about. 

(E.g. the traditional way of navigating from one island to another involves imagining a third "reference" island and tracking its location relative to the stars as the journey proceeds. Some anthropologists thought that this third island was meant as an "emergency island" to escape to in case of unforeseen trouble, an interpretation challenged by the fact that the reference island may sometimes be completely imagined, so obviously not suitable as a backup port. Chapter 2 of Hutchins 1995 has a detailed discussion of the way that different tools for performing navigation affect one's conceptual representations, including the difficulties both the anthropologists and the traditional navigators had in trying to understand each other due to having incompatible concepts.)

Another example are legal concepts; e.g. American law traditionally held that a landowner did not only control his land but also everything above it, to “an indefinite extent, upwards”. Upon the invention of this airplane, this raised the question: could landowners forbid airplanes from flying over their land, or was the ownership of the land limited to some specific height, above which the landowners had no control?

Eventually, the law was altered so that landowners couldn't forbid airplanes from flying over their land. Intuitively, one might think that this decision was made because the redefined concept did not substantially weaken the position of landowners, while allowing for entirely new possibilities for travel. In that case, we can think that our concept for landownership existed for the purpose of some vaguely-defined task (enabling the things that are commonly associated with owning land); when technology developed in a way that the existing concept started interfering with another task we value (fast travel), the concept came to be redefined so as to enable both tasks most efficiently.

This seemed to suggest an interplay between concepts and values; our values are to some extent defined in terms of our concepts, but our values and the tools that we have available for furthering our values also affect that how we define our concepts. This line of thought led me to think that that interaction must be rooted in what was evolutionarily beneficial:

... evolution selects for agents which best maximize their fitness, while agents cannot directly optimize for their own fitness as they are unaware of it. Agents can however have a reward function that rewards behaviors which increase the fitness of the agents. The optimal reward function is one which maximizes (in expectation) the fitness of any agents having it. Holding the intelligence of the agents constant, the closer an agent’s reward function is to the optimal reward function, the higher their fitness will be. Evolution should thus be expected to select for reward functions that are closest to the optimal reward function. In other words, organisms should be expected to receive rewards for carrying out tasks which have been evolutionarily adaptive in the past. [...]

We should expect an evolutionarily successful organism to develop concepts that abstract over situations that are similar with regards to receiving a reward from the optimal reward function. Suppose that a certain action in state s1 gives the organism a reward, and that there are also states s2–s5 in which taking some specific action causes the organism to end up in s1. Then we should expect the organism to develop a common concept for being in the states s2–s5, and we should expect that concept to be “more similar” to the concept of being in state s1 than to the concept of being in some state that was many actions away.

In other words, we have some set of innate values that our brain is trying to optimize for; if concepts are task-specific, then this suggests that the kinds of concepts that will be natural to us are those which are beneficial for achieving our innate values given our current (social, physical and technological) environment. E.g. for a child, the concepts of "a child" and "an adult" will seem very natural, because there are quite a few things that an adult can do for furthering or hindering the child's goals that fellow children can't do. (And a specific subset of all adults named "mom and dad" is typically even more relevant for a particular child than any other adults are, making this an even more natural concept.)

That in turn seems to suggest that in order to see what concepts will be natural for humans, we need to look at fields such as psychology and neuroscience in order to figure out what our innate values are and how the interplay of innate and acquired values develops over time. I've had some hope that some of my later work on the structure and functioning of the mind would be relevant for that purpose.

Internal Family Systems

Yeah, subagents is the general idea of modeling the mind in terms of independent agents, but IFS is a more specific theory of what kinds of subagents there are. E.g. my sequence has a post about understanding System 1 and System 2 in terms of subagents, while IFS doesn't really have anything to say about that.

What Do We Know About The Consciousness, Anyway?

But this idea - self-consciousness is a model trained to predict other such models and generalizing to itself - seems both extremely obvious (in retrospective) and as mentioned before, with one small exception I can’t remember ever hearing or reading about it.

The idea feels familiar enough that I didn't feel surprised to see you suggest it, but I'm not sure where exactly I might have first encountered it. Learning to be conscious seems like a somewhat similar model, at least:

Consciousness remains a formidable challenge. Different theories of consciousness have proposed vastly different mechanisms to account for phenomenal experience. Here, appealing to aspects of global workspace theory, higher-order theories, social theories, and predictive processing, we introduce a novel framework: the self-organizing metarepresentational account (SOMA), in which consciousness is viewed as something that the brain learns to do. By this account, the brain continuously and unconsciously learns to redescribe its own activity to itself, so developing systems of metarepresentations that qualify target first-order representations. Thus, experiences only occur in experiencers that have learned to know they possess certain first-order states and that have learned to care more about certain states than about others. In this sense, consciousness is the brain’s (unconscious, embodied, enactive, nonconceptual) theory about itself.

As does maybe this paper [edit: apparently it's written by the person who wrote the "Rethinking Consciousness" book]:

One possible explanation of consciousness, proposed here, is that it is a construct of the social perceptual machinery. Humans have specialized neuronal machinery that allows us to be socially intelligent. The primary role for this machinery is to construct models of other people’s minds thereby gaining some ability to predict the behavior of other individuals. In the present hypothesis, awareness is a perceptual reconstruction of attentional state; and the machinery that computes information about other people’s awareness is the same machinery that computes information about our own awareness. The present article brings together a variety of lines of evidence including experiments on the neural basis of social perception, on hemispatial neglect, on the out-of-body experience, on mirror neurons, and on the mechanisms of decision-making, to explore the possibility that awareness is a construct of the social machinery in the brain.

I'm also somewhat reminded Thomas Metzinger's stuff about consciousness being a "self-model" (though it tends to be a bit of a pain to figure out what the heck exactly he's saying; I didn't even try doing more than skimming that page, and wouldn't recommend that to others, either), Dennett's notion of the self as a narrative center of gravity, and this LW comment.

How do we prepare for final crunch time?

Does any military use meditation as part of its training? 

. Yes, e.g.

This [2019] winter, Army infantry soldiers at Schofield Barracks in Hawaii began using mindfulness to improve shooting skills — for instance, focusing on when to pull the trigger amid chaos to avoid unnecessary civilian harm.

The British Royal Navy has given mindfulness training to officers, and military leaders are rolling it out in the Army and Royal Air Force for some officers and enlisted soldiers. The New Zealand Defence Force recently adopted the technique, and military forces of the Netherlands are considering the idea, too.

This week, NATO plans to hold a two-day symposium in Berlin to discuss the evidence behind the use of mindfulness in the military.

A small but growing group of military officials support the techniques to heal trauma-stressed veterans, make command decisions and help soldiers in chaotic battles.

“I was asked recently if my soldiers call me General Moonbeam,” said Maj. Gen. Piatt, who was director of operations for the Army and now commands its 10th Mountain Division. “There’s a stereotype this makes you soft. No, it brings you on point.”

The approach, he said, is based on the work of Amishi Jha, an associate professor of psychology at the University of Miami. She is the senior author of a paper published in December about the training’s effectiveness among members of a special operations unit.

The paper, in the journal Progress in Brain Research, reported that the troops who went through a monthlong training regimen that included daily practice in mindful breathing and focus techniques were better able to discern key information under chaotic circumstances and experienced increases in working memory function. The soldiers also reported making fewer cognitive errors than service members who did not use mindfulness.

The findings, which build on previous research showing improvements among soldiers and professional football players trained in mindfulness, are significant in part because members of the special forces are already selected for their ability to focus. The fact that even they saw improvement speaks to the power of the training, Dr. Jha said. [...]

Mr. Boughton has thought about whether mindfulness is anathema to conflict. “The purists would say that mindfulness was never developed for war purpose,” he said.

What he means is that mindfulness is often associated with peacefulness. But, he added, the idea is to be as faithful to compassionate and humane ideals as possible given the realities of the job.

Maj. Gen. Piatt underscored that point, describing one delicate diplomatic mission in Iraq that involved meeting with a local tribal leader. Before the session, he said, he meditated in front of a palm tree, and found himself extremely focused when the delicate conversation took place shortly thereafter.

“I was not taking notes. I remember every word she was saying. I wasn’t forming a response, just listening,” he said. When the tribal leader finished, he said, “I talked back to her about every single point, had to concede on some. I remember the expression on her face: This is someone we can work with.”

Rationalism before the Sequences

I think this comment would make for a good top-level post almost as it is.

Rationalism before the Sequences

I was slightly surprised, mostly because I had the expectation that if you've known about LW for a while, then I would have thought that you'd end up contributing either early or not at all. Curious what caused it to happen in 2021 in particular.

Rationalism before the Sequences

I also quite liked both the Jargon File (which I found before or around the same time as LW) and Dancing With the Gods (which I found through LW).

What Happens To Your Brain When You Write?

Similarly, if I'm writing something original, then if I'm typing I can type relatively close to the speed of my thought - it feels like my words are only somewhat trailing behind the shape of what I'm about to say. But if I'm writing by hand, there's more "lag", with it feeling like it takes much longer for my writing to catch up to the thought.

On the other hand, this feels like it has positive consequences; the words taking longer to write out, means that I also spend more time processing their content, and maybe the writing is a little better as a result. But having to wait for so long also feels frustrating, which is why I mostly don't do it.

Voting-like mechanisms which address size of preferences?

What kind of election do these governments use?

Mostly, I think, voting systems designed to ensure that parties get a share of seats that's proportional to their number of votes ("party-list proportional representation" is what Wikipedia calls it). E.g. the D'Hondt method seems pretty popular (and is used in Finland as well as several other countries).

As for whether it's actually better overall - well, I grew up with it and am used to it so I prefer it over something that would produce a two-party system. ;) But I don't have any very strong facts to present over which system is actually best.

Load More