I could try writing the well cited and argued version of this post, but that would result in procrastination and get stuck in drafts, so you get the stream of consciousness version instead. Happy to answer clarifying questions in the comments. This post is written not on the thesis that it will help all or even most of the people who read it (though it aims to not be actively harmful to those it doesn't help) but that it might be helpful pointers for some or encourage a productive line of thinking. 

Reading the transcript of Eliezer's crypto interview, I experienced a physical sensation I've had a few times before, like a cold hand gripping my lower spine and moving up to envelop more of my body. Having had the experience and gotten advice from more advanced practitioners in the past, I paused in reading to stack trace what was going on. The stack trace goes something like this:

  • Normally when reading, that information goes into a buffer that is tagged with the associations I have for the author, context, audience etc.
  • The buffer is allowed to activate associations that are then used to map concepts across from the author's concept/belief space to my own
  • This process can priority escalate if I'm receiving information I need to act on to keep myself or others physically safe
  • Eliezer's content is trying to priority escalate for unclear reasons, ostensibly for motivation but this gets tagged as a known "bad shape"
  • The bad shape in question is negative/aversive motivation for complex tasks/requests
  • The system also recognizes what would potentially be a working memory buffer overflow vulnerability, but I'm familiar enough with the concepts in question that it isn't a problem in this particular case. (side note: I see this exploit used, to varying degrees of unclear intent, in communities that value intelligence due to not being able to deescalate such exploits since that would be low status. Normal communities have various fail safes against it. The default outcome of this vulnerability getting triggered is to put the person in a more suggestible state, and this is worth being alert to)
  • Eliezer is most likely smarter than me, which means he has potentially drawn black balls that I do not have adequate defenses for the informational content of directly. Global policies of shape similar to 'don't click on links in email regardless of whether you have logical answers to the arguments in the email for why you should click the link' are a fallback
  • In particular, not compulsively reading the next argument for why I should click the link because darn it they sure are interesting arguments even separate from the question of clicking the link, another suspicious shape
  • I take a moment to notice the ways 'this is talking about things that might take place over the next several years, not the next 2 minutes' is propagating and not propagating through the body somatically
  • Pulse lowers, disordered relationship to time common to adhd tagged as likely cofactor for bad response

I took a break for a few minutes and thought about whether I actually wanted to finish reading the piece now or later. I decided to finish it at a slightly slower pace, with a bit more of an eye on myself as I did. From a calmer and less contracted attention position it was a lot more obvious that the scarier inferential jumps are privileging a bunch of hypotheses, not claiming that Eliezer is taking those same privileges, after all, he seems to still be semi-functional whereas we've already seen that that's not true for everyone who takes his arguments seriously, but that Eliezer's arguments plus some privileged hypotheses can lead to very bad places psychologically.

Side note that I also find it likely that 'my parents suddenly started acting insane due to Fox News' is the early example of what is going to ramp up in intensity. I am not optimistic about the mental health of people without a meditation practice over the next few years as computation ramps up to cover far larger swathes of the search space for adversarial prompts for humans. I also notice that I'm sort of annoyed that those sloppy French philosophers turned out right that capitalism as an organizational principle turned out to be an existential threat (they made money their god and got bad outcomes, laughingdevas.jpg).

What does this have to do with Buddhist tech? Well, I want to talk a bit about these so called 'bad shapes' and what to do about them.

The Buddhist psychological model of humans says that at any given time you are overwhelmingly likely to find a person in one of six states/stances/attitudes towards the objects of their attention. Each of the stances is a particular set of strategies for dealing with the flow of experience, telling you which things or aspects of things are important to pay attention to in order to get goods and avoid bads. Very briefly speaking, the human realm is oriented towards understanding, the god realm towards pleasure, the titan realm towards conflict, the animal realm towards comfort/safety, the ghost realm towards whatever is lacking, and the hell realm towards suffering. You can guess human realm is considered fortunate to find yourself in for a religion oriented around 'ignorance is the root of all suffering.' 

One of the things people can get sensitized to is the energetic shape of each of these stances and this is what happened that triggered a stack trace. Energetic shape means that it feels very different to have the exact same neutral posture but internally to feel confident vs cringing. In one of the (sequence?) posts about genre savviness there is something like noticing that the compelling arguments being whispered in your ear are being whispered by a shadowy figure in rags and that it's perfectly alright to update on that fact. So what I'm pointing to here is that it is really useful to notice when the compelling content being whispered in your ear is from a hell realm being whose whole shtick is helplessness in the face of endless suffering. And especially that hell realm arguments seemingly spawned by someone smarter than yourself probably have something seriously wrong with their implications, directionally speaking. Noticing how the body feels is the equivalent to seeing the shadowy figure in black rags when we're talking about conceptual space and not the visual field.

For more on the realms, I recommend Opening the Heart of Compassion. For now, onto what to do about it. I'm going to gloss past some more tacit claims here about how psychology works in my experience, again, happy to try to clarify. The stack here is best described by the book Core Transformation, and is also present in summary form in the Titan chapter of OtHoC. 

There is an experience, the beginnings of which are described at the beginning of this post, which it feels accurate to describe as 'being attacked by a hell realm.' These experiences are something like Jhana, but for negative states. Tightly contracted attention on some aversive feeling tone in the body like helplessness, hopelessness, fear, separation/loneliness, disgust, etc. A solution to these problems is to open the aperture of attention wider to include neutral objects (most of the sensorium at any given time) but this doesn't feel available in the moment due to the threat of the fear object (tunnel vision etc.) A bunch of contemplative practices are oriented around making this degree of freedom on the attentional aperture available at all times. When it isn't directly available as an immediate move and all we have to work with are the contents themselves an analytical approach can be helpful. The underlying assumption that powers the whole investigation is that emotions are strategies to orient the organism usefully under some model of the world, and that those emotions won't be willing to pass until they're satisfied that we're actually dealing with the situation and not just pulling our usual bullshit of hand waving things we don't like away. This means investigating what the goal of the feeling is on the feeling's own terms, and feeling a genuine sense of connection and care for whatever that goal is, since it is some positive experience it wants you to have. More here. Or in more epistemic terms, emotions are entangled with useful information in the environment, and your system has safeguards against you throwing out that information until it is satisfied you've actually updated. I could say more but it's starting to feel like this could go in several directions so I'll leave it for now, might add more later based on any feedback.

Thanks to Justis Millis for feedback/editing

New Comment
10 comments, sorted by Click to highlight new comments since: Today at 1:00 PM

Could you state the problem and solution more succinctly?

problem: people think they/are trying to evaluate arguments when what's actually happening is that they're experiencing weird psychological effects that aren't contextualized well by western psychological theories. Understanding these psychological effects allows better separation of them from the underlying claims one would like to evaluate about the future.

That's some epic level metacognition! (the "stack trace"). Mine tends to be more "fluid" and less "technical", just something like "supercharged intution".

I'm going to say, for those readers who find the concepts used in this post to sound woo-ey: these are conceptual handles for real physical and biological processes. You could use different conceptual handles, including purely mathematical/logical ones, if they existed: what matters is to prime you cognition with a pattern that is isomorphic to the biological/neural/informational pattern you are trying to identify and interact with.

Yes, the realms can be thought of as clusters in the space of state-transition probabilities. Once you are in a realm, you are much more likely to move to another state within that realm rather than some other state outside it.

For intuition, here's a picture of a state-transition probability map for emotions: https://i.redd.it/7irpdyupbbfy.png

drawn black balls that I do not have adequate defenses for the informational content of directly.

I don't know what drawing black balls means in this context. Would someone be able to clarify?

Sorry, that should have been a link to the vulnerable world hypothesis https://onlinelibrary.wiley.com/doi/full/10.1111/1758-5899.12718

Feels like it needed an ending... So you could open up the aperture of attention to get equanimity, but then what about the arguments?  just ignore them since they're from a "hell realm" ? (That seems like it may lead to being unable to learn certain distressing information that is nonetheless true. 

PS I'm really enjoying the Opening the Heart of Compassion book!

The factual contents of the arguments become fine once disarmed of the emotional payload ime.

I may have misunderstood the structure of this "stream of consciousness version", but it concerns me that framing Eliezer Yudkowsky as "a hell realm being whose whole shtick is helplessness in the face of endless suffering" could introduce a new confounding emotional payload.

May I suggest a more charitable interpretation of what I'm trying to communicate here.