5.1 Post summary / Table of contents

Part of the Valence series.

Here in the final post of the Valence series, I will discuss how valence might shed light on three phenomena in mental health and personality: depression, mania, and narcissistic personality disorder. 

  • Section 5.2 gives some context: What kind of relationship do we expect a priori between algorithm-level mental components like “valence”, versus observable mental health syndromes and personality disorders? I’ll argue that we should expect salient clusters of symptoms that correspond to systematic changes in valence, but we should not expect this kind of analysis to account for all the symptoms that co-occur in real patients.
  • Section 5.3 discusses what happens if valence has a strong general negative bias—i.e., if almost all thoughts are negative valence. I will argue that the result is a good match to clinical depression. I’ll particularly discuss the inability to voluntarily move and think without unusual effort and willpower.
  • Section 5.4 discusses the opposite: what happens if valence has a strong general positive bias—i.e., if almost all thoughts are positive valence? I will suggest that the expected result is a pretty good match to mania.
  • Section 5.5 discusses what happens if valence is systematically extremized—i.e., if thoughts can have very positive valence, or very negative valence, but rarely in between. I will suggest that the result is a set of symptoms that seem to be a close match to narcissistic personality disorder.
  • Section 5.6 will wrap up the post and series, including a brief discussion of how it relates to my job description as an Artificial General Intelligence safety and alignment researcher.

5.2 Context: What are we expecting to find a priori?

We can think of the following indirect path to get from “root causes” to psychological observations & personality traits:

(Don’t scrutinize the red arrows—I just put them in randomly, to illustrate the idea that each layer can influence the layer below.) As illustrated by the bold text and thick arrows, we should expect to find salient clusters of symptoms that tend to co-occur because they flow from the same proximal cause: systematic changes to valence signals in the brain. But we should also not be surprised to find a mish-mosh of other algorithmically-unrelated symptoms that often appear along with those clusters of symptoms.

As argued in Post 1, valence is one of the most important ingredients in one of the most important algorithms in the brain. So we should expect:

  • Some possible root causes may happen to have a big systematic impact on valence. (But they’ll probably have other consequences too, and the details will differ among different root causes.)
  • Given the centrality of valence in the brain, if there is a big systematic change to valence, then it should have lots of obvious downstream effects on psychology and behavior.

As a consequence:

  • We should expect to find clusters of symptoms / behaviors that can be elegantly explained in terms of something happening to valence signals
  • We should also expect to find other symptoms / behaviors that commonly co-occur in practice, but cannot be explained in terms of valence. Instead, they are different consequences of the same root cause(s), and may have no relation whatsoever at the “algorithm level”.

For example, dopamine is centrally involved in valence signals, and meanwhile, off in an obscure corner of the brain, dopamine is also centrally involved in a little specialized circuit controlling prolactin hormone release. I firmly believe that, at the algorithm level, these two functions have nothing whatsoever to do with each other. But they both happen to involve dopamine, and thus they can cross-talk in some people—hence the somewhat rare “dysphoric milk ejection reflex” where there’s a flood of intense negative emotions upon milk let-down during lactation.

That example is meant to illustrate the perils of theorizing about psychology purely at the algorithm level. Don’t get me wrong—the algorithm level is great! There are lots of insights to be found there. This post will hopefully be an example. But we shouldn’t expect to find all the insights there. Some things in psychology can only be explained at other levels, including lower (biochemistry) and higher (culture).

5.3 If valence has a strong negative bias (i.e., almost every thought is negative valence), it should lead to a cluster of symptoms suspiciously close to clinical depression

Everyone has a range of thoughts, with varying valence. I claim that, in depression, there’s a strong offset towards negative valence. So for almost every thought you think (e.g. “I’m gonna get out of bed”), your brain immediately assesses that thought as a bad idea, tosses it out, and re-rolls for a new thought (cf. §1.3). For unusually appealing / motivating thoughts, like “I’m gonna scratch that really itchy bug bite right now”, I bet that even quite depressed, bedridden people will wind up executing that plan.

5.3.1 Voluntary motor and attention control can only happen with great effort

Going back to §1.3, valence is a control signal. When valence is negative, whatever thought you’re thinking tends to get thrown out, and the brain goes fishing for a new thought instead. When valence is positive, whatever thought you’re thinking tends to stick around. If that thought is part of a temporal sequence (e.g. you’re in the middle of singing a song), that sequence will continue. If that thought entails motor outputs (e.g. “I’m gonna stand up right now”), those motor outputs will actually happen.

If the valence of every thought gets pulled negative, the two most direct consequences are:

  • Voluntary motor control can only happen with great effort / willpower.
  • Voluntary attention control (a.k.a. “voluntary thinking”, a.k.a. “System 2”) can only happen with great effort / willpower.

If you’re confused by that, I’ll elaborate some potentially-confusing parts:

“Voluntary attention control”: As discussed in §3.3, I firmly believe that motor control and attention control are “the same kind of thing” in many ways. Both have “voluntary” output channels that are under the control of the brain’s “main” reinforcement learning system (§1.5.6), and both also have “involuntary” mechanisms that can be triggered by other brain systems, particularly innate reactions in the brainstem. See the table in §3.3.5 for examples of voluntary and involuntary motor control versus attention control.

“…a.k.a. ‘voluntary thinking’, a.k.a. ‘System 2’…”: There’s a 2019 blog post by Kaj Sotala that I heartily endorse: System 2 as working-memory augmented System 1 reasoning. I would summarize it as the idea that deliberate “System 2” reasoning entails thinking lots of thoughts in sequence, and relating them to each other by holding particular things in working memory. Voluntary attention control is the switchboard making this whole process work, and we learn to skillfully operate that switchboard through reinforcement learning over the course of our life experience.

“…can only happen with great effort / willpower”: In the diagram above with the two gaussians, I showed the extreme right tail of the red gaussian just barely squeezing into positive-valence territory. I’ll try to illustrate what that can mean in practice, with an example. Let’s say that you are currently motivated to stay in bed rather than get up, but let’s also say that this motivation is ego-dystonic (§2.6)—i.e., you want to want to get out of bed. Then motivated thinking / brainstorming (§3.3) will kick in, and with luck you’ll be able to concoct a thought that spins “I will get out of bed” in the most positive-valence light possible—you’ll call to mind all the great consequences and associations of getting out of bed, and you’ll avoid paying attention to all the unappealing aspects of getting out of bed, insofar as that’s possible. With luck, the result of this brainstorming process will be that your “Thought Generator” (§1.3) crafts a thought Θ that both involves a plan to immediately get out of bed and is assessed by your brain as having net positive valence—probably just barely net positive. And by forming that thought Θ, you will then, in fact, actually get out of bed. Now, everything I’ve written in this paragraph is a mechanistic third-person description, but think about what this same process would feel like “from the inside”: I claim that it’s exactly the kind of thing we’re talking about, when we casually say “I can get out of bed, but only with great effort / willpower”.

5.3.2 Anhedonia and other symptoms

Moving on, another famous aspect of depression is anhedonia (inability to feel pleasure). I’m not immediately sure whether the anhedonia of depression is upstream of negative valence, or downstream, or a different consequence of the same root cause, or something else. But I definitely think anhedonia is intimately related to negative valence, for reasons hinted at in §1.5.2.

And what about every other aspect of clinical depression? As best as I can tell, at least most of them are consequences of a global negative bias on valence. But in some cases, the story is a bit indirect and speculative. I hope what I’ve said is enough to pique interest in my valence-centric hypothesis of depression, so I’ll leave the story here, although I’m happy to chat more in the comments section.

5.3.3 Root causes

As in §5.2, nothing I’ve said so far is a claim about root causes. But still, what about root causes? I imagine there are a variety of them. For example, here’s a made-up example of obsessive-compulsive disorder (OCD) leading to depression (edited from this older post of mine):

  • If my current thought involves an immediate plan to wash my hands again, then it’s negative valence, because it reminds me of the fact that OCD is ruining my life and relationships.
  • If my current thought does not involve an immediate plan to wash my hands again, then it’s negative valence, because I will get sick and die.
  • I can’t just think a thought about something entirely unrelated to washing my hands and disease and OCD, because of constraints-on-thoughts stemming from “involuntary attention” associated with my anxiety (§3.3.5)

Maybe you’re thinking: OK, but then that just kicks the question one level back: what’s the root cause of the OCD here? But I don’t have a great answer.

Also, this is just one made-up example; even if it’s valid, I imagine that it’s one of many causes of depression, and I have no particular insight to offer.

In case you’re wondering, I also have no particular knowledge about treatments. If you’re suffering from depression, then dang, I’m really sorry; maybe try this general resource page.

5.4 If valence has a strong positive bias (i.e., almost every thought is positive valence), it should lead to a cluster of symptoms suspiciously close to mania

Here, the obvious consequence is that whatever plan happens to pop into your head seems to be a really really awesome plan, and therefore you will actually go and do it. Hence, we get consequences like impulsivity, terrible judgment, unrealistic optimism, and high energy.

Another major symptom of mania is psychosis. But I think that psychosis is basically not algorithmically related to valence. Instead I think psychosis is biochemically related to valence, because both are related to the dopamine system. I have a blog post with some (speculative) details: Model of psychosis, take 2.

OK, that’s what I do believe about psychosis. Why don’t I believe that psychosis is a direct consequence of positive valence? Several reasons (but note that I’m not certain of all these details):

  • Psychosis can happen in the absence of unusual positive valence—especially in schizophrenia. (There’s even such a thing as “psychotic depression”, although it’s less common.) As best as I can tell, the psychotic symptoms in schizophrenia are not wildly different from the psychotic symptoms in manic psychosis, although obviously we expect it to present differently to some extent because the psychosis is occurring in very different background contexts of co-occurring symptoms.
  • As discussed in §3.3, our sensory perceptions are generally constrained by our sensory inputs. If I want to sincerely believe that I’m scuba diving right now, I just can’t, no matter how strong my motivation. Thus, since sensory inputs are independent of valence, a valence bias cannot explain the visual and auditory hallucinations, delusions of reference, and so on, that occur in manic psychosis. (Per §3.3.1, attention-control and motor-control have an influence on perception on the margin, but I don’t think that’s adequate to explain these phenomena.)
  • I don’t think the content of hallucinations, delusions of reference, etc., is a perfect match to what we are motivated to see and believe, even after accounting for the §3.3.4 caveat that motivations are not always obvious.
  • Putting aside the origin of psychotic delusions, perhaps one could argue that their persistence is related to confirmation bias, which in turn is related to valence (§3.3). But I don’t buy that story either, because confirmation bias is not particularly related to positive valence. A big part of confirmation bias is that “the idea of changing one’s mind” has to be negative valence. And indeed, I don’t think it’s the case that mania involves a general unwillingness to change one’s mind. Quite the contrary—in the reports I’ve read, people talk about how a new idea will pop into their head, and it seems great, and they go with it, forgetting about whatever they were into a moment earlier. Thus, in mania, the psychotic delusions are persistent, but pretty much every other kind of thought, plan, and belief has unusually little persistence, I think. So I don’t think the persistence of psychotic delusions can be explained by a general positive bias on valence.

5.5 If valence is “extremized” (i.e., almost every thought is either very positive valence, or very negative valence, but rarely anywhere in between), it should lead to a cluster of symptoms suspiciously close to Narcissistic Personality Disorder (NPD)

Note: I could have alternatively drawn the purple curve as a wider gaussian.

NPD is one of the four “Cluster B personality disorders” listed in DSM-V; the others are borderline personality disorder (BPD), histrionic personality disorder (HPD), and antisocial personality disorder (ASPD) a.k.a. psychopathy a.k.a. sociopathy.

Contrary to what you might think, NPD is not especially related to the everyday meaning of “narcissism”; indeed, there’s a “narcissistic personality inventory” survey, but it turns out that NPD patients get the same score on the survey as controls (!!). The issue seems to revolve around self-esteem. A “narcissist”, as the term is used in everyday language, is a person who thinks they’re really special and great—they have high self-esteem by definition. Whereas an NPD patient need not think they’re really special and great. But if they don’t think that, then boy do they feel lousy about it. (As discussed in that paper, DSM-V emphasizes that “individuals with this disorder have a grandiose sense of self-importance”, but also notes that “vulnerability in self-esteem makes individuals with narcissistic personality disorder very sensitive to ‘injury’ from criticism or defeat”.)

I’m not too sure that an NPD diagnosis “carves nature at its joints”, and I am very open-minded to NPD having subtypes that are only superficially related. (I actually think antisocial personality disorder is like that, i.e. that it has at least two subtypes that are only superficially related.[1]) So the discussion here might only concern a subset of NPD. The discussion here is probably also somewhat applicable to BPD and HPD, although I’m not too sure about the details.[2]

Now let’s consider the hypothesis of “valence extremization”. What happens if almost every thought is either very positive valence, or very negative valence, but rarely anywhere in between? We might expect the following downstream consequences, among other things:

  • Unusual difficulty in talking or thinking about the world independently from how we feel about it: As discussed in §3.4, our brain treats valence as salient sense data which thus gets incorporated into our concepts, categories, and words. If valence signals are unusually strong in general, then presumably they would also play an unusually central role in beliefs, thinking, and communication. For example, there would be an unusually strong mental force for believing that if two things “go together” conceptually, then they must have the same valence.
  • Unusually strong halo effect, affect heuristic, and “splitting”: This is closely related to the above bullet point—again see §3.4. Jargon note: “Splitting” is where someone with NPD views a person they know as a perfect saint during some periods, and views the same person as irredeemably terrible during other periods. (Splitting is a symptom of BPD too.)
  • Unusually strong social status drive: I argued in the previous post that there’s an intimate connection between valence and social status. Well, if all of your valence signals are unusually high or low, then presumably social status signals wind up being unusually strong too. More concretely, suppose I have NPD, and I’m doing “splitting” where people are either wonderful or terrible. Suppose further that I mentally model (by empathetic simulation) what other people think of me. My brain will implicitly assume that they’re splitting too, i.e. that they think that I’m either wonderful or terrible, which in turn feels extremely motivating or aversive respectively, thanks to my social status drive.[3]

As far as I can tell, this cluster of symptoms (and more that I’ve omitted) is a decent match to NPD. I think it especially resonates with this thought-provoking essay by the late Emma Borhanian. (In fact, I was reading that essay when the hypothesis of this section first popped into my head. But my theory is different from Emma’s.)

Two more quick things:

Root causes? As in the previous sections, if “valence extremization” is a proximate cause of NPD, you may still be wondering what root cause leads to “valence extremization”. My answer is: I have no idea, sorry.

What’s the “opposite” of NPD? Food for thought: If mania and depression correspond to equal-and-opposite distortions of valence signals, then what would be the opposite of NPD, i.e. what would be a condition where valence signals stay close to neutral, rarely going either very positive or very negative? I don’t know, and maybe it doesn’t have a clinical label. One thing is: I would guess that it’s associated with a “high-decoupling” (as opposed to “contextualizing”) style of thinking.[4]

5.6 Conclusion

5.6.1 Conclusion of this post

I’ll reiterate that I’m very far from an expert on mental health or personality disorders, and this post is pretty speculative. I am blessed by a lack of real-world experience with depression, mania, or NPD; rather I’m trying to piece things together from stuff I’ve read. Hopefully there’s at least some food for thought here. As usual, please reach out (in the comments section or email) if you want to chat about this more!

5.6.2 Conclusion of the whole series

Thanks for sticking it out to the end! I hope that I have convinced you that valence is indeed an extraordinarily important part of everyday mental life, and that pondering valence for 26,000 words is a good way to illuminate and crystallize a wide variety of phenomena that might otherwise be confusing.

I started writing this series because I recently had two valence-related “aha” moments (the social status thing in Post 4, and the Narcissistic Personality Disorder thing in §5.5), and wanted to write a short post about them, and “valence” was a convenient hook that would tie them together and allow me to write about both at once. But that short post turned into a long post, and then a whole series, as I kept finding that, the more I thought about valence, the more phenomena I found that were just beautifully clicking into place!

As my regular readers know, my long-term work goal is researching alignment and safety for possible future brain-like Artificial General Intelligence (AGI). I have long been interested in Narcissistic Personality Disorder and social status drive (among many other things) because both seemed likely to shed some light on how human social instincts work, which in turn is connected to brain-like AGI safety for reasons briefly summarized here. Valence also has a more direct connection to AGI safety via understanding motivation—see my valence-based “plan for mediocre alignment”.

Unfortunately, I can’t say that writing this series has given me new concrete ideas for programming future safe & beneficial AGI, beyond what I already knew before I started. But I think I got some mental frameworks that will be useful going forward. In particular, I think §3.4 helps me think more clearly about what’s really going on with my “plan for mediocre alignment”. (As it happens, the update is in the pessimistic direction, although not very strongly. I may write about this in a separate post sometime.)

I also feel like I now have my “foot in the door” on how innate status drive works in the human brain, which is very exciting to me. Obviously I don’t want our AGIs to have innate status drive (cf. the Padme meme I put in §4.5), but I do think we might want our AGIs to have compassion. Unfortunately, the “innate compassion drive” is still pretty mysterious to me, as of this writing, but I think compassion drive might have structural overlap with status drive, in the specific sense that I expect both to rely centrally on transient empathetic simulations (more discussion here). So hopefully this “foot in the door” towards understanding innate status drive will ultimately constitute meaningful progress towards safe and beneficial AGI, even if it’s still several steps removed. To be explicit:

  • The next step might look like my fleshing out §4.5 into a theory of human innate status drive with a similar level of detail as my laughter post, i.e. getting all the way to specific pseudocode mapped to particular hypothesized neuroanatomical connections and logic.
  • Then the next step after that, with luck, might look like a somewhat-analogous hypothesis for whatever innate drives are upstream of compassion.

This is very high on my list of things to try in 2024! But it might take a long time, and/or I might get stuck. See how it goes.

In contrast to status drive, I’m now much less interested in NPD and other personality disorders than I was before I came up with the §5.5 idea, and I’m correspondingly moving personality disorders much lower on my list of urgent research priorities. (I still have much more that I’d like to learn about them! Alas, there’s only so much time in the day.) An analogy: If someone is trying to understand the detailed mechanism of how car engines work, it’s not very useful for them to understand what goes wrong when they get a flat tire, even though a flat tire prevents the engine from accomplishing what it normally accomplishes (i.e., moving the car forward quickly). By the same token, my current guess is that further studying personality disorders would not offer much illumination into the nuts-and-bolts mechanisms underlying human social instincts. To be clear, I don’t think this guess was obvious a priori, and it still might be wrong.

Well, thanks again for reading! Again, please reach out (in the comments section or by email) if you want to talk about valence, this series, or whatever else.

Thanks to Seth Herd, Aysja Johnson, Justis Mills, Charlie Steiner, Adele Lopez, and Garrett Baker for critical comments on earlier drafts. Thanks tailcalled for some helpful discussions and references related to this post.

  1. ^

    This is getting off-topic, but I currently think that some cases of antisocial personality disorder involve globally low arousal levels (see here), and other cases involve being unusually quick to anger. At a root-cause level, these are wildly different—probably anticorrelated, if anything. But they have some superficial overlap of symptoms / presentation, so they get lumped together in clinical practice. (I’m very interested in feedback—does this hot-take ring true or false to you?)

  2. ^

    My current vague impression (e.g. based on this) is that BPD tends to involve “strong emotions” of all sorts, and extremized valence can happen incidentally as a consequence. Whereas I currently guess NPD is more centered around this valence story. I don’t know anything about HPD. I feel very uncertain about all of this, and enthusiastically welcome people’s ideas and discussion.

  3. ^

    Fine print: Perhaps I shouldn’t have said that NPD people have an “unusually strong social status drive” per se; rather, they have a normal innate social status drive in their brain, but the inputs feeding into this circuit are unusually strong, and thus the circuit sends unusually strong outputs.)

  4. ^

    At this point, my contextualizer readers are saying “Hey, he’s insulting me! After all, NPD is bad, and now he’s saying decoupling is the diametric opposite of NPD, so he’s basically saying decoupling is good and therefore that contextualizing is bad and therefore that I’m bad! I resent that, sir!!” Hopefully it goes without saying that I don’t mean to imply that—after all, I’m a high-decoupler, I don’t think that way!

New Comment
7 comments, sorted by Click to highlight new comments since: Today at 8:16 PM

From my own study of mood disorders I generally agree with your valence theory of depression/mania.

However I believe the primary cause (at least for most people today) is disrupted sleep architecture.

To a first order approximation, the brain accumulates batch episodic training data during the day through indexing in the hippocampus (which is similar-ish to upper cortex, but more especially adapted to medium term memory & indexing). The brain's main episodic replay training then occurs during sleep, with alternation of several key phases (REM and several NREM) with unique functional roles. During NREM (SWS in particular) the hippocampus rehearses sequences to 'train' the cortex via episodic replay. (Deepmind's first atari RL agent is based on directly reverse engineering this mechanism).

But the REM sleep is also vitally important - and it seems to globally downscale/prune synaptic connections, most specifically the weakest and least important. It may also be doing something more complex in subtracting out the distribution of internally generated data ala Hinton's theories (but maybe not, none of his sleep wake algos actually work well yet).

Regardless the brain does not seem to maintain synaptic strength balance on the hourly timescale. Instead median/average synaptic strength slowly grows without bound during the waking state, and is not correctly renormalized until pruning/renormalization during sleep - and REM sleep most specifically.

This explains many curious facts known of mania and depression:

  • The oldest known treatment for depression is also completely (but only temporarily) effective: sleep deprivation. Depression generally does not survive sleep deprivation.

  • Sleep is likewise effective to treat full blown mania, but mania inhibits sleep. One of the early successes in psychiatry was the use of sedatives to treat severe mania.

  • Red light interferes with the circadian rhythm - specifically serotonin->melatonin conversion, and thereby can disrupt sleep architecture (SAD etc)

  • SSRIs alter effective serotonin transport quickly but take a week or more to have noticeable effects on mood. Serotonin directly blocks REM - REM sleep is characterized (and probably requires) a near complete absence of monoamine neurotransmitters (histamine, serotonin and norepinephrine).

  • Lithium - a common treatment for bipolar - is a strong cellular circadian modulator and sleep stabilizer.

So basically the brain does not maintain perfect homeostatic synaptic normalization balance on short timescales. During wake synapses tend to strengthen, and during REM sleep they are pruned/weakened. Balancing this correctly seems to rely on a fairly complex sleep architecture, disruptions to which can cause mood disorders - not immediately, but over weeks/months.

But why does mean synaptic strength imbalance effect mostly mood and not say vision or motor control? Every synapse and brain region has a characteristic plasticity timescale that varies wildly. Peripheral lower regions (closer to sensors/motors) crystallize early and have low learning rate/plasticity in adults, so they aren't very susceptible. At any one time in life the hippocampal -> cortical episodic replay is focusing on particular brain modules, and in adults that focus is mostly on upper regions (PFC etc) that mostly store current plans, consequences, etc that are changing more rapidly.

Thus the upper brain regions that are proposing and computing the valence of various (actual or mental) actions as 'dopaminergic bids' with respect to current plans/situations are the most sensitive to synaptic norm imbalance, because they change at higher frequency. Of course if a manic stays awake long enough they do in fact progress to psychosis similar to schizophrenia.

Very interesting, thanks! I hadn’t thought that before, but now I agree with parts of what you said.

One thing is: I think you present some suggestive evidence that REM sleep is an important intervention point that can help mitigate depression / mania. But I think you haven’t presented much evidence that REM sleep abnormalities are (usually) the root cause that led to the depression / mania starting in the first place. Maybe they are, maybe not, I dunno.

Certainly mood disorders like bipolar,depression,mania can have multiple causes - for examle simply doing too much dopaminergic simulants (cocaine, meth etc) can cause mania directly.

But the modern increased prevalence of mood disorders is best explained by a modern divergence from conditions in the ancestral environment, and sleep disorder due to electric lighting disrupting circadian rhythms is a good fit to the evidence.

The evidence for each of my main points is fairly substantial and now mainstream, the only part which isn't mainstream (yet) is the specific causal mechanism linking synaptic pruning/normalization to imbalance in valence computing upper brain modules (but it's also fairly straightforward obvious from a DL perspective - we know that training stability is an intrinsic likely failure mode).

A few random links:

REM and synaptic normalization/pruning/homeostasis:

Sleep and Psychiatric Disorders:

The effectiveness of circadian interventions through the blue light pineal gland serotonin->melatonin pathway is also very well established: daytime bright light therapy has long been known to be effective for depression, nighttime blue light reduction is now also recognized as important/effective, etc.

The interventions required to promote healthy sleep architecture are not especially expensive and are certainly not patentable, so they are in a blindspot for our current partially misaligned drug-product focused healthcare system. Of course there would be a market for a hypothetical drug which could target and fix the specific issues that some people have with sleep quality - but instead we just have hammers like benzos and lithium which cause as many or more problems than they solve.

What’s the “opposite” of NPD? Food for thought: If mania and depression correspond to equal-and-opposite distortions of valence signals, then what would be the opposite of NPD, i.e. what would be a condition where valence signals stay close to neutral, rarely going either very positive or very negative? I don’t know, and maybe it doesn’t have a clinical label. One thing is: I would guess that it’s associated with a “high-decoupling” (as opposed to “contextualizing”) style of thinking.[4]

I listened to this podcast recently (link to relevant timestamp) with Arthur Brooks. In his work (which I have done zero additional research on and have no idea it's done well or worth engaging with), he divides people into four quadrants based on having above/below average positive emotions and above/below average negative emotions. He gives each quadrant a label, where the below/below ones are called "judges", which according to him are are "the people with enormously good judgment who don't get freaked out about anything".

This made sense to me because I think I'm squarely in the low/low camp, and I feel like decoupling comes extremely natural to me and feels effortless (ofc this is also a suspiciously self-serving conclusion). So insofar as his notion of "intensity and frequency of emotions" tracks with your distribution of valence signals, the judges quarter would be the "opposite" of NPD -- although I believe it's constructed in such a way that it always contains 25% of the population.

Great post! Let me just add a few observations that I've made myself doing mood swings and depressive and hypomanic states:

There seems to be a close link between energy levels, belief in ones ability to achieve ones goals, and confidence.

The feeling of energy gives us the idea that we can deal with the problems which are troubling us, and we always extrapolate our current state into the entire future we imagine, so as long as we have energy, everything will be alright. But if you think about your future at night when you're exhausted and low-energy, projecting this state into the future, you might start to doubt your ability to deal with the future and achieve your goals.

The feeling of energy is very close to self-esteem. I used to have mood swings, so it became very apparent to me how my levels of energy and my belief in myself changed together. As my subjective evaluation of myself changed, as would my evaluation of all problems and challenges. What seemed like impossible challenges would warp into trivial problems, as if my evaluation of my own ability was inversely proportional to my evaluation of challenges. A last thing which seemed to scale was the exploration vs exploitation axis. At the depressive extreme, any minor risk would worry me. At the manic extreme, gambling seemed like a fun idea, and I somehow always believed that the outcome would be in my favor (and if it wasn't, so what? Any loss felt insignificant)

This momentum towards better states (and the feeling of growth) seems like a core part of human nature. I'm even somewhat confident that confidence is the ratio between perceived victory and perceived defeat, with memories of both serving as "evidence" that the brain uses. So why does confidence change over the course of a day? Perhaps it's due to state-dependent learning, and because the evaluation of the evidence depends on your current state. I also think that the feeling of "getting stuck" is a lack of momentum, and that depression is "learning" that all strategies available to you are no good.

Just curious, have you encountered the neurotransmitter serotonin and its role in the human body? I think the mechanisms behind it (and other neurotransmitters) have far reaching relevance to the alignment problem compared to valence.

I think serotonin does different things in different parts of the brain (and body), and that none of those things are particularly relevant to Safe & Beneficial AGI, so I don’t plan to write about it.

That’s my opinion, and I’m well aware that other people would disagree with me about that.

If you think you have some serotonin-inspired insight into the alignment problem, you’re welcome to write it up.  :)