Kaj_Sotala's Comments

Healing vs. exercise analogies for emotional work

"Emotional work is endless boring cleaning" doesn't sound as attractive as either healing or exercise, though. :-)

Healing vs. exercise analogies for emotional work

Yeah, I was specifically thinking about persistent ongoing work more than occasional epiphanies, though of course sometimes the epiphanies can actually be transforming too, and ongoing work is likely to eventually produce them.

The two-layer model of human values, and problems with synthesizing preferences

Hmm... it's hard for me to get what you mean from a comment this short, but just the fact that I seem to have a lot of difficulty connecting your comment with my own model suggests that I didn't communicate mine very well. Could you say more about how you understood it?

2018 Review: Voting Results!

That's an interesting measure, let's plot that too. (Ranks reversed so that rank 1 is represented as 74, rank 2 as 73, and so on.)

Reverse rank vs. karma

The two-layer model of human values, and problems with synthesizing preferences

Great comment, thanks!

Is it really "wrong"? It's a normative assumption ... we get to decide what values we want, right? As "I" am a character, I don't particularly care what the player wants :-P

Well, to make up a silly example, let's suppose that you have a conscious belief that you want there to be as much cheesecake as possible. This is because you are feeling generally unsafe, and a part of your brain has associated cheesecakes with a feeling of safety, so it has formed the unconscious prediction that if only there was enough cheesecake, then you would finally feel good and safe.

So you program the AI to extract your character-level values, it correctly notices that you want to have lots of cheesecake, and goes on to fill the world with cheesecake... only for you to realize that now that you have your world full of cheesecake, you still don't feel as happy as you were on some level expecting to feel, and all of your elaborate rational theories of how cheesecake is the optimal use of atoms start feeling somehow hollow.

Is the Reversal Test overrated?

Worth noting that the original paper mentions several potential reasons to prefer the status quo, which can in fact be valid arguments rather than bias. Your body temperature example is an instance of the first one, the argument from evolutionary adaptation:

Obviously, the Reversal Test does not show that preferring the status quo is always unjustified. In many cases, it is possible to meet the challenge posed by the Reversal Test and thus to defeat the suspicion of status quo bias. Let us examine some of the possible ways in which one could try to do this [...]

The Argument from Evolutionary Adaptation

For some biological parameters, one may argue on evolutionary grounds that it is likely that the current value is a local optimum. The idea is that we have adapted to live in a certain kind of environment, and that if a larger or a smaller value of the parameter had been a better adaptation, then evolution would have ensured that the parameter would have had this optimal value. For example, one could argue that the average ratio between heart size and body size is at a local optimum, because a suboptimal ratio would have been selected against. This argument would shift the burden of proof back on somebody who maintains that a particular person’s heart—or the average human heart-tobody-size ratio—is too large or too small. [...]

The Argument from Transition Costs

Consider the reluctance of the United States to move to the metric system of measurement units. While few would doubt the superiority of the metric system, it is nevertheless unclear whether the United States should adopt it. In cases like this, the transition costs are potentially so high as to overwhelm the benefits to be gained from the new situation. Those who oppose both increasing and decreasing some parameter can potentially appeal to such a rationale to explain why we should retain the status quo without having to insist that the status quo is (locally) optimal. [...]

The Argument from Risk

Even if it is agreed that we are probably not at a local optimum with respect to some parameter under consideration, one could still mount an argument from the risk against varying the parameter. If it is suspected that the potential gains from varying the parameter are quite low and the potential losses very high, it may be prudent to leave things as they are (fig. 2).
2018 Review: Voting Results!

The "Click Here If You Would Like A More Comprehensive Vote Data Spreadsheet" link includes both vote totals and karma, making it easy to calculate the correlation using Google Sheet's CORRELATE function. Pearson correlation between karma and vote count is 0.355, or if we throw away the outlier of Affordance Widths that was heavily downvoted due to its author, 0.425.

Scatterplots with "Affordance Widths" removed:

Karma vs. Total

Total vs. karma

Player vs. Character: A Two-Level Model of Ethics

I didn't feel like I fully understood this post at the time when it was written, but in retrospect it feels like it's talking about essentially the same thing as Coherence Therapy does, just framed differently.

Any given symptom is coherently produced, in other words, by either (1) how the individual strives, without conscious awareness, to carry out strategies for safety or well-being; or (2) how the individual responds to having suffered violations of safety or well-being. This model of symptom production is squarely in accord with the constructivist view of the self as having profound if unrecognized agency in shaping experience and behavior. Coherence therapy is centrally focused on ushering clients into a direct, noninterpretive experience of their agency in generating the symptom.

Symptom coherence was also defined by Ecker and Hulley (2004) as a heuristic principle of mental functioning, as follows: The brain-mind-body system can purposefully produce any of its possible conditions or states, including any kind of clinical symptom, in order to carry out any purpose that it is capable of forming.

This principle of general coherence is, of course, quite foreign to the therapy field’s prevailing, pathologizing models of symptom production. Underscoring the paradigmatic difference, Ecker and Hulley (2004, p. 3), addressing trainees, comment:

You won’t fully grasp this methodology until you grasp the nimble, active genius of the psyche not only in constructing personal reality, but also in purposefully manifesting any one of its myriad possible states to carry out any of its myriad possible purposes. The client’s psyche is always coherent, always in control of producing the symptom—knowing why and when to produce it and when not to produce it.

-- Toomey & Ecker 2007 (sci-hub)

Reality-Revealing and Reality-Masking Puzzles
Over the last 12 years, I’ve chatted with small hundreds of people who were somewhere “in process” along the path toward “okay I guess I should take Singularity scenarios seriously.” From watching them, my guess is that the process of coming to take Singularity scenarios seriously is often even more disruptive than is losing a childhood religion. Among many other things, I have seen it sometimes disrupt:

I feel like I was hit by most of these disruptions myself, and eventually managed to overcome them. But the exact nature of how exactly I overcame them, suggests to me that there might be one more piece to the puzzle which hasn't been mentioned here.

A concept which I've seen thrown around in a few places is that of an "exile-driven life"; "exile" referring to the Internal Family Systems notion of strong painful feelings which a person is desperate to keep buried. Your life or some aspect of your life being exile-driven, means that keeping those painful feelings suppressed is one of the primary motivations behind your choices. The alcoholic who drinks to make their feelings of shame go away is exile-driven, but one can also have an exile-driven career that looks successful from the outside, or an exile-driven relationship where someone is primarily in the relationship for the sake of e.g. getting validation from their partner, and gets desperate whenever they don't get enough of it.

In retrospect, it looks to me like most of my disruptions - such as losing the belief of having a right to rest etc. - were ultimately linked to strong feelings of moral obligation, guilt, and worthlessness which have also popped up in other contexts. For example, it has happened more than once that a friend has gotten very depressed and suicidal, and then clutched onto me for help; and this has triggered exactly the same kind of reasoning as the various Singularity scenarios. "What right do I have to rest when this other person is much more badly off", and other classic codependency symptoms. (Looking at that list of codependency symptoms actually makes for a very interesting parallel to "Singularity disorder", now that I think of it.)

Now, I do agree that there's something to the "eliminating antibodies" framing - in each of those cases, there have been related thoughts about consequentialism and (this was particularly toxic) heroic responsibility saying that yes, if I don't manage to help this person, then their suffering and possibly death is my fault.

But the "eliminating antibodies" framing is something that suggests that this is something that could happen to anyone. And maybe it could: part of my recovery involved starting to explicitly reject excessive consequentialism and utilitarianism in my thinking. Still, it wasn't until I found ways to address the underlying emotional flaws themselves, that the kinds of failure modes that you described also started fixing themselves more thoroughly.

So at least my own experience was less of "eliminating these antibodies caused me to overgeneralize factual beliefs", as "there were pre-existing parts of my mind that believed that I was worthless, and all the rationalist stuff handed them even more evidence that they could use for making that case, eliminating existing defenses against the belief". If I hadn't had those pre-existing vulnerabilities, I suspect that I wouldn't have been disrupted to the same extent.

Qiaochu and others have been making the observation that the rationalist community seems to have a large share of people who are traumatized; it's been remarked that self-improvement communities in general attract the walking wounded. At my IFS training, it was remarked that manager parts that are struggling to keep exiles in bay tend to be really strongly attracted into any systems which offer a promise of control and predictability, such as what you might get from the original Sequences - "here are the mathematically correct ways of reasoning and acting, just follow these instructions and you're doing as well as a human can!". There's the thought that if only you can work yourself hard enough, and follow the dictates of this new system faithfully enough, then the feelings of guilt and worthlessness will stop. But since consequentialism is more demanding than what any human is ever capable of, you can never say "okay, now I've done enough and can rest", and those feelings of worthlessness will just continue to recur.

This would suggest that not only are there pre-existing vulnerabilities that make some people more susceptible to being disrupted by rationalist memes, those are also exactly the same kinds of people who frequently get drawn to rationalist memes, since in the view of some of their parts, the "disruption" is actually a way to redeem themselves.

Why Do You Keep Having This Problem?

Closely related to this is the issue where people try to do something, fail, and figure that they will "try harder" the next time. Frequently this just means that they will fail again, because their understanding of what they are doing wrong isn't sufficiently gears-level to allow them to isolate the bug in question; "I will try harder" tends to mean "I don't know why exactly I failed, so I'll just try again and hope that it works this time around".

I did some peer coaching at one point, and a common thing was that one of us would make a plan in order to do Y; a week later, the plan had failed and Y remained undone. The one doing the coaching would then ask what went wrong and how the other person could fix that failure, producing a revised plan for next week. Often, drilling down would produce something specific, such as "I had planned to get exercise by going out on a run, but then I was busy on a few days and it rained on the rest". Then you could ask yourself what kind of a plan would avoid those failure modes, and generate a less fragile approach.

That makes "why do I keep having this problem" and "what have I tried before and why hasn't it worked" very useful questions, and might help reframe failure not as failure, but as progress towards solving the goal. Yes, you didn't succeed at the goal right away, but you got more information about what works and what doesn't, making you better-positioned to solve it the next time around.

This is also good to combine with Murphyjitsu - after forming a new plan, imagining that plan to have failed and asking yourself how surprising that possibility would feel. If it wouldn't feel very surprising at all, ask your brain what it expects the failure to have been caused by, and plan around that.

It's also worth noting that sometimes you go through a few iterations of this and new bugs just keep popping up, or alternatively it feels like you can't imagine anything in particular that would go wrong, but your inner simulator still expects this to fail. That might point to there being some emotional issue, such as your brain predicting that success will be dangerous for some reason. That's then another thing that you can try to tackle.

Load More