Why Everyone (Else) Is a Hypocrite: Evolution and the Modular Mind
Concept Safety
Multiagent Models of Mind
Keith Stanovich: What Intelligence Tests Miss


chinchilla's wild implications

so that the people who end up reading it are at least more likely to be plugged into the LW ecosystem and are also going to get exposed to arguments about AI risk.

There's also the chance that if these posts are not gated, people who previously weren't plugged into the LW ecosystem but are interested in AI find LW through articles such as this one. And then eventually also start reading other articles here and become more interested in alignment concerns.

There's also a bit of a negative stereotype among some AI researchers as alignment people being theoretical philosophers doing their own thing and being entirely out of touch about what real AI is like. They might take alignment concerns a bit more seriously if they find it easy to actually find competent AI discussion on LW / Alignment Forum.

Using GPT-3 to augment human intelligence

Some of the GPT-for-fiction-writing sites, e.g. WriteHolo, at least offer tools for saving and organizing the outputs. (Though WriteHolo uses GPT-J rather than GPT-3 and hasn't been trained to respond to queries directly the way that GPT-3's InstructGPT version has, so it may require more creative prompt engineering.)

Shard Theory: An Overview

Where his shards once just passively dealt with the consequences effected by other shards via their shared motor output channel, they are now intelligent enough to plan scheme at the other shards. Say that you're considering whether to go off to big-law school, and are concerned about that environment exacerbating the egoistic streak you see and dislike in yourself. You don't want to grow up to be more of an egotist, so you choose to avoid going to your top-ranked big-law-school offer, even though the compensation from practicing prestigious big-shot law would further your other goals. [...]

There are some human phenomena that shard theory doesn't have a tidy story about. The largest is probably the apparent phenomenon of credit assignment improving over a lifetime. When you're older and wiser, you're better at noticing which of your past actions were bad and learning from your mistakes. Possibly, this happens a long time after the fact, without any anti-reinforcement event occurring. But an improved conceptual understanding ought to be inaccessible to your subcortical reinforcement circuitry -- on shard theory, being wiser shouldn't mean your shards are reinforced or anti-reinforced any differently.

How does the mechanism in these two examples differ from each other? You seem to be suggesting that the first one is explainable by shard theory, while the second one is mysterious. But aren't they both cases of the shards having some kind of a conceptual model of the world and the consequences of different actions, where the conceptual model improves even in cases where it doesn't lead to immediate consequences with regard to the valued thing and thus can't be directly reinforced?

Meditation course claims 65% enlightenment rate: my review

In part he was probably just expressing the point humorously as is his style, in part for most Westerners it's probably easier to be friends with skeptics than with meditators with the right competencies. (Especially since even knowing who would have the right competencies is highly nontrivial question.)

Meditation course claims 65% enlightenment rate: my review

Note that the people Martin studied were systematically wrong about what they looked like to the external observers.

That's not what your quote is saying, though - it specifically says that the interviewer asked the person about his internal state, not what he looked like to external observers. The person's report is consistent with the hypothesis that while he is still experiencing the physical symptoms of stress, those have stopped causing him suffering. 

Also it's not clear for how many other people this was the case; Martin's paper says that 

The same was observed in a total of three participants and I went on to conduct other experiments into this. The overall suggestion from the data was a disconnect between the internal subjective experience in these participants and other parts of their psychology and physiology. While this was especially pronounced during times of high stress it seemed more broadly measurable. Two examples illustrate aspects of this.

This wording is ambiguous for exactly how many participants the "claims to be stress-free even when they are exhibiting physical signs of stress" was observed. "The same was observed in a total of three participants" suggests that this specific thing was only observed in 3/50 of the participants and that for the rest, other things were observed that Martin decided to lump into the same category. 

The "two examples" mentioned say that the participants thought they had more bodily awareness than they did, and that they said they couldn't be racist while still showing signs of implicit racism. These examples seem to be about overconfidence of what their internal experience implies, rather than about them being mistaken of what their internal experience is. The "bodily awareness" example is also ambiguous:

I arranged and observed private yoga sessions with a series of participants as part of a larger inquiry into their bodily awareness. During these sessions it became clear that participants believed they were far more aware of their body than they actually were. For example, the instructor would often put her hand on part of the body asking the participant to relax the tense muscles there, only to have the participant insist that s/he was totally relaxed in that area and did not feel any muscle tension.

In that example, are the participants really claiming that they have complete bodily awareness, or are they just reporting that they cannot find any tension? If someone were to tell me to relax tense muscles in a part of my body that felt totally relaxed to me, I might also say that I feel totally relaxed and can't find any muscle tension. Not because I thought I had perfect bodily awareness, but because I can't relax the tension if I can't feel it, so I want to explain why I can't follow the instruction that I'm given.

Also the implicit racism was measured using the Implicit Association Test, whose reliability is rather dubious, but I'm more willing to let that one slide. I've met enough advanced meditators who are very visibly overconfident about their unbiasedness that my stance here is "yeah that definitely happens". :-) In general I do find it easy to believe that people who've reached various states of enlightenment are often overconfident about what that gets them, but that feels like a weaker claim than "they are suffering more and noticing it less". 

It is also possible that the participants for which something like this was observed - again, 3/50 so only 6% - thought they were enlightened while actually being dissociated... since descriptions of "enlightenment" and dissociation sound very similar and can be hard to distinguish from the outside. (The difference is very apparent if you've personally experienced both, though.) I can't resist the opportunity to drop in one of my favorite quotes from meditation, from a meditation teacher I went to a retreat with:

There’s no way to tell enlightenment and delusion apart from the inside, so you should have feedback and you should have friends who think that Buddhism is stupid. They are willing to listen to you talk about it because you're their friend and it's one of your interests, but if you start talking about how you've become enlightened, they'll tell you how you're full of shit.

Flash Classes: Polaris, Five-Second Versions, and Thought Lengths

You can point at goal factoring and turbocharging, and recognize ways in which the first person in each example is sort of missing the point. Those first three people, as described, are following the rules sort of just because—they’re doing what they’re supposed to do, because they’re supposed to do it, without ever pausing to ask who’s doing the supposing, and why. 

This might just be nitpicking, but given that the very same post is also talking about how valuable it is to get genuinely curious about why people might not be learning well, it seems worth mentioning...

My first reaction reading those examples was not that the people in question were missing the point. Rather I took it to mean that they were at a stage of their learning where they were learning the basic technique and did not yet have it automated enough to have any working memory to spare to also think about the big picture. At such a point, it doesn't seem clear to me that stopping to ask questions about the big picture even would be beneficial; procedural "how" understanding and conceptual "why" understanding usually develop hand-in-hand, so you can't reason about the "why" very well before you have enough of the "how" down (and vice versa).

Of course it's possible to get stuck in only the "how" and not even try to understand the "why", but to me the examples as written don't convey that these people are making that particular mistake.


Hmm, maybe I'm just so used to thinking in terms of felt senses that I interpret all aliefs as being felt senses? :) E.g. when I hear the word "alief", I usually think of Scott Alexander's haunted house story, as well as these examples from Wikipedia:

For example, a person standing on a transparent balcony may believe that they are safe, but alieve that they are in danger. A person watching a sad movie may believe that the characters are completely fictional, but their aliefs may lead them to cry nonetheless. A person who is hesitant to eat fudge that has been formed into the shape of feces, or who exhibits reluctance in drinking from a sterilized bedpan may believe that the substances are safe to eat and drink, but may alieve that they are not.

And all of these aliefs feel like they would have distinct felt senses associated with them, e.g. if you cry at a sad movie because one of the characters died or lost something important to them, you probably have some kind of a felt sense of loss. If you're afraid or disgusted, you have felt senses corresponding to both the general emotion as well as the more specific anticipation. It feels a little hard for me to imagine an alief that wouldn't have a felt sense associated with it.

Though you could argue that while an alief implies a felt sense, a felt sense doesn't necessarily imply an alief, and that steampunk vs. forest pictures differ in their felt senses but not their aliefs. I guess that depends on how exactly we're defining an alief. I'm thinking of it as something like "a belief embedded in an implicit predictive model of the world". 

In that frame, the felt senses evoked by different kinds of pictures reflect unconscious beliefs about what kinds of things are associated with the specific things in the pictures E.g. there was relatively little specifically Victorian in the steampunk pictures, but that was a word that popped to my mind anyway when trying to describe the vibe in them, because of the more general steampunk <-> Victorian association. Aesthetics such as "steampunk" also seem to encode more complicated belief networks and predictions, as you've argued yourself. :)

There's also the consideration that aliefs seem to activate not just beliefs, but also behavioral dispositions. E.g. if you're standing on a transparent balcony or spending the night in a mansion that's supposedly haunted, your aliefs may activate flight-type responses. (That was the most top-voted response to Scott's haunted mansion post: that it's less about there being a secret belief about the mansion really being haunted, and more about the mind being semi-hardwired to activate fear responses when you're alone at night in an unfamiliar place with lots of weird sounds.) I think some of the more subtler changes you're referencing to are something like changing activation levels in subsystems responsible for behavioral dispositions. E.g. looking from the steampunk pictures to the forest ones may cause the predominant input to be registered as "more safe", which slightly downgrades the priority of systems with an objective of making yourself small in order to hide from threats, which may be subjectively experienced as a subtle sense of the mind opening up. And I think of those kinds of changes in behavioral dispositions as being driven by predictions (e.g. an environment being safe vs. unsafe) made on an alief level.

Anatomy of a Dating Document

I think there's also a trend for academics to date other academics. How come those random forests didn't detect it?

Likely because all of the people in question were academics, or at least undergraduate ones, so there was no chance to detect differences in how they matched with non-academics:

Sample A consisted of 163 undergraduate students (81 women and 82 men; mean age = 19.6 years, SD = 1.0) who attended one of seven speed-dating events in 2005. Sample B consisted of 187 undergraduate students (93 women and 94 men; mean age = 19.6 years, SD = 1.2) who attended one of eight such events in 2007. Sample size was determined by the number of speed-dating events we were able to hold in 2005 and 2007 and the number of participants we were able to recruit for each event while maintaining an equal gender ratio. All participants, who were recruited via on-campus flyers and e-mails to participate in a speed-dating study, had the goal of meeting and potentially matching with opposite-sex participants. [...]

The present results were obtained with undergraduate samples; a more demographically diverse sample might exhibit matching by sociological factors such as age, socioeconomic status, cultural background, or religious background.

Anatomy of a Dating Document

I'm particularly impressed by Joel et al 2017, where participants spent half an hour filling out several hundred of the most useful survey/psych questions (many extremely similar to what 'date documents' record) first, and past the global 'hot or not' ratings everyone could agree on, they throw high-powered random forests at trying to predict pairs of men/women, and the pairwise random forests do not merely fail to add much predictive power, they actually make the predictions worse! I, uh, did not predict that.

As far as I can tell, the outcome that the study was trying to predict was perceived compatibility on a four-minute speed date? If that's the amount of time you have to get to know a person, it doesn't sound too surprising if the global 'hot or not' ratings are the only useful predictor. Many people even reserve deal-breaker questions like "kids or no kids" until the second full date or later.

Given that squidious was talking about cases where people jump into a relationship and might find out about serious problems only much later, e.g. at a point where they might already have kids, it seems that the kinds of long-term issues she was talking about would also go unnoticed in a situation where you only had four minutes to assess the other person.

The authors note this limitation themselves, and seem to say that the actual question squidious is referencing hasn't even been studied, since it's methodologically too hard:

The present findings address only obliquely the predictability of long-term romantic compatibility. Even if unique desire in initial interactions is not predictable a priori, a matching algorithm could serve a useful function by surrounding users with partners with whom they would ultimately enjoy long-term compatibility should a relationship develop. Building and validating such an algorithm would require that researchers collect background measures before two partners have met and follow them over time as they become an established couple. To our knowledge, relationship science has yet to accomplish this methodological feat; even the commonly assessed individual-difference predictors of relationship satisfaction and breakup (e.g., neuroticism, attachment insecurity; Karney & Bradbury, 1995; Le et al., 2010) have never been assessed before the formation of a relationship. For these variables to be useful in a long-term compatibility algorithm that also separates actor, partner, and relationship variance, researchers would need to predict relationship dynamics across participants’ multiple romantic relationships over time (Eastwick et al., 2017). Predicting long-term compatibility may be more challenging than predicting initial romantic desire. 

Anatomy of a Dating Document

the women will all try to choose the one tall rich high-status handsome dude while barely glancing at 80% of men.

This may certainly be true for some women, but what's your basis for the claim that "all" women will try to use this criteria? It seems to me wildly implausible both on the basis of how much individual variation there is in people, as well as just, well, knowing lots of women and having witnessed both what they claim to be attracted to and also the kinds of people they've actually ended up with.

Load More