I was a co-founder of CFAR in 2012. I'd been actively trying to save the world for about a decade at that point. I left in 2018 to seriously purify my mind & being. I realized in 2020 that I'd been using the fear of the end of the world like an addictive drug and did my damnedest to quit cold-turkey. I'm now doing my best to embody an answer to the global flurry in a way that's something like a fusion of game theory and Buddhist Tantra.

Wiki Contributions


Signaling isn't about signaling, it's about Goodhart

So you seem to be focused in this post on ways to generate signals.

No. I'm focused on how attention to signals tends to create Goodhart drift.

Signaling isn't about signaling, it's about Goodhart

A different frame on what I see as the same puzzle:

If faced with the choice, would you rather self-deceive, or die?

It sure looks like the sane choice is self-deception. You might be able to unwind that over time, whereas death is hard to recover from.

Sadly, this means you can be manipulated and confused via the right kind of threat, and it'll be harder and harder for you over time to notice these confusions.

You can even get so confused you don't actually recognize what is and isn't death — which means that malicious (to you) forces can have some sway over the process of your own self-deception.

It's a bit like the logic of "Don't negotiate with terrorists":

The more scenarios in which you can precommit to choosing death over self-deception, the less incentive any force will have to try to present you with such a choice, and thus the more reliably clear your thinking will be (at least on this axis).

It just means you sincerely have to be willing to choose to die.

Signaling isn't about signaling, it's about Goodhart

Isn't the main point of acting on cultural differences to make others feel more comfortable? Or to show that you're interested in/you care about their culture?

As viewed from the outside, yes.

I think navigating this truthfully feels different from that analysis on the inside though.

If I think "I'm going to make these people feel comfortable by matching their cultural norms", this can often create the opposite effect. I described the dynamics of this in the OP.

The reason those norms help put people at ease is because of what they imply (signal) about a certain quality of attention and compatibility you're bringing. If you just are attentive then that'll emerge naturally. No reason to think explicitly about the norms.

This is a little like noticing how all things about love and romance are ultimately about sex, but how thinking about it that way can actually jam their ability to function properly. This isn't to deny the centrality of evolutionary forces. It's noticing how thinking about those forces while inside them can create loops that bring in influences you may not want. Hence the "Just be yourself" advice.


Probably most people should lean towards wearing what they feel like more, but having this as a general policy might be quite costly, because people judge a lot based on clothing.

Yep. And if you focus your attention on other people's judgments this way, you totally summon Goodhart's Demon.

So which do you want? The risk of paying a social cost for a while, or the risk of floating along in Goodhart drift?


[…] I think that a large proportion of signalling involves unconscious calculations or self-deception, and it takes a huge amount of work to make those explicit. So the category of "signalling" may, because of that, seem more pervasive and deeper-rooted to me than it does to you.

That's not what's going on here.

I'm guessing you think I'm talking about actually in fact dropping all signaling.

That's definitely not what I mean. That doesn't make sense to me. It'd be on par with "Stop being affected by physics."

When I say "Drop attempts to signal", I'm describing the subjective experience of enacting this shift as I currently understand it.

I mean the thing where, when sitting across from someone on a first date, I can track the thoughts that are about "making a good impression" and either lean into them or sort of drop them. The first one structurally creates problems. The second is less likely to.

On the inside it feels like going in the direction of just not caring about what impressions I do or don't give her. Which is to say, on the inside it feels like dropping all attempts to signal.

But of course my body language and word choice and dress and so on will signal all kinds of things to her. I haven't actually dropped all signaling, or even subconscious attempts to signal.

It's just that by pointing this optimization force away from those signals, I can encourage them to reflect reality instead of the (possibly false) image of myself a part of me wants her to see.

And by holding such a policy in myself, the signals I end up sending will always systematically (at least in the limit) align with the truth of this transparency. Signaling non-deception by not deceiving. Focus — even subconscious — on signals just can't beat this strategy for fidelity of transmission best as I can tell.

Which is to say, the strategy of "Drop all attempts to signal" is a signaling strategy.

…at least in one analysis. Because thinking of it that way makes it harder to use, it helps to reframe it.

But my guess is that this resolves the difference in perspective here between you and me. Yes?

Signaling isn't about signaling, it's about Goodhart

That's a really good point. It's like stealth obsession with signaling, because there's a need to not signal.

This in turn reminds me of how beginning statistics students often confuse independence and anti-correlation. I'm trying to point at the analog of independence, but if folk who feel compelled that I'm pointing at something real don't grok what I'm pointing at, they're likely to land on the analog of anti-correlation.

Signaling isn't about signaling, it's about Goodhart

Yep, I'm pretty uncertain too.

I think that at least some politeness falls more under the category of language. Like, I'm in Mexico, and it's often helpful for me to switch to Spanish. I'm totally manipulating my signals there, but it seems… fine? Like I just don't see the Goodhart pressure appearing there at all. Saying "Gracias, ¡hasta luego!" instead of "Thank you, have a good day!" seems perfectly fine.

But some politeness very much does introduce Goodhart drift. "How dare you say that?! That's so rude!" This is a weird signal suppression system that introduces what some folks near Toronto coined as "untalkaboutability" (read as: "un-talk-about-ability"). 

Likewise with pretending to be friendly. Lots of shop owners here will call out to me as I pass saying something like "Hey! Hey there my friend! Tell me, where are you from?" The context makes it pretty obvious that they're being friendly to hook me into their shop. But the reason the hook works at all is because of the plausible deniability that that's their purpose. "Oh, don't be like that! I'm just being friendly!" This is weaponization of signals of friendliness, which is possible because of the Goodhart drift applied to those signals.

But yeah, I have a question around language here, and cultural standards. Like shaking hands in North America vs. bowing in Japan. This is actually a better edge case than is Spanish: It seems fine to recognize and act on the cultural difference… 

…unless I switch because I'm trying to make others feel more comfortable. At that point I'm focusing on the signal in order to manipulate the other, which starts to introduce Goodhart drift. The fact that my intentions are good or that this is common doesn't save the signal from Goodhart's Demon.

Whereas if I can focus on grokking the cultural difference, and then set that entirely aside and do what I feel like doing… I think something like that naturally results in the politeness that matters.

Signaling isn't about signaling, it's about Goodhart

My main takeaway from this post is that it's important to distinguish between sending signals and trying to send signals, because the latter often leads to goodharting.

That is a wonderful summary.


For instance, I make more of an effort now than I used to, to notice when I appreciate what people are doing, and tell them, so that they know I care. And I think this has basically been very good. This is very much not me dropping all effort to signal.

But I think what you're talking about is very applicable here, because if I were just trying to maximise that signal, I would probably just make up compliments, and this would probably be obviously insincere.


There's an area of fuzz for me here that matters. I don't intellectually know how to navigate it.

A much more blatant example is with choosing a language. Right now I'm in Mexico. Often I'll talk to the person behind the counter in Spanish. Why? Because they'll understand me better. If they don't speak English, it's sort of pointless to try to communicate in English.

This is totally shaping my behavior to impact the other person.

But it's… different. It's really different. I can tell the difference intuitively. I just don't know what the difference really is.

I notice that your example absolutely hits my sense of "Oh, no, this is invoking the Goodhart thing." It seems innocent enough… but where my eyes drift to is: Why do you have to "make more of an effort now than [you] used to"? If I feel care for someone, and I notice that my sharing it lets them feel it more readily, and that strikes me as good, then I don't have to put in effort. It just happens, kind of like drinking water from my cup in my hand when I'm thirsty just happens.

I would interpret that effort as maintaining behavior in the face of not having taken the truth all the way into your body. Something like… you understand that people need to hear your appreciation in order to feel your care, but you haven't grokked it yet. You can still manipulate your own behavior without grokking, but it really is self-manipulation based on a mental idea of how you need to behave in order to achieve some imagined goal.

(I want to acknowledge that I'm reading a lot into a short statement. If I've totally misread you here, please take this as a fictional example. I don't mean any of this as a critique of your behavior or choices.)

I'd like to extend your example a bit to point out what I can see going wrong here.

Suppose a fictional version of you in fact doesn't care about these others and is only interested in how he benefits from others' actions. And maybe he recognizes that his "appreciation", if nakedly seen, would cause these people to (correctly!) feel dehumanized. This fictional you would therefore need to control his signals and make his appreciation come across as genuine in order to get the results he wants.

If he could, he might even want to convince himself of his sincerity so that his signal hacking is even harder to detect.

(I think of that as "Newcomblike self-deception".)

The fact that fictional you could be operating like this means that hacking your own signal is itself a subtle meta-signal that you might be this fictional version of you. The default thing people seem to try to do to get around this is to distract people with the volume of the signal. ("Oh, wow! This is sooo amazing! Thank you so, so, SO much!") This is the "feeding psychopaths" thing I mentioned.

If you happen to never notice and fear this, and the people you're expressing appreciation for never pick up on this, then you accidentally end up in a happy equilibrium.

(…although I think people pick up on this stuff pretty automatically and just try to be numb to it. Most people seem to be manipulating their signals at one another all the time, which sometimes requires signaling that they're not noticing what the other is signaling.)

It's just very unstable. All it takes is one misstep somewhere. One flicker of worry. And if it happens to hit someone where they're emotionally sensitive… KABLOOEY! Signaling arms race.

Whereas if you put your attention on grokking the thing and then letting people have whatever impression of you they're going to have, you end up in an immensely stable equilibrium. Your appreciation becomes transparent because you are transparent and you in fact appreciate them.

(…with a caveat here around the analog of learning Spanish. Which, again, I can feel but don't understand yet.)


So I guess the big question is, which things do you stop trying to do?

I agree. That's the big question. I don't know. But I like you bringing it up explicitly.

Signaling isn't about signaling, it's about Goodhart

I don't think you're "dropping all effort" to signal, you're rather getting good at signaling, by actually being truthful and information-focused.

…which is much more likely to fail if I think of it like this while doing it.

I agree with what I think you're saying. I think there's been a definitional sliding here. When I say "Drop all effort to signal", I'm describing the experience on the inside. I think you're saying that from the outside, signaling is still happening, and the benefits of "dropping all effort to signal" can be understood in signaling terms.

I agree with that.

I'm just suggesting that in practice, the experience on the inside is of turning attention away from signals and entirely toward a plain and simple attention on what is.


I don't think we can go so far as to say they're equivalent, just that signaling is yet another domain subject to goodheart's law.

I agree. I didn't mean to imply otherwise.

(I imagine this is a reaction to the title? That was tongue-in-cheek. I said so, though maybe you missed it. It was meant to artistically gesture at the thesis in an entertaining way rather than as a truth statement accurately summarizing the point.)

What are sane reasons that Covid data is treated as reliable?

Has personal testimony from our own social groups become the best we can do?

Sadly yes, at least on my side.

I think your questions are very sane. Sadly I'm not the person to do this kind of data collection. The way some people have the opposite of a green thumb when it comes to plants, I have something like that for putting together numerically focused models. As soon as I move away from geometry or contact with physical reality, errors like 2+3=6 dominate and my models' output becomes gobbledegook. I was astoundingly good at geometry and utter garbage at algebra in math grad school.

I think most of the people I'm referring to were pointed at VAERS. This was from months ago, buried in old Facebook threads, so it'd take quite a bit of digging to find and I'm not sure I could. So this is based on a fuzzy impression of seeing that acronym in that context. But I do recall many of them were given a hotline number to call if they got side effects, and in calling the number they got the "Well, the vaccines are safe, so these must be from something else" line.

Without an explicit probability calculation, how exactly are we supposed to determine what the levels of side effects in reality are, vs what the medical data that has been collected and reported suggests, vs what the average person thinks is true?

Yep. This has been part of my problem. I'm living in a sea of vastly deeper uncertainty than the people around me seem to think they're in. I'm hoping to do slightly better than either of "No one knows anything and anyone who claims otherwise is deluded" or "My tribe is right." I've just been having a lot of trouble finding that alternative.

(…and this discussion is helping.)

What are sane reasons that Covid data is treated as reliable?

How many people do you know? What rate are we talking, re: "many people"?

I don't think I can give very useful data here. I can give some rough numbers but they aren't going to be very informative. I stopped bothering to listen to or look for reports of people's vaccine side effects getting rejected after something like ten-ish because I was starting to notice something like overfitting going on in my head.

The important (to me) part was that there were multiple such cases, very distributed, which meant there's some kind of bureaucratic mechanism in place (as opposed to one grumpy bureaucrat somewhere). I knew I couldn't see it, and I observed that no one seemed to be talking about it (except the disgruntled vaccine-injured folk who were feeling swayed by the conspiracy theorists), which made the confidence folk were asserting about "The vaccines are safe & effective" look like mindless propaganda repetition to me even if it accidentally happened to be correct.

I was hoping for an update on that here. I've gotten quite a few others. Sadly on this one I'm not seeing much in the way of hope for clarification just yet.

What are sane reasons that Covid data is treated as reliable?

Yep, that does seem reasonable.

Several of the people I talked to or indirectly listened in on said they'd been given a number to call if they got any side effects. Then when they got side effects and called, they were given the "The vaccine is safe so this must be something else" line.

Clearly that's not everyone's experience. But since I don't know the structure these people encountered in almost any detail, my net emotional update was "Fuck this 'data'."

Load More