Epistemic status: Fuzzy conjecture in a faintly mathematically flavored way. Clear intuitions about Gears and a conclusion, but nothing like a formal proof or even formal definitions. Anecdotes offered to clarify the intuition rather than as an attempt at data. Plenty of room for development and increased rigor if so desired.
Suppose that for whatever reason, you want to convince someone (let's call them "Bob") that they can trust you.
I'd like to sketch two different strategy types for doing this:
- You can try to figure out how Bob reads trust signals. Maybe you recognize that Bob is more likely to trust someone who brings a bottle of his favorite wine to the meeting because it signals thoughtfulness and attention. Maybe revealing something vulnerably helps Bob to relax. You're not really trying to deceive Bob per se here, but you recognize that in order for him to trust you you need to put some energy into showing him that he can trust you.
- You make a point within yourself to be in fact worthy of Bob's trust. Then, without knowing how Bob will take it, you drop all attempts to signal anything about your trustworthiness or lack thereof. Instead you just let Bob come to whatever conclusion he's going to come to.
That second strategy might sound nuts.
Despite that, I claim it's actually almost strictly more effective.
If you see why, you probably have the bulk of my point.
I'll say a few more things to spell this out, together with some Gears I see and some implications.
A rephrasing of Goodhart's Law goes something like this:
The more explicit attention a signal gets, the more pressure there is to decouple it from what it's a signal of.
The mechanism is basically analogous to wireheading. If you get a reward for a signal happening, you're incentivized to find the cheapest way to make that signal happen.
Like when someone's trying to lose weight, so they make a point of weighing themselves first thing in the morning before drinking water and after using the toilet.
This might accidentally create some kind of standard baseline, but that isn't what's motivating the person to do this. They're trying to make the scale's numbers be lower.
Even weirder is when they stop drinking as much water because the scales reward them for that.
An often missed corollary of Goodhart — and basically the center of what I want to point at here — is this:
If you want a signal to retain its integrity, minimize attention on the signal.
To be maybe just a little more formal, by "attention" I mean something like incentive structures.
For instance, maybe the person who's trying to lose weight wants to live longer. In which case, inner work they can put into viewing the scales at an emotional/intuitive level as a flawed window into their health (instead of as a signal to optimize for) will help to ameliorate Goodhart drift.
And in fact, if they don't do this, they'll start to do crazy things like drink too little water, losing track of the "why". They'll hurt their health for the sake of a signal of health.
This means that stable use of signals of what you care about requires that you not care about the signal itself.
What's required for this person to be able to use the scales, recognizing that the number relates to something they care about, but without caring about the number itself?
That's a prerequisite question to answer for sober use of that tool.
Back to Bob.
Suppose I'm trying to sell Bob a used car. This introduces the classic "lemons problem".
In strategy #1, where I try to signal as clearly as I can to Bob that the car is good, maybe I show him papers from the mechanic I had check out the car. I let him look under the hood. I try to connect with him to show him that I'm relatable and don't have anything to hide.
Of course, Bob knows I'm a used car salesman, so he's suspicious. Did the paper come from a trustworthy mechanic? Would he be able to notice the real problem with the car by looking under the hood? Maybe I'm just being friendly in order to get him to let his guard down. Etc.
So if I notice this kind of resistance in Bob, I have to find ways to overcome them. Maybe I reassure him that the mechanic has been in business for decades, and that he can call them at this number right here and now if he likes.
But I know that if Bob leaves the lot without buying the car, he probably won't come back. So in fact I do want Bob to buy the car right now. And, I tell myself, Bob is in fact looking for a car, and I know this one to be good! So it's a good deal for both of us if I can just convince him!
Bob of course picks up on this pressure and resists more. I try to hide it, knowing this, although Bob intuitively knows that both the pressure and the attempt to hide it are things that a sleazy used car salesman would do too.
The problem here is Goodhart: to the extent that signals have decoupled from what they're "supposed to" signal, Bob can't trust that the signals aren't being used to deceive.
But I have a weird incentive here to get him to trust the signals anyway.
Maybe I bias toward signals that (a) are harder for a dishonest version of me to send and (b) that Bob can tell are harder for sleazy-me to send.
I just have to find those signals.
Here's strategy #2:
I know the car is good.
I look to Bob and say something like this:
"Hey. I know the car is good. I know you don't know that, and you don't know if you can trust me. Let me know what you need here to make a good decision. I'll see what I can do."
And I drop all effort to convince him.
(How? By the same magic inner move that the person aiming for
weight loss health improvement uses to drop caring about their scales' numbers. It's doable, I promise.)
If he has questions about the car, I can honestly just answer them based on whatever caused me to believe it's a good car.
This means that I and the car will incidentally offer immensely clear signals of the truth of the situation to Bob.
One result is that those signals that would be costly to sleazy-me to send would appear much, much more effortlessly here.
They just happen, because the emphasis is on letting truth speak simply for itself.
In the standard culture of business, this is less effective at causing purchases. Maybe more energy put into digging out what inspires my customers to buy would cause them to get excited more reliably.
But focusing on whether the person buys the car puts me in a Goodhart-like situation. I start attending to the signals Bob needs, which is the same kind of attention that sleazy-me would put into those same signals.
I'm not trying to give business advice per se. I have reason to think this actually works better in the long run for business, but that's not a crux for me.
Much more interesting to me is the way that lots of salespeople are annoying. People know this.
How do you be a non-annoying salesperson?
By dropping the effort to signal.
This also has a nice coordination effect:
If there's an answer to the lemons problem between me and Bob, it'll be much, much easier to find. All signals will align with cooperation because we will in fact be cooperating.
And if there isn't a solution, we correctly conclude that much, much more quickly and effortlessly.
No signaling arms races needed.
In practice, signal hacking just can't keep up with this kind of honest transparency.
If I want my girlfriend's parents to think I'll be good to her… well, I can just drop all attempts to convince them one way or the other and just be honest. If I'm right, they'll conclude the truth if they were capable of it.
…or I could go with the usual thing of worrying about it, coming up with a plan about what I'm going to tell them, hoping it impresses them, maybe asking her about what will really impact them, etc.
Even if this latter scenario works, it can't work as efficiently as dropping all effort to signal and just being honest does. The signals just automatically reflect reality in the latter case. Whereas I have to try to make the signals reflect the reality I want her parents to believe in, which I assume is the truth, in the former method.
The real cost (or challenge rather) of the "drop signaling" method is that in order for me to do it, I have to be willing to let her parents conclude the worst. I have to prefer that outcome if it's the natural result of letting reality reflect the truth without my meddling hands distorting things.
And that might be because maybe I'm actually bad for her, and they'll pick up on this.
Of course, maybe they're just pigheaded. But in which case I've just saved myself a ton of effort trying to convince them of something they were never going to believe anyway.
"But wait!" a thoughtful person might exclaim. "What if the default thing that happens from this approach isn't clear communication? What if because of others running manipulative strategies, you have to put some energy into signals in order for the truth to come out?"
Well, hypothetical thoughtful exclaimer, let me tell you:
I don't know.
…but I'm pretty sure this is an illusion.
This part is even fuzzier than the rest. So please bear with me here.
If I have to put effort into making you believe a signal over what directly reflects reality, then I'm encouraging you to make the same mistake that a manipulator would want you to make.
This means that even if this kind of move were necessary to get through someone's mental armor, on net it actually destabilizes the link between communication and grounded truth.
In a sense, I'm feeding psychopaths. I'm making their work easier.
Because of this, the person I'm talking to would be correct to trust my communication a little less just because of the method employed.
So on net, I think you end up quite a bit ahead if you let some of these communications fail instead of sacrificing pieces of your integrity to Goodhart's Demon.
The title is a tongue-in-cheek reference to the bit of Robin Hanson's memetic DNA that got into Less Wrong from the beginning:
"X isn't about X. X is about signaling."
I think this gives some wonderful insight into situations when examined from the outside.
I think it's often toxic and anti-helpful when used as an explicit method of navigating communication and coordination attempts. It usually introduces Goodhart drift.
Imagine I went to a used car sales lot and told the salesperson something like this:
"I'm interested in this car. I might buy it if you can convince me it's not a lemon even though I have reason not to trust you."
This seems very sensible on the surface. Maybe even honest and straightforward.
But now you've actually made it harder for the salesperson to drop focusing on signals. Most people have close zero idea that focusing on signals creates Goodhart drift (other than in platitudes like "Just be yourself"). So now you're in a signaling-and-detection arms race where you're adversely trying to sort out whether you two sincerely want to cooperate.
Compare with this:
"Hi! I'm interested in this car. Tell me about it?"
I think it's pretty easy to notice attempts to manipulate signals. If I were in this situation, I'd just keep sidestepping the signal manipulations and implicitly inviting (by example only!) the salesperson to meet me in clear honesty. If they can't or won't, then I'd probably decline to do business with them. I'd very likely be much more interested in living in this kind of clear integrity than I would be in the car!
(Or maybe I'd end up very confident I can see the truth despite the salesperson's distortions and feeling willing to take the risk. But that would be in spite of the salesperson, and it sure wouldn't have been because I invited them into a signaling skirmish.)
This picture suggests that what others choose to signal just isn't any of your business.
If you focus on others' signals, you either Goodhart yourself or play into signaling arms races.
Far, far simpler and more reliable is just trusting reality to reflect truth. You just keep looking at reality.
This might sound abstract. For what it's worth, I think Jacob Falkovich might be saying the same thing in his sequence on selfless dating. The trend where people optimize for "fuckability instead of fucking" and end up frustrated that they're not getting sex is an example of this. Goodhart drift engendered by focusing on the signals instead of on reality.
(My understanding of) Jacob's solution is also a specific example of the general case.
If you try to signal "Hey, I'm hot!" in the language you think will be attractive to the kind of person you think will be attracted to that signal…
…well, the sort of person you'll draw is the one who needs you to put effort into that kind of signal.
(Here I'm assuming for simplicity that the goal is a long-term relationship.)
So now, every ounce of energy you put into sending that signal falls into one of two buckets:
- It reflects reality, meaning you effortlessly would send that signal just by being transparently yourself. So the energy put into sending the signal is simply wasted and possibly anti-helpful (since it encourages you to mask the truth a little).
- It's a bit off from reality, meaning you have to keep hiding the parts of you that don't match what your new partner thinks of you. (In practice this is rarely sustainable.)
So the solution is…
…drop all effort to signal!
Yes, you might end up not attracting anyone. But if so, that is a correct reflection of you relative to the dating market. To do better you'd have to trick a potential partner (and possibly yourself).
Of course, maybe you'd rather be in a relationship made of signaling illusions than be alone.
That's up to you.
I'm just pointing out a principle.
What exactly does it mean to "drop all effort to signal"?
Honestly, I'm not sure.
I have a very clear intuition of it. I can feel it. I can notice cases where it happens and where it's not happening, and I can often mentally transform one into the other. I know a bunch of the inner work needed to do it.
But I don't know how to define it.
Hence the epistemic status of "fuzzy conjecture".
My hope is that this brings some thoughtfulness to discourse about "social signaling" and "social status" and all that. I keep seeing Goodhart drift in those areas due to missing this vision. Hopefully this will bring a little more awareness to those corners of discussion.
It's also something I'm working on embodying. This ties clearly to how much care and thoughtfulness goes into communication: "Oh dear, what will people think of this?" That seems like it can be helpful for making communication clearer — but it also acts as bait for Goodhart's Demon.
I don't know how to resolve that just yet.
I hope I will soon.