Wiki Contributions


Narrative Syncing

"Narrative syncing" took a moment to click to me, but when it did it brought up the connotations that I don't see in the examples alone. Personally, the words that first came to mind were "Presupposing into existence", and then after getting a better idea of which facet of this you were intending to convey, "Coordination through presupposition".

While it obviously can be problematic in the ways you describe, I wouldn't view it as "a bad thing" or "a thing to be minimized". It's like.. well, telling someone what to do can be "bossy" and "controlling", and maybe as a society we think we see too much of this failure mode, but sometimes commands really are called for and so too little willingness to command "Take cover!" when necessary can be just as bad.

Before getting into what I see as the proper role of this form of communication, I think it's worth pointing out something relevant about the impression I got when meeting you forever ago, which I'd expect others get as well, and would be expected to lead to this kind of difficulty and this kind of explanation of the difficulty.

It's a little hard to put into words, and not at all a bad thing, but it's this sort of paradoxically "intimidating in reverse" sort of thing. It's this sort of "I care what you think. I will listen and update my models based on what you say" aura that provokes anxieties of "Wait a minute, my status isn't that high here. This doesn't make sense, and I'm afraid if I don't denounce the status elevation I might fall less gracefully soon" -- though without the verbal explanation, of course. But then, when you look at it, it's *not* that you were holding other people above you, and there's no signals of "I will *believe* what you say*" or "I see you as claiming relevant authority here", just a lack of "threatened projection of rejection". Like, there was going to be no "That's dumb. You're dumb for thinking that", and no passive aggression in "Hm. Okay.", just an honest attempt to take things for what they appear to be worth. It's unusually respectful, and therefore jarring when people aren't used to being given the opportunity to take that kind of responsibility.

I think this is a good thing, but if you lack an awareness of how it clashes with expectations people are likely to have, it can be harder to notice and preempt the issues that can come up when people get too intimidated by what you're asking of them, which they are likely to flinch from. Your proposed fix addresses part of this because you're at least saying the "We expect you to think for yourself" part explicitly rather than presupposing it on them, but there are a couple pieces missing. One is that it doesn't acknowledge the "scariness" of being expected to come up with ones own perspectives and offer them to be criticized by very intelligent people who have thought about the subject matter more than you have. Your phrasing downplays it a bit ("no vetted-by-the-group answer to this" is almost like "no right answer here") and that can help, but I suspect that it ends up burying some of the intimidation under the rug rather than integrating it.

The other bit is that it doesn't really address the conceptual possibility that "You should go study ML" is actually the right answer here. This needs a little unpacking, I think.

Respect, including self respect or lack thereof, is a big part of how we reason collectively. When someone makes an explicit argument (or otherwise makes a bid for pointing our attention in a certain direction), we cannot default to always engage and try to fully evaluate the argument on the object level. Before even beginning to do that, we have to decide whether or not and to what extent their claim is worth engaging with, and we do that based on a sense of how likely it is that this person's thoughts will prove useful to engage with. "Respect" is a pretty good term for that valuation, and it is incredibly useful for communicating across inferential distances. It's always necessary to *some* degree (or else discussions go the way political arguments go even about trivial things), and excess amounts let you bridge much larger distances usefully because things don't have to be supported immediately relative to a vastly different perspective. When the homeless guy starts talking about the multiverse, you don't think quite so hard about whether it could be true as if it were a respected physics professor saying the same things. When someone you can see to see things you miss tells you that you're in danger and to follow their instructions if you want to live, it can be viscerally unnerving, and you might find yourself motivated to follow precautions you don't understand -- and it might very well be the right thing to do.

Returning to Alec, he's coming to *you*. Anna freakin' Salamon. He's asking you "What should I do? Tell me what I should do, because *I don't know* what I should do". In response one, you're missing his presupposition that he belongs in a "follower" role, as relates to this question, and elevating to "peer" someone who doesn't feel up to the job, without acknowledging his concerns or addressing them.

In response two, you're accepting the role and feeling uneasy about it, presumably because you intuitively feel like that leadership role is appropriate there, regardless of whether you've put it to words.

In response three, you lead yourself out of a leadership role. This is nice because it actually addresses the issue somewhat, and is a potentially valid use of leadership, but open to unintentional abuse of the same type that your unease with the second response warns of.


Returning to "narrative syncing", I don't see it so much as "syncing", as that implies a sort of symmetry that doesn't exist. It's not "I'm over here, where are  you? How do we best meet up?". It's "We're meeting up *here*. This is where you will be, or you won't be part of the group". It's a decision coming from someone who has the authority to decide.

So when's that a good thing?

Well, put simply, when it's coming from someone who actually has the authority to decide, and when the decision is a good one. Is the statement *true?*

"We don't do that here" might be questionable. Do people there really not do it, or do you just frown at them when they do? Do you actually *know* that people will continue to meet your expectations of them, or is there a little discord that you're "shoulding" at them? Is that a good rule in the first place?

It's worth noticing that we do this all the time without noticing anything weird about it. What else is "My birthday party is this Saturday!", if not syncing narratives around a decision that is stated as fact? But it's *true*, so what's the problem? Or internally, "You know, I *will* go to that party!". They're both decisions and predictions simultaneously because that's how decisions fundamentally work. As long as it's an actual prediction and not a "shoulding", it doesn't suddenly become dishonest if the person predicting has some choice in the matter. Nor is there any thing wrong with exercising choice in good directions.

So as applied to things like "What should I do for AI risk?", where the person is to some degree asking to be coordinated, and telling you that they want your belief or your community's belief because they don't trust themselves to be able to do better themselves, do you have something worth coordinating them toward? Are you sure you don't, given how strongly they believe they need the direction, and how much longer you've been thinking about this?

An answer which denies neither possibility might look like..

"ML. Computer science in general. AI safety orgs. Those are the legible options that most of us currently guess to be best for most, but there's dissent and no one really knows. If you don't know what else to do, start with computer science while working to develop your own inside views about what the right path is, and ditch my advice the moment you don't believe it to be right for you. There's plenty of room for new answers here, and finding them might be one of the more valuable things you could contribute, if you think you have some ideas".

Preregistration: Air Conditioner Test

I don’t have the equipment on hand to easily measure power consumption.


It's pretty easy to get that data if you want it. $14 on amazon 


Godshatter Versus Legibility: A Fundamentally Different Approach To AI Alignment

It's worth noting that (and the video acknowledges that) "Maybe it's more like raising a child than putting a slave to work" is a very very different statement than "You just have to raise it like a kid".

In particular, there is no "just" about raising a kid to have good values -- especially when the kid isn't biologically yours and quickly grows to be more intelligent than you are.

What if "friendly/unfriendly" GAI isn't a thing?

It's possible that it "wouldn't use all it's potential power" in the same sense that a high IQ neurotic mess of a person wouldn't use all of their potential power either if they're too poorly aligned internally to get out of bed and get things done. And while still not harmless, crazy people aren't as scary as coherently ruthless people optimized for doing harm.

But "People aren't ruthless" isn't true in any meaningful sense. If you're an ant colony, and the humans pave over you to make a house, the fact that they aren't completely coherent in their optimization for future states over feelings doesn't change the fact that their successful optimization for having a house where your colony was destroyed everything you care about.

People generally aren't in a position of that much power over other people such that reality doesn't strongly suggest that being ruthful will help them with their goals. When they do perceive that to be the case, you see an awful lot of ruthless behavior. Whether the guy in power is completely ruthless is much less important than whether you have enough threat of power to keep him feeling ruthful towards your existence and values.

When you start positing superintelligence, and it gets smart enough that it actually can take over the world regardless of what stupid humans want, that becomes a real problem to grapple with. So it makes sense that it gets a lot of attention, and we'd have to figure it out even if it were just a massively IQ and internal-coherence boosted human.

With respect to the "smart troubled person, dumb therapist" thing, I think you have some very fundamental misgivings about human aims and therapy. It's by no means trivial to explain in a tangent of a LW comment, but "if the person knew how to feel better in the future, they would just do that" is simply untrue. We do "optimize for feelings" in a sense, but not that one. People choose their unhappiness and their suffering because the alternative is subjectively worse (as a trivial example, would you take a pill that made you blisfully happy for the rest of your life if it came at the cost of happily watching your loved ones get tortured to death?). In the course of doing "therapy like stuff", sometimes you have to make this explicit so that they can reconsider their choice. I had one client, for example, who I led to the realization that his suffering was a result of his unthinking-refusal to give up hope on a (seemingly) impossible goal. Once he could see that this was his choice, he did in fact choose to suffer less and give up on that goal. However, that was because the goal was actually impossible to achieve, and there's no way in hell he'd have given up and chosen happiness if it were at all possible for him to succeed in his hopes. 

It's possible for "dumb therapists" to play a useful role, but either those "dumb" therapists are still wiser than the hyperintelligent fool, or else it's the smart one leading the whole show.

Ukraine Post #9: Again

As far as I can tell, the reasoning is that things that help Trump hurt America, so Putin should help Trump? I mean, fair, but a little on nose and saying the quiet part out loud even for him.


It's obviously that Trump America is great, and Biden America is bad. So Putin should spank Biden for making America bad again, so that Trump can help Make America Great Again (Again).

What if "friendly/unfriendly" GAI isn't a thing?

I don't think your conclusions follow.

Humans get neurotic and goodhart on feelings, so would you say "either it's not really GAI, or it's not really friendly/unfriendly" about humans? We seem pretty general, and if you give a human a gun either they use it to shoot you and take your stuff or they don't. 

Similarly,  with respect to "They might still be able to "negotiate" "win-win" accommodations by nudging the AI to different local optima of its "feelings" GAN", that's analogous to smart people going to dumb therapists. In my experience, helping people sort out their feelings pretty much requires having thought through the landscape better than they have, otherwise the person "trying to help" just gets sucked into the same troubled framing or has to disengage. That doesn't mean there isn't some room for lower IQ people to be able to help higher IQ people, but it does mean that this only really exists until the higher IQ person has done some competent self reflection. Not something I'd want to rely on.

If we're using humans as a model, there's two different kinds of "unfriendliness" to worry about. The normal one we worry about is when people do violent things which aren't helpful to the individual, like school shootings or unabombering. The other one is what we do to colonies of ants when they're living where we want to build a house. Humans generally don't get powerful enough for the latter to be much of a concern (except in local ways that are really part of the former failure mode), but anything superintelligent would. That gets us right back to thinking about what the hell a human even wants when unconstrained, and how to reliably ensure that things end up aligned once external constraints are ineffective.

Dr Fauci as Machiavellian Boddhisattva

Thanks for the feedback. To be clear, I didn't mean that I inferred that you took it that way, just that after I finished writing I realized I was doing the "pretty critical of people for doing very normal things" thing, and that it often comes off that way if I'm not careful to credibly disclaim that interpretation.

Dr Fauci as Machiavellian Boddhisattva

Right, it sounds like you mostly get what I'm saying. 

I'd quibble that "the people their coalition is forcing to wear masks" are the anti-maskers (since pro-maskers are being nice and obedient, and therefore aren't being "forced"). It's pretty easy to slip into contempt for people not respecting your well-deserved authoritah, so that even when they start doing it you think "About fucking time!" and judge them for not doing it earlier or more enthusiastically, instead of showing gratitude for the fact that they're moving in the right direction. I know I've been guilty of it in the past.

I don't mean to imply that the people behind the ads are to be seen as shitty people, or in this light alone, and I think in the course of describing this perspective which I viewed as needing to be conveyed I may have failed to make that clear. I do actually agree with your take on what they see themselves as doing, and that it's not entirely illegitimate. 

I responded to my own comment trying to lay out better what I meant exactly by "alignment failure" and how "they're not (meta) trying to be hostile" and "they're trying to humiliate and degrade" aren't actually mutually exclusive.

Dr Fauci as Machiavellian Boddhisattva

After hitting "submit" I realized that "alignment failure" is upstream of this divergence of analyses.

By "alignment failure", I mean "the thing they are optimizing for isn't aligned with the thing they claim to be optimizing for". It's a bit "agnostic" on the cause of this, because the cause isn't so clearly separable into "evil vs incompetent". Alignment failure happens by default, and it takes active work to avoid.

Goodharting is an example. Maybe you think "Well, COVID kills people, so we want people to not get COVID, so... let's fine people for positive COVID tests!". Okay, sure, that might work if you have mandatory testing. If you have voluntary testing though, that just incentivizes people to not get tested, which will probably make things worse. At this point, someone could complain that you're aiming to make COVID *look* like it's not a problem, not actually aiming to solve the problem. They will be right in that this is the direction your interventions are pointing, *even if you didn't mean to and don't like it*. In order to actually help keep people healthy and COVID free, you have to keep your eyes on the prize and adjust your aim point as necessary. In order to aim at aiming to keep people healthy and COVID free, you have to keep your eyes on the prize of alignment, and act to correct things when you see that your method of aiming is no longer keeping convergence.

When it comes to things like pro-mask advertisements, it's oversimplifying to say "It's an honest mistake" and it's *also* oversimplifying to say "They WANT to exercise power, not save lives" (hopefully). The question is where, *exactly* the alignment between stated goals and effects break. And the way to tell is to try different interventions and see what happens.

What happens if you say "All I got from your ad was 'eat shit'! Go to hell you evil condescending jerk!"? Do they look genuinely surprised and say "Shoot, I'm so sorry. I definitely care about your opinion and I have no idea how I came off that way. Can you please explain so that I can see where I went wrong and make it more clear that my respect for your opinion and autonomy is genuine?"?

Do they think "Hm. This person seems to think that I'm condescending to him, and I don't want them to think that, yet I notice that I'm not surprised. Is it true? Do I have to check my inner alignment to the goal of saving lives, and maybe humble myself somewhat?"

What if you state the case more politely? What if you go out of your way to explain it in a way that makes it easy for them to continue to see themselves as good people, while also making it unmistakable that remaining a "good person who cares about saving lives" requires running ads which don't leak contempt? Do they change the ad, mind how they're coming off and how they're feeling more closely, and thank you for helping them out? Or do they try making up nonsense to justify things before finally admitting "Okay, I don't actually care about people I just like being a jerk"?

My own answer is that the contempt is likely real. It's likely something they aren't very aware of, but that they likely would be if they were motivated to find these things. It's likely that they are not so virtuous and committed to alignment to their stated goals of being a good person that you can rudely shove this in their face and have them fix their mistakes. If you play the part of someone being stomped on, and cast them as a stomper, they will play into the role you've cast them in while dismissing the idea that they're doing it. How evil!

However, it's also overwhelmingly likely that if you sit down with them and see them for where they're at, and explain things in a way that makes it feel okay to be who they are and shows them *how* they can be more of who they want to see themselves as being, they'll choose to better align themselves and be grateful for the help. If you play the part of someone who recognizes their good intent and who recognizes that there are causal reasons which are beyond them for all of their failures, and cast them in the role of someone who is virtuous enough to choose good... they'll probably still choose to play the part you cast them in.

That's why it's not "Simple mistake, nothing to see here" and also not "They're doing it on purpose, those irredeemable bastards!". It's kinda "accidentally on purpose". You can't just point at what they did on purpose and expect them to change because they did in fact "do it on purpose" (in a sense). You *can*, however, point out the accident of how they allowed their purpose to become misaligned (if you know how to do so), and expect that to work.

Aligning ourselves (and others) with good takes active work, and active re-aiming, both of object level goals and meta-goals of what we're aiming for. Framing things as either "innocent mistakes" or "purposeful actions of coherent agents" misses the important opportunity to realign and teach alignment

Dr Fauci as Machiavellian Boddhisattva

It's not that when the people behind the ad sat down and asked "What are we trying to do?", they twirled their mustaches and said "I know! Let's degrade and humiliate!". It's about what bleeds through about their attitude when they "try to get people to wear masks", which they fail to catch.

For example, if a microwave salesman said "Microwaves are like women. Great in the kitchen!", you don't have to reject the idea that they're trying to sell microwaves to notice what their ad implies about their perspective on women. Maybe it's incompetence that they'd love to fix if anyone informs them about why it might not be the most universally non-offensive line to use, but it still shows something about how they view women.

However, if they use this line at a feminist convention, and they aren't paid on commission... and you don't quickly hear "Oops! Sorry, I fucked up!"... it starts to say something not just about his perspectives on women, but also his ability and/or inclination to take into account the perspectives of his target audience. The more the context makes the offensiveness difficult to miss, the harder it becomes to believe that the person is trying oh so hard to be not offensive so that they can sell microwaves, and the more it starts to seem like provoking offense and failing to sell microwaves is something they're at least indifferent to, if not actively enjoying.

So when someone says "Masks are like opinions" and reminds you that opinions are like assholes (and stinky assholes at that, which the full saying specifies), right before encouraging you to have an opinion, it's pretty hard to hear that as expressing "I'd love to hear your opinion!"? Do you really think that's the best way they can think to convey their heart-felt attitude of "Let's all expose our opinions to each other so that we can share their contents and take them in!"? Or do you notice that they went out of their way to point at "No one wants that shit, so keep it hidden behind multiple layers", and then didn't disclaim that interpretation, and infer that maybe the fact that this slipped past their filters signals that "We're not interested in your dissent" isn't actually something they're trying super hard to avoid signalling?

Keep in mind, this isn't some "orthogonal" failure mode that makes for a small deviation from an otherwise good ad -- the way "simple oversight" predicts. The people who aren't wearing masks have actively formed an opinion on the topic which contradicts the idea of wearing masks. The anti-mask sentiment is *explicitly* about giving the finger to an authority who they see as trying to condescend to them while sneering at them, and the ad that is "trying" to combat this literally associates their opinions with shit -- while portraying itself as supportive, no less. It is quite literally the exact wrong signal to send if you want to get people to wear masks, so as far as "simple oversights" go, it'd have to be an amazing one. However, it is dead nuts center of what "alignment failure of the type pointed at by anti-maskers" predicts.

"Masks = assholes" is just the wrong explanation for the valid observation that there's an "Eat shit" vibe coming through.

Load More