This post was originally intended as a response to One saving one's world

I agree I probably misinterpreted some of the original piece, and hence ended up taking this post down, a month back

Ic's post with the exact same title motivated to republish my post as is, as I think it still has useful content. I've only deleted a section or two.

Epistemic status: One-sided, only presents pros of flailing, and not cons


 

In defence of demonstrating pain or fear

 - It is honest, if that is how you genuinely feel.  Honesty is a virtue.* Being suffficiently expressive and outspoken about your emotions can also be a virtue, depending on the kind of person you are.

 - Not demonstrating your fear can be perceived as insincere. If you genuinely believe that the world is ending in 20 years, but are not visibily affected by this, or considering extreme actions, people may be less likely to believe that you believe what you say you do. The average person evaluating your integrity may not have enough time in interaction with you to think, "yes this person is afraid but they have considered all extreme actions and displays of emotion, and decided they're net negative in current circumstances, and hence it is still possible they genuinely believe what they do." In other words, it is not enough for you alone to realise that your instincts are adaptive to ordinary threats and maladptive to AI. Anyone who tries to believe you also needs to realise this, otherwise they will assume such instincts to be adaptive to all scenarios including this one, and then evaluate you under this assumption.

 - Not demonstrating your fear could mean you don't feel fear because you are compartmentalising. It is possible you yourself have not fully integrated your beliefs across all the parts of your brain. This is especially true you if you consider yourself a generally honest person, but for some reason find yourself not comfortable boldly stating your claiming or taking actions you think are right.

 - The stuff about compartmentalising also applies to others evaluating whether you are compartmentalising. I am more likely to assume who says "10% x-risk by year X" is compartmentalising, versus someone who says "10% probability all of us are going to die by year X".

 - Pain and fear spread virally, because there are evolutionary advantages to humans reacting to each other's fear and pain. This may or may not make it the optimal strategy for spreading a message, but it is a strategy. (This also means that flailing isn't a completely maladptive evolutionary response, as there is a causal chain between drawinf others in and reducing the risk, shown below.)

 

 

Targetting public, not just authorities

Memetically spreading knowledge about AI risk doesn't have to only be focussed on authorities. Reasons to target the general public include:

 - Finding more people who are convinced about the problem's importance, and are willing to work on it or donate resoorces.

 - Higher salience in "idea space". The set of salient ideas that are popular at any given point in time are finite, and getting your idea into this set requires endless effort to maintain its position. AI risk has a decent chance of occupying such a spot, just as climate change does in the present.

 - Getting your idea into this set means (all) authorities are forced to spend more time thinking about the idea. One simple reason is because we're humans and like to empathise with what large groups of other humans believe, if possible. More strategic reasons include trying to get people on your side by understanding and empathising with their ideas, or trying to get people on your side by pushing other ideas or actively attacking their current ideas. Most elites need to rally the public on their side at some point in time on the other, and this requires them to model what the public believes. All these strategies force elites to think more about AI risk.

 - Public support can unlock policy options.

 - Public support can unlock options for coordination outside of the legal or democratic framework, such as protests or violent action. (I am not stating a position in favour of this, but this is a potential strategy that mass public support can unlock.) It can also let you build new institutions with democratic consent, even if you don't have support of existing institutions yet.

 

 

 

Executing instinct versus being aware of consequences

It is generally better to realise why what you are doing might have good consequences, than as Rob Bensinger says, to just execute an instinct.*** You may always be able to get better outcomes if you start explicitly thinking and optimising, versus just executing the instinct.

But this applies to most instincts, and doesn't mean the instinct is bad cause the executing the instinct may still get you good outcomes.

 

*Side-note in defence of virtue:

One motivating reason to study decision theory is the insight that humans are capable of committing and reprogramming themselves into "the kind of person who always does X". And that doing this reprogramming may be optimal even if the action X itself is not optimal for all scenarios. For instance, when X is "honesty", the game-theoretically rational thing to do in iterated prisoner dilemma-like games may be to be consistently honest in low-stakes scenarios to build trust, and only defect for personal gain in sufficiently high-stakes or one-shot scenarios. In practice the average human sucks both at being honest, and at being a convincing liar. Consequently a good strategy for humans to follow is to train yourself into "the kind of person who feels good about being honest". Such a person may not be able to defect in the high stakes scenario, but atleast they'll be honest in all the low-stake ones, which on net allows them to outperform most other humans.

 

***Side-note in weak defence of instincts

It is possible your instincts are sometimes smarter than your slower more analytical thinking but I guess this is less frequent (although still not very rare) for people who are used to explicitly enumerating consequences.

 

Net conclusion

On net it isn't immediately obvious to me whether either attempting unnatural strategies, or demonstrating pain, is net good as of today, or how much of it should be done. I have presented only one-sided arguments in favour of them above, I have not really presented the cons.

 

But I wanted to start a discussion by providing atleast some causal chains from flailing to good outcomes.

10

14 comments, sorted by Click to highlight new comments since: Today at 12:23 AM
New Comment

In defence of attempting unnatural or extreme strategies

As a response to Rob's post, this sounds too me like it's misunderstanding what he was pointing out. I don't think he was pointing to weird ideas — I'm pretty sure he knows very well we need new ideas to solve alignment. What he was pointing out was people who are so panicked that they start discussing literal crimes or massive governance interventions that are incredibly hard to predict.

I see. It wasn't obvious to me this was what he was referring to. Thanks for replying!

Upvoted because I think there should be more of a discussion around this then "Obviously getting normal people involved will only make things worse" (which seems kind of arrogant / assumes there are no good unknown unknowns)

Wait, are there people who explicitly state that getting normal people involved will make things worse?

Yes, I'm not convinced either way myself but here are some arguments against:

  • If the USA regulates AGI, China will get it first which seems worse as there's less alignment-activity in China (as for US China coordination, lol, lmao)
  • Raising awareness of AGI Alignment also raises awareness of AGI. If we communicate the "AGI" part without the "Alignment" part we could speed up timelines
  • If there's a massive influx of funding/interest from people who aren't well informed, it could lead to "substitution hazards" like work on aligning weak models with methods that don't scale to the superintelligent case (In climate change people substitute "solve climate change" to "I'll reduce my own emissions" which is useless)
  • If we convince the public AGI is a threat, there could be widespread flailing (the bad kind) which reflects badly on Alignment researchers (e.g. if DeepMind researchers are receiving threats, their system 1 might generalize to "People worried about AGI are a doomsday cult and should be disregarded")

Most of these I've heard from reading conversations on EleutherAI's discord, Connor is typically the most pessimistic but some others are pessimistic too (Connor's talk discusses substitution hazards in more detail)

TLDR: It's hard to control the public once they're involved. Climate change startups aren't getting public funding, the public is more interested in virtue-signaling (In the climate case the public doesn't really make things worse, but for AGI it could be different)

EDIT: I think I've presented the arguments badly, re-reading them I don't find them convincing. You should seek out someone who presents them better.

EDIT: I think I've presented the arguments badly, re-reading them I don't find them convincing. You should seek out someone who presents them better.

Fair. I can think of counterarguments to many of these.

And I agree there are many ways in which the public can misinterpret the message or not coordinate around the right kind of action. That by itself doesn't make the attempt net negative unless these other consequences are bad enough to outweight the upside. For instance as you say, if this convinces the public to on net, speed up research towards AGI - but my current model of the public doesn't make that feel very likely. (This point can be discussed further I guess.)

I suspect that mass outreach is likely to be too slow to make a big difference and so may not be worth it given the possible downsides.

That said, I am in favour of addressing public misconceptions rather than just letting them stand.

Hmm.

I feel AI risk can be a mainstream topic of discussion in as low as 4 years, especially if we see 4 years more of progress at the current rate (DALLE, PALM etc).

I'm not totally sure how to convince someone else of this intuition, except to create a dataset of "accurate + important ideas" in the past and see how long it took for them to go mainstream. I don't think track record of public caring about important ideas is really that bad.

Note about formatting:

If you're in the Markdown editor, you can make footnotes instead of *, **, ***. Do that by making one of these -> [^1] (ex output: [1]) and then at the bottom, put [^1]: Your footnote here


  1. It doesn't have to be just numbers, it can also be a word like [^example] or [^defense_of_virtue]. The words are nice because they look good in a link. ↩︎

If you genuinely believe that the world is ending in 20 years, but are not visibily affected by this, or considering extreme actions, people may be less likely to believe that you believe what you say you do.

 

IMO, that's not the bottleneck. The bottleneck is people thinking you're insane, which composure mitigates.

It feels more to me like we're the quiet weird kid in high school that doesn't speak up or show emotion because we're afraid of getting judged or bullied. Which, fair enough, the school is sort of like - just look at poor cryonics, or even nuclear power - but the road to popularity (let along getting help with what's bugging us) isn't to try to minimize our expressions to 'proper' behavior while letting us be characterized by embarrassing past incidents (e.g. Roko's Basilisk) if we're noticed at all.

It isn't easy to build social status, but right now we're trying next to nothing and we've seen it doesn't seem to do enough.

I see. Is the "insane" appearance because of your beliefs or how you present them?

Both, I'd think.