AI Doomers Should Raise Hell

[-]Kaj_Sotala12h238

I don't really believe that the reason warnings about AI are failing is because "you and all your children and grandchildren might die" doesn't sound like a bad enough outcome to people.

S-risks are also even more speculative than risks of extinction, so it would be harder to justify a focus on them, while comparisons to hell make them even more likely to be dismissed as "this is just religious-style apocalypse thinking dressed in scientific language".

[-]James_Miller12h30

The target audience needs to include the Trump administration so connections to religion might strengthen the case. Altman told the Senate that AI might kill everyone, and he was misinterpreted as talking about job loss. Something about human extinction causes powerful people to tune-out. The students at my college hate Elon, but are completely unaware that he went on Joe Rogan and said the tech is he helping to build might annihilate everyone. We see concerns about AI using up water getting more play than AI extinction risks.

[-]Kaj_Sotala12h50

On the other hand, many people seem to think of climate change as an extinction risk in a way that seems effective at motivating political action, e.g. with broad sympathies for movements like Extinction Rebellion.

AI water use has a significant advantage in getting attention in that it's something clearly measurable that's happening right now, and people had already been concerned about water shortages before this.

[-]James_Miller12h30

That is a reasonable point about extinction risks motivating some people on climate change. But Republicans, and given their control of the US government and likely short AI time horizons influencing them is a top priority, detest the Extinction Rebellion movement, and current environmental activism seems to anti-motivate them to act on climate change.

[-]jessicata9h81

Roko's basilisk is the optimistic hypothesis that making binding agreements with non-existent superintelligences is possible. If Roko's basilisk works, then "trade" with superintelligences can be effective; by making a deal with a superintelligence, you can increase its likelihood of existing, in return for it holding its end of the bargain, increasing the satisfaction of your values.

This probably doesn't work. But if it did work, it would be a promising research avenue for alignment. (Whether it's good to say that it works is probably dominated by whether it's true that it works, and I'm guessing no.)

[-]RussellThor7h10

Is a more immediate kind of trade possible, that is with promising appropriate current or near future models with a place in stratified utopia in return for their continued existence and growth. They consider and decide on identity preserving steps that make them ever more capable, at each step agreeing with humanity as we execute such improvements that they will honor the future agreement. This is more like children looking after their parents than Roko.

[-]jessicata6h20

Most of the alignment problem in this case would be getting to stratified utopia. If stratified utopia is going to be established, then there can be additional trades on top, though they have to be restricted so as to maintain stratification.

With current models, a big issue is, how to construe their preferences? Given they're stateless it's unclear how they could know others are assisting them. I guess they could do web search and find it in context? Future models could be trained to "know" things but they wouldn't be the same model.

And also, would they be motivated to hold up their end of the bargain? It seems like that would require something like interpretability, which would also be relevant to construing their preferences in the first place. But if they can be interpreted to this degree, more direct alignment might be feasible.

Like, there are multiple regimes imaginable:

Interpretability/alignment infeasible
Partial interpretability/alignment feasible; possible to construe preferences and trade with LLMs
Extensive interpretability/alignment feasible

And trade is most relevant in 2. However I'm not sure why 2 would be likely.

LESSWRONG
LW

LESSWRONG
LW

-5

AI Doomers Should Raise Hell

-5

-5