On saving one's world

Rob Bensinger

LESSWRONG
LW

On saving one's world — LessWrong

193 On saving one's world

by Rob Bensinger

17th May 2022

2 min read

193

If the world is likeliest to be saved by sober scholarship, then let us be sober scholars in the face of danger.

If the world is likeliest to be saved by playful intellectual exploration, then let us be playful in the face of danger.

Strategic, certainly; aware of our situation, of course; but let us not throw away the one mental mode that can actually save us, if that's in fact our situation.

If the world is likeliest to be saved by honest, trustworthy, and high-integrity groups, who by virtue of their trustworthiness can much more effectively collaborate and much more quickly share updates; then let us be trustworthy. What is the path to good outcomes otherwise?

CFAR has a notion of "flailing". Alone on a desert island, if you injure yourself, you're likelier to think fast about how to solve the problem. Whereas injuring yourself around friends, you're more likely to "flail": lean into things that demonstrate your pain/trouble to others.

To my eye, a lot of proposals that we set aside sober scholarship, or playful intellectual exploration, or ethical integrity, look like flailing. I don't see an argument that this setting-aside actually chains forward into good outcomes; it seems performative to me, like hoping that if our reaction "feels extreme" enough, some authority somewhere will take notice and come to the rescue.

Who is that authority?

If you have a coherent model of this, we can talk about it and figure out if that's really the best strategy for eliciting their aid.

But if no one comes to mind, consider the possibility that you're executing a social instinct that's adaptive to threats like tigers and broken legs, but maladaptive to threats like Unfriendly AI.

If you feel scared about something, I generally think it's good to be honest about that fact and discuss it soberly, rather than hiding it. I don't think this is incompatible with rigorous scholarship or intellectual play.

But I would clearly distinguish "being honest about your world-models and feelings, because honesty is legitimately a good idea" from "making it your main strategy to do whatever action sequence feels emotionally resonant with the problem".

An "extreme" key doesn't necessarily open an "extreme" lock. A dire-sounding key doesn't necessarily open a dire-feeling lock. A fearful or angry key doesn't necessarily open a lock that makes you want to express fear or anger.

Rather, the lock's exact physical properties determine which exact key (or set of keys) opens it, and we need to investigate the physical world in order to find the right key.

Existential riskWorld OptimizationAI

Frontpage

193

On saving one's world

New Comment

4 comments, sorted by

top scoring

Click to highlight new comments since: Today at 9:27 AM

[-]Michaël Trazzi4y220

I found the concept of flailing and becoming what works useful.

I think the world will be saved by a diverse group of people. Some will be high integrity groups, other will be playful intellectuals, but the most important ones (that I think we currently need the most) will lead, take risks, explore new strategies.

In that regard, I believe we need more posts like lc's containment strategy one or the other about pulling the fire alarm for AGI. Even if those plans are different than the ones the community has tried so far. Integrity alone will not save the world. A more diverse portfolio might.

[-]adamShimi4y170

Thanks for this much needed post!

I agree wholeheartedly with you, and that's the mindset I'm having and trying to spread when I speak to terrified and panicked people about doing alignment research.

Here are a couple of thoughts that I think might complement this post:

A big part of flailing in my model comes from having hope that someone will save you. As such, realizing that no one will save you is important in actually taking action and doing things. But there are ways of pushing this too far — notably, thinking that because no one will save you, no one is doing anything valuable or can help. One doesn't have to resolve the tension between "No one will save me" and "I can't do it all by myself" with "let's do it all by myself". Instead, you can see what is needed and that no one else seems able to contribute, and trust and motivate others to take the other crucial and necessary tasks.
I want to point out a general pattern in the reactions to extreme and dire problems: you become more greedy, in the sense of a greedy algorithm. So you only want solutions that work now, or go instantly looking for something else. Yet the history of science and technology tells us that scientific progress and problem solving so rarely happen by being right from the start, but more by a succession of productive mistakes. So I want to remind people that another option might be to have more productive mistakes faster and capitalizing on them better and faster.

An "extreme" key doesn't necessarily open an "extreme" lock. A dire-sounding key doesn't necessarily open a dire-feeling lock. A fearful or angry key doesn't necessarily open a lock that makes you want to express fear or anger.
Rather, the lock's exact physical properties determine which exact key (or set of keys) opens it, and we need to investigate the physical world in order to find the right key.

I really like this, and will share this quote when I want a nice phrasing of this thought I keep having these days.

[-]TeaTieAndHat4y110

That concept of flailing in this context seems very interesting, and it makes me wonder about some things.

In particular, I find it particularly helpful to me, because it explains how I often feel very anxious about some things when there are other people around who will share the burden of it with me, in a way that makes me practically unable to get anything done, while the same problems seem a lot simpler and less stressful when I am alone. But I don’t know if people actually feel like this in ordinary situations, like I do, or if it is because I have an overprotective family and exceedingly kind friends who spoil me :)

Assuming it is actually quite common, we could then say that a lot of problems are best solved by one person alone. And yet, it also makes much sense to say that problems are best solved by bringing together many different sources of insights and sources of information, and — for more obviously political problems — a lot of competing interests as well. In short, a lot of people who should be brought together.

So, there is, first, probably something to say about how flailing is similar to political signaling. Also, it would be interesting to expand on that concept of flailing to think about how and when we solve problems individually vs. in groups, and what kinds of groups: Do we flail more with people we are close to, making it easier to solve problems sensibly when discussing them with strangers? Will a culturally homogeneous group making a decision be subject to a lot of flailing or not?, etc., etc. Basically, I am trying to think about how flailing can be seen as another way to describe at least some forms of political signaling in political decision-making, and it looks like an interesting way of describing it. But it also looks like I am not able to think clearly about it myself for the moment, which is an interesting opportunity to post a comment and see if anyone has interesting takes on these kinds of things :)

[-]simonsimonsimon4y10

The steelman version of flailing, I think, is being willing to throw a "hail mary" when you're about to lose anyway. If the expected outcome is already that you die, sometimes an action with naively negative value but fat tails can improve your position.

If different hail mary options are mutually exclusive, you definitely want to coordinate to pick the right one and execute it the best you can, but you also need to be willing to go for it at some point.

Moderation Log