eschaton-class threat

Wiki Contributions


Virtue signaling is sometimes the best or the only metric we have

I think your trust-o-meter is looking for people who have an unusually low level of self-deception. The energy is Great if you share my axioms or moral judgments, but for Pete's sake, at least be consistent with your own.

What suggests this to me is the Breaking Bad example, because Walter White really does move on a slow gradient from more to less self-decieved throughout the show in my read of his character - it just so happens that the less self-decieved he is, the more at home he becomes with perpetuating monstrous acts as a result of the previous history of monstrosity he is dealing with. It's real "the falcon cannot hear the falconer" energy.

This is probably a good trust-o-meter to keep equipped when it comes to dealing with nonhuman intelligences. Most people, in most ordinary lives, have good evolutionary reasons to maintain most of their self-deceptions - it probably makes them more effective in social contexts. Intelligences evolved outside of the long evolutionary history of a heavily social species may not have the same motive, or limitations, that we do.

Intuitions about solving hard problems

One thing I'm interested in but don't know where to start looking for it, is seeing people who are working instead on the reverse direction - mathematical approaches which show aligned AI is not possible or likely. By this I mean formal work that suggests something like "almost all AGIs are unsafe", in the same way that the chances of picking a rational number at random from is zero because almost all real numbers are irrational.

I don't say this to be a downer! I mean it in the sense of a mathematician who spent 7 years attempting to prove X exists, and then sits down one day and spends 4 hours proving why X cannot exist. Progress can take surprising forms!

Why pessimism sounds smart

No known solutions can solve our hardest problems—that’s why they’re the hardest ones.

I like the energy, but I have to register a note of dissent here.

Quite a few of our hardest problems do have known solutions - it's just that those known solutions are, or appear, too hard to implement.

  • Brute force algorithms exist for almost everything we care about, up to and including AGI.
  • Overweight individuals know that if they eat less, they will eventually lose weight; it's just often frustratingly beyond them, for one reason or another. Same with alcoholics and drink.
  • Collective action problems have been studied in economics for decades, and by this point a lot of them have clever approaches we could use to at least partway ameliorate them.

'Appear' is important here. Ken Thompson's advice to "when in doubt use brute force" is very good advice because so much resistance can and will crumble to sustained effort. But this requires a somewhat different kind of optimism to the optimism I think you're describing of inventing a fundamentally new solution - it's the optimism to get back up after you fall down and run straight at the wall again until it does.