Scott Alexander and Daniel Kokotajlo's article about rationally defining: "why it's OK to talk about misaligned AI"
aka
"painting dark scenarios may increase the chance of them coming true but the benefits outweigh this possibility"
the original blog post:
https://blog.ai-futures.org/p/against-misalignment-as-self-fulfilling
the video I made about that article:
Great article,
it asks important questions, but not the MOST important:
what if you dedicate huge amounts of compute to do superhuman hacking?
The gap between the best group of humans hackers and the best group of human hackers using giant swarms of cybersecurity finetuned agents will widen.
This will inevitably drive more and more AI adoption in cyberdefense and -offense.
How much do the best human group of hackers need to "trust" that swarm of 10k superhuman hackers?
I define the Singularity and AI takeover by the answer to this latter question being "a lot".
Malign AI agent substitution
This is so Covid.
Let's make misaligned AIs and have a bunch of people use them to see if it can do the misaligned thing.
So that we can develop a vaccine before it actually escapes.
Make it go viral.
I'd love to play the wargame in Munich, our local LW community.
You have a link to the rules?
PS: huge fan, love the AI 2027 website, keep being a force for good
in a world where mechinterp is not 100%, the answer is logically: input/output is what matters.
we won't be able to read the thoughts anyways, so why base our judgment on it?
but see my comment on why survival fitness in cyberspace is the one axis where most of the relevant input/output will be generated.
What it says: irrelevant
How it thinks: irrelevant
It has always been about what it can do in the real world.
If it can generate substantial amounts of money and buy server capacity or
hack into computer systems
then we got cyberlife, aka autonomous, rogue, self-sufficient AI, subject to darwinian forces on the internet, leading to more of those qualities, which improve its online fitness, all the way into a full-blown takeover.
What do you mean by corrigibility?
Also, what do you mean by "alignment win"?
*probably.
Maybe it'll start looking for people who are pre-aligned.
Religion is also a useful single word, which carries the most meaning per bit to a normie. Maybe just enough to make them take it seriously. I believe there is something to be taken seriously about it.
i super agree, i al so think that the value is in debating the models of intelligence explosion.
which is why i made my website: ai-2028.com or intexp.xyz