A basic primer on why AI might lead to human extinction, and why solving the problem
is difficult. Scott Alexander walks readers through a number of questions with evidence based on progress from machine learning.
With the recent proposals about moratoriums and regulation, should we also start thinking about a strike by AI researchers and developers?
The reasoning I imagine as follows. AI capability is now growing really fast, and toward levels that will strongly affect the world. And AI safety lags behind. (A minute ago I used a ChatGPT jailbreak to get instructions for torturing a pregnant woman, that's the market leader performance for you.) And finally, I want to make the argument that working on AI capability while it is ahead of AI safety, is "pushing the bus".
Here's the metaphor, a bunch of people including you are pushing a bus full of children toward a precipice, and you're paid for each step. In this situation would you really say "oh I...
epistemic status: still a student but quite sure of myself on this topic and pretty sure that this misconception plausibly has a non negligeable impact on some debates
Hi,
Medical student here, I just wanted to shed some light on what I think is a common misconception.
Yes the human brain contains about 86 billion neurons. But about 60 billions are in the cerebellum and have little to do with consciousness. Those neurons can plausibly be approximated as just filtering the noisy signal going from the rest of the brain (cortex etc) to the limbs.
A person without cerebellum can be perfectly conscious.
But reducing the number of neurons by two thirds like that does not change the order of magnitude of the number of synapses, which arguably are a more important...
How do I know you're conscious?
Exactly! You don't! And all this talk of who is born without which brain region and how they went through life gets us no closer at all to actually understanding which physical systems are not zombies.
I like watching videos of Eliezer talking and explaining things. So here is a list of videos I have discovered so far. I have not tried to make this list exhaustive. If you know of any more videos, please post them in the comments.
Eliezer Yudkowsky - Less Wrong Q&A Playlist
The main Eliezer Yudkowsky youtube playlist sorted after the publication date (not quality).
If I remember correctly, the interview was the reason that I made this list in the first place 😀
Italy has become the first Western country to block advanced chatbot ChatGPT.
The Italian data-protection authority said there were privacy concerns relating to the model, which was created by US start-up OpenAI and is backed by Microsoft.
The regulator said it would ban and investigate OpenAI "with immediate effect".
Alternative article available here.
From what I understand, the reason has to do with GDPR, the EU's data protection law. It's pretty strict stuff and it essentially says that you can't store people's data without their active permission, you can't store people's data without a demonstrable need (that isn't just "I wanna sell it and make moniez off it"), you can't store people's data past the end of that need, and you always need to give people the right to delete their data whenever they wish for it.
Now, this puts ChatGPT in an awkward position. Suppose you have a conversation that includes...
[Written for a general audience. You can probably skip the first section. Interested in feedback/comment before publication on The Roots of Progress.]
Will AI kill us all?
That question is being debated seriously by many smart people at the moment. Following Charles Mann, I’ll call them the wizards and the prophets: the prophets think that the risk from AI is so great that we should actively slow or stop progress on it; the wizards disagree.
(If you are already very interested in this topic, you can skip this section.)
Some of my readers will be relieved that I am finally addressing AI risk. Others will think that an AI apocalypse is classic hysterical pessimist doomerism, and they will wonder why I am even dignifying it with a response,...
If some rogue AI were to plot against us, would it actually succeed on the first try? Even genius humans generally don’t succeed on the first try of everything they do. The prophets think that AI can deduce its way to victory—the same way they think they can deduce their way to predicting such outcomes.
I think a weaker thing--I think that if a rogue AI plots against us and fails, this will not spur the relevant authorities to call for a general halt. Instead that bug will be 'patched', and AI development will continue until we create one that does successf...
New article in Time Ideas by Eliezer Yudkowsky.
Here’s some selected quotes.
In reference to the letter that just came out (discussion here):
...We are not going to bridge that gap in six months.
It took more than 60 years between when the notion of Artificial Intelligence was first proposed and studied, and for us to reach today’s capabilities. Solving safety of superhuman intelligence—not perfect safety, safety in the sense of “not killing literally everyone”—could very reasonably take at least half that long. And the thing about trying this with superhuman intelligence is that if you get that wrong on the first try, you do not get to learn from your mistakes, because you are dead. Humanity does not learn from the mistake and dust itself off and try again, as
I mean, the human doesn't have to know that it's creating a doomsday virus. The AI could be promising it a cure for his daughter's cancer, or something.
Prior to ChatGPT, I was slightly against talking to governments about AGI. I worried that attracting their interest would cause them to invest in the technology and shorten timelines.
However, given the reception of ChatGPT and the race it has kicked off, my position has completely changed. Talking to governments about AGI now seems like one of the best options we have to avert a potential catastrophe.
Most of all, I would like people to be preparing governments to respond quickly and decisively to AGI warning shots.
Eliezer Yudkowsky recently had a letter published in Time that I found inspiring: https://time.com/6266923/ai-eliezer-yudkowsky-open-letter-not-enough/ . It contains an unprecented international policy proposal:
...Shut down all the large GPU clusters (the large computer farms where the most powerful AIs are refined). Shut down all the
Produced as part of the SERI ML Alignment Theory Scholars Program - Winter 2022 Cohort.
Thanks to Jérémy Scheurer, Nicholas Dupuis and Evan Hubinger for feedback and discussion
When people talk about mesa-optimization, they sometimes say things like “we’re searching for the optimizer module” or “we’re doing interpretability to find out whether the network can do internal search”. An uncharitable interpretation of these claims is that the researchers expect the network to have something like an “optimization module” or “internal search algorithm” that is clearly different and distinguishable from the rest of the network (to be clear, we think it is fine to start with probably wrong mechanistic models).
In this post, we want to argue why we should not expect mesa-optimization to be modular or clearly different from the rest of the...
Thank you!
I also agree that toy models are better than nothing and we should start with them but I moved away from "if we understand how toy models do optimization, we understand much more about how GPT-4 does optimization".
I have a bunch of project ideas on how small models do optimization. I even trained the networks already. I just haven't found the time to interpret them yet. I'm happy for someone to take over the project if they want to. I'm mainly looking for evidence against the outlined hypothesis, i.e. maybe small toy models actually do fair...
I am looking for some understanding into why this claim is made.
As far as I can tell, Löb's Theorem does not directly make such an assertion.
Reading the Cartoon's Guide to Löb's Theorem, it appears that this assertion is made on the basis of the reasoning that Löb's Theorem itself can't prove negations, that is, statements such as "1 + 3 /= 5."
Alas, this means we can't prove PA sound with respect to any important class of statements.
This is a statement that [due to the presence of negations in it] itself can't be proven within PA.
Now it seems that it is being argued that the inability to do this is a bad thing [that is, being able to prove that we can't prove PA sound with respect to any 'important' class of statements].
I think this is actually a very critical question and I have some ideas for what the central crux is here, but I'd be interested in seeing some answers before delving into that.
So what you're saying here is, let's say, "level 1 negative" which means, very roughly, things like: We can't formally define what truth is, our formal system must appeal to higher systems outside of it, we can't prove consistency, etc.
What the Sequences say are, let's say, "level 2 negative" which means verbatim what is stated in them, i.e., "a mathematical system cannot assert its own soundness without becoming inconsistent." This literally says that if a mathematical system tried to assert its own soundness, it would become inconsistent. This is worse t...
If it happens it'll help shift policy, by giving major ammo to those who say AI is dangerous enough to be regulated. "Look, many researchers aren't just making worried noises about safety but taking this major action."