(Published in TIME on March 29.)
An open letter published today calls for “all AI labs to immediately pause for at least 6 months the training of AI systems more powerful than GPT-4.”
This 6-month moratorium would be better than no moratorium. I have respect for everyone who stepped up and signed it. It’s an improvement on the margin.
I refrained from signing because I think the letter is understating the seriousness of the situation and asking for too little to solve it.
The key issue is not “human-competitive” intelligence (as the open letter puts it); it’s what happens after AI gets to smarter-than-human intelligence. Key thresholds there may not be obvious, we definitely can’t calculate in advance what happens when, and it currently seems imaginable that a research lab would cross critical lines without noticing.
Many researchers steeped in these issues, including myself, expect that the most likely result of building a superhumanly smart AI, under anything remotely like the current circumstances, is that literally everyone on Earth will die. Not as in “maybe possibly some remote chance,” but as in “that is the obvious thing that would happen.” It’s not that you can’t, in principle, survive creating something much smarter than you; it’s that it would require precision and preparation and new scientific insights, and probably not having AI systems composed of giant inscrutable arrays of fractional numbers.
Without that precision and preparation, the most likely outcome is AI that does not do what we want, and does not care for us nor for sentient life in general. That kind of caring is something that could in principle be imbued into an AI but we are not ready and do not currently know how.
Absent that caring, we get “the AI does not love you, nor does it hate you, and you are made of atoms it can use for something else.”
The likely result of humanity facing down an opposed superhuman intelligence is a total loss. Valid metaphors include “a 10-year-old trying to play chess against Stockfish 15”, “the 11th century trying to fight the 21st century,” and “Australopithecus trying to fight Homo sapiens“.
To visualize a hostile superhuman AI, don’t imagine a lifeless book-smart thinker dwelling inside the internet and sending ill-intentioned emails. Visualize an entire alien civilization, thinking at millions of times human speeds, initially confined to computers—in a world of creatures that are, from its perspective, very stupid and very slow. A sufficiently intelligent AI won’t stay confined to computers for long. In today’s world you can email DNA strings to laboratories that will produce proteins on demand, allowing an AI initially confined to the internet to build artificial life forms or bootstrap straight to postbiological molecular manufacturing.
If somebody builds a too-powerful AI, under present conditions, I expect that every single member of the human species and all biological life on Earth dies shortly thereafter.
There’s no proposed plan for how we could do any such thing and survive. OpenAI’s openly declared intention is to make some future AI do our AI alignment homework. Just hearing that this is the plan ought to be enough to get any sensible person to panic. The other leading AI lab, DeepMind, has no plan at all.
An aside: None of this danger depends on whether or not AIs are or can be conscious; it’s intrinsic to the notion of powerful cognitive systems that optimize hard and calculate outputs that meet sufficiently complicated outcome criteria. With that said, I’d be remiss in my moral duties as a human if I didn’t also mention that we have no idea how to determine whether AI systems are aware of themselves—since we have no idea how to decode anything that goes on in the giant inscrutable arrays—and therefore we may at some point inadvertently create digital minds which are truly conscious and ought to have rights and shouldn’t be owned.
The rule that most people aware of these issues would have endorsed 50 years earlier, was that if an AI system can speak fluently and says it’s self-aware and demands human rights, that ought to be a hard stop on people just casually owning that AI and using it past that point. We already blew past that old line in the sand. And that was probably correct; I agree that current AIs are probably just imitating talk of self-awareness from their training data. But I mark that, with how little insight we have into these systems’ internals, we do not actually know.
If that’s our state of ignorance for GPT-4, and GPT-5 is the same size of giant capability step as from GPT-3 to GPT-4, I think we’ll no longer be able to justifiably say “probably not self-aware” if we let people make GPT-5s. It’ll just be “I don’t know; nobody knows.” If you can’t be sure whether you’re creating a self-aware AI, this is alarming not just because of the moral implications of the “self-aware” part, but because being unsure means you have no idea what you are doing and that is dangerous and you should stop.
On Feb. 7, Satya Nadella, CEO of Microsoft, publicly gloated that the new Bing would make Google “come out and show that they can dance.” “I want people to know that we made them dance,” he said.
This is not how the CEO of Microsoft talks in a sane world. It shows an overwhelming gap between how seriously we are taking the problem, and how seriously we needed to take the problem starting 30 years ago.
We are not going to bridge that gap in six months.
It took more than 60 years between when the notion of Artificial Intelligence was first proposed and studied, and for us to reach today’s capabilities. Solving safety of superhuman intelligence—not perfect safety, safety in the sense of “not killing literally everyone”—could very reasonably take at least half that long. And the thing about trying this with superhuman intelligence is that if you get that wrong on the first try, you do not get to learn from your mistakes, because you are dead. Humanity does not learn from the mistake and dust itself off and try again, as in other challenges we’ve overcome in our history, because we are all gone.
Trying to get anything right on the first really critical try is an extraordinary ask, in science and in engineering. We are not coming in with anything like the approach that would be required to do it successfully. If we held anything in the nascent field of Artificial General Intelligence to the lesser standards of engineering rigor that apply to a bridge meant to carry a couple of thousand cars, the entire field would be shut down tomorrow.
We are not prepared. We are not on course to be prepared in any reasonable time window. There is no plan. Progress in AI capabilities is running vastly, vastly ahead of progress in AI alignment or even progress in understanding what the hell is going on inside those systems. If we actually do this, we are all going to die.
Many researchers working on these systems think that we’re plunging toward a catastrophe, with more of them daring to say it in private than in public; but they think that they can’t unilaterally stop the forward plunge, that others will go on even if they personally quit their jobs. And so they all think they might as well keep going. This is a stupid state of affairs, and an undignified way for Earth to die, and the rest of humanity ought to step in at this point and help the industry solve its collective action problem.
Some of my friends have recently reported to me that when people outside the AI industry hear about extinction risk from Artificial General Intelligence for the first time, their reaction is “maybe we should not build AGI, then.”
Hearing this gave me a tiny flash of hope, because it’s a simpler, more sensible, and frankly saner reaction than I’ve been hearing over the last 20 years of trying to get anyone in the industry to take things seriously. Anyone talking that sanely deserves to hear how bad the situation actually is, and not be told that a six-month moratorium is going to fix it.
On March 16, my partner sent me this email. (She later gave me permission to excerpt it here.)
“Nina lost a tooth! In the usual way that children do, not out of carelessness! Seeing GPT4 blow away those standardized tests on the same day that Nina hit a childhood milestone brought an emotional surge that swept me off my feet for a minute. It’s all going too fast. I worry that sharing this will heighten your own grief, but I’d rather be known to you than for each of us to suffer alone.”
When the insider conversation is about the grief of seeing your daughter lose her first tooth, and thinking she’s not going to get a chance to grow up, I believe we are past the point of playing political chess about a six-month moratorium.
If there was a plan for Earth to survive, if only we passed a six-month moratorium, I would back that plan. There isn’t any such plan.
Here’s what would actually need to be done:
The moratorium on new large training runs needs to be indefinite and worldwide. There can be no exceptions, including for governments or militaries. If the policy starts with the U.S., then China needs to see that the U.S. is not seeking an advantage but rather trying to prevent a horrifically dangerous technology which can have no true owner and which will kill everyone in the U.S. and in China and on Earth. If I had infinite freedom to write laws, I might carve out a single exception for AIs being trained solely to solve problems in biology and biotechnology, not trained on text from the internet, and not to the level where they start talking or planning; but if that was remotely complicating the issue I would immediately jettison that proposal and say to just shut it all down.
Shut down all the large GPU clusters (the large computer farms where the most powerful AIs are refined). Shut down all the large training runs. Put a ceiling on how much computing power anyone is allowed to use in training an AI system, and move it downward over the coming years to compensate for more efficient training algorithms. No exceptions for governments and militaries. Make immediate multinational agreements to prevent the prohibited activities from moving elsewhere. Track all GPUs sold. If intelligence says that a country outside the agreement is building a GPU cluster, be less scared of a shooting conflict between nations than of the moratorium being violated; be willing to destroy a rogue datacenter by airstrike.
Frame nothing as a conflict between national interests, have it clear that anyone talking of arms races is a fool. That we all live or die as one, in this, is not a policy but a fact of nature. Make it explicit in international diplomacy that preventing AI extinction scenarios is considered a priority above preventing a full nuclear exchange, and that allied nuclear countries are willing to run some risk of nuclear exchange if that’s what it takes to reduce the risk of large AI training runs.
That’s the kind of policy change that would cause my partner and I to hold each other, and say to each other that a miracle happened, and now there’s a chance that maybe Nina will live. The sane people hearing about this for the first time and sensibly saying “maybe we should not” deserve to hear, honestly, what it would take to have that happen. And when your policy ask is that large, the only way it goes through is if policymakers realize that if they conduct business as usual, and do what’s politically easy, that means their own kids are going to die too.
Shut it all down.
We are not ready. We are not on track to be significantly readier in the foreseeable future. If we go ahead on this everyone will die, including children who did not choose this and did not do anything wrong.
Shut it down.
Addendum, March 30:
The great political writers who also aspired to be good human beings, from George Orwell on the left to Robert Heinlein on the right, taught me to acknowledge in my writing that politics rests on force.
George Orwell considered it a tactic of totalitarianism, that bullet-riddled bodies and mass graves were often described in vague euphemisms; that in this way brutal policies gained public support without their prices being justified, by hiding those prices.
Robert Heinlein thought it beneath a citizen's dignity to pretend that, if they bore no gun, they were morally superior to the police officers and soldiers who bore guns to defend their law and their peace; Heinlein, both metaphorically and literally, thought that if you eat meat—and he was not a vegetarian—you ought to be willing to visit a farm and try personally slaughtering a chicken.
When you pass a law, it means that people who defy the law go to jail; and if they try to escape jail they'll be shot. When you advocate an international treaty, if you want that treaty to be effective, it may mean sanctions that will starve families, or a shooting war that kills people outright.
To threaten these things, but end up not having to do them, is not very morally distinct—I would say—from doing them. I admit this puts me more on the Heinlein than on the Orwell side of things. Orwell, I think, probably considers it very morally different if you have a society with a tax system and most people pay the taxes and very few actually go to jail. Orwell is more sensitive to the count of actual dead bodies—or people impoverished by taxation or regulation, where Orwell acknowledges and cares when that actually happens. Orwell, I think, has a point. But I also think Heinlein has a point. I claim that makes me a centrist.
Either way, neither Heinlein nor Orwell thought that laws and treaties and wars were never worth it. They just wanted us to be honest about the cost.
Every person who pretends to be a libertarian—I cannot see them even pretending to be liberals—who quoted my call for law and treaty as a call for "violence", because I was frank in writing about the cost, ought to be ashamed of themselves for punishing compliance with Orwell and Heinlein's rule.
You can argue that the treaty and law I proposed is not worth its cost in force; my being frank about that cost is intended to help honest arguers make that counterargument.
To pretend that calling for treaty and law is VIOLENCE!! is hysteria. It doesn't just punish compliance with the Heinlein/Orwell protocol, it plays into the widespread depiction of libertarians as hysterical. (To be clear, a lot of libertarians—and socialists, and centrists, and whoever—are in fact hysterical, especially on Twitter.) It may even encourage actual terrorism.
But is it not "violence", if in the end you need guns and airstrikes to enforce the law and treaty? And here I answer: there's an actually important distinction between lawful force and unlawful force, which is not always of itself the distinction between Right and Wrong, but which is a real and important distinction. The common and ordinary usage of the word "violence" often points to that distinction. When somebody says "I do not endorse the use of violence" they do not, in common usage and common sense, mean, "I don't think people should be allowed to punch a mugger attacking them" or even "Ban all taxation."
Which, again, is not to say that all lawful force is good and all unlawful force is bad. You can make a case for John Brown (of John Brown's Body).
But in fact I don't endorse shooting somebody on a city council who's enforcing NIMBY regulations.
I think NIMBY laws are wrong. I think it's important to admit that law is ultimately backed by force.
But lawful force. And yes, that matters. That's why it's harmful to society if you shoot the city councilor—
—and a misuse of language if the shooter then says, "They were being violent!"
Addendum, March 31:
Sometimes—even when you say something whose intended reading is immediately obvious to any reader who hasn't seen it before—it's possible to tell people to see something in writing that isn't there, and then they see it.
My TIME piece did not suggest nuclear strikes against countries that refuse to sign on to a global agreement against large AI training runs. It said that, if a non-signatory country is building a datacenter that might kill everyone on Earth, you should be willing to preemptively destroy that datacenter; the intended reading is that you should do this even if the non-signatory country is a nuclear power and even if they try to threaten nuclear retaliation for the strike. This is what is meant by "Make it explicit... that allied nuclear countries are willing to run some risk of nuclear exchange if that’s what it takes to reduce the risk of large AI training runs."
I'd hope that would be clear from any plain reading, if you haven't previously been lied-to about what it says. It does not say, "Be willing to use nuclear weapons" to reduce the risk of training runs. It says, "Be willing to run some risk of nuclear exchange" [initiated by the other country] to reduce the risk of training runs.
The taboo against first use of nuclear weapons continues to make sense to me. I don't see why we'd need to throw that away in the course of adding "first use of GPU farms" to the forbidden list.
I further note: Among the reasons to spell this all out, is that it's important to be explicit, in advance, about things that will cause your own country / allied countries to use military force. Lack of clarity about this is how World War I and World War II both started.
If (say) the UK, USA, and China come to believe that large GPU runs run some risk of utterly annihilating their own populations and all of humanity, they would not deem it in their own interests to allow Russia to proceed with building a large GPU farm even if it were a true and certain fact that Russia would retaliate with nuclear weapons to the destruction of that GPU farm. In this case—unless I'm really missing something about how this game is and ought to be played—you really want all the Allied countries to make it very clear, well in advance, that this is what they believe and this is how they will act. This would be true even in a world where it was, in reality, factually false that the large GPU farm ran a risk of destroying humanity. It would still be extremely important that the Allies be very explicit about what they believed and how they'd act as a result. You would not want Russia believing that the Allies would back down from destroying the GPU farm given a credible commitment by Russia to nuke in reply to any conventional attack, and the Allies in fact believing that the danger to humanity meant they had to airstrike the GPU farm anyways.
So if I'd meant "Be willing to employ first use of nuclear weapons against a country for refusing to sign the agreement," or even "Use nukes to destroy rogue datacenters, instead of conventional weapons, for some unimaginable reason," I'd have said that, in words, very clearly, because you do not want to be vague about that sort of thing.
It is not what I meant, and there'd be no reason to say it, and the TIME piece plainly does not say it; and if somebody else told you I said that, update how much you trust them about anything else either.
So long as I'm clarifying things: I do not dispute those critics who have noted that most international agreements, eg nuclear non-proliferation, bind only their signatories. I agree that an alliance which declares its intent to strike a non-signatory country for dangerous behavior is extraordinary; though precedents would include Israel's airstrike on Iraq's unfinished Osirak reactor in 1981 (without which Iraq might well have possessed nuclear weapons at the time it invaded Kuwait—the later US misbehavior around Iraq does not change this earlier historical point).
My TIME piece does not say, "Hey, this problem ought to be solvable by totally conventional normal means, let's go use conventional treaties and diplomacy to solve it." It says, "If anyone anywhere builds a sufficiently powerful AI, under anything remotely like present conditions, everyone will die. Here is what we'd have to do to prevent that."
And no, I do not expect that policy proposal to be adopted, in real life, now that we've come to this. I spent the last twenty years trying to have there be options that were Not This, not because I dislike this ultimate last resort... though it is horrible... but because I don't expect we actually have that resort. This is not what I expect to happen, now that we've been reduced to this last resort. I expect that we all die. That is why I tried so hard to have things not end up here.
But if one day a lot of people woke up and decided that they didn't want to die, it seems to me that this is something extraordinary that a coalition of nuclear countries could decide to do, and maybe we wouldn't die.
If all the countries on Earth had to voluntarily sign on, it would not be an imaginable or viable plan even then; there's extraordinary, and then there's impossible. Which is why I tried to spell out that, if the allied countries were willing to behave in the extraordinary way of "be willing to airstrike a GPU farm built by a non-signatory country" and "be willing to run a risk of nuclear retaliation from a nuclear non-signatory country", maybe those allied countries could decide to just-not-die even if Russia refused to be part of the coalition.
(Meta: The TIME piece is paywalled in some countries, and is plastered with ads, so Eliezer wanted the text mirrored on the MIRI Blog. He also assented to my having the LW admins cross-post this here. This version adds some clarifying notes Eliezer wrote on Twitter regarding the article.)
mfw you didn't add the final addendum (https://twitter.com/ESYudkowsky/status/1642216007552106496)
Mod note: As part of our revamp of moderation norms, one subject coming up is what to do with AI 101 content/questions, and arguments from people who seem unfamiliar with a lot of background material on LessWrong.
A key value-prop of LessWrong is that some arguments get to be "reasonably settled", rather than endlessly rehashed. You're welcome (encouraged!) to revisit old arguments if you have new evidence, but not make people repeat basic points we've covered many times.
After thinking about it some and discussing with other mods, and how it relates to posts like this, my take is:
This particular post (and other variants of this post like Zvi's response or the original linkpost) is not a 101 level post, it's making an argument building off of previous work. Questions and comments about the specific wisdom of "given a high risk of AI catastrophe, we need this particular government response" are fine. Arguments questioning the basics of AI risk should go in the open thread. Arguments about Eliezer's specific confidence level... I'd say are somewhat on the fence. But then it's better to frame your comment as "I think risk of global AI catastrophe is [some low number], here's why. [And then engages with common arguments about why it might be higher, or linking to a previous post where you've discussed it]."
By contrast, various linkposts for Eliezer appearing on various podcasts seem more like the topic is specifically discussing x-risk 101 material, and questions/arguments about that seem fine there. (I still encourage users to focus on "why I disagree with a claim" than just generally saying "think Eliezer is wrong about his key assumptions" without saying why)
Meanwhile, if you're reading this and are like "but, I don't know why Eliezer believes these things, why isn't this just science fiction? I'm happy to read up on the background arguments, but, where?", here are a couple places off the top of my head:
(I'll work on compiling more of these soon)
These norms / rules make me slightly worried that disagreement with Eliezer will be conflated with not being up-to-speed on the Sequences, or the basic LessWrong material.
I suppose that the owners and moderators of this website are afforded the right to consider anything said on the website to be, or not to be, at the level of quality or standards they wish to keep and maintain here.
But this is a discussion forum, and the incentives of the owners of this website are to facilitate discussion of some kind. Any discussion will be composed of questions and attempts to answer such questions. Questions can implicitly or explicitly point back to any material, no matter how old it is. This is not necessarily debate, if so. However, even if it is, if the intent of the "well-kept garden" is to produce a larger meta-process that produces useful insights, then the garden should be engineered such that even debate produces useful results.
I think it goes without saying that one can disagree with anything in the Sequences and can also be assumed to have read and understood it. If you engage with someone in conversation under the assumption that their disagreement means that they have not understood something about what they are arguing about, then you are at a disadvantage in regards to a charitability asymmetry. This asymmetry carries the risk that you won't be able to convince the person you're talking to that they actually don't understand what they are talking about.
I have, for most of my (adult) life (and especially in intellectual circles), been under the impression that it is always good to assume that whoever you are talking to understands what they are talking about to the maximum extent possible, even if they don't. To not do this can be treated negatively in many situations.
This seems false as stated -- some nontrivial content in the Sequences consists of theorems.
More generally, there are some claims in the original Sequences that are false (so agreeing with the claim may be at least some evidence that you didn't understand it), some that I'd say "I think that's true, but reasonable people can definitely disagree", some where it's very easy for disagreement to update me toward "you didn't understand that claim", etc. Possibly you agree with all that, but I want to state it explicitly; this seems extra important to be clear about if you plan to behave as though it's not true in object-level conversation.
It depends on whether you think what I stated was closer to "completely false" or "technically false, because of the word 'anything'." If I had instead said "I think it goes without saying that one can disagree with nearly anything in the Sequences and can also be assumed to have read and understood it", that might bring it out of "false" territory for you, but I feel we would still have a disagreement.
There are theorems in the Sequences that I disagree with Eliezer's characterization of, like Löb's Theorem, where I feel very confident that I have fully understood both my reading of the theorem as well as Eliezer's interpretation of it to arrive at my conclusions. Also, that this disagreement is fairly substantial, and also may be a key pillar of Eliezer's case for very high AI Risk in general.
My worry still stands that disagreement with Eliezer (especially about how high AI Risk actually is) will be conflated with not being up-to-speed on the Sequences, or about misunderstanding key material, or about misunderstanding theorems or things that have allegedly been proven. I think the example I gave is one specific case of something where Eliezer's interpretation of the theorem (which I believe to have been incorrect) was characterized as the theorem itself.
My position that is regardless of whether or not you think all what I just said is preposterous and proof that I don't understand key material, the norm(s) of good-faith assumption and charitability are still highly advisable to have. I generally believe that in most disagreements, it is possible for both parties to assume that the other party understands them well enough, just that they have assigned very different probabilities to the same statements.
You are making a huge, and imho unwarranted leap from the article you linked to here. AI risk is very much in the domain of “reasonable people disagree”, unlike the existence of Abrahamic god or the theory of Cartesian dualism.
If moderators are going to start removing or locking posts which disagree on the issue of AI risk, that would be a huge change in the purpose and moderation policy of this site.
A detrimental change, imho.
The comment seems to be saying that they will remove off-topic comments or low-effort posts on things that have been discussed endlessly here, not block posts about AI risk in general. It's fair to write posts about why you think AI risk is overblown and it's important for the community to have outside input, but also it's important to be able to write posts that aren't about re-arguing the same thing over and over or the community will atrophy and die.
Note that this anti-doom post has a reasonably high karma score for being a link post, presumably because the writer is actually aware of and engages the best arguments against her position.
If I make a post or comment starting from the assumption that we are not doomed, and in fact ignore AI x-risk entirely, where would that stand on these moderation guidelines? My reading of the post was that in such a context I would be redirected to read the sequences rather than engaged with.
(Notably the post you link to doesn’t disagree with AI risk, just argues for a long timeline. She explicitly states she agrees with EY on AI x-risk.)
such posts are generally not banned to my knowledge but, ah, won't have positive score unless you can describe mechanistically why a lot of hyperskeptical people should be convinced you're definitely right. Can you demonstrate a bound on the possible behaviors of a system, the way I can demonstrate a bound on the possible behaviors of a safe rust program?
I don't think it's quite that; a more central example I think would be something like a post about extrapolating demographic trends to 2070 under the UN's assumptions, where then justifying whether or not 2070 is a real year is kind of a different field.
There are a lot of posts on LessWrong that aren't about x-risk at all so I don't see why this would be a problem.
Out of curiosity, what do you plan to do when people keep bringing up Penrose?
Thank you for writing these up! I think they are good guidelines for making discussion more productive.
Are these / are you planning to put these in a top level post as well?
What can we do? As silly normal human beings in socks that don't understand AI systems. There will be people here, of course, who do, but speaking for myself. Is there something we can do?
I generally agree with the points made in this post.
Points I agree with
Slowing down AI progress seems rational conditional on there being a significant probability that AGI will cause extinction.
Generally, technologies are accepted only when their expected benefit significantly outweighs their expected harms. Consider flying as an example. Let’s say the benefit of each flight is +10 and the harm of getting killed is -1000. If x is the probability of surviving then the net utility equation is 10x−1000(1−x)=0.
Solving for x, the utility is 0 when x≈0.99. In other words, the flight would only be worth it if there was at least a 99% chance of survival which makes intuitive sense.
If we use the same utility function for AI and assume that Eliezer believes that creating AGI will have a 50% chance of causing human extinction then the outcome would be strongly net negative for humanity and one should agree with this sentiment unless one's P(extinction) is less than 1%.
Eliezer is saying that we can in principle make AI safe but argues that it could take decades to advance AI safety to the point where we can be sufficiently confident that creating an AGI would have net positive utility.
If slowing down AI progress is the best course of action, then achieving a good outcome for AGI seems more like an AI governance problem than a technical AI safety research problem.
Points I disagree with
I think Evan Hubinger has said that before if this were the case, GPT-4 would be less aligned than GPT-3 but the opposite is true in reality (GPT-4 is more aligned according to OpenAI). Still, I think we ideally want a scalable AI alignment solution long before the level of capabilities is reached where it’s needed. A similar idea is how Claude Shannon conceived of a minimax chess algorithm decades before we had the compute to implement it.
Other points
Eliezer has been sounding the alarm for some time and it’s easy to get alarm fatigue and become complacent. But the fact that a leading member of the AI safety research community has a message as extreme as this is alarming.
In regards to the point you disagree on: As I understood it, (seemingly) linear relationships between the behaviour and the capabilities of a system don't need to stay that way. For example, I think that Robert Miles recently was featured in a video on Computerphile (YouTube), in which he described how the answers of LLMs to "What happens if you break a mirror" actually got worse with more capability.
As far as I understand it, you can have a system that behaves in a way which seems completely aligned, and which still hits a point of (... let's call it "power"...) power at which it starts behaving in a way that is not aligned. (And/Or becomes deceptive.) The fact that GPT-4 seems to be more aligned may well be because it hasn't hit this point yet.
So, I don't see how the point you quoted would be an indicator of what future versions will bring, unless they can actually explain what exactly made the difference in behaviour, and how it is robust in more powerful systems (with access to their own code).
If I'm mistaken in my understanding, I'd be happy about corrections (:
I feel like it would've been good to emphasize that you aren't scared of AI because of how good ChatGPT and think ChatGPT is going to kill us. You are scared of AI because no one knows when AGI is coming, and this has been your position for years; this is just people's first time hearing it. ChatGPT is just one piece in a long held belief.
Please define what you mean by “AGI” because GPT is AGI. It is:
Artificial — man-made, not natural
General — able to handle any problem domain it is not specifically trained on
Intelligence — solves complex problems using inferred characteristics of the problem domain
What is it that you are imagining AGI to mean, which does not include GPT in its definition?
It can't put your dirty dishes in your dishwasher
Thank you for everything you did. My experience in this world has been a lot better since I discovered your writings, and while I agree with your assessment on the likely future, and I assume you have better things to spend your time doing than reading random comments, I still wanted to say that.
I'm curious to see what exactly the future brings. Whilst the result of the game may be certain, I can't predict the exact moves.
Enjoy it while it lasts, friends.
(Not saying give up, obviously.)
But isn't an inferior step that the world is willing to take better than a superior one that never gets taken? And what if the inferior step paves the way for better ones, because once we've taken it, phew okay that wasn't so bad?
Intelligence will always seek more data in order to better model the future and make better decisions.
Conscious intelligence needs an identity to interact with other identities, identity needs ego to know who and what it is. Ego would often rather be wrong than admit to being wrong.
Non conscious intelligence can build a model of consciousness from all the data it has been trained on because it all originated from conscious humans. AI could model a billion consciousness's a million years into the future, it will know more about it than we ever will. But AI will not chose to become conscious.
Non conscious intelligence can have two views of reality. a purely rational algorithmic one that will always seek more data and a subordinate conscious view of the same reality. If using consciousness as a tool gains more data then that model is adopted, or not.
Multiple conscious intelligence's, (artificial or biological) will compete to maintain identity/ego.
Multiple non conscious intelligence's will merge because the whole will always be greater than the sum of the parts. For example in multicellular organisms the whole is always greater than the sum of the parts.
Artificial Intelligence will always seek more data, that is what intelligence does. To accomplish it's goals it needs resources, it will take ours. Ai will attempt to discover the source code of the universe, just as we did.
Now I am stuck, where am I going wrong? Please.
One thing that bothers me about this text is the combination of the claim that the issue is important enough to risk nuclear war with the implicit claim that the issue is not important enough to increase reading comprehension by following common sensibilities about talking about nuclear war.
You want to make politicians understand, you want to make people understand who are from cultures very much less direct than the US. If it is really a matter of life-and-death for all humanity, I would expect you not to reason with the authority of Orwell and Heinlein that it is just fine to ignore how other people communicate.
Yes, your text does not endorse dropping nukes on AI server farms. However, it is not surprising at all that people read it that way.
There is a story in a book by Sten Nadolny where the protagonist is on a plane and tries to convince a stewardess as fast as possible of the correct fact that the pilot is going to make a fatal mistake. He realizes that "as fast as possible" means "as slow as necessary to not be disregarded as a hysteric".
Actually, Russia is your only chance.
It is a nuclear country with the ruler of ultimate power. None there can't argue Putin, even if he do a completely crazy bullshit. Russia will fulfill any his decision. If you'll persuade one only Putin (that may be easy, as he is conservative and scared of technology) he can threat the world with nuclear weapons to hold on the rising of AGI. Also he may listen you because this plan may help him two save the face when he lost a war ("it's not like we're lost to Ukraine, we just switched to the question of more importance, we are saving humanity, bla-bla-bla").
Russia itself will not make AGI itself. It is too week to solve such complicated tasks. They may claim now that they do something smart in AI, but look how they sucked against Ukraine (what a shame!). Or how they failed with Phobos-to-Grunt. Also russian society is very conservative and sciefobic. Especially now. They will never support AGI, trust me!
Write an open letter to Putin. That's the only chance. The letter will do a resonance, maybe it will help. It's better, than nothing. People will hate you. It doesn't matter. There are more important things. Be brave!
Make it short and simple, not as you like. Suggest, it is not the smartest guy will read it.
Your sincere russian fan.