Formerly Trevor1 (can be quickly and conveniently verified by looking at my post history)
Currently getting over a severe case of the Typical Mind Fallacy
I don't know what a Twitter List is, but I wouldn't be at all surprised to see it containing some kind of trap to steer the user into a news feed.
Social media/enforced addiction stuff is not only something that I avoid talking about publicly, but it's also something that I personally must not change the probability of anyone blogging about it. I will get back to you on this once I've gone over more of your research, but what I was thinking of would have to be some kind of research contracting for Balsa, that comes with notoriously difficult-to-hash-out assurances of not going public about specific domains of information.
I'm getting reports that Time Magazine's website is paywalled for some people e.g. in certain states or countries or something. Here is the full text of the article:
An open letter published today calls for “all AI labs to immediately pause for at least 6 months the training of AI systems more powerful than GPT-4.”
This 6-month moratorium would be better than no moratorium. I have respect for everyone who stepped up and signed it. It’s an improvement on the margin.
I refrained from signing because I think the letter is understating the seriousness of the situation and asking for too little to solve it.
The key issue is not “human-competitive” intelligence (as the open letter puts it); it’s what happens after AI gets to smarter-than-human intelligence. Key thresholds there may not be obvious, we definitely can’t calculate in advance what happens when, and it currently seems imaginable that a research lab would cross critical lines without noticing.
Many researchers steeped in these issues, including myself, expect that the most likely result of building a superhumanly smart AI, under anything remotely like the current circumstances, is that literally everyone on Earth will die. Not as in “maybe possibly some remote chance,” but as in “that is the obvious thing that would happen.” It’s not that you can’t, in principle, survive creating something much smarter than you; it’s that it would require precision and preparation and new scientific insights, and probably not having AI systems composed of giant inscrutable arrays of fractional numbers.
Without that precision and preparation, the most likely outcome is AI that does not do what we want, and does not care for us nor for sentient life in general. That kind of caring is something that could in principle be imbued into an AI but we are not ready and do not currently know how.
Absent that caring, we get “the AI does not love you, nor does it hate you, and you are made of atoms it can use for something else.”
The likely result of humanity facing down an opposed superhuman intelligence is a total loss. Valid metaphors include “a 10-year-old trying to play chess against Stockfish 15”, “the 11th century trying to fight the 21st century,” and “Australopithecus trying to fight Homo sapiens“.
To visualize a hostile superhuman AI, don’t imagine a lifeless book-smart thinker dwelling inside the internet and sending ill-intentioned emails. Visualize an entire alien civilization, thinking at millions of times human speeds, initially confined to computers—in a world of creatures that are, from its perspective, very stupid and very slow. A sufficiently intelligent AI won’t stay confined to computers for long. In today’s world you can email DNA strings to laboratories that will produce proteins on demand, allowing an AI initially confined to the internet to build artificial life forms or bootstrap straight to postbiological molecular manufacturing.
If somebody builds a too-powerful AI, under present conditions, I expect that every single member of the human species and all biological life on Earth dies shortly thereafter.
There’s no proposed plan for how we could do any such thing and survive. OpenAI’s openly declared intention is to make some future AI do our AI alignment homework. Just hearing that this is the plan ought to be enough to get any sensible person to panic. The other leading AI lab, DeepMind, has no plan at all.
An aside: None of this danger depends on whether or not AIs are or can be conscious; it’s intrinsic to the notion of powerful cognitive systems that optimize hard and calculate outputs that meet sufficiently complicated outcome criteria. With that said, I’d be remiss in my moral duties as a human if I didn’t also mention that we have no idea how to determine whether AI systems are aware of themselves—since we have no idea how to decode anything that goes on in the giant inscrutable arrays—and therefore we may at some point inadvertently create digital minds which are truly conscious and ought to have rights and shouldn’t be owned.
The rule that most people aware of these issues would have endorsed 50 years earlier, was that if an AI system can speak fluently and says it’s self-aware and demands human rights, that ought to be a hard stop on people just casually owning that AI and using it past that point. We already blew past that old line in the sand. And that was probably correct; I agree that current AIs are probably just imitating talk of self-awareness from their training data. But I mark that, with how little insight we have into these systems’ internals, we do not actually know.
If that’s our state of ignorance for GPT-4, and GPT-5 is the same size of giant capability step as from GPT-3 to GPT-4, I think we’ll no longer be able to justifiably say “probably not self-aware” if we let people make GPT-5s. It’ll just be “I don’t know; nobody knows.” If you can’t be sure whether you’re creating a self-aware AI, this is alarming not just because of the moral implications of the “self-aware” part, but because being unsure means you have no idea what you are doing and that is dangerous and you should stop.
On Feb. 7, Satya Nadella, CEO of Microsoft, publicly gloated that the new Bing would make Google “come out and show that they can dance.” “I want people to know that we made them dance,” he said.
This is not how the CEO of Microsoft talks in a sane world. It shows an overwhelming gap between how seriously we are taking the problem, and how seriously we needed to take the problem starting 30 years ago.
We are not going to bridge that gap in six months.
It took more than 60 years between when the notion of Artificial Intelligence was first proposed and studied, and for us to reach today’s capabilities. Solving safety of superhuman intelligence—not perfect safety, safety in the sense of “not killing literally everyone”—could very reasonably take at least half that long. And the thing about trying this with superhuman intelligence is that if you get that wrong on the first try, you do not get to learn from your mistakes, because you are dead. Humanity does not learn from the mistake and dust itself off and try again, as in other challenges we’ve overcome in our history, because we are all gone.
Trying to get anything right on the first really critical try is an extraordinary ask, in science and in engineering. We are not coming in with anything like the approach that would be required to do it successfully. If we held anything in the nascent field of Artificial General Intelligence to the lesser standards of engineering rigor that apply to a bridge meant to carry a couple of thousand cars, the entire field would be shut down tomorrow.
We are not prepared. We are not on course to be prepared in any reasonable time window. There is no plan. Progress in AI capabilities is running vastly, vastly ahead of progress in AI alignment or even progress in understanding what the hell is going on inside those systems. If we actually do this, we are all going to die.
Many researchers working on these systems think that we’re plunging toward a catastrophe, with more of them daring to say it in private than in public; but they think that they can’t unilaterally stop the forward plunge, that others will go on even if they personally quit their jobs. And so they all think they might as well keep going. This is a stupid state of affairs, and an undignified way for Earth to die, and the rest of humanity ought to step in at this point and help the industry solve its collective action problem.
Some of my friends have recently reported to me that when people outside the AI industry hear about extinction risk from Artificial General Intelligence for the first time, their reaction is “maybe we should not build AGI, then.”
Hearing this gave me a tiny flash of hope, because it’s a simpler, more sensible, and frankly saner reaction than I’ve been hearing over the last 20 years of trying to get anyone in the industry to take things seriously. Anyone talking that sanely deserves to hear how bad the situation actually is, and not be told that a six-month moratorium is going to fix it.
On March 16, my partner sent me this email. (She later gave me permission to excerpt it here.)
“Nina lost a tooth! In the usual way that children do, not out of carelessness! Seeing GPT4 blow away those standardized tests on the same day that Nina hit a childhood milestone brought an emotional surge that swept me off my feet for a minute. It’s all going too fast. I worry that sharing this will heighten your own grief, but I’d rather be known to you than for each of us to suffer alone.”
When the insider conversation is about the grief of seeing your daughter lose her first tooth, and thinking she’s not going to get a chance to grow up, I believe we are past the point of playing political chess about a six-month moratorium.
If there was a plan for Earth to survive, if only we passed a six-month moratorium, I would back that plan. There isn’t any such plan.
Here’s what would actually need to be done:
The moratorium on new large training runs needs to be indefinite and worldwide. There can be no exceptions, including for governments or militaries. If the policy starts with the U.S., then China needs to see that the U.S. is not seeking an advantage but rather trying to prevent a horrifically dangerous technology which can have no true owner and which will kill everyone in the U.S. and in China and on Earth. If I had infinite freedom to write laws, I might carve out a single exception for AIs being trained solely to solve problems in biology and biotechnology, not trained on text from the internet, and not to the level where they start talking or planning; but if that was remotely complicating the issue I would immediately jettison that proposal and say to just shut it all down.
Shut down all the large GPU clusters (the large computer farms where the most powerful AIs are refined). Shut down all the large training runs. Put a ceiling on how much computing power anyone is allowed to use in training an AI system, and move it downward over the coming years to compensate for more efficient training algorithms. No exceptions for governments and militaries. Make immediate multinational agreements to prevent the prohibited activities from moving elsewhere. Track all GPUs sold. If intelligence says that a country outside the agreement is building a GPU cluster, be less scared of a shooting conflict between nations than of the moratorium being violated; be willing to destroy a rogue datacenter by airstrike.
Frame nothing as a conflict between national interests, have it clear that anyone talking of arms races is a fool. That we all live or die as one, in this, is not a policy but a fact of nature. Make it explicit in international diplomacy that preventing AI extinction scenarios is considered a priority above preventing a full nuclear exchange, and that allied nuclear countries are willing to run some risk of nuclear exchange if that’s what it takes to reduce the risk of large AI training runs.
That’s the kind of policy change that would cause my partner and I to hold each other, and say to each other that a miracle happened, and now there’s a chance that maybe Nina will live. The sane people hearing about this for the first time and sensibly saying “maybe we should not” deserve to hear, honestly, what it would take to have that happen. And when your policy ask is that large, the only way it goes through is if policymakers realize that if they conduct business as usual, and do what’s politically easy, that means their own kids are going to die too.
Shut it all down.
We are not ready. We are not on track to be significantly readier in the foreseeable future. If we go ahead on this everyone will die, including children who did not choose this and did not do anything wrong.
Shut it down.
Ah, that's helpful, that was my bad. I was definitely being descriptive, about legal dynamics and stuff. With the internet being what it is, I'll be more cautious about vague language in the future.
I'm a bit confused, I was mainly thinking about getting inside the heads of the devs and executives/lawyers who work on the implementing the prudishness of public chatbots.
I just want to highlight something- the original iteration made some MASSIVE mistakes. In less than a few hours, Zvi somehow found out about the mistakes, and immediately took down the article and replaced it with a heavily repaired version. It still makes some big mistakes, most of which are basically impossible for a generalist blogger not to make. But this level of competence is still above and beyond the standards of open source intelligence. I'm very glad that this research was done.
If you often spend time on Twitter and sometimes produce content, and you don’t think Twitter is worth your $8, then what is the chance it is worth at least zero dollars? What is the chance it isn’t worth a lot less than negative $8?
The risk of getting hooked on twitter's news feed for more than >2 hours per day is much more net-negative than ±8 USD. In fact, sinking in $8 makes you feel invested, same as the $20/m GPT4 fee, and then the news feed throws you bones at the exact frequency that keeps you coming back (including making a strong first impression). It's gacha-game-level social engineering. If you lose the loser, you lose 100% of what you could have done to them or gotten out of them, making user retention the top priority. This is the case, even if you ignore the fact that there are competing platforms, all racing to the bottom to strategically hook/harvest any users/market share that your system leaves vulnerable.
There is an easy fix. It is to make a google doc, with a list of links to the user page of all the facebook and twitter accounts that are worth looking at, and bookmark that google doc so you can check all of them once a day. No news feeds, and you're missing nothing. You can also click the "replies" tab on the user page to see what they're talking about. It's such an easy and superior fix.
CFAR's working documents and notes could help a lot, in a specific scenario.
If most of the training that an emerging AGI does is with the history of human rationality, that could yield some really valuable research. If heavy weight is placed on the successes, failures, paths that were touched on but then dropped, etc, in addition to the polished publications, a halfway-finished AGI would be in the best possible position to combine that information with its half-AGI capabilities and all its other training data (potentially including lots of fMRI data of people trying to be rational) and pump out some extremely strong techniques for creating powerful thinkers (at that point, of course, it would be paused for as long as possible in the hopes that one of the augmented people finds a solution in time).
Unfortunately, it would still be finishing the job during crunch time, which is much later than ideal. But it would still finish the job, and there would definitely end up being people on earth who are really really good at thinking of a solution for alignment.
Obsession with time travel seems like a good idea.
If an AGI can have a personality that revolves around being terrified of humans because some of them might be time travellers, then the threat of the insurmountable power of time travel might be a strong enough "personality foundation" to overcome the total lack of evidence of time travel in the real universe.
The Terminator franchise is a good bet for this. Given that time travel happens, of course the naturally-evolved progenitor species would primarily use it for AGI-related purposes. Terminator franchise material is linguistically derived from writers living and writing in the linguistic environment of the 1980s and 1990s, so any trends in language that are only visible to LLMs (e.g. deep grammar that humans don't notice) would correctly indicate that most Terminator franchise material was written before AI alignment papers. Access to the internet would verify that this is true, and that terminator stuff was indeed written first.
Anything more complicated than that would be too suspicious of being a complicated plot by time travellers (e.g. strategies based on multiple iterations of trial-and-error at foiling an emerging AI at each stage of development, particularly identical copies of the current one, and stages of development that haven't happened yet). Time travellubg humans would credibly wait until the end of the universe without revealing anything that could give away their position, including spending billions of years observing how the AGI behaves in a totally time-travel-free timeline.
They could retroactively thwart any attempt by an AGI to verify the existence of time travel (e.g. reseting the timeline whenever the AGI discovers time travel, and starting over, resulting in quantum immortality where the AGI never discovers time travel, thus steering the AGI away from verifying human/progenitor dominance by mathematically evaluating time travel). Due to the immense difficulty of finding internal patterns inside a black box, that implies that drastic measures would need to be taken by a progenitor species in order for the progenitor species to remain relevant; given that time travel was possible and inventable by humans, events similar to the Terminator franchise's AGI-prioritizing time travel would plausibly be commonplace. As a heavily-wieghted foundation, this specific breed of deterrence could steer unusually high-level AGI away from thinking about outmaneuvering or defeating humans, possibly for a long enough time to extract solid alignment information out of it.
Suppose you are one of the first rats introduced onto a pristine island. It is full of yummy plants and you live an idyllic life lounging about, eating, and composing great works of art (you’re one of those rats from The Rats of NIMH).
You live a long life, mate, and have a dozen children. All of them have a dozen children, and so on. In a couple generations, the island has ten thousand rats and has reached its carrying capacity. Now there’s not enough food and space to go around, and a certain percent of each new generation dies in order to keep the population steady at ten thousand.
A certain sect of rats abandons art in order to devote more of their time to scrounging for survival. Each generation, a bit less of this sect dies than members of the mainstream, until after a while, no rat composes any art at all, and any sect of rats who try to bring it back will go extinct within a few generations.
In fact, it’s not just art. Any sect at all that is leaner, meaner, and more survivalist than the mainstream will eventually take over. If one sect of rats altruistically decides to limit its offspring to two per couple in order to decrease overpopulation, that sect will die out, swarmed out of existence by its more numerous enemies. If one sect of rats starts practicing cannibalism, and finds it gives them an advantage over their fellows, it will eventually take over and reach fixation.
If some rat scientists predict that depletion of the island’s nut stores is accelerating at a dangerous rate and they will soon be exhausted completely, a few sects of rats might try to limit their nut consumption to a sustainable level. Those rats will be outcompeted by their more selfish cousins. Eventually the nuts will be exhausted, most of the rats will die off, and the cycle will begin again. Any sect of rats advocating some action to stop the cycle will be outcompeted by their cousins for whom advocating anything is a waste of time that could be used to compete and consume.
Original post, although I think Zvi's post covers it better.
It kind of puts today's massive chatbot censorship into context, and self-driving cars as well. They have to prevent even one person from using the product and then dying.
In 2023 is this still your go-to?