Most Questionable Details in 'AI 2027'

[-]gwern7mo4211

I don't really understand how a local copy of the weights gives the terrorists more practical control over the software's alignment. I don't think it's easy to manually tweak weights for so specific a purpose. Maybe they just mean the API is doing a good job of blocking sketchy requests?

You can finetune models for any specific purpose: just provide a few datapoints and train. The more specific the purpose, the easier tweaking the weights is, not harder. (Surely, if nothing else, you've seen all of the LoRAs and other things for finetuning image generation model to generate a specific character?) There is an extensive literature at this point on how it is trivial to strip away all of the friendly chatbot persona from released checkpoints, such as LLaMA, if you are able to access and modify the model slow weights and fast weights directly.

[-]Commander Zander7mo20

Thank you for the info!

[-]Brendan Long7mo132

We haven't really established why OpenBrain's market dominance is inevitable.

I think they gave OpenBrain a generic name to indicate that they don't know which company this would be, so I think it's tautologically defined that OpenBrain is dominant because the dominant company is the one we're looking at.

[-]Carl Feynman7mo82

Nitpick: No single organism can destroy the biosphere; at most it can fill its niche & severely disrupt all ecosystems.

Have you read the report on mirror life that came out a few months ago? A mirror bacterium has a niche of “inside any organism that uses carbon-based biochemistry”. At least, it would parasitize all animals, plants, fungi, and the larger Protozoa, and probably kill them. I guess bacteria and viruses would be left. I bet that a reasonably smart superintelligence could figure out a way to get them too.

[-]Daniel Kokotajlo2mo52

Continuous progress beyond humans into superhuman abilities reminds me of Kurzweil's superlinear graph through agriculture, industry, computing, & his singularity. Not impossible, but it's a curve without much of a mechanism to explain it.

Kurzweil's predictions are looking pretty good compared to what most contemporaries of his thought, right? I say this as someone who is generally not a fan of his.

[-]Daniel Kokotajlo2mo53

In real life it's well-established that software can match humans at many tasks, & also can often perform tasks much faster.

Neural networks also frequently surpass humans qualitatively as well, right? Like, AlphaGo wasn't just a faster version of Lee Sedol.

Also, note that scaling up RL training has already resulted in drift towards alien incomprehensible language in the CoT AIs use to think. It's still comprehensible / English, for now, but it's notably less so than it was a year ago.

[-]Daniel Kokotajlo2mo53

Like many parts of this scenario, i think this scene is fairly plausible but i would place it decades later.

Why decades and not years or centuries?

[-]Daniel Kokotajlo2mo42

This is another hypothetical superhuman ability. And if OpenBrain influences the public extremely rapidly, surely that speed will be evidence for tracing this back to OpenBrain?

Such influence might not be traced in practice. But also, it might be traced and still happen. Corporations do lobbying all the time, and succeed in getting their way often as a result, and often it is in fact traced back to them, and yet... it still happens.

[-]Daniel Kokotajlo2mo31

I don't really understand how a local copy of the weights gives the terrorists more practical control over the software's alignment. I don't think it's easy to manually tweak weights for so specific a purpose. Maybe they just mean the API is doing a good job of blocking sketchy requests?

If you have the weights, you can do fine-tuning, activation vector steering, stuff like that. It's still sometimes difficult to get it to do what you want, but it's a lot easier than if you don't have the weights. At least so say I, on the basis of things I heard, I haven't got experience doing this myself.

[-]Daniel Kokotajlo2mo30

We must imagine this happens in August or later, since it hasn't yet been a month since -mini was released. And 10% is a lot for such a rapid cultural shift.

Yeah fair.

[-]Daniel Kokotajlo2mo30

sounds so expensive to run that there are only a few servers it could copy itself to. Surely those admins would notice? But this doesn't affect the scenario much.

Depends on what you mean by "a few." It couldn't run on personal computers, that's for sure. But OpenBrain has enough compute to run something like a million copies in parallel IIRC (the num_copies displayed in the side widget is just the number they are actually using for research, not the number they COULD run if they used all their compute for that purpose) and OpenBrain has only like 15% of the world's AI-relevant compute, meaning that in order to be able to fit a copy of Agent-2, a server needs to have something like one ten-millionth of the world's AI chips on it (e.g. H100's or similar). I don't know how many such servers exist but the answer is probably "a lot." Including probably at least a few hobby project, academic clusters, etc.

So probably Agent-2 could either (a) try to hack all of them, knowing that it would get caught some of the time but not get caught other times, or (b) try to hack the least-secure 0.1% of them, hoping to never get caught.

[-]StanislavKrym2mo30

There also is the Rogue Replication Scenario where copies of Agent-2 hack not just 0.1% of servers, but semi-legitimately gain access to loads of servers and GPU accelerators in the wilderness. For example, the AI could produce ransomware and use the revenue to rent the compute instead of resorting to security breaches.

Caveats

I think it's great that the AI Futures Project wrote up a detailed scenario.

I enjoy it.

Every part of the story i didn't comment on here is either fine or excellent.

This is one of the most realistic scenarios i've read.

All detailed predictions contain errors.

The authors of this scenario don't claim it's the most likely future.

If the speed of 2018-2025 was the typical rate of progress in software, then AI 2027 would be realistic.

Core Disagreements

Early 2026: OpenBrain is making algorithmic progress 50% faster.

As with many parts of this scenario, i think this is plausible but too fast. 150% productivity is a lot in business terms, & the scenario doesn't provide much detail for why this is 150% as opposed to 110%. In my software development experience, organizations are bottlenecked by their organizational efficiency (relationships & process) rather than by their software tools or by general employee attentiveness. Humans can already write bugfixes & features quickly. The trick is knowing which bugfixes & features to write. So OpenBrain being so productive is possible but it relies upon a additional lucky cause - OpenBrain's employees must have done a best-in-industry job of rapidly adapting to a process & release schedule that harmoniously leverages the software agents. Note this is even harder while rapidly hiring.

February 2027: OpenBrain wants to maintain a good relationship with the executive branch, because it is basically the only actor that can stop them now.

We haven't really established why OpenBrain's market dominance is inevitable. In business it's common for smaller competitors & stealth startups to have a decent chance of growing to rival the market leader. Contrast this with the later half of the story, when the software is so good at research that it doesn't rely on the performance of the human staff.

June 2027: Most of the humans at OpenBrain can’t usefully contribute anymore.

Months ago, humans were running software that gives very useful output. Now we have software that not only does a great job at all workplace tasks of a employee, it actually rivals a small research org by itself. This is a very ambitious timeline for such a new technology & would require the previous software giving at least 300% productivity, not 150%. And note that data for training software to imitate humans is much more abundant than data for imitating whole organizations. Like many parts of this scenario, i think this scene is fairly plausible but i would place it decades later.

September 2027: Agent-4’s neuralese “language” becomes as alien and incomprehensible to Agent-3 as Agent-3’s is to humans. Besides, Agent-4 is now much more capable than Agent-3 and has a good sense of exactly how to look good to it.

This posits 2 superhuman abilities. A general-purpose language & a self-marketing ability, both of which are better than human versions. In real life it's well-established that software can match humans at many tasks, & also can often perform tasks much faster. However, devising languages & self-presentation much better than the human versions is a more hypothetical idea that might not happen in 1,000 years. Most of real life's recent progress with LLMs is based on training software to imitate human experts. There are no data on superhumans.
Continuous progress beyond humans into superhuman abilities reminds me of Kurzweil's superlinear graph through agriculture, industry, computing, & his singularity. Not impossible, but it's a curve without much of a mechanism to explain it.

Shutdown version of April 2028: It’s vastly smarter than the top humans in every domain.

As above, i consider 'slightly smarter' likely & 'vastly smarter' highly speculative. But 'slightly smarter & far far faster' seems quite likely, & overlaps with 'vastly smarter' at certain search & design tasks.
Also, i'd personally replace 'every domain' with '95% of domains', but that doesn't affect things much.

Shutdown version of July 2028: Safer-4 answers that its own alignment training succeeded.

This sounds a little too simple to be true. Have any 2 organisms ever been perfectly aligned? Can Safer-4 be sure that it is aligned? And indeed, we can't really read this line literally. Safer-4 is currently puppeting America's diplomatic mission. It sounds like a parent playing a game with a young child: 'Today you're in charge of the grown-ups. Anything you say, i'll do.' But realistically, the parent will tweak any commands it thinks are foolish or dangerous.

Minor Details

April 2026

I'm dubious 'AIs' will ever be effective. Intelligent software in general is obviously effective; architecting it like a human employee is dodgier. A human coworker is not the optimal computational system nor the optimal UX. Rather, i want to send a complex task to the OpenBrain server as a batch process & have it message me hours later when it has output. This process may well involve similar internals to LLMs - but i'm unlikely to personify it, nor to keep it running after it gives satisfactory output, nor to call it a short-lived AI. While some users do like to have verbal conversations with chatbots that have names & use the 'i' pronoun, the idea of attaching this interface to its own database of memories & own always-running execution environment is purely a imitation of humans, rather than a legitimately useful server architecture. It would be wasteful to store everything the software learns in just 1 AI-person, rather than making that accessible to all of OpenBrain's future batch processes. 'Teams of agents' are technically feasible but not good business. Ultimately this doesn't affect the scenario much, but keep in mind that just because software runs for a few hours & does a great job doesn't mean that it resembles a human!
This 'AIs' paradigm also fails to capture the complexities of the end of the story, when a highly unified always-on system is referred to as a collective named Agent-4.

January 2027: Agent-2 is able to escape, survive, & replicate autonomously

Agent-2 sounds so expensive to run that there are only a few servers it could copy itself to. Surely those admins would notice? But this doesn't affect the scenario much.

July 2027: Fortunately, it’s extremely robust to jailbreaks, so while the AI is running on OpenBrain’s servers, terrorists won’t be able to get much use out of it.

I don't really understand how a local copy of the weights gives the terrorists more practical control over the software's alignment. I don't think it's easy to manually tweak weights for so specific a purpose. Maybe they just mean the API is doing a good job of blocking sketchy requests?

Late July 2027: Gamers get amazing dialogue with lifelike characters in polished video games that took only a month to make. 10% of Americans, mostly young people, consider an AI “a close friend.”

We must imagine this happens in August or later, since it hasn't yet been a month since -mini was released. And 10% is a lot for such a rapid cultural shift.

October 2027: The White House sets up a Oversight Committee.

I've never seen the White House spot a problem, devise a plan, gather 10 people, & complete a committee vote in 1 month. Perhaps we can imagine this is more like the President committing to privately consult with 9 stakeholders & act based on the majority recommendation.

Shutdown version of November 2027: The President uses the Defense Production Act (DPA) to effectively shut down the AGI projects of the top 5 trailing US AI companies and sell most of their compute to OpenBrain

Nationalization is not common in modern US history, & many of these people might reasonably expect a diversity of labs to create a larger chance of breakthroughs in the arms race. But of course, some of these stakeholders want nationalization as a means to decrease productivity, because they fear the software. And we might imagine the software is also manipulating some of them to give it more compute. So i find this scene bizarre but explicable.

Shutdown version of November 2027: Thus far nobody has been willing to pull the trigger.

The big trigger, sure. (Seizing individual control of OpenBrain.) But we must imagine the Committee members still used OpenBrain servers for all sorts of personal gain. Spinning off their own startups in asteroid mining, chip manufacturing, etc. This story assumes that OpenBrain made good management decisions & its growth surpassed all of these superintelligence-founded rivals.

Shutdown version of December 2027: The US tightens export controls, and forces all chips to be fit with a variety of tamper-resistant hardware-enabled governance mechanisms including location tracking.

This is reasonable, but surely overhauling all this hardware will take time, & slow growth?

Shutdown version of February 2028:

A mirror life organism which would probably destroy the biosphere.
- Nitpick: No single organism can destroy the biosphere; at most it can fill its niche & severely disrupt all ecosystems.
If given nation-state resources, it could easily surpass the best human organizations (e.g. the CIA) at mass influence campaigns. Such campaigns would be substantially cheaper, faster, more effective, and less traceable.
- This is another hypothetical superhuman ability. And if OpenBrain influences the public extremely rapidly, surely that speed will be evidence for tracing this back to OpenBrain?

The President asks Safer-3 advice on geopolitical questions.
- I think this assumes a president who is surprisingly credulous about the all-purpose usefulness of advanced software. (I would be credulous of it but i think Washington types would not.) I speculate that presidents are generally too busy to change their productivity habits mid-term. Perhaps we can imagine the president is consulting a advisor, & the advisor is leaning on Safer-3.

Shutdown version of June 2028:

I think this scene is great & i just want to say it's very funny to imagine the 2 AIs brainstorming what fake test results to 'reveal' to the humans.

Arms Race version of 2030: The AI releases a dozen quiet-spreading biological weapons in major cities.

While i find biowar worryingly plausible, note that this was not the only option. Consensus-1 could probably have destroyed all the nuclear missile submarines, talked people into evacuating all the most valuable land, & then ignored the now-harmless humans.

Overall

In my opinion, this story is realistic except for a few big 'fast-forward' moments. It's a story of businesses succeeding incredibly fast.

Most of AI 2027's scenes are sober & realistic extrapolations of the previous scene. Some are so realistic they make me laugh! But keep in mind that sci-fi technology is inserted into this story at 3 or 4 places. This does of course happen in real history! But you must be carefully skeptical of each of these insertions.

LESSWRONG
LW