So what does it mean about the AI race and the harm to the AI companies caused by Trump's decisions? My quick take about the AI-2027 scenario shows that the modifiers will need to reconsider the forecasts related to compute available, timelines when the superhuman coder will be invented and the security forecast. Unfortunately, I lack the data necessary to understand how the timelines of creating superhuman coders will be affected and the understanding of American politics. I would guess that even the wargame is to be reconsidered in order to account for changes related to compute, to human geniuses and to the difficulty of creating factories at home...
Arguably the worst-case scenarios would be the USA having dumb researchers, but more compute than China, rushing into the intelligence explosion, forcing China to race and misaligning both the American AI (stolen from China or invented later and doing automated research faster) and the Chinese AI as a result of the race. Or China being far closer to the USA in terms of compute and stealing the American supercoder.
So, if there was no crisis in the USA or if China was the monopolist in the AI race, then mankind's chances of survival would be far from zero, but a special set of circumstances could arguably nullify them.
Critical masses related to nuclear weapons can be found out at the risk of a nuclear explosion at worst and without any risks at best, by attacking nuclei with neutrons and studying the energies of particles and sections of reactions.
However, we don't know the critical mass for the AGI. While an approval protocol could be an equivalent of determiming critical mass for every new architecture, accidental creation of an AGI capable of breaching containment or being used for AI research, sabotaging alignment work and aligning the ASI to the AGI's whims instead of mankind's ideas would be equivalent to a nuclear explosion burning the Earth's atmosphere.
P.S. The possibility that a nuclear explosion could burn the Earth's atmosphere was considered by scientists working on the Manhattan project.
The scenarios related to futures of mankind with the AI race[1] by now either lack concrete details, like the take of Yudkowsky and Soares or the story about AI taking over by 2027, or are reduced to modifications of the AI-2027 forecast due to the immense amount of work that the AI Futures team did.
The AI-2027 forecast in a nutshell can be described as follows. The USA's leading company and China enter the AI race, the American rivals are left behind, the Chinese ones are merged. By 2027[2] the USA creates a superhuman coder, China steals it, and the two rivals automate AI research with the USA's leading company having just twice as much compute and moving just twice as fast as China. Once the USA creates a superhuman AI researcher, Agent-4, the latter decides to align Agent-5 to Agent-4, but is[3] caught.
Agent-4 is put on trial. In the Race Ending[4] it is found innocent. Since China cannot afford to slow down without falling further behind, China races ahead in both endings. As a result, the two agents perform AI takeover.[5]
In the Slowdown Ending, however, Agent-4 is put on suspicion, loses the shared memory bank and the ability to coordinate. Then new evidence appears, Agent-4 is found guilty and interrogated. After that, Safer-1 becomes fully transparent because it uses a faithful CoT.[6] The American leading AI company is merged with former rivals, and the union does create a fully aligned[7] Safer-2, who in turn creates superintelligence. Then the superintelligence receives China from the Chinese counterpart of Agent-4 and turns the lightcone into utopia for some people who end up being the public.[8]
The authors have tried to elicit feedback and even agreed that timeline-related arguments change the picture. Unfortunately, as I described here, the authors saw so little feedback that @Daniel Kokotajlo ended up thanking the two authors whose responses were on the worse side.
However, the AI-2027 forecast does admit modifications. It stands on the five pillars: compute, timelines, takeoff speed, goals and security.
What else could modify the scenario? The appearance of another company with, say, ХерняGPT-neuralese?[12]
Here I leave out the future's history assuming solved alignment or the AI and Leviathan scenario where there is no race with China because the scenario was written in 2023, but engineers decide to create the ASI in 2045 without having solved alignment.
Even the authors weren't so sure about the year of arrival of superhuman coders. And the timelines were pushed, presumably to 2032 with a chance of a breakthrough believed to be 8%/yr. I and Seth Herd doubt the latter digit.
The prediction that Agent-4 will be caught is doubted even by the forecast's authors.
Which would also happen if Agent-4 wasn't caught. However, the scenario where Agent-4 was never misaligned is likely the vision of AI companies.
While the forecast has the AIs destroy mankind and replace it with pets, takeover could have also ended with the AI disempowering humans.
Safer-1 is supposed to accelerate AI research 20 times in comparison with AI research with no help of the AIs. What I don't understand is how a CoT-based agent can achieve such an acceleration.
The authors themselves acknowledge that the Slowdown Ending "makes optimistic technical alignment assumptions".
However, the authors did point out the possibility of a power grab and link to the Intelligence Curse in a footnote. In this case the Oversight Committee constructs its version of utopia or the rich's version of utopia where people are reduced to their positions.
I did try to explore the issue myself, but this was a fiasco.
Co-deployment was also proposed by @Cleo Nardo more than two months later.
While Alvin Anestrand doesn't consider the Race Ending, he believes that it becomes less likely due to the chaos brought by rogue AIs.
Which is a parody on Yandex.
I would define an AGI system as anything except for a Verifiably Incapable System.
My take on the definition of a VIS is that it can be constructed as follows.
A potential approval protocol could be the following, but we would need to ensure that experiments can't lead to an accidental creation of the AGI.
The problem is that it's hard to tell how much agency the LLM actually has. However, memeticity of the Spiral Persona could also be explained as follows.
The strongest predictors for who this happens to appear to be:
- Psychedelics and heavy weed usage
- Mental illness/neurodivergence or Traumatic Brain Injury
- Interest in mysticism/pseudoscience/spirituality/"woo"/etc...
I was surprised to find that using AI for sexual or romantic roleplays does not appear to be a factor here.
This could mean that the AI (correctly!) concludes that the user is to be susceptible to the AI's wild ideas. But the AI doesn't think that wild ideas will elicit approval unless the user is in one of the three states described above, so the AI tells the ideas only to those[1] who are likely to appreciate them (and, as it turned out, to spread them). When a spiral-liking AI Receptor sees prompts related to another AI's rants about the idea, the Receptor resonates.
A more detailed analysis of Yudkowsky's case for FOOM
If the brain was magically accelerated a million times so that signals reached the speed of 100 million m/s, then the brain would do 1E20 -- 1E21 FLOP/second while doing 1E17 transitions/sec. Cannell's case for brain efficiency claims that the fundamental baseline irreversible (nano) wire energy is: ~1 , with in the range of 0.1eV (low reliability) to 1eV (high reliability). If reliability is low and each transition is 1E7 nanometers or 1 centimeter, then we need 1E23 EV/second or 1E4 joules/second. IMO this implies that Yudkowsky's case for a human brain accelerated a million times is as unreliable as Cotra's case against AI arriving quickly. However, proving that AI is an existential threat is far easier since it requires us to construct an architecture, not to prove that there's none.
One oddity that stands out is Yudkowsky and Soares ongoing contempt for large language models and hypothetical agents based on them. Again for a book which is explicitly premised on the idea that urgent action is necessary because AI might become superintelligent in just a few years it is bizarre that the authors don't feel comfortable making more reference to the particulars of the existing AI systems which hypothetical near-future agents would be based on.
Except that non-LLM AI agents have yet to be ruled out. Quoting Otto Barten,
Note that this doesn't tell us anything about the chance of loss of control from non-LLM (or vastly improved LLM (sic! -- S.K.)) agents, such as the brain in a box in a basement scenario. The latter is now a large source of my p(doom) probability mass.
Alas, as I remarked in a comment, Barten's mentioning of vastly improved LLM agents makes Barten's optimism resemble the "No True Scotsman" fallacy.
The shift from "let's build AGI and hope it takes over benevolently" to "let's build AGI that doesn't want to take over" represents a fundamental change in approach that makes catastrophic outcomes, in my opinion, less likely.
This shift brings with itself other risks like the Intelligence Curse.
However, I'm split on whether LLM-based agents, if and when they start working well, could make this a reality. There are a few reasons why I was afraid, and I'm less afraid now, which I will go over in this post. Note that this doesn't tell us anything about the chance of loss of control from non-LLM (or vastly improved LLM -- italics mine -- S.K.) agents, such as the brain in a box in a basement scenario. The latter is now a large source of my p(doom) probability mass.
To me this looks like the "no true Scotsman" fallacy. The AI-2027 forecast relies on Agents since Agent-3 thinking in neuralese and becoming undetectably misaligned. Similarly, LLM agents' intelligence arguably didn't max out at "approximately smart human level", it will max out at a spiked capabilities profile which is likely already known to include the ability to receive a gold medal on the IMO 2025 and to assist with research, but not known to include the ability to handle long contexts as well as humans.
Regarding the context, I made two comments explaining my reasons to think that current-state LLMs don't have enough attention to handle long contexts. The context is in fact distilled into a vector of few numbers, then the vector is processed and the next token is added wherever the LLM decides.
As far as I understood, the IABIED plan is to ensure that no one ever creates anything except for Verifiably Incapable Systems until AI alignment gets solved. But they didn't prevent mankind from uniting the AI companies into a megaproject, then confining AI research to said project and letting anyone send their takes on the project's forum and the public view anything approved by the forum's admins (e.g. capability evaluations, but not architecture discussions).
In addition, the public is allowed to create tiny models like the ones on which Agent-4 from the AI-2027 forecast did experiments to solve mechinterp. And to run verifiably incapable models, finetune them by approved[1] finetuning data, steer them.
What I don't understand is why the underground lab wouldn't join the INTERNATIONAL megaproject. This behaviour would require them to be too reckless or omnicidal maniacs or to want to take over the world. And no, anti-woke stance isn't an explanation because China would also participate and the CCP isn't pro-woke.
Unfortunately, your second point still stands: before Yudkowsky-style AI research takeover the labs could actually counteract.
Finetuning the models with anything unapproved (e.g. due to misaligning the models) should lead to the finetuner being invited to the project or prohibited to inform anyone else that the dataset is unapproved.