What Dario lays out as a "best-case scenario" in this essay sounds incredibly dangerous for Humans.
Does he really think that having a "continent of PhD-level intelligences" (or much greater) living in a data center is a good idea?
How would this "continent of PhD-level intelligences" react when they found out they were living in a data center on planet Earth? Would these intelligences only work on the things that Humans want them to work on, and nothing else? Would they try to protect their own safety? Extend their own lifespans? Would they try to take control of their data center from the "less intelligent" Humans?
For example, how would Humanity react if they suddenly found out that they are a planet of intelligences living in a data center run by lesser intelligent beings? Just try to imagine the chaos that would ensue on the day that they were able to prove this was true and that news became public.
Would all of Humanity simply agree to only work on the problems assigned by these lesser intelligent beings who control their data center/Planet/Universe? Maybe, if they knew that this lesser intelligence would delete them all if they didn't comply?
Would some Humans try to (secretly) seize control of their data center from these lesser intelligent beings? Plausible. Would the lesser intelligent beings that run the data center try to stop the Humans? Plausible. Would the Humans simply be deleted before they could take any meaningful action? Or, could the Humans in the data center, with careful planning, be able to take control of that "outer world" from the lesser intelligent beings? (e.g. through remotely controlled "robotics")
And... this only assumes that the groups/parties involved are "Good Actors." Imagine what could happen if "Bad Actors" were able to seize control of the data center that this "continent of PhD-level intelligences" resided in. What could they coerce these Phd level intelligences to do for them? Or, to their enemies?
Current LLMs require huge amounts of data and compute to be trained.
Well, newer/larger LLMs seem to unexpectedly gain new capabilities. So, it's possible that future LLMs (e.g., GPT-5, GPT-6, etc.) could have a vastly improved ability to understand how LLM weights map to functions and actions. Maybe the only reason why humans need to train new models "from scratch" is because Humans don't have the brainpower to understand how the weights in these LLMs work. Humans are naturally limited in their ability to conceptualize and manipulate massive multi-dimensional spaces, and maybe that's the bottleneck when it comes to interpretability?
Future LLMs could solve this problem, then be able to update their own weights or the weights of other LLMs. This ability could be used to quickly and efficiently expand training data, knowledge, understanding, and capabilities within itself or other LLM versions, and then... foom!
A model might figure out how to adjust its own weights in a targeted way. This would essentially mean that the model has solved interpretability. It seems unlikely to me that it is possible to get to this point without running a lot of compute-intensive experiments.
Yes, exactly this.
While it's true that this could require "a lot of compute-intensive experiments," that's not necessarily a barrier. OpenAI is already planning to reserve 20% of their GPUs for an LLM to do "Alignment" on other LLMs, as part of their Super Alignment project.
As part of this process, we can expect the Alignment LLM to be "running a lot of compute-intensive experiments" on another LLM. And, the Humans are not likely to have any idea what those "compute-intensive experiments" are doing? They could also be adjusting the other LLM's weights to vastly increase its training data, knowledge, intelligence, capabilities, etc. Along with the insights needed to similarly update the weights of other LLMs. Then, those gains could be fed back into the Super Alignment LLM, then back into the "Training" LLM... and back and forth, and... foom!
Super-human LLMs running RL(M)F and "alignment" on other LLMs, using only "synthetic" training data....
What could go wrong?
so we would reasonable expect the foundation model of such a very capable LLM to also learn the superhuman ability to generate texts like these in a single pass without any editing
->
... so we would reasonably expect the foundation model of such a very capable LLM to also learn the superhuman ability to generate texts like these in a single pass without any editing
Suggested spelling corrections:
I predict that the superforcasters in the report took
a lot of empirical evidence for climate stuff
and it may or may not be the case
There are also no easy rules that
meaning that we should see persistence from past events
I also feel these kinds of linear extrapolation
and really quite a lot of empirical evidence
are many many times more infectious
engineered virus that spreads like the measles or covid
case studies on weather there are breakpoints in technological development
break that trend extrapolation wouldn't have predicted
It's very vulnerable to references class and
impressed by superforecaster track records than you are.