Thanks for sharing your point of view. I tried to give myself a few days, but I'm aftraid I still don't understand where you see the magic barrier for the transition from 3 to 4 to happen outside of the realm of human control.
are you thinking about sub-human-level of AGIs? the standard definition of AGI involves it being it better than most humans in most of the tasks humans can do
the first human hackers were not trained on "take over my data center" either, but humans can behave out of distribution and so will the AGI that is better than humans at behaving out of distribution
the argument about AIs that generalize to many tasks but are not "actually dangerous yet" is about speeding up creation of the actually dangerous AGIs, and it's the speeding up that is dangerous, not that AI Safety researchers believe that those "weak AGIs" created from large LLMs would actually be capable of killing everyone immediatelly on their own
if you believe "weak AGIs" won't speed creation of "dangerous AGIs", can you spell out why, please?
first-hand idea of what kinds of things even produce progress
I'd rather share second-hand ideas about how progress looks like based on a write up from someone with deep knowledge from multiple research directions than to spend next 5 years forming my own idiosyncratic first-hand empathic intuitions.
It's not like Agent Foundations are 3cm / 5db / 7 dimensions of more progress than Circuits, but if there is no standardized quantity of progress, then why ought we believe that making 1000 different tools by 1000 people now is worse than those people doing research first before attempting to help with non-research skills?
if everyone followed the argmax approach I laid out here. Are there any ways they might do something you think is predictably wrong?
While teamwork seems to be assumed in the article, I believe it's worth spelling out explicitly that argmaxing for a plan with highest marginal impact might mean joining and/or building a team where the team effort will make the most impact, not optimizing for highest individual contribution.
Spending time to explain why a previous research failed might help 100 other groups to learn from our mistake, so it could be more impactful than pursuing the next shiny idea.
We don't want to optimize for the naive feeling of individual marginal impact, we want to keep in mind the actual goal is to make an Aligned AGI.
I agree with the explicitly presented evidence and reasoning steps, but one implied prior/assumption seems to me so obscenely wrong (compared to my understanding about social reality) that I have to explain myself before making a recommendation. The following statement:
“stacking” means something like, quadrupling the size of your team of highly skilled alignment researchers lets you finish the job in ~1/4 of the time
implies a possibility that approximately neg-linear correlation between number of people and time could exist (in multidisciplinary software project management in particular and/or in general for most collective human endeavors). The model of Nate that I have in my mind believes that reasonable readers ought to believe that:
I am making a claim about the social norm that it's socially OK to assume other people can believe in linear scalability, not a belief whether other people actually believe that 4x the people will actually finish in 1/4 time by default.
Individually, we are well calibrated to throw a TypeError at the cliche counterexamples to the linear scalability assumption like "a pregnant woman delivers one baby in 9 months, how many ...".
And professional managers tend to have an accurate model of applicability of this assumption, individually they all know how to create the kind of work environment that may open the possibility for time improvements (blindly quadrupling the size of your team can bring the project to a halt or even reverse the original objective, more usually it will increase the expected time because you need to lower other risks, and you have to work very hard for a hope of 50% decrease in time - they are paid to believe in the correct model of scalability, even in cases when they are incentivized to say more optimistic professions of belief in public).
Let's say 1000 people can build a nuclear power plant within some time unit. Literally no one will believe that one person will build it a thousand times slower or that a million people will build it a thousand times faster.
I think it should not be socially acceptable to say things that imply that other people can assume that others might believe in linear scalability for unprecedented large complex software projects. No one should believe that only one person can build Aligned AGI or that a million people can build it thousand times faster than a 1000 people. Einstein and Newton were not working "together", even if one needed the other to make any progress whatsoever - the nonlinearity of "solving gravity" is so qualitatively obvious, no one would even think about it in terms of doubling team size or halving time. That should be the default, a TypeError law of scalability.
If there is no linear scalability by default, Alignment is not an exception to other scalability laws. Building unaligned AGI, designing faster GPUs, physically constructing server farms, or building web apps ... none of those are linearly scalable, it's always hard management work to make a collective human task faster when adding people to a project.
Why is this a crux for me? I believe the incorrect assumption leads to rationally-wrong emotions in situations like these:
Also, I've tried a few different ways of getting researchers to "stack" (i.e., of getting multiple people capable of leading research, all leading research in the same direction, in a way that significantly shortens the amount of serial time required), and have failed at this.
Let me talk to you (the centeroid of my models of various AI researchers, but not any one person in particular). You are a good AI researcher and statistically speaking, you should not expect you to also be an equally good project manager. You understand maths and statistically speaking, you should not expect you to also be equally good at social skills needed to coordinate groups of people. Failling at a lot of initial attempts to coordinate teams should be the default expectation - not one or two attempts and then you will nail it. You should expect to fail more ofthen than the people who are getting the best money in the world for aligning groups of people towards a common goal. If those people who made themselves successful in management initially failed 10 times before they became billionaires, you should expect to fail more times than that.
You can either dilute your time by learning both technical and social / management skills or you can find other experts to help you and delegate the coordination task. You cannot solve Alignment alone, you cannot solve Alignment without learning, and you cannot learn more than one skill at a time.
The surviving worlds look like 1000 independent alignment ideas, each pursued by 100 different small teams. Some of the teams figured out how to share knowledge between some of the other teams and connect one or two ideas and merge teams iff they figure out explicit steps how to shorten time by merging teams.
We don't need to "stack", we need to increase the odds of a positive black swan.
Yudkowsky, Christiano, and the person who has the skills to start figuring out the missing piece to unify their ideas are at least 10,000 different people.
Building a tunnel from 2 sides is the same thing even if those 2 sides don't see each other initially. I believe some, but not all, approaches will end up seeing each other, that it's not a bad sign if we are not there yet.
Since we don't seem to have time to build 2 "tunnels" (independent solutions to alignment), a bad sign would be if we could prove all of the approaches are incompatible with each other, which I hope is not the case.
Staying in meta-level, if AGI weren't going to be created "by the ML field", would you still believe problems on your list cannot possibly be solved within 6-ish months if companies would throw $1b at each of those problems?
Even if competing groups of humans augmented by AI capabilities existing "soon" were trying to solve those problems with combined tools from inside and outside ML field, the foreseeable optimization pressure is not enough for those foreseeable collective agents to solve those known-known and known-unknown problems that you can imagine?
No idea about original reasons, but I can imagine a projected chain of reasoning:
To be continued in the form of a science fiction story Unnatural Abstractions.
yes, it takes millions to advance, but companies are pouring BILLIONS into this and number 3 can earn its own money and create its own companies/DAOs/some new networks of cooperation if it wanted without humans realizing ... have you seen any GDP per year charts whatsoever, why would you think we are anywhere close to saturation of money? have you seen any emergent capabilities from LLMs in the last year, why do you think we are anywhere close to saturation of capabilities per million of dollars? Alpaca-like improvemnts are somehow one-off miracle and things are not getting cheaper and better and more efficient in the future somehow?
it could totally happen, but what I don't see is why are you so sure it will happen by default, are you extrapolating some trend from non-public data or just overly optimistic that 1+1 from previous trends is less than 2 in the future, totally unlike the compount effects in AI advancement in the last year?