AI alignment researcher, ML engineer. Masters in Neuroscience.
I believe that cheap and broadly competent AGI is attainable and will be built soon. This leads me to have timelines of around 2024-2027. Here's an interview I gave recently about my current research agenda. I think the best path forward to alignment is through safe, contained testing on models designed from the ground up for alignability trained on censored data (simulations with no mention of humans or computer technology). I think that current ML mainstream technology is close to a threshold of competence beyond which it will be capable of recursive self-improvement, and I think that this automated process will mine neuroscience for insights, and quickly become far more effective and efficient. I think it would be quite bad for humanity if this happened in an uncontrolled, uncensored, un-sandboxed situation. So I am trying to warn the world about this possibility.
See my prediction markets here:
I also think that current AI models pose misuse risks, which may continue to get worse as models get more capable, and that this could potentially result in catastrophic suffering if we fail to regulate this.
I now work for SecureBio on AI-Evals.
relevant quotes:
"There is a powerful effect to making a goal into someone’s full-time job: it becomes their identity. Safety engineering became its own subdiscipline, and these engineers saw it as their professional duty to reduce injury rates. They bristled at the suggestion that accidents were largely unavoidable, coming to suspect the opposite: that almost all accidents were avoidable, given the right tools, environment, and training." https://www.lesswrong.com/posts/DQKgYhEYP86PLW7tZ/how-factories-were-made-safe
"The prospect for the human race is sombre beyond all precedent. Mankind are faced with a clear-cut alternative: either we shall all perish, or we shall have to acquire some slight degree of common sense. A great deal of new political thinking will be necessary if utter disaster is to be averted." - Bertrand Russel, The Bomb and Civilization 1945.08.18
"For progress, there is no cure. Any attempt to find automatically safe channels for the present explosive variety of progress must lead to frustration. The only safety possible is relative, and it lies in an intelligent exercise of day-to-day judgment." - John von Neumann
"I believe that the creation of greater than human intelligence will occur during the next thirty years. (Charles Platt has pointed out the AI enthusiasts have been making claims like this for the last thirty years. Just so I'm not guilty of a relative-time ambiguity, let me more specific: I'll be surprised if this event occurs before 2005 or after 2030.)" - Vernor Vinge, Singularity
I've wondered about adding a microphone to the inside and speaker to the outside. Such tech is pretty cheap these days, so it wouldn't add much overhead to a reusable respirator.
I find them quite useful despite being buggy. I spend about 40% of my time debugging model code, 50% writing my own code, and 10% prompting. Having a planning discussion first with s3.6, and asking it to write code only after 5 or more exchanges works a lot better.
Also helpful is asking for lots of unit tests along the way yo confirm things are working as you expect.
Sadly, no. It doesn't take superintelligence to be deadly. Even current open-weight LLMs, like Llama 3 70B, know quite a lot about genetic engineering. The combination of a clever and malicious human, and an LLM able to offer help and advice is sufficient.
Furthermore, there is the consideration of "seed AI" which is competent enough to improve and not plateau. If you have a competent human helping it and getting it unstuck, then the bar is even lower. My prediction is that the bar for "seed AI" is lower than the bar for AGI.
I agree that some amount of control is possible.
But if we are in a scenario in the future where the offense-defense balance of bioweapons remains similar to how it is today, then a single dose of pseudoephedrine going unregulated by the government and getting turned into methamphetamine could result in the majority of humanity being wiped out.
Pseudoephedrine is regulated, yes, but not so strongly that literally none slips past the enforcement. With the stakes so high, a mostly effective enforcement scheme doesn't cut it.
Deep mind etc is already heavily across biology from what I gather from interview with Demis. If the knowledge was there already there's a good chance they would have found it
I've heard this viewpoint expressed before, and find it extremely confusing. I've been studying neuroscience and it's implications for AI for twenty years now. I've read thousands of papers, including most of what DeepMind has produced. There's still so many untested ideas because biology and the brain are so complex. Also because people tend to flock to popular paradigms, rehashing old ideas rather than testing new ones.
I'm not saying I know where the good ideas are, just that I perceive the explored portions of the Pareto frontier of plausible experiments to be extremely ragged. The are tons of places covered by "Fog of War" where good ideas could be hiding.
DeepMind is a tiny fraction of the scientists in the world that have been working on understanding and emulating the brain. Not all the scientists in the world have managed to test all the reasonable ideas, much less DeepMind alone.
Saying DeepMind has explored the implications of biology for AI is like saying that the Opportunity Rover has explored Mars. Yes, this is absolutely true, but the unexplored area vastly outweighs the explored area. If you think the statement implies "explored ALL of Mars" then you have a very inaccurate picture in mind.
I think moratorium is basically intractable short of a totalitarian world government cracking down on all personal computers.
Unless you mean just a moratorium on large training runs, in which case I think it buys a minor delay at best, and comes with counterproductive pressures on researchers to focus heavily on diverse small-scale algorithmic efficiency experiments.
I feel like you're preaching to the choir here on LessWrong, but I really appreciate someone doing the work of writing this in a way that's more publicly accessible.
I'd like to hear people's thoughts on how to promote this piece to a wider audience.
Ugh. I like your analysis a lot, but feel like it's worthless because your Overton window is too narrow.
I write about this in more depth elsewhere, but I'll give a brief overview here.
Misuse risk is already very high. Even open-weights models, when fine-tuned on bioweapons-relevant papers, can produce thorough and accurate plans for insanely deadly bioweapons. With surprisingly few resources, a bad actor could wipe out the majority of humanity within a few months of release.
Global Moratorium on AGI research is nearly impossible. It would require government monitoring and control of personal computers worldwide. Stopping big training runs doesn't help, maybe buys you one or two years of delay at best. Once more efficient algorithms are discovered, personal computers will be able to create dangerous AI. Scaling is the fastest route to more powerful AI, not the only route. Further discussion
AGI could be here as soon as 2025, which isn't even on your plot.
Once we get close enough to AGI for recursive self-improvement (an easier threshold to hit) then rapid algorithmic improvements make it cheap, quick and easy to create AGI. Furthermore, advancement towards ASI will likely be rapid and easy because the AGI will be able to do the work. https://manifold.markets/MaxHarms/will-ai-be-recursively-self-improvi
AI may soon be given traits that make it self-aware/self-modeling, internal-goal-driven, etc. In other words, conscious. And someone will probably set it to the task of self-improving. Nobody has to let it out of the box if its creator never puts it in a box to begin with. Nobody needs to steal the code and weights if they are published publicly. Any coherent plan must address what will be done about such models, able to run on personal computers, with open code and open weights, with tremendous capability for harm if programmed/convinced to harm humanity.
https://www.lesswrong.com/posts/NRZfxAJztvx2ES5LG/a-path-to-human-autonomy
I do think there's going to be significant AI capabilities advances from improved understanding of how mammal and bird brains work. I disagree that more complete scanning of mammalian brains is the bottleneck. I think we actually know enough about mammalian brains and their features which are invariant across members of a species. I think the bottlenecks are: Understanding the information we do have (scattered across terms of thousands of research papers) Building compute efficient emulations which accurately reproduce the critical details while abstracting away the unimportant details. Since our limited understanding can't give certain answers about which details are key, this probably involves quite a bit of parallelizable brute-forceable empirical research.
I think current LLMs can absolutely scale fast enough to be very helpful with these two tasks. So if something still seems to be missing from LLMs after the next scale-up in 2025, I expect hunting for further inspiration from the brain will seem tempting and tractable. Thus, I think we are well on track for AGI by 2026-2028 even if LLMs don't continue scaling.
Typo? Is the intended meaning: Interpretation 1 "Most people would prefer not just a compliant assistant, but rather a cofounder with integrity" Or am I misunderstanding you?
Alternately, perhaps you mean: Interpretation 2 "Most people would prefer their compliant assistant to act as if it were a cofounder with integrity under cases where instructions are underspecified."
In terms of a response, I do worry that "what most people would prefer" is not necessarily a good guide for designing a product which has the power to be incredibly dangerous either accidentally or via deliberate misanthropy.
So we have this funny double-layered alignment question of alignment-to-supervising-authority (e.g. the model owner who offers API access) and alignment to end-users.
Have you read Max Harms' sequence on Corrigibility as Singular Target? I think there's a strong argument that the ideal situation is something like your description of a "principled cofounder" as a end-user interface, but a corrigible agent as the developer interface.