AI alignment researcher, ML engineer. Masters in Neuroscience.
I believe that cheap and broadly competent AGI is attainable and will be built soon. This leads me to have timelines of around 2024-2027. Here's an interview I gave recently about my current research agenda. I think the best path forward to alignment is through safe, contained testing on models designed from the ground up for alignability trained on censored data (simulations with no mention of humans or computer technology). I think that current ML mainstream technology is close to a threshold of competence beyond which it will be capable of recursive self-improvement, and I think that this automated process will mine neuroscience for insights, and quickly become far more effective and efficient. I think it would be quite bad for humanity if this happened in an uncontrolled, uncensored, un-sandboxed situation. So I am trying to warn the world about this possibility.
See my prediction markets here:
I also think that current AI models pose misuse risks, which may continue to get worse as models get more capable, and that this could potentially result in catastrophic suffering if we fail to regulate this.
I now work for SecureBio on AI-Evals.
relevant quotes:
"There is a powerful effect to making a goal into someone’s full-time job: it becomes their identity. Safety engineering became its own subdiscipline, and these engineers saw it as their professional duty to reduce injury rates. They bristled at the suggestion that accidents were largely unavoidable, coming to suspect the opposite: that almost all accidents were avoidable, given the right tools, environment, and training." https://www.lesswrong.com/posts/DQKgYhEYP86PLW7tZ/how-factories-were-made-safe
"The prospect for the human race is sombre beyond all precedent. Mankind are faced with a clear-cut alternative: either we shall all perish, or we shall have to acquire some slight degree of common sense. A great deal of new political thinking will be necessary if utter disaster is to be averted." - Bertrand Russel, The Bomb and Civilization 1945.08.18
"For progress, there is no cure. Any attempt to find automatically safe channels for the present explosive variety of progress must lead to frustration. The only safety possible is relative, and it lies in an intelligent exercise of day-to-day judgment." - John von Neumann
I had a similar situation up until my late 30s. A CPAP fixed some of this, and Wellbutrin (aka Buproprion) fixed the rest. Now I'm at 8-9 hours and more rested, rather than 10-14.
If the correct-side debator uses invalid claims as part of its arguments, and the judge fails to catch this... It would make me feel that something was amiss. That perhaps this wasn't a good proxy for a high-stakes debator between competent debtors trying to convince a smart and motivated human judge about facts about the world.
And if, given the full set of cited sources from both sides of the debate, the judge is able to consistently come to the correct answer, then the question isn't hard enough.
You would be happy about more people blogging about bioweapon design? Hmm.
I think you don't realize quite how much danger we are all in from a bad actor developing and releasing bioweapons. More information about that on the internet would make it easier for people to google it, and would make more training data about it available to future to LLMs. I can't see how that's anything but bad for humanity.
I don't think you are saying that you want to murder billions of innocent people or destroy modern civilization, but it seems like you are arguing for something that would make it easier for people to do those things.
I think there will be a period in the future where AI systems (models and their scaffolding) exist which are sufficiently capable that they will be able to speed up many aspects of computer-based R&D. Including recursive-self-improvement, Alignment research and Control research. Obviously, such a time period will not be likely to last long given that surely some greedy actor will pursue RSI. So personally, that's why I'm not putting a lot of faith in getting to that period [edit: resulting in safety].
I think that if you build the scaffolding which would make current models able to be substantially helpful at research (which would be impressively strong scaffolding indeed!), then you have built dual-use scaffolding which could also be useful for RSI. So any plans to do this must take appropriate security measures or they will be net harmful.
I agree with some points here Bogdan, but not all of them.
I do think that current models are civilization-scale-catastrophe-risky (but importantly not x-risky!) from a misuse perspective, but not yet from a self-directed perspective. Which means neither Alignment nor Control are currently civilization-scale-catastrophe-risky, much less x-risky.
I also agree that pausing now would be counter-productive. My reasoning for this is that I agree with Samo Burja about some key points which are relevant here (while disagreeing with his conclusions due to other points).
To quote myself:
I agree with [Samo's] premise that AGI will require fundamental scientific advances beyond currently deployed tech like transformer LLMs.
I agree that scientific progress is hard, usually slow and erratic, fundamentally different from engineering or bringing a product to market.
I agree with [Samo's] estimate that the current hype around chat LLMs, and focus on bringing better versions to market, is slowing fundamental scientific progress by distracting top AI scientists from pursuit of theoretical advances.
Think about how you'd expect these factors to change if large AI training runs were paused. I think you might agree that this would likely result in a temporary shift in much of the top AI scientist talent to making theoretical progress. They'd want to be ready to come in strong after the pause was ended, with lots of new advances tested at small scale. I think this would actually result more high quality scientific thought directed at the heart of the problem of AGI, and thus make AGI very likely to be achieved sooner after the pause ends than it otherwise would have been.
I would go even farther, and make the claim that AGI could arise during a pause on large training runs. I think that the human brain is not a supercomputer, my upper estimate for 'human brain inference' is about at the level of a single 8x A100 server. Less than an 8x H100 server. Also, I have evidence from analysis of the long-range human connectome (long range axons are called tracts, so perhaps I should call this a 'tractome'). [Hah, I just googled this term I came up with just now, and found it's already in use, and that it brings up some very interesting neuroscience papers. Cool.] Anyway... I was saying, this evidence shows that the range of bandwidth (data throughput in bits per second) between two cortical regions in the human brain is typically around 5 mb/s, and maxes out at about 50 mb/s. In other words, well within range for distributed federated training runs to work over long distance internet connections. So unless you are willing to monitor the entire internet so robustly that nobody can scrape together the equivalent compute of an 8X A100 server, you can't fully block AGI.
Of course, if you wanted to train the AGI in a reasonable amount of time, you'd want to do a parallel run of much more than a single inference instance of compute. So yeah, it'd definitely make things inconvenient if an international government were monitoring all datacenters... but far from impossible.
For the same reason, I don't think a call to 'Stop AI development permanently' works without the hypothetical enforcement agency literally going around the world confiscating all personal computers and shutting down the internet. Not gonna happen, why even advocate for such a thing? Makes me think that Eliezer is advocating for this in order to have some intended effect other than this on the world.
Thanks so much for writing this Seth! I so often get into conversations with people where I wished I had something like this post to refer them to. And now I do!
I really hope that you and Max's ideas soon get the wider recognition that I think they deserve!
- I believe it is mostly impossible except in corner/edge cases like everyone having the same preferences, because of this post:
https://www.lesswrong.com/posts/YYuB8w4nrfWmLzNob/thatcher-s-axiom
So personal intent alignment is basically all we get except in perhaps very small groups.
I want to disagree here. I think that a widely acceptable compromise on political rules, and the freedom to pursue happiness on one's own terms without violating others' rights, is quite achievable and desirable. I think that having a powerful AI establish/maintain the best possible government given the conflicting sets of values held by all parties is a great outcome. I agree that this isn't what is generally meant by 'values alignment', but I think it's a more useful thing to talk about.
I do agree that large groups of humans do seem to inevitably have contradictory values such that no perfect resolution is possible. I just think that that is beside the point, and not what we should even be fantasizing about. I also agree that most people who seem excited about 'values alignment' mean 'alignment to their own values'. I've had numerous conversations with such people about the problem of people with harmful intent towards others (e.g. sadism, vengeance). I have yet to receive anything even remotely resembling a coherent response to this. Averaging values doesn't solve the problem, there are weird bad edge cases that that falls into. Instead, you need to focus on a widely (but not necessarily unanimously) acceptable political compromise.
Extremely excited about the idea of such research succeeding in the near future! But skeptical that it will succeed in time to be at all relevant. So my overall expected value for that direction is low.
Also, I think there's probably a very real risk that the bird has already flown the coop on this. If you can cheaply modify existing open-weight models to be 'intent-aligned' with terrorists, and to be competent at using scaffolding that you have built around 'biological design tools'... then the LLM isn't really a bottleneck anymore. The irreversible proliferation has occurred already. I'm not certain this is the case, but I'd give it about 75%.
So then you need to make sure that better biological design tools don't get released, and that more infohazardous virology papers don't get published, and that wetlab automation tech doesn't get better, and... the big one.... that nobody releases an open-weight LLM so capable that it can successfully create tailor-made biological design tools. That's a harder thing to censor out of an LLM than getting it to directly not help with biological weapons! Creation of biological design tools touches on a lot more things, like its machine learning knowledge and coding skill. What exactly do you censor to make a model helpful at building purely-good tools but not at building tools which have dual-use?
Basically, I think it's a low-return area entirely. I think humanity's best bet is in generalized biodefenses, plus an international 'Council of Guardians' which use strong tool-AI to monitor the entire world and enforce a ban on:
a) self-replicating weapons (e.g. bioweapons, nanotech)
b) unauthorized recursive-self-improving AI
Of these threats only bioweapons are currently at large. The others are future threats.
Very cool. A few initial experiments worked well for me. It's interesting to see how much phrasing matters. I expect this is true for human forecasters as well. I tried predicting this manifold market and got predictions for basically the same concept varying between 10% and 78%. https://manifold.markets/MaxHarms/will-ai-be-recursively-self-improvi