Nathan Helm-Burger

AI alignment researcher, ML engineer. Masters in Neuroscience.

I believe that cheap and broadly competent AGI is attainable and will be built soon. This leads me to have timelines of around 2024-2027. Here's an interview I gave recently about my current research agenda. I think the best path forward to alignment is through safe, contained testing on models designed from the ground up for alignability trained on censored data (simulations with no mention of humans or computer technology). I think that current ML mainstream technology is close to a threshold of competence beyond which it will be capable of recursive self-improvement, and I think that this automated process will mine neuroscience for insights, and quickly become far more effective and efficient. I think it would be quite bad for humanity if this happened in an uncontrolled, uncensored, un-sandboxed situation. So I am trying to warn the world about this possibility. See my prediction market here: 

I also think that current AI models pose misuse risks, which may continue to get worse as models get more capable, and that this could potentially result in catastrophic suffering if we fail to regulate this.

Wiki Contributions


Well, I do agree that there are two steps needed from the quote to the position of saying the quote supports omnicide. 

Step 1. You have to also think that things smarter (better at science) and more complex than humans will become more powerful than humans, and somehow end up in control of the destiny of the universe.

Step 2. You have to think that humans losing control in this way will be effectively fatal to them, one way or another, not long after it happens.

So yeah, Schmidhuber might think that one or both of these two steps are invalid. I believe they probably are, and thus that Schmidhuber's position thus points pretty strongly at human extinction. That if we want to avoid human extinction we need to avoid going in the direction of AI being more complex than humans. 

My personal take is that we should keep AI as limited and simple as possible, as long as possible. We should aim for increasing human complexity and ability. We should not merge with AI, we should simply use AI as a tool to expand humanity's abilities. Create digital humans. Then figure out how to let those digital humans grow and improve beyond the limits of biology while still maintaining their core humanity.

I think they might be loss-leading to compete against the counterfactual of status-quo-bias, the not-using-a-model-at-all state of being. Once companies start to pay the cost to incorporate the LLMs into their workflows, I see no reason why OpenAI can't just increase the price. I think this might happen by simply releasing a new improved model at a much higher price. If everyone is using and benefiting already from the old model, and the new one is clearly better, the higher price will be easier to justify as a good investment for businesses.

In my past job experience there has just always been a small handful of tasks that get left up to linux shell no matter what the rest of the codebase is written in. It's just a lot more convenient for certain things.

So, I agree with most of your points Porby, and like your posts and theories overall.... but I fear that the path towards a safe AI you outline is not robust to human temptation. I think that if it is easy and obvious how to make a goal-agnostic AI into a goal-having AI, and also it seems like doing so will grant tremendous power/wealth/status to anyone who does so, then it will get done. And do think that these things are the case. I think that a carefully designed and protected secret research group with intense oversight could follow your plan, and that if they do, there is a decent chance that your plan works out well. I think that a mish-mash of companies and individual researchers acting with little effective oversight will almost certainly fall off the path, and that even having most people adhering to the path won't be enough to stop catastrophe once someone has defected.

I also think that misuse can lead more directly to catastrophe, through e.g. terrorists using a potent goal-agnostic AI to design novel weapons of mass destruction. So in a world with increasingly potent and unregulated AI, I don't see how to have much hope for humanity.

And I also don't see any easy way to do the necessary level of regulation and enforcement. That seems like a really hard problem. How do we prevent ALL of humanity from defecting when defection becomes cheap, easy-to-hide, and incredibly tempting?

And I'm not Daniel K., but I do want to respond to you here Ryan. I think that the world I foresee is one in which there will huge tempting power gains which become obviously available to anyone willing to engage in something like RL-training their personal LLM agent (or other method of instilling additional goal-pursuing-power into it). I expect that some point in the future the tech will change and this opportunity will become widely available, and some early adopters will begin benefiting in highly visible ways. If that future comes to pass, then I expect the world to go 'off the rails' because these LLMs will have correlated-but-not-equivalent goals and will become increasingly powerful (because one of the goals they get set will be to create more powerful agents).

I don't think that's that only way things go badly in the future, but I think it's an important danger we need to be on guard against. Thus, I think that a crux between you and I is that I think that there is a strong reason to believe that the 'if we did a bunch of RL' is actually a quite likely scenario. I believe it is inherently an attractor-state.

What if each advisor was granted a limited number of uses of a chess engine... Like 3 each per game. That could help the betrayers come up with a good betrayal when they thought the time was right. But the good advisor wouldn't know that the bad one was choosing this move to user the chess engine on.

Just wanted to say that this was a key part of my daily work for years as an ML engineer / data scientist. Use cheap fast good-enough models for 99% of stuff. Use fancy expensive slow accurate models for the disproportionately high value tail.

Love this. I've been thinking about related things in AI bio safety evals. Could we have an LLM walk a layperson through a complicated-but-safe wetlab protocol which is an approximate difficulty match for a dangerous protocol? How good of evidence would this be compared to doing the actual dangerous protocol? Maybe at least you could cut eval costs by having a large subject group do the safe protocol, and only a small carefully screened and supervised group go through the dangerous protocol.

To which I say, the only valid red teaming of an open source model is to red team it and any possible (not too relatively expensive) modification thereof, since that is what you are releasing.


Yes! Thank you!

Load More