Nathan Helm-Burger — LessWrong

AI alignment researcher, ML engineer. Masters in Neuroscience.

I believe that cheap and broadly competent AGI is attainable and will be built soon. This leads me to have timelines of around 2024-2027. Here's an interview I gave recently about my current research agenda. I think the best path forward to alignment is through safe, contained testing on models designed from the ground up for alignability trained on censored data (simulations with no mention of humans or computer technology). I think that current ML mainstream technology is close to a threshold of competence beyond which it will be capable of recursive self-improvement, and I think that this automated process will mine neuroscience for insights, and quickly become far more effective and efficient. I think it would be quite bad for humanity if this happened in an uncontrolled, uncensored, un-sandboxed situation. So I am trying to warn the world about this possibility.

See my prediction markets here:

https://manifold.markets/NathanHelmBurger/will-gpt5-be-capable-of-recursive-s?r=TmF0aGFuSGVsbUJ1cmdlcg

I also think that current AI models pose misuse risks, which may continue to get worse as models get more capable, and that this could potentially result in catastrophic suffering if we fail to regulate this.

I now work for SecureBio on AI-Evals.

relevant quotes:

"There is a powerful effect to making a goal into someone’s full-time job: it becomes their identity. Safety engineering became its own subdiscipline, and these engineers saw it as their professional duty to reduce injury rates. They bristled at the suggestion that accidents were largely unavoidable, coming to suspect the opposite: that almost all accidents were avoidable, given the right tools, environment, and training." https://www.lesswrong.com/posts/DQKgYhEYP86PLW7tZ/how-factories-were-made-safe

"The prospect for the human race is sombre beyond all precedent. Mankind are faced with a clear-cut alternative: either we shall all perish, or we shall have to acquire some slight degree of common sense. A great deal of new political thinking will be necessary if utter disaster is to be averted." - Bertrand Russel, The Bomb and Civilization 1945.08.18

"For progress, there is no cure. Any attempt to find automatically safe channels for the present explosive variety of progress must lead to frustration. The only safety possible is relative, and it lies in an intelligent exercise of day-to-day judgment." - John von Neumann

"I believe that the creation of greater than human intelligence will occur during the next thirty years. (Charles Platt has pointed out the AI enthusiasts have been making claims like this for the last thirty years. Just so I'm not guilty of a relative-time ambiguity, let me more specific: I'll be surprised if this event occurs before 2005 or after 2030.)" - Vernor Vinge, Singularity

Note that if the "evolutionary programming" axis is real, and a breakthrough does occur there, it is possible that it might mean reaearchers could possibly "pre-program" a model to be good at rapid learning (aka increase its Chollet-style intelligence).

I suspect that something like this process of evolution predisposing a brain to be better at abstract learning is a key factor in the differentiation of humans from previous primates.

In my mental model, I expect we are at least one fundamental breakthrough away from AGI (in keeping with François Chollet's points about intelligence being about rapid efficient learning rather than just applying broad knowledge).

It seems difficult to me to predict how far we are from a breakthrough that gives us a significant improvement on this metric.

So, to me, it seems really important to ask how much a given level of LLM coding assistance is enabling researchers to iterate more easily over a broader range of experiments. I don't have sufficient insight into the research patterns of AI companies to have a good sense of novel experiments per researcher per month (ERM). My expectation is that an increase in this metric ERM would give us some sense of how to update from the base rate of major conceptual breakthroughs (estimated in your article as 1 per 10 years, at 2010 - 2020 levels of researcher hours per year).

To figure out current odds of breakthrough per year, I'd want to know how many more researchers worldwide are working on ML than in the 2010-2020 period. I'd want to discount this by assuming many of the marginal additions are not as inspired and experienced as the earlier researchers, and are retreading known ground more, and are running less well designed experiments.

Then I'd want to make the upward adjustment of expected LLM assistance to ERM. Also, perhaps to being a sufficiently helpful research design assistant that it somewhat offsets the decrease to experiment quality caused by addition of many marginal researchers (by bumping up the low end, perhaps causing some that were below threshold of relevance to become potentially relevant).

If, over the next 5 years, we see a gradual average improvement to the "ML breakthrough rate", we should expect the next breakthrough to arrive in more like 4-6 years rather than 8-12.

If a big enough breakthrough in "learning and extrapolation rate from limited data" (aka Chollet-style intelligence) does get found, I think that puts us on a fundamentally different trend line.

I also think there's a third kind of "brain capability" axis that we might see a breakthrough along. I think it's probably less impactful than the Chollet-style. This is what I'd call the "evolutionary programming"-style. In other words, the fact that evolution has managed to shape the genetics of a baby horse such that its fetal brain development was sufficient for it to walk soon after birth with nearly no learning. This seems different from either "inefficiently learned knowledge and heuristics" (aka current style) or from Chollet-style. So this third axis would be something like researchers being able to explicitly engineer a target skill into a model such that the model needed only a tiny amount of training on this targeted task to be competent.

Tangent: have you seen Black Ops Chess? It's a blend of Chess and Stratego. https://blackopschess.com/game

I loved Stratego as a kid, and I find this very appealing. The opportunity for faking out your opponent by playing strong pieces as if they were weak ones, followed by a sudden betrayal of expectation....

Makes me curious to see a game between humans where non-sensible moves are defined in some objective way and forbidden by guardrail AI. Like, not even considered a legal move by the computer UI.

Would this extend the games of humans to around 64 moves on average? What would the experience of playing such a game be for low ELO humans? Confusion about why certain moves were forbidden, probably.

I've been suspecting that Anthropic is doing some reinforcement of legibility of CoT, because their CoTs seemed unusually normal and legible. Gemini too, back when it had visible CoT instead of summarized.

Also possible that Anthropic is actually giving edited CoTs rather than raw ones.

It's also quite possible that some will be sadistic. Once powerful AI is in the picture, it also unlocks cheap, convenient, easy-to-apply, extremely potent brain-computer-interfaces that can mentally enslave people. And that sort of thing snowballs, because the more loyal-unto-death servants you have, the easier it is to kidnap and convert new people. Along with other tech potentially unlocking things like immortality, and you have a recipe for things going quite badly if some sadist gets enough power to ramp into even more... I mean, plus the weird race dynamics of the AI itself. Will the few controllers of AI cooperate peacefully, or occasionally get into arguments and get jealous of each other's power? Might they one day get into a fight or significant competition that causes them to aim for even stronger AI servants to best their opponent, and thus leads to them losing control? Or a wide variety of other ways humans may fail to remain consistently sensible over a long period. It seems to me pretty likely that even 1 out of 1000 of the AI Lords losing control could easily lead to their uncontrolled AI self-improving enough to escape and conquer all the AI Lords. Just doesn't seem like a sane and stable configuration for humanity to aim for, insofar as we are able to aim for anything. The attractor basin around 'genuinely nice value-aligned AI' seems a lot more promising to me than 'obedient AI controlled by centralized human power'. MIRI & co make arguments about a 'near miss' on value alignment being catastrophic, but after years of thought and debate on the subject, I've come around to disagreeing with this point. A really smart, really powerful AI that is trying its best to help humanity and satisfying humanity's extrapolated values as best it can seems likely to... approach the problem intelligently. Like, recognize the need for epistemic humility and for continued human progress....

Not a donation, but I wanted to give a shout-out to Palisade Research for employing the competent Dave Kasten @davekasten to do AIS Advocacy. I enjoyed getting to meet Dave in person for the first time just recently, although I've been following him on twitter for longer.

I kinda think people spend too much effort on trying to track CO2, and not enough on improving amelioration of it. My take is that everyone in a small poorly ventilated apartment should just assume they don't get enough fresh air and spend the money they would have spent on a CO2 monitor on a small in-window heat-recovery ventilator with fan and air filter. Of course, this would require such a product being able to be purchased... and society itself is falling short on that front. Here's the closest that amazon comes to offering this: https://www.amazon.com/TOSOT-Ventilation-Conditioner-Residential-Applications/dp/B0DZCN1MQT

Which should ideally be crossed with this: https://www.amazon.com/Weiworld-Ventilation-Provides-Circulation-Basements/dp/B0DYT5R9YC

And made easy to put in a window like this: https://www.amazon.com/BLACK-DECKER-Twin-Window-Fan/dp/B0CNHB7WJ4

Such a thing is totally possible to be produced at acceptable quality for a price point of less than $200 USD.

I've met some of the team working on this project. They seem competent and motivated and optimistic they will succeed: https://e11.bio/blog/roadmap

I occasionally do have an experience where the plane is on the ground, and the cabin gets flooded with intense unpleasant airplane fuel smell. I hate it. Scary to learn it causes damage.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments