Nathan Helm-Burger

AI alignment researcher, ML engineer. Masters in Neuroscience.

I believe that cheap and broadly competent AGI is attainable and will be built soon. This leads me to have timelines of around 2024-2027. Here's an interview I gave recently about my current research agenda. I think the best path forward to alignment is through safe, contained testing on models designed from the ground up for alignability trained on censored data (simulations with no mention of humans or computer technology). I think that current ML mainstream technology is close to a threshold of competence beyond which it will be capable of recursive self-improvement, and I think that this automated process will mine neuroscience for insights, and quickly become far more effective and efficient. I think it would be quite bad for humanity if this happened in an uncontrolled, uncensored, un-sandboxed situation. So I am trying to warn the world about this possibility.

See my prediction markets here:

https://manifold.markets/NathanHelmBurger/will-gpt5-be-capable-of-recursive-s?r=TmF0aGFuSGVsbUJ1cmdlcg

I also think that current AI models pose misuse risks, which may continue to get worse as models get more capable, and that this could potentially result in catastrophic suffering if we fail to regulate this.

I now work for SecureBio on AI-Evals.

relevant quote:

"There is a powerful effect to making a goal into someone’s full-time job: it becomes their identity. Safety engineering became its own subdiscipline, and these engineers saw it as their professional duty to reduce injury rates. They bristled at the suggestion that accidents were largely unavoidable, coming to suspect the opposite: that almost all accidents were avoidable, given the right tools, environment, and training." https://www.lesswrong.com/posts/DQKgYhEYP86PLW7tZ/how-factories-were-made-safe

Posts

Sorted by New

4Nathan Helm-Burger's Shortform

11Constituency-sized AI congress?

3mo

14Gunpowder as metaphor for AI

4mo

21Digital humans vs merge with AI? Same or different?

5mo

8Desiderata for an AI

10mo

22An attempt to steelman OpenAI's alignment plan

10mo

11Two paths to win the AGI transition

10mo

12Nice intro video to RSI

18Will GPT-5 be able to self-improve?

15Can GPT-4 play 20 questions against another instance of itself?

6Feature idea: extra info about post author's response to comments.

Wiki Contributions

Comments

Rapid capability gain around supergenius level seems probable even without intelligence needing to improve intelligence

Nathan Helm-Burger21h20

I think this is a good description of the problem. The fact that Einstein's brain had a similar amount of compute and data, similar overall architecture, similar fundamental learning algorithm means that a brain-like algorithm can substantially improve in capability without big changes to these things. How similar to the brain's learning algorithm does an ML algorithm have to be before we should expect similar effects? That seems unclear to me. I think a lot of people who try to make forecasts about AI progress are greatly underestimating the potential impact of algorithm development, and how the rate of algorithmic progress could be accelerated by large-scale automated searches by sub-AGI models like GPT-5.

A related market I have on manifold:

https://manifold.markets/NathanHelmBurger/gpt5-plus-scaffolding-and-inference

https://manifold.markets/NathanHelmBurger/1hour-agi-a-system-capable-of-any-c

A related comment I made on a different post:

https://www.lesswrong.com/posts/sfWPjmfZY4Q5qFC5o/why-i-m-doing-pauseai?commentId=p2avaaRpyqXnMrvWE

Shannon Vallor’s “technomoral virtues”

Nathan Helm-Burger3d42

Something I think humanity is going to have to grapple with soon is the ethics of self-modification / self-improvement, and the perils of value-shift due to rapid internal and external changes. How do we stay true to ourselves while changing fundamental aspects of what it means to be human?

OHGOOD: A coordination body for compute governance

Nathan Helm-Burger3d30

This is a solid seeming proposal. If we are in a world where the majority of danger comes from big datacenters and large training runs, I predict that this sort of regulation would be helpful. I don't think we are in that world though, which I think limits how useful this would be. Further explanation here: https://www.lesswrong.com/posts/sfWPjmfZY4Q5qFC5o/why-i-m-doing-pauseai?commentId=p2avaaRpyqXnMrvWE

How does the ever-increasing use of AI in the military for the direct purpose of murdering people affect your p(doom)?

Answer by Nathan Helm-BurgerMay 04, 202452

Personally, I have gradually moved to seeing this as lowering my p(doom). I think humanity's best chance is to politically coordinate to globally enforce strict AI regulation. I think the most likely route to this becoming politically feasible is through empirical demonstrations of the danger of AI. I think AI is more likely to be legibly empirically dangerous to political decision-makers if it is used in the military. Thus, I think military AI is, counter-intuitively, lowering p(doom). A big accident that caused military AI to kill thousands of innocent people that the military had not intended to kill could be really great for p(doom).

This is a sad thing to think, obviously. I'm hopeful we can come up with harmless demonstrations of the dangers involved, so that political action will be taken without anyone needing to be killed.

In scenarios where AI becomes powerful enough to present an extinction risk to humanity, I don't expect that the level of robotic weaponry it has control over to matter much. It will have many many opportunities to hurt humanity that look nothing like armed robots and greatly exceed the power of armed robots.

Why I'm doing PauseAI

Nathan Helm-Burger4d20

I absolutely sympathize, and I agree that with the world view / information you have that advocating for a pause makes sense. I would get behind 'regulate AI' or 'regulate AGI', certainly. I think though that pausing is an incorrect strategy which would do more harm than good, so despite being aligned with you in being concerned about AGI dangers, I don't endorse that strategy.

Some part of me thinks this oughtn't matter, since there's approximately ~0% chance of the movement achieving that literal goal. The point is to build an anti-AGI movement, and to get people thinking about what it would be like to be able to have the government able to issue an order to pause AGI R&D, or turn off datacenters, or whatever. I think that's a good aim, and your protests probably (slightly) help that aim.

I'm still hung up on the literal 'Pause AI' concept being a problem though. Here's where I'm coming from:

1. I've been analyzing the risks of current day AI. I believe (but will not offer evidence for here) current day AI is already capable of providing small-but-meaningful uplift to bad actors intending to use it for harm (e.g. weapon development). I think that having stronger AI in the hands of government agencies designed to protect humanity from these harms is one of our best chances at preventing such harms.

2. I see the 'Pause AI' movement as being targeted mostly at large companies, since I don't see any plausible way for a government or a protest movement to enforce what private individuals do with their home computers. Perhaps you think this is fine because you think that most of the future dangers posed by AI derive from actions taken by large companies or organizations with large amounts of compute. This is emphatically not my view. I think that actually more danger comes from the many independent researchers and hobbyists who are exploring the problem space. I believe there are huge algorithmic power gains which can, and eventually will, be found. I furthermore believe that beyond a certain threshold, AI will be powerful enough to rapidly self-improve far beyond human capability. In other words, I think every AI researcher in the world with a computer is like a child playing with matches in a drought-stricken forest. Any little flame, no matter how small, could set it all ablaze and kill everyone. Are the big labs playing with bonfires dangerous? Certainly. But they are also visible, and can be regulated and made to be reasonably safe by the government. And the results of their work are the only feasible protection we have against the possibility of FOOM-ing rogue AGI launched by small independent researchers. Thus, pausing the big labs would, in my view, place us in greater danger rather than less danger. I think we are already well within the window of risk from independent-researcher-project-initiated-FOOM. Thus, the faster we get the big labs to develop and deploy worldwide AI-watchdogs, the sooner we will be out of danger.

I know these views are not the majority views held by any group (that I know of). These are my personal inside views from extensive research. If you are curious about why I hold these views, or more details about what I believe, feel free to ask. I'll answer if I can.

An Introduction to AI Sandbagging

Nathan Helm-Burger4d20

I've mentioned it elsewhere, but I'll repeat it again here since it's relevant. For GPT-style transformers, and probably for other model types, you can smoothly subtly degrade the performance of the model by adding in noise to part or all of the activations. This is particularly useful for detecting sandbagging, because you would expect sandbagging to show up as an anomalous increase in capability, breaking the smooth downward trend in capability, as you increased the amount of noise injected or fraction of activations to which noise was added. I found that there was noticeable decrease in performance even when the noise was added even to a small fraction (e.g. < 1%) of the activations and was created to be small relative to the given activation-magnitude it was being added to.

KAN: Kolmogorov-Arnold Networks

Nathan Helm-Burger4d20

So, after reading the KAN paper, and thinking about it in the context of this post: https://www.lesswrong.com/posts/gTZ2SxesbHckJ3CkF/transformers-represent-belief-state-geometry-in-their

My vague intuition is that the same experiment done with a KAN would result in a clearer fractal which wiggled less once training loss had plateaued. Is that also other people's intuition?

Why is AGI/ASI Inevitable?

Nathan Helm-Burger4d50

I, on the other hand, have very little confidence that people trying to build AGI will fail to quickly (within the next 3 years, aka 2027) find ways to do it. I do have confidence that we can politically coordinate to stop the situation becoming an extinction or near-extinction-level catastrophe. So I place much less emphasis on abstaining from publishing ideas which may help both alignment and capabilities, and more emphasis on figuring out ways to generate empirical evidence of the danger before it is too late, so as to facilitate political coordination.

I think that the situation in which humanity fails to politically coordinate to avoid building catastrophically dangerous AI is a situation that leads into conflict, likely a World War III with wide-spread use of nuclear weapons. I don't expect humanity to go extinct from this and I don't expect the rogue AGI to emerge as the victor, but I do think it is in everyone's interests to work hard to avoid such a devastating conflict. I do think that any such conflict would likely wipe out the majority of humanity. That's a pretty grim risk to be facing on the horizon.

Open Thread Spring 2024

Nathan Helm-Burger5d32

EY may be too busy to respond, but you can probably feel pretty safe consulting with MIRI employees in general. Perhaps also Conjecture employees, and Redwood Research employees, if you read and agree with their views on safety. That at least gives you a wider net of people to potentially give you feedback.

Open Thread Spring 2024

Nathan Helm-Burger5d32

Some features I'd like:

a 'mark read' button next to posts so I could easily mark as read posts that I've read elsewhere (e.g. ones cross-posted from a blog I follow)

a 'not interested' button which would stop a given post from appearing in my latest or recommended lists. Ideally, this would also update my recommended posts so as to recommend fewer posts like that to me. (Note: the hide-from-front-page button could be this if A. It worked even on promoted/starred posts, and B. it wasn't hidden in a three-dot menu where it's frustrating to access)

a 'read later' button which will put the post into a reading list for me that I can come back to later.

a toggle button for 'show all' / 'show only unread' so that I could easily switch between the two modes.

These features would help me keep my 'front page' feeling cleaner and more focused.