UPDATE
When writing this post, I think I was biased by the specific work I'd been doing for the 6-12 months prior, and generalised that too far.
I think I stand by the claim that tooling alone could speedup the research I was working on by 3-5x. But even on the same agenda now, the work is far less amenable to major speedups from tooling. Now, work on the agenda is far less "implement several minor algorithmic variants, run hyperparameter sweeps which take < 1 hour, evaluate a set of somewhat concrete metrics, repeat", and more "think deeply about which variants make the most sense, run >3 hours jobs/sweeps, evaluate the more murky metrics".
The main change was switching our experiments from toy models to real models, which massively loosened the iteration loop due to increased training time and less clear evaluations.
For the current state, I think 6-12 months of tooling progress might give 1.5-2x speedup from the baseline 1 year ago.
I still believe that I underestimated safety research speedups overall, but not by as much as I thought 2 months ago.
my hot take is i agree that human researchers spend a ridiculous amount doing stupid stuff (see my shortform on this), but also I don't think it's very easy to automate the stupid stuff.
I've optimized my research setup to get quite tight feedback loops. if I had more slack I could probably make things even better, but it would look more like developing better infrastructure and hpopt techniques myself, than handing work off to agents.
I disagree that you have to use grid search and not anything more clever in theory. I currently use grid searches too for simplicity, and it's definitely nontrivial to get the clever thing to tell you about interactions, but it doesn't seem fundamentally impossible.
(See my update 2 months later)
A year or so ago, I thought that 10x speedups in safety research would require AIs so capable that takeover risks would be very high. 2-3x gains seemed plausible, 10x seemed unlikely. I no longer think this.
What changed? I continue to think that AI x-risk research is predominantly bottlenecked on good ideas, but I suffered from a failure of imagination of the speedups that could be gained from AIs that are unable to produce great high-level ideas. I’ve realised that humans trying to get empirical feedback from their ideas waste a huge amount of thought cycles on tasks that could be done by just moderately capable AI agents.
I don’t expect this to be an unpopular position, but I thought it might be useful to share some details of how I see this speedup happening in my current research.
If we stopped frontier AI progress today but had 6-12 months of tooling and scaffolding progress, I think the research direction I’ve been working on, which contains a mix of conceptual and empirical interpretability work (parameter decomposition), could speed up by 3-5x from the base rate of 1 year ago. The most recent month or two might have been a 1.5-2x speedup.
Where do the speedups come from? The majority of the time my team has spent on the parameter decomposition agenda went as follows:
You might think “surely humans without AI can minimise the number of iterations here by doing very wide sweeps initially?”. Unfortunately, the space of hyperparmeter sweeps is extremely large when you’re testing 20+ different possible loss functions and training process miscellanea. This is a problem because:
How I think we could get a 3-5x speedup with better tooling/scaffolding only:
Going from 3-5x to 10+x with still-safe AI just comes from models that are capable enough to do more iterations of 1-4 on their own, and are able to provide better ideas and analysis to the humans. I don’t know what level of capabilities is required to achieve this, but I don’t think it’s too far from the current level. Provided these slightly more capable models are not integrated absolutely everywhere in society with minimal controls, I don’t expect them to have large x-risk.
If they tried to do more than a few iterations, I expect today's models to get too off-track