Bogdan Ionut Cirstea

Automated / strongly-augmented safety research.

Wikitag Contributions

Comments

Sorted by

I suspect current approaches probably significantly or even drastically under-elicit automated ML research capabilities.

I'd guess the average cost of producing a decent ML paper is at least 10k$ (in the West, at least) and probably closer to 100k's $.

In contrast, Sakana's AI scientist cost on average 15$/paper and .50$/review. PaperQA2, which claims superhuman performance at some scientific Q&A and lit review tasks, costs something like 4$/query. Other papers with claims of human-range performance on ideation or reviewing also probably have costs of <10$/idea or review.

Even the auto ML R&D benchmarks from METR or UK AISI don't give me at all the vibes of coming anywhere near close enough to e.g. what a 100-person team at OpenAI could accomplish in 1 year, if they tried really hard to automate ML.

A fairer comparison would probably be to actually try hard at building the kind of scaffold which could use ~10k$ in inference costs productively. I suspect the resulting agent would probably not do much better than with 100$ of inference, but it seems hard to be confident. And it seems harder still to be confident about what will happen even in just 3 years' time, given that pretraining compute seems like it will probably grow about 10x/year and that there might be stronger pushes towards automated ML.
 

This seems pretty bad both w.r.t. underestimating the probability of shorter timelines and faster takeoffs, and in more specific ways too. E.g. we could be underestimating by a lot the risks of open-weights Llama-3 (or 4 soon) given all the potential under-elicitation.

Reply41111

I find the pessimistic interpretation of the results a bit odd given considerations like those in https://www.lesswrong.com/posts/i2nmBfCXnadeGmhzW/catching-ais-red-handed

I also think it's important to notice how much less scary / how much more probably-easy-to-mitigate (at least strictly when it comes to technical alignment) this story seems than the scenarios from 10 years ago or so, e.g. from Superintelligence / from before LLMs, when pure RL seemed like the dominant paradigm to get to AGI.

I agree it's bad news w.r.t. getting maximal evidence about steganography and the like happening 'by default'. I think it's good news w.r.t. lab incentives, even for labs which don't speak too much about safety.

I pretty much agree with 1 and 2. I'm much more optimistic about 3-5 even 'by default' (e.g. R1's training being 'regularized' towards more interpretable CoT, despite DeepSeek not being too vocal about safety), but especially if labs deliberately try for maintaining the nice properties from 1-2 and of interpretable CoT.

If "smarter than almost all humans at almost all things" models appear in 2026-2027, China and several others will be able to ~immediately steal the first such models, by default.

Interpreted very charitably: but even in that case, they probably wouldn't have enough inference compute to compete.

Quick take: this is probably interpreting them over-charitably, but I feel like the plausibility of arguments like the one in this post makes e/acc and e/acc-adjacent arguments sound a lot less crazy.

To the best of my awareness, there isn't any demonstrated proper differential compute efficiency from latent reasoning to speak of yet. It could happen, it could also not happen. Even if it does happen, one could still decide to pay the associated safety tax of keeping the CoT.

More generally, the vibe of the comment above seems too defeatist to me; related: https://www.lesswrong.com/posts/HQyWGE2BummDCc2Cx/the-case-for-cot-unfaithfulness-is-overstated

They also require relatively little compute (often around $1 for a training run), so AI agents could afford to test many ideas.

Ok, this seems surprisingly cheap. Can you say more about what such a 1$ training run typically looks like (what the hyperparameters are)? I'd also be very interested in any analysis about how SAE (computational) training costs scale vs. base LLM pretraining costs.

I wouldn't be surprised if SAE improvements were a good early target for automated AI research, especially if the feedback loop is just "Come up with idea, modify existing loss function, train, evaluate, get a quantitative result".

This sounds spiritually quite similar to what's already been done in Discovering Preference Optimization Algorithms with and for Large Language Models and I'd expect something roughly like that to probably produce something interestin, especially if a training run only cost $1.

Some additional evidence: o3 used 5.7B tokens per task to achieve its ARC score of 87.5%; it also scored 75.7% on low compute mode using 33M tokens per task:

https://arcprize.org/blog/oai-o3-pub-breakthrough

Load More