Marius Hobbhahn

I'm the co-founder and CEO of Apollo Research: https://www.apolloresearch.ai/
I mostly work on evals, but I am also interested in interpretability.

I was previously doing a Ph.D. in ML at the International Max-Planck research school in Tübingen, worked part-time with Epoch and did independent AI safety research.

For more see https://www.mariushobbhahn.com/aboutme/

I subscribe to Crocker's Rules

Wiki Contributions

Comments

Sorted by

Newest

This might be the last AI Safety Camp

Marius Hobbhahn7mo6731

Copying from EAF

TL;DR: At least in my experience, AISC was pretty positive for most participants I know and it's incredibly cheap. It also serves a clear niche that other programs are not filling and it feels reasonable to me to continue the program.

I've been a participant in the 2021/22 edition. Some thoughts that might make it easier to decide for funders/donors.
1. Impact-per-dollar is probably pretty good for the AISC. It's incredibly cheap compared to most other AI field-building efforts and scalable.
2. I learned a bunch during AISC and I did enjoy it. It influenced my decision to go deeper into AI safety. It was less impactful than e.g. MATS for me but MATS is a full-time in-person program, so that's not surprising.
3. AISC fills a couple of important niches in the AI safety ecosystem in my opinion. It's online and part-time which makes it much easier to join for many people, it implies a much lower commitment which is good for people who want to find out whether they're a good fit for AIS. It's also much cheaper than flying everyone to the Bay or London. This also makes it more scalable because the only bottleneck is mentoring capacity without physical constraints.
4. I think AISC is especially good for people who want to test their fit but who are not super experienced yet. This seems like an important function. MATS and ARENA, for example, feel like they target people a bit deeper into the funnel with more experience who are already more certain that they are a good fit.
5. Overall, I think AISC is less impactful than e.g. MATS even without normalizing for participants. Nevertheless, AISC is probably about ~50x cheaper than MATS. So when taking cost into account, it feels clearly impactful enough to continue the project. I think the resulting projects are lower quality but the people are also more junior, so it feels more like an early educational program than e.g. MATS.
6. I have a hard time seeing how the program could be net negative unless something drastically changed since my cohort. In the worst case, people realize that they don't like one particular type of AI safety research. But since you chat with others who are curious about AIS regularly, it will be much easier to start something that might be more meaningful. Also, this can happen in any field-building program, not just AISC.
7. Caveat: I have done no additional research on this. Maybe others know details that I'm unaware of. See this as my personal opinion and not a detailed research analysis.

We need a Science of Evals

Marius Hobbhahn7moΩ352

I feel like both of your points are slightly wrong, so maybe we didn't do a good job of explaining what we mean. Sorry for that.

1a) Evals both aim to show existence proofs, e.g. demos, as well as inform some notion of an upper bound. We did not intend to put one of them higher with the post. Both matter and both should be subject to more rigorous understanding and processes. I'd be surprised if the way we currently do demonstrations could not be improved by better science.
1b) Even if you claim you just did a demo or an existence proof and explicitly state that this should not be seen as evidence of absence, people will still see the absence of evidence as negative evidence. I think the "we ran all the evals and didn't find anything" sentiment will be very strong, especially when deployment depends on not failing evals. So you should deal with that problem from the start IMO. Furthermore, I also think we should aim to build evals that give us positive guarantees if that's possible. I'm not sure it is possible but we should try.
1c) The airplane analogy feels like a strawman to me. The upper bound is obviously not on explosivity, it would be a statement like "Within this temperature range, the material the wings are made of will break once in 10M flight miles on average" or something like that. I agree that airplanes are simpler and less high-dimensional. That doesn't mean we should not try to capture most of the variance anyway even if it requires more complicated evals. Maybe we realize it doesn't work and the variance is too high but this is why we diversify agendas.

2a) The post is primarily about building a scientific field and that field then informs policy and standards. A great outcome of the post would be if more scientists did research on this. If this is not clear, then we miscommunicated. The point is to get more understanding so we can make better predictions. These predictions can then be used in the real world.
2b) It really is not "we need to find standardised numbers to measure so we can talk to serious people" and less "let's try to solve that thing where we can't reliably predict much about our AIs". If that was the main takeaway, I think the post would be net negative.

3) But the optimization requires computation? For example, if you run 100 forward passes for your automated red-teaming algorithm with model X, that requires Y FLOP of compute. I'm unsure where the problem is.

We need a Science of Evals

Marius Hobbhahn7moΩ130

Nice work. Looking forward to that!

We need a Science of Evals

Marius Hobbhahn7moΩ120

Not quite sure tbh.
1. I guess there is a difference between capability evaluations with prompting and with fine-tuning, e.g. you might be able to use an API for prompting but not fine-tuning. Getting some intuition for how hard users will find it to elicit some behavior through the API seems relevant.
2. I'm not sure how true your suggestion is but I haven't tried it a lot empirically. But this is exactly the kind of stuff I'd like to have some sort of scaling law or rule for. It points exactly at the kind of stuff I feel like we don't have enough confidence in. Or at least it hasn't been established as a standard in evals.

We need a Science of Evals

Marius Hobbhahn7moΩ140

I somewhat agree with the sentiment. We found it a bit hard to scope the idea correctly. Defining subcategories as you suggest and then diving into each of them is definitely on the list of things that I think are necessary to make progress on them.

I'm not sure the post would have been better if we used a more narrow title, e.g. "We need a science of capability evaluations" because the natural question then would be "But why not for propensity tests or for this other type of eval. I think the broader point of "when we do evals, we need some reason to be confident in the results no matter which kind of eval" seems to be true across all of them.

The next decades might be wild

Marius Hobbhahn9moΩ120Review for 2022 Review

I think this post was a good exercise to clarify my internal model of how I expect the world to look like with strong AI. Obviously, most of the very specific predictions I make are too precise (which was clear at the time of writing) and won't play out exactly like that but the underlying trends still seem plausible to me. For example, I expect some major misuse of powerful AI systems, rampant automation of labor that will displace many people and rob them of a sense of meaning, AI taking over the digital world years before taking over the physical world (but not more than 5-10 years), humans giving more and more power into the hands of AI, infighting within the AI safety community, and many more of the predictions made in this post.

The main thing I disagree with (as I already updated in April 2023) is that the timelines underlying the post are too long. I now think almost everything is going to happen in at least half of the time presented in the post, e.g. many events in the 2030-2040 section may already happen before 2030.

In general, I can strongly recommend taking a weekend or so to write a similar story for yourselves. I felt like it made many of the otherwise fairly abstract implications of timeline and takeoff models much more salient to me and others who are less in the weeds with formal timeline / takeoff models.

Disagreement with bio anchors that lead to shorter timelines

Marius Hobbhahn9moΩ120Review for 2022 Review

I still stand behind most of the disagreements that I presented in this post. There was one prediction that would make timelines longer because I thought compute hardware progress was slower than Moore's law. I now mostly think this argument is wrong because it relies on FP32 precision. However, lower precision formats and tensor cores are the norm in ML, and if you take them into account, compute hardware improvements are faster than Moore's law. We wrote a piece with Epoch on this: https://epochai.org/blog/trends-in-machine-learning-hardware

If anything, my disagreements have become stronger and my timelines have become shorter over time. Even the aggressive model I present in the post seems too conservative for my current views and my median date is 2030 or earlier. I have substantial probability mass on an AI that could automate most current jobs before 2026 which I didn't have at the time of writing.

I also want to point out that Daniel Kokotajlo, whom I spent some time talking about bio anchors and Tom Davidson's takeoff model with, seemed to have consistently better intuitions than me (or anyone else I'm aware of) on timelines. The jury is still out there, but so far it looks like reality follows his predictions more than mine. At least in my case, I updated significantly toward shorter timelines multiple times due to arguments he made.

Nuclear Energy - Good but not the silver bullet we were hoping for

Marius Hobbhahn9mo120Review for 2022 Review

I think I still mostly stand behind the claims in the post, i.e. nuclear is undervalued in most parts of society but it's not as much of a silver bullet as many people in the rationalist / new liberal bubble would make it seem. It's quite expensive and even with a lot of research and de-regulation, you may not get it cheaper than alternative forms of energy, e.g. renewables.

One thing that bothered me after the post is that Johannes Ackva (who's arguably a world-leading expert in this field) and Samuel + me just didn't seem to be able to communicate where we disagree. He expressed that he thought some of our arguments were wrong but we never got to the crux of the disagreement.

After listening to his appearance on 80k: https://80000hours.org/podcast/episodes/johannes-ackva-unfashionable-climate-interventions/ I feel like I understand the core of the disagreement much better (though I never confirmed with Johannes). He mostly looks at energy through a lens of scale, neglectedness, and traceability, i.e. he's looking to investigate and push interventions that are most efficient on the margin. On the margin, nuclear seems underinvested and lots of reasonable options are underexplored (e.g. large-scale production of smaller reactors), both Samuel and I would agree with that. However, the claim we were trying to make in the post was that nuclear is already more expensive than renewables and this gap will likely just increase in the future. Thus, it makes sense to, in total, invest more in renewables than nuclear. Also, there were lots of smaller things where I felt like I understood his position much better after listening to the podcast.

Trends in GPU price-performance

Marius Hobbhahn9moΩ360Review for 2022 Review

In a narrow technical sense, this post still seems accurate but in a more general sense, it might have been slightly wrong / misleading.

In the post, we investigated different measures of FP32 compute growth and found that many of them were slower than Moore's law would predict. This made me personally believe that compute might be growing slower than people thought and most of the progress comes from throwing more money at larger and larger training runs. While most progress comes from investment scaling, I now think the true effective compute growth is probably faster than Moore's law.

The main reason is that FP32 is just not the right thing to look at in modern ML and we even knew this at the time of writing, i.e. it ignores tensor cores and lower precisions like TF16 or INT8.

I'm a little worried that people who read this post but don't have any background in ML got the wrong takeaway from the post and we should have emphasized this difference even more at the time. We have written a follow-up post about this recently here: https://epochai.org/blog/trends-in-machine-learning-hardware
I feel like the new post does a better job at explaining where compute progress comes from.

Lessons learned from talking to >100 academics about AI safety

Marius Hobbhahn9moΩ5116Review for 2022 Review

I haven't talked to that many academics about AI safety over the last year but I talked to more and more lawmakers, journalists, and members of civil society. In general, it feels like people are much more receptive to the arguments about AI safety. Turns out "we're building an entity that is smarter than us but we don't know how to control it" is quite intuitively scary. As you would expect, most people still don't update their actions but more people than anticipated start spreading the message or actually meaningfully update their actions (probably still less than 1 in 10 but better than nothing).

LESSWRONG
LW

Posts

Wiki Contributions

Comments