Marius Hobbhahn

I recently founded Apollo Research: https://www.apolloresearch.ai/ 

I was previously doing a Ph.D. in ML at the International Max-Planck research school in Tübingen, worked part-time with Epoch and did independent AI safety research. 

For more see https://www.mariushobbhahn.com/aboutme/

I subscribe to Crocker's Rules

Wiki Contributions

Comments

All of the above but in a specific order. 
1. Test if the model has components of deceptive capabilities with lots of handholding with behavioral evals and fine-tuning. 
2. Test if the model has more general deceptive capabilities (i.e. not just components) with lots of handholding with behavioral evals and fine-tuning. 
3. Do less and less handholding for 1 and 2. See if the model still shows deception. 
4. Try to understand the inductive biases for deception, i.e. which training methods lead to more strategic deception. Try to answer questions such as: can we change training data, technique, order of fine-tuning approaches, etc. such that the models are less deceptive? 
5. Use 1-4 to reduce the chance of labs deploying deceptive models in the wild. 

(cross-posted from EAG)

Meta: Thanks for taking the time to respond. I think your questions are in good faith and address my concerns, I do not understand why the comment is downvoted so much by other people. 

1. Obviously output is a relevant factor to judge an organization among others. However, especially in hits-based approaches, the ultimate thing we want to judge is the process that generates the outputs to make an estimate about the chance of finding a hit. For example, a cynic might say "what has ARC-theory achieve so far? They wrote some nice framings of the problem, e.g. with ELK and heuristic arguments, but what have they ACtUaLLy achieved?" To which my answer would be, I believe in them because I think the process that they are following makes sense and there is a chance that they would find a really big-if-true result in the future. In the limit, process and results converge but especially early on they might diverge. And I personally think that Conjecture did respond reasonably to their early results by iterating faster and looking for hits. 
2. I actually think their output is better than you make it look. The entire simulators framing made a huge difference for lots of people and writing up things that are already "known" among a handful of LLM experts is still an important contribution, though I would argue most LLM experts did not think about the details as much as Janus did. I also think that their preliminary research outputs are pretty valuable. The stuff on SVDs and sparse coding actually influenced a number of independent researchers I know (so much that they changed their research direction to that) and I thus think it was a valuable contribution. I'd still say it was less influential than e.g. toy models of superposition or causal scrubbing but neither of these were done by like 3 people in two weeks. 
3. (copied from response to Rohin): Of course, VCs are interested in making money. However, especially if they are angel investors instead of institutional VCs, ideological considerations often play a large role in their investments. In this case, the VCs I'm aware of (not all of which are mentioned in the post and I'm not sure I can share) actually seem fairly aligned for VC standards to me. Furthermore, the way I read the critique is something like "Connor didn't tell the VCs about the alignment plans or neglects them in conversation". However, my impression from conversation with (ex-) staff was that Connor was very direct about their motives to reduce x-risks. I think it's clear that products are a part of their way to address alignment but to the best of my knowledge, every VC who invested was very aware about what their getting into. At this point, it's really hard for me to judge because I think that a) on priors, VCs are profit-seeking, and b) different sources said different things some of which are mutually exclusive. I don't have enough insight to confidently say who is right here. I'm mainly saying, the confidence of you surprised me given my previous discussions with staff.
4. Regarding confidence: For example, I think saying "We think there are better places to work at than Conjecture" would feel much more appropriate than "we advice against..." Maybe that's just me. I just felt like many statements are presented with a lot of confidence given the amount of insight you seem to have and I would have wanted them to be a bit more hedged and less confident. 
5. Sure, for many people other opportunities might be a better fit. But I'm not sure I would e.g. support the statement that a general ML engineer would learn more in general industry than with Conjecture. I also don't know a lot about CoEm but that would lead me to make weaker statements than suggesting against it. 

Thanks for engaging with my arguments. I personally think many of your criticisms hit relevant points and I think a more hedged and less confident version of your post would have actually had more impact on me if I were still looking for a job. As it is currently written, it loses some persuasion on me because I feel like you're making too broad unqualified statements which intuitively made me a bit skeptical of your true intentions. Most of me thinks that you're trying to point out important criticism but there is a nagging feeling that it is a hit piece. Intuitively, I'm very averse against everything that looks like a click-bait hit piece by a journalist with a clear agenda. I'm not saying you should only consider me as your audience, I just want to describe the impression I got from the piece. 

(cross-posted from EAF)

Some clarifications on the comment:
1. I strongly endorse critique of organisations in general and especially within the EA space. I think it's good that we as a community have the norm to embrace critiques.
2. I personally have my criticisms for Conjecture and my comment should not be seen as "everything's great at Conjecture, nothing to see here!". In fact, my main criticism of leadership style and CoEm not being the most effective thing they could do, are also represented prominently in this post. 
3. I'd also be fine with the authors of this post saying something like "I have a strong feeling that something is fishy at Conjecture, here are the reasons for this feeling". Or they could also clearly state which things are known and which things are mostly intuitions. 
4. However, I think we should really make sure that we say true things when we criticize people, quantify our uncertainty, differentiate between facts and feelings and do not throw our epistemics out of the window in the process
5. My main problem with the post is that they make a list of specific claim with high confidence and I think that is not warranted given the evidence I'm aware of. That's all.  

(cross-commented from EA forum)

I personally have no stake in defending Conjecture (In fact, I have some questions about the CoEm agenda) but I do think there are a couple of points that feel misleading or wrong to me in your critique. 

1. Confidence (meta point): I do not understand where the confidence with which you write the post (or at least how I read it) comes from. I've never worked at Conjecture (and presumably you didn't either) but even I can see that some of your critique is outdated or feels like a misrepresentation of their work to me (see below). For example, making recommendations such as "freezing the hiring of all junior people" or "alignment people should not join Conjecture" require an extremely high bar of evidence in my opinion. I think it is totally reasonable for people who believe in the CoEm agenda to join Conjecture and while Connor has a personality that might not be a great fit for everyone, I could totally imagine working with him productively. Furthermore, making a claim about how and when to hire usually requires a lot of context and depends on many factors, most of which an outsider probably can't judge. 
Given that you state early on that you are an experienced member of the alignment community and your post suggests that you did rigorous research to back up these claims, I think people will put a lot of weight on this post and it does not feel like you use your power responsibly here
I can very well imagine a less experienced person who is currently looking for positions in the alignment space to go away from this post thinking "well, I shouldn't apply to Conjecture then" and that feels unjustified to me.

2. Output so far: My understanding of Conjecture's research agenda so far was roughly: "They started with Polytopes as a big project and published it eventually. On reflection, they were unhappy with the speed and quality of their work (as stated in their reflection post) and decided to change their research strategy. Every two weeks or so, they started a new research sprint in search of a really promising agenda. Then, they wrote up their results in a preliminary report and continued with another project if their findings weren't sufficiently promising." In most of their public posts, they stated, that these are preliminary findings and should be treated with caution, etc. Therefore, I think it's unfair to say that most of their posts do not meet the bar of a conference publication because that wasn't the intended goal. 
Furthermore, I think it's actually really good that the alignment field is willing to break academic norms and publish preliminary findings. Usually, this makes it much easier to engage with and criticize work earlier and thus improves overall output quality. 
On a meta-level, I think it's bad to criticize labs that do hits-based research approaches for their early output (I also think this applies to your critique of Redwood) because the entire point is that you don't find a lot until you hit. These kinds of critiques make it more likely that people follow small incremental research agendas and alignment just becomes academia 2.0. When you make a critique like that, at least acknowledge that hits-based research might be the right approach.

3. Your statements about the VCs seem unjustified to me. How do you know they are not aligned? How do you know they wouldn't support Conjecture doing mostly safety work? How do you know what the VCs were promised in their private conversations with the Conjecture leadership team? Have you talked to the VCs or asked them for a statement? 
Of course, you're free to speculate from the outside but my understanding is that Conjecture actually managed to choose fairly aligned investors who do understand the mission of solving catastrophic risks. I haven't talked to the VCs either, but I've at least asked people who work(ed) at Conjecture. 

In conclusion:
1. I think writing critiques is good but really hard without insider knowledge and context. 
2. I think this piece will actually (partially) misinform a large part of the community. You can see this already in the comments where people without context say this is a good piece and thanking you for "all the insights".
3. The EA/LW community seems to be very eager to value critiques highly and for good reason. But whenever people use critiques to spread (partially) misleading information, they should be called out. 
4. That being said, I think your critique is partially warranted and things could have gone a lot better at Conjecture. It's just important to distinguish between "could have gone a lot better" and "we recommend not to work for Conjecture" or adding some half-truths to the warranted critiques.
5. I think your post on Redwood was better but suffered from some of the same problems. Especially the fact that you criticize them for having not enough tangible output when following a hits-based agenda just seems counterproductive to me. 

Clarified the text: 

Update (early April 2023): I now think the timelines in this post are too long and expect the world to get crazy faster than described here. For example, I expect many of the things suggested for 2030-2040 to already happen before 2030. Concretely, in my median world, the CEO of a large multinational company like Google is an AI. This might not be the case legally but effectively an AI makes most major decisions.

Not sure if this is "Nice!" xD. In fact, it seems pretty worrying. 

So far, I haven't looked into it in detail and I'm only reciting other people's testimonials. I intend to dive deeper into these fields soon. I'll let you know when I have a better understanding.  

I agree with the overall conclusion that the burden of proof should be on the side of the AGI companies. 

However, using the FDA as a reference or example might not be so great because it has historically gotten the cost-benefit trade-offs wrong many times and e.g. not permitting medicine that was comparatively safe and highly effective. 

So if the association of AIS evals or is similar to the FDA, we might not make too many friends. Overall, I think it would be fine if the AIS auditing community is seen as generally cautious but it should not give the impression of not updating on relevant evidence, etc. 

If I were to choose a model or reference class for AI auditing, I would probably choose the aviation industry which seems to be pretty competent and well-regarded. 

People could choose how they want to publish their opinion. In this case, Richard chose the first name basis. To be fair though, there aren't that many Richards in the alignment community and it probably won't be very hard for you to find out who Richard is. 

Just to get some intuitions. 

Assume you had a tool that basically allows to you explain the entire network, every circuit and mechanism, etc. The tool spits out explanations that are easy to understand and easy to connect to specific parts of the network, e.g. attention head x is doing y. Would you publish this tool to the entire world or keep it private or semi-private for a while? 

Thank you!

I also agree that toy models are better than nothing and we should start with them but I moved away from "if we understand how toy models do optimization, we understand much more about how GPT-4 does optimization". 

I have a bunch of project ideas on how small models do optimization. I even trained the networks already. I just haven't found the time to interpret them yet. I'm happy for someone to take over the project if they want to. I'm mainly looking for evidence against the outlined hypothesis, i.e. maybe small toy models actually do fairly general optimization. Would def. update my beliefs. 

I'd be super interested in falsifiable predictions about what these general-purpose modules look like. Or maybe even just more concrete intuitions, e.g. what kind of general-purpose modules you would expect GPT-3 to have. I'm currently very uncertain about this. 

I agree with your final framing. 

Load More