How to do conceptual research: Case study interview with Caspar Oesterheld

1ektimo

2Chi Nguyen

1ektimo

2Maxime Riché

1Chi Nguyen

New Comment

Thanks for the interesting write-up.

Regarding __Evidential Cooperation in Large Worlds__, the Identical Twin One Shot Prisoner's dilemma makes sense to me because the entity giving the payout is connected to both worlds. What is the intuition for ECL (where my understanding is there isn't any connection)?

The "entity giving the payout" in practice for ECL would be just the world states you end up in and requires you to care about the environment of the person you're playing the PD with.

So, defecting might be just optimising my local environment for my own values and cooperating would be optimising my local environment for some aggregate of my own values and the values of the person I'm playing with. So, it only works if there are positive-sum aggregates and if each player cares about what the other does to their local environment.

Caspar Oesterheldcame up with two of the most important concepts in my field of work:Evidential Cooperation in Large WorldsandSafe Pareto Improvements. He also came up with a potential implementation of evidential decision theory in boundedly rational agents calleddecision auctions, wrote a comprehensivereview of anthropics and how it interacts with decision theorywhich most of my anthropics discussions built on, and independently decided to work on AI some time late 2009 or early 2010.Needless to say, I have a lot of respect for Caspar’s work. I’ve often felt very confused about what to do in my attempts at conceptual research, so I decided to ask Caspar how he did his research. Below is my writeup from the resulting conversation.

## How Caspar came up with surrogate goals

## The process

SPI paperwas created and he continues trying to answer this question now.## Caspar’s reflections on what was important during the process

somehigh-level thinking about bargaining alongside his more narrow projects.## How Caspar came up with ECL

## The process

shouldexist. Most of the research in the report was specifically done with the goal of making the report complete as opposed to, for example, being curiosity projects.## Caspar’s reflections on what was important during the process

## How Caspar came up with decision auctions

## The process

they specifically say that their approach doesn’t work well to build an EDT agent. But maybe I can extend it to change that?this paperthat takes an economics-style perspective, which points out essentially the same issues as the Garrabrant post above, so I seem to be on the right track with reading the literature on prediction markets.Eliciting Predictions and Recommendations for Decision Making” (Yiling, Kash, Ruberry & Shnayder, 2024), but it involves randomisation, so can’t be used to build an EDT agent. But it has a nice formalism and seems to offer a great formal framework to think about agents that are powered by something like a prediction market. It ismucheasier to think about something with a concrete formal structure than just vaguely thinking about “Hm, how do you build an agent that kind of behaves like this or that.” Maybe I can extend the method in the paper to make it EDT friendly?paperon decision scoring, which identifies the decision auctions mechanism.a theory of bounded inductive rationality.[editor’s note: I find it notable that all the linked papers are in CS venues rather than economics. That said, while Yiling Chen is a CS professor, she studied economics and has an economics PhD.]## How Caspar decided to work on superhuman AI in late 2009 or early 2010

My impression is that a few people in AI safety independently decided that AI was the most important lever over the future and

thendiscovered LessWrong, Eliezer Yudkowsky, and the AI safety community. Caspar is one of those people. While this didn’t turn out to be unique or counterfactually impactful, I am including his story for deciding to work on superhuman AI. The story is from notes Caspar left in writing after the interview. I mostly copied them verbatim with some light editing for clarity and left it in first person.## The process

“Much of this happened when I was very young, so there's some naivete throughout:When I was young I wanted to become a physicist, because physics is the most fundamental science.Physics uses maths, so I first wanted to learn some maths. To that end I took a linear algebra course at the University of Hamburg, which for some reason started with the Zermelo-Fraenkl axiomatization of set theory. (Linear algebra courses don't normally introduce those ideas.)This led me to think about automated theorem proving: using Zermelo-Fraenkl axiomatization, you can write down a program that finds all correct proofs/all provable theorems. You'd "just" have to figure out how to make this program fast/efficient. This seemed like a big deal to me at the time! Why be a mathematician and prove theorems yourself -- seems much more leveraged to figure out automated theorem proving and then prove theorems that way? This led me to think and read about AI a bunch, including outside of the automated theorem proving context.Then at some point I sat down and thought about what the most impactful thing would be that I could do with my life. And then creating superhuman AI for the improvement of society was my best guess. (I don't remember why I chose this over global warming btw (I'd guess it was neglectedness or comparative advantage, but not sure). I had been thinking a bunch about global warming at the time.) [editor’s note: This was late 2009 or early 2010 when Caspar was 15.] So then I learned even more about AI and CS, deprioritized math, and when it came to choosing what BSc/undergrad to enroll in, I picked CS at a uni that had a lot of people working on AI. Within AI, I also focused on learning about the ones that seemed to me most useful for AGI, i.e., RL, neural nets, cognitive science as opposed to, say, support vector machines or automated theorem proving.Eventually (once I used the English-language Internet more) I found some articles by Yudkowsky on AGI, which then led me to Yudkowsky's writing on AI safety, which convinced me to think more about safety and adjacent topics (ethics), and also caused me to engage with EA. (As you might know, Yudkowsky et al. also wanted to create AGI before they started working on safety. So to some extent my trajectory is similar, though I didn’t have to do the hard work to become convinced of safety as a priority, which to me seems like a more difficult step than figuring out that AI is important in some way.)”## Caspar’s reflections on what was important during the process

“I was often driven by "this seems like a big deal"-type intuitions that weren't exactly correct, but that did track the truth to some extent. This caused me to work on and think about various "adjacent" ideas and this was very useful. For example, take "automated theorem proving is a more leveraged way to prove mathematical theorems". Of course, there are lots of issues with this idea. (Why is proving mathematical theorems important in the first place? Is any of this counterfactual? Can you solve automated theorem proving without “solving AGI”?) But to some extent the argument contains some of the structure of the true arguments for the importance of AI. And thinking about automated theorem proving was good because it led me to think about AI a bunch. Maybe at the time I could have known that I was wrong or naive in various ways. But just acting on the views at the time was definitely better than discarding them altogether.Consuming existing ideas (e.g., taking a linear algebra course, texts about the future of AI and the importance of AI safety) is important.It was also important to at various points think explicitly about impact as opposed to just following curiosity.”## General notes on his approach to research

## What does research concretely look like in his case?

Thinks he might do when he does research, in no particular order:

## Research immersion

unsure about how important research immersion is(description below). He knows others who say it’s important to do good research.Occasional life-distracting obsessive immersion:Sometimes, especially when he has a fairly well-defined technical question, he can’t let the question go from his mind for a day or several days. His whole mental life will revolve around this question even when it’s not important. This makes it difficult to do other stuff, be it life or work. It also often feels bad if it doesn’t feel like he’s making progress.Usual background immersion: Most days, he has his research questions in the back of his mind when he’s off work. If he’s not doing distracting activities, during perhaps 25% of his free evening time he will passively have some research on his mind. (A bit like a song that’s very mildly stuck in your head although often very quietly.)## Goal orientation vs. curiosity orientation