Thoughts on AI Safety Camp

I am the program coordinator of AI Safety Camp. Let me respond with personal impressions / thoughts:

Apologies, Charlie, that we did not get to call before you wrote this post. Busy months for me, and I had misinterpreted your request as you broadly reaching out to interview organisers of various programs.

First, respect for the thoroughness and consideration of your writing:

It is useful to get an outside perspective of how AI Safety Camp works for participants.
- In this sense, I am glad that we as organisers did not get to talk with you yet, which might have 'anchored' this post more on our notions of what the camp is about.
  - Hoping that you and I can still schedule a reverse interview, where I can listen to and learn from your ideas!
- Noting that we also welcome honest criticism of AI Safety Camp that could help us rethink or improve the format and/or the way we coordinate editions.
  - I would personally value if someone could do background research at least half as well as Charlie and play devil's advocate: come up with arguments against AISC's current design or 'set parameters' being any good for helping certain (potential) participants to contribute to AI existential safety.
  - Write a quality post and it will get my strong upvote at least!
Glad to have your ideas on parameters to tweak and what to consider focussing on doing well so we can serve new participants better (to come to contribute at the frontiers of preventing the existential risk posed by AI developments).
- For example, you made me think that maybe the virtual edition could be adapted to cater for remote ML engineering teams in particular.
- Where conceptual research in a group setting may just tend to work better through spontaneous chats and flip-chart scribbles at a physical retreat.
I find myself broadly agreeing with most of your descriptions of whom the camp is for and how we serve our participants.

On ways the camp serves participants looking to contribute to AI x-safety research:

35% about testing fit, 30% about signalling, and 15% about object-level work, plus different leftovers.

The relative weighting above matches my impressions, at least for past editions (AISC 1-5).
- Having said that, making connections with other aspiring researchers (fellow participants, organisers, speakers, research authors) mattered a lot for some alumni's career trajectories.
  - I am not sure how to even introduce a separate weight for 'networking' given the overlap with 'signalling' and 'testing fit' and leftovers like 'heard about a grant option; started an org'.
  - BTW your descriptions related to 'testing fit' resonated with me!
    > What was valuable to them was often what they learned about themselves, rather than about AI....
    > Some people attended AISC and decided that alignment research wasn't for them, which is a success in its own way. On average, I think attending made AI alignment research feel "more real," and increased peoples' conviction that they could contribute to it. Several people I talked to came away with ideas only tangentially related to their project that they were excited to work on - but of course it's hard to separate this from the fact that AISC participants are already selected for being on a trajectory of increasing involvement in AI safety.
- Also, in the 'leftovers' bucket, there is a lot of potential for tail events – where people's experiences at the camp either strongly benefit or strongly harm their future collaborations on research for preventing technology-induced existential risks.
  For example:
  - Benefits: Research into historically overlooked angles (e.g. alignment impossibility arguments, human biology-based alignment) sparks new insights and reflections that shift the paradigm within which next generation AI x-safety researchers conduct their research.
  - Harms: We serve alcohol at a fun end-of-camp party, fail at monitoring and checking in with participants, and then someone really ends up ignoring another person's needs and/or crosses their personal boundaries.
- Finally, I would make an ends-means distinction here:
  - I would agree that at past formats, the value object-level work during the program seemed proportionally smaller than the value of testing fit and networking.
  - At the same time, I and other past organisers believe that individual participants actually trying to do well-considered rigorous research together helps a lot with them making non-superficial non-fluky traction toward actually contributing at the frontiers of the community's research efforts.
For future physical editions (in Europe, US, and Asia-Pacific):
- I would guess that...
  - signalling (however we define this and its benefits/downsides) is held about constant.
  - object-level work (incl. deconfusing meta-level questions) and testing fit (incl. for working on a new research probles, if already decided on a research career) swap weights.
- I.e. 35% about object-level work, 30% about signalling, 15% about testing fit, plus different leftovers (leaving aside 'networking' and 'camp tail events').
Note that edition formats have been changing over time (as you mentioned yourself):
- The first camp was a rather grassroots format where participants already quite knowledgeable / connected / experienced in AI safety research could submit their proposals and gather into teams around a chosen research topic.
- At later editions, we admitted participants who had spent less time considering what research problems to work on, and we managed to connect at most a few teams per camp with a fitting mentor (mostly, we provided a list of willing mentors a team could reach out to after the team already had decided on a research topic).
- At the sixth edition and current edition, we finally rearranged and refined the format to serve our mentors better. The current virtual edition involved people applying to collaborate on an open research problem picked by one of our mentors (progress of the teams so far are mixed, but based on recent feedback about 3 mentors were a little negative, and 3 others were somewhat to very positive about having mentored a team).
- The next physical edition in Europe will be about creating ~6 research spaces for some individual past participants and reviewers – who are somewhat experienced at facilitating research – to invite over independent researchers to dig together into an arguably underexplored area of research (on the continuum you mentioned, AISC7 is nearer to the end of a research lab).

On your points re: parameters of AISC that make it good at some things and not others:

Length and length variability: Naturally shorter time mandates easier projects, but you can have easy projects across a wide variety of sub-fields. However, a fixed length (if somewhat short) also mandates lower-variance projects, which discourages the inherent flailing around of conceptual work and is better suited to projects that look more like engineering.

Current camp durations:
- For the yearly virtual edition (early Jan to early June), the period is roughly 5 months – from initial onboarding and discussions to research presentations and planning post-camp steps.
- For the upcoming physical edition, the period is roughly 2 months (Sep to Oct).
Some/all future editions of AISC may specialise in enabling research in less-explored areas and based on less-explored paradigms (meaning higher variance in the value of the projects' outputs).
- In which case, the length and/or intensity of research (in terms of hours per week, one-on-one interactions) at editions will go up.

Level of mentor involvement: ...The more interesting arguments against increasing supervision are that it might not reduce length variability pressure by much (mentors might have ideas that are both variable between-ideas and that require an uncertain amount of time to accomplish, similar to the participants),

This seems more likely than not to me (holding constant how 'exploratory' the area of research is).
- In part because half or more of the teams at the current virtual edition ended up exploring angles that were different than mentors had planned for.

... and might not increase the total object-level output, relative to the mentor and participants working on different topics on the margin.

This is definitely the case in the short run (i.e. over the 5-month period of the virtual edition).
I feel very unsure when it comes to the long run (i.e. 'total object-level output' over the decades following the participants' collaboration with a mentor at an edition).
- Overall, I guess the current average degree of mentor involvement is >10% more likely to increase 'total output' than not.
  - Where the reference of comparison is quick mentor feedback on the team's initial proposal and any later draft on their research findings.
  - Where average output on the upside ('increase in total') is higher than it is lower on the downside ('decrease in total') when working from established alignment research paradigms.
    - Along with a decrease in the likelihood that team outputs lead to any important paradigmatic changes in AI x-safety research.
- Also need to account for that a mentor will occasionally meet a capable independent-minded researcher at the camp and afterwards continue to collaborate with them individually (this seems probably the case for ~2 participants at the current virtual edition).

Evaluation: Should AISC be grading people or giving out limited awards to individuals?

In the early days of AISC, we discussed whether to try to evaluate participant performance during the camp so we could recommend and refer alumni to senior researchers in the community.
- We decided against internal evaluations because that could break apart the collaborative culture at the camp.
- Basically it could leave people feeling discomfortable about 'getting watched', and encourage some individuals to compete with each other to display 'their work' (also who am I kidding: for organisers to manage evaluation of performance on this variety of open problems?).

Nuances on a few of your considerations:

AISC doesn't need to be in the business of educating newbies, because it's full of people who've already spent a year or three considering AI alignment and want to try something more serious.

There are interesting trade-offs between 'get newcomers up to speed' vs. 'foster cognitive diversity':
- There are indeed more aspiring contributors now who spent multiple years considering work in AI existential safety. Also, there are now other programs specialising in bringing students and others up to speed with AI concepts and considerations, like AGI Safety Fundamentals and MLSS (and to a lesser extent, the top-university-named 'existential risk initiative' programs).
  - So agreed there that AISC does not have a comparative advantage in educating newcomers there, and also that this 'part of the pipeline' is no longer a key bottleneck.
  - We never have been in the business of educating people (despite mention of 'teach' on our website, which I've been thinking of rewriting). Rather, people self-study, apply and do work on their own initiative.
  - In this sense, AISC can offer people who self-studied or e.g. completed AGISF a way to get their hands dirty on a project at the yearly virtual camp (and from there go on to contribute to a next AISC edition, say, or apply for SERI MATS).
- On the other hand, my sense is that most of the people in the crowd you mentioned share roughly similar backgrounds (in terms of STEM disciplines and in being WEIRD: Western, Educated, from Industrialised & Democratic nations).
  - Many of the aspiring AI x-safety researchers to me appear to broadly share similar inclinations in how they analyse and perceive problems in the world around them (with corresponding blindspots – I tried to summarise some here).
  - The relatively homogenous reasoning culture of our community is concerning in the sense that where the AI x-safety community collectively shares the same blindspots (reflected in forum 'Schelling points' of topics and discussions), individuals participating will tend to overlook any crucial considerations there (in those blindspots) that are relevant for us to help prevent AI developments from destroying human society and all of biological life.
  - We as organisers look to reach out to and serve individual persons who can bring in their diverse research disciplines, skills and perspectives. We are more accommodating here in terms of how much time such diverse applicants have spent upfront reading about and engaging with AI existential safety research (given that they would have heard less about our community's work), and try where we can to assist persons individually with getting up to speed.
  - Here, your suggested 'program fit' guideline definitely applies:
    > You know at least a bit about the alignment problem - at the very least you are aware that many obvious ways to try to get what we want from AI do not actually work.

This isn't really a fixed group of people, either - new people enter by getting interested in AI safety and learning about AI, and leave when they no longer get much benefit from the fit-testing or signalling in AISC. I would guess this population leaves room for ~1 exact copy of AISC (on an offset schedule), or ~4 more programs that slightly tweak who they're appealing to.

Potential participants need to distinguish programs and work out which would serve their needs better. We as organisers need to keep org scopes clear. So I am not excited about the 'exact copy' angle (of course, it would also get forced and unrealistic if someone tries to copy over the cultural nuances and the present organisers' ways of relating with and serving participants).
I would be curious to explore ideas for new formats with anyone who noticed a gap in what AISC and other AIS research training programs do, and who is considering trying out a pilot for a new program that takes a complementary angle on the AI Safety Camp. Do message me!

[-]Karl von Wendt3y*140

As a participant, I probably don't fit the "typical" AISC profile: I'm a writer, not a researcher (even though I've got a Ph.D. in symbolic AI), I'm at the end of my career, not the beginning (I'm 61). That I'm part of AISC is due to the fact that this time, there was a "non-serious" topic included in the camp's agenda: Designing an alignment tabletop role-playing game (based on an idea by Daniel Kokotajlo). Is this a good thing?

For me, it certainly was. I came to AISC mostly to learn and get connections into the AI alignment community, and this worked very well. I feel like I know a lot less about alignment than I thought I knew at the start of the camp, which is a sure sign that I learned a lot. And I made a lot of great and inspiring contacts, even friendships, some of which I think will stay long after the camp is over. So I'm extremely happy and grateful that I had the opportunity to participate.

But what use am I to AI alignment? Well, together with another participant, Jan Kirchner, I did try to contribute an idea, but I'm not sure how helpful that is. However, one thing I can do: As a writer, I can try to raise awareness for the problem. That is the reason I participated in the first place. I see a huuuuuge gap between the importance and urgency of AI alignment and the attention it gets outside the community, among people who probably could do something about it, e.g. politicians and "established" scientists. For example, in Germany, we have the "Institut für Technikfolgenabschätzung" (ITAS) which claims on its website to be the leading institute for technology assessment. I asked them whether they are working on AI alignment. Apparently, they aren't even aware that there IS a problem. The same seems to be true for the scientific establishment in the rest of Germany and the EU.

You may question how helpful it is to get people like them to work on alignment. But I think that if we hope to solve the problem in time, we need as much attention on it as possible. There are some smart people at ITAS and elsewhere, and it would be great to get them to work on the problem, even if it seems a bit late. Maybe we need just one brilliant idea, and the more people are searching for it, the more likely it is to find it, I think. It could also be that there is no solution, in which case it is even more important that as many people as possible agree on that, the more established and accepted, the better. If we need regulation, or try to implement a global ban or freeze on AGI research, we need as much support as possible.

So that's what I'm trying to do, with my limited outreach outside of the AI alignment community. My participation in AISC taught me many things and helped me get my message straight. A lot of it will probably find its way into my next novel. And maybe our tabletop RPG will also help spreading the message. All in all, I think it was a good idea to broaden the scope of AISC a bit, and I recommend doing it again. Thank you very much, Remmelt, Daniel, and all the others for taking me in!

[-]Chris_Leong3y70

I think it's great that you're thinking about how you can use your writing skills to further alignment. If you're thinking about contacting politicians or people who are famous, I'd suggest reaching out to CEA's community health team first for advice on how to ensure this goes well.

[-]Karl von Wendt3y20

Thank you, I will!

[-]Remmelt3y*120

[-]Charlie Steiner3y30

Thanks for this mammoth comment!

[-]Remmelt3y10

Happy to. Glad to hear any follow-up thoughts you have!

[-]Remmelt3y*30

Ah, adding this here:

I personally do not tend to think of AISC as converting money and interested people into useful research. For me, that conjures up the image of a scaleable machine where you can throw in more inputs to spit out more of the output.

I view AISC more as designing processes together that tend toward better outcomes (which we can guess at but do not know about beforehand!).
Or as a journey of individual and shared exploration that people – specifically aspiring researchers– go through who are committed to ensuring unsafe AI does not destroy human society and all of biological life.

[-]Linda Linsefors2y20

I just found this post (yesterday) while searching the EA Forum archives for something else.

I've been co-organising AISC1 (2018), AISC8 (2023) and AISC9 (2024). This means that I was not involved when this was posted which is why I missed it.

What you describe fits very well with my own view of AISC, which is reassuring.

LESSWRONG
LW

LESSWRONG
LW

33

Thoughts on AI Safety Camp

33

33