Program Coordinator of AI Safety Camp.

Wiki Contributions


Thoughts on AI Safety Camp

Happy to. Glad to hear any follow-up thoughts you have!

Thoughts on AI Safety Camp

Ah, adding this here: 

I personally do not tend to think of AISC as converting money and interested people into useful research. For me, that conjures up the image of a scaleable machine where you can throw in more inputs to spit out more of the output. 

  • I view AISC more as designing processes together that tend toward better outcomes (which we can guess at but do not know about beforehand!). 
  • Or as a journey of individual and shared exploration that people – specifically aspiring researchers–  go through who are committed to ensuring unsafe AI does not destroy human society and all of biological life. 
Thoughts on AI Safety Camp

I am the program coordinator of AI Safety Camp. Let me respond with personal impressions / thoughts:

Apologies, Charlie, that we did not get to call before you wrote this post. Busy months for me, and I had misinterpreted your request as you broadly reaching out to interview organisers of various programs.

First, respect for the thoroughness and consideration of your writing:

  • It is useful to get an outside perspective of how AI Safety Camp works for participants.
    • In this sense, I am glad that we as organisers did not get to talk with you yet, which might have 'anchored' this post more on our notions of what the camp is about.
      • Hoping that you and I can still schedule a reverse interview, where I can listen to and learn from your ideas!
    • Noting that we also welcome honest criticism of AI Safety Camp that could help us rethink or improve the format and/or the way we coordinate editions. 
      • I would personally value if someone could do background research at least half as well as Charlie and play devil's advocate:  come up with arguments against AISC's current design or 'set parameters' being any good for helping certain (potential) participants to contribute to AI existential safety.  
      • Write a quality post and it will get my strong upvote at least!
  • Glad to have your ideas on parameters to tweak and what to consider focussing on doing well so we can serve new participants better (to come to contribute at the frontiers of preventing the existential risk posed by AI developments).
    • For example, you made me think that maybe the virtual edition could be adapted to cater for remote ML engineering teams in particular. 
    • Where conceptual research in a group setting may just tend to work better through spontaneous chats and flip-chart scribbles at a physical retreat. 
  • I find myself broadly agreeing with most of your descriptions of whom the camp is for and how we serve our participants.


On ways the camp serves participants looking to contribute to AI x-safety research:

35% about testing fit, 30% about signalling, and 15% about object-level work, plus different leftovers.

  • The relative weighting above matches my impressions, at least for past editions (AISC 1-5).
    • Having said that, making connections with other aspiring researchers (fellow participants, organisers, speakers, research authors) mattered a lot for some alumni's career trajectories.
      • I am not sure how to even introduce a separate weight for 'networking' given the overlap with 'signalling' and 'testing fit' and leftovers like 'heard about a grant option; started an org'.
      • BTW your descriptions related to 'testing fit' resonated with me!
        > What was valuable to them was often what they learned about themselves, rather than about AI....
        > Some people attended AISC and decided that alignment research wasn't for them, which is a success in its own way. On average, I think attending made AI alignment research feel "more real," and increased peoples' conviction that they could contribute to it. Several people I talked to came away with ideas only tangentially related to their project that they were excited to work on - but of course it's hard to separate this from the fact that AISC participants are already selected for being on a trajectory of increasing involvement in AI safety.
    • Also, in the 'leftovers' bucket, there is a lot of potential for tail events – where people's experiences at the camp either strongly benefit or strongly harm their future collaborations on research for preventing technology-induced existential risks. 
      For example: 
      • Benefits:  Research into historically overlooked angles (e.g. alignment impossibility arguments, human biology-based alignment) sparks new insights and reflections that shift the paradigm within which next generation AI x-safety researchers conduct their research.
      • Harms:  We serve alcohol at a fun end-of-camp party, fail at monitoring and checking in with participants, and then someone really ends up ignoring another person's needs and/or crosses their personal boundaries. 
    • Finally, I would make an ends-means distinction here:
      • I would agree that at past formats, the value object-level work during the program seemed proportionally smaller than the value of testing fit and networking.
      • At the same time, I and other past organisers believe that individual participants actually trying to do well-considered rigorous research together helps a lot with them making non-superficial non-fluky traction toward actually contributing at the frontiers of the community's research efforts. 
  • For future physical editions (in Europe, US, and Asia-Pacific):  
    • I would guess that...
      • signalling (however we define this and its benefits/downsides) is held about constant. 
      • object-level work (incl. deconfusing meta-level questions) and testing fit (incl. for working on a new research probles, if already decided on a research career) swap weights. 
    • I.e. 35% about object-level work, 30% about signalling, 15% about testing fit, plus different leftovers (leaving aside 'networking' and 'camp tail events').
  • Note that edition formats have been changing over time (as you mentioned yourself): 
    • The first camp was a rather grassroots format where participants already quite knowledgeable / connected / experienced in AI safety research could submit their proposals and gather into teams around a chosen research topic.
    • At later editions, we admitted participants who had spent less time considering what research problems to work on, and we managed to connect at most a few teams per camp with a fitting mentor (mostly, we provided a list of willing mentors a team could reach out to after the team already had decided on a research topic).
    • At the sixth edition and current edition, we finally rearranged and refined the format to serve our mentors better. The current virtual edition involved people applying to collaborate on an open research problem picked by one of our mentors (progress of the teams so far are mixed, but based on recent feedback about 3 mentors were a little negative, and 3 others were somewhat to very positive about having mentored a team).
    • The next physical edition in Europe will be about creating ~6 research spaces for some individual past participants and reviewers – who are somewhat experienced at facilitating research – to invite over independent researchers to dig together into an arguably underexplored area of research (on the continuum you mentioned,  AISC7 is nearer to the end of a research lab).


On your points re: parameters of AISC that make it good at some things and not others:

Length and length variability: Naturally shorter time mandates easier projects, but you can have easy projects across a wide variety of sub-fields. However, a fixed length (if somewhat short) also mandates lower-variance projects, which discourages the inherent flailing around of conceptual work and is better suited to projects that look more like engineering.

  • Current camp durations:
    • For the yearly virtual edition (early Jan to early June), the period is roughly 5 months – from initial onboarding and discussions to research presentations and planning post-camp steps.
    • For the upcoming physical edition, the period is roughly 2 months (Sep to Oct).
  • Some/all future editions of AISC may specialise in enabling research in less-explored areas and based on less-explored paradigms (meaning higher variance in the value of the projects' outputs).
    • In which case, the length and/or intensity of research (in terms of hours per week, one-on-one interactions) at editions will go up.

Level of mentor involvement: ...The more interesting arguments against increasing supervision are that it might not reduce length variability pressure by much (mentors might have ideas that are both variable between-ideas and that require an uncertain amount of time to accomplish, similar to the participants), 

  • This seems more likely than not to me (holding constant how 'exploratory' the area of research is). 
    • In part because half or more of the teams at the current virtual edition ended up exploring angles that were different than mentors had planned for. 

... and might not increase the total object-level output, relative to the mentor and participants working on different topics on the margin.

  • This is definitely the case in the short run (i.e. over the 5-month period of the virtual edition). 
  • I feel very unsure when it comes to the long run (i.e. 'total object-level output' over the decades following the participants' collaboration with a mentor at an edition). 
    • Overall, I guess the current average degree of mentor involvement is >10% more likely to increase 'total output' than not. 
      • Where the reference of comparison is quick mentor feedback on the team's initial proposal and any later draft on their research findings. 
      • Where average output on the upside ('increase in total') is higher than it is lower on the downside ('decrease in total') when working from established alignment research paradigms.
        • Along with a decrease in the likelihood that team outputs lead to any important paradigmatic changes in AI x-safety research.
    • Also need to account for that a mentor will occasionally meet a capable independent-minded researcher at the camp and afterwards continue to collaborate with them individually (this seems probably the case for ~2 participants at the current virtual edition). 

Evaluation: Should AISC be grading people or giving out limited awards to individuals?

  • In the early days of AISC, we discussed whether to try to evaluate participant performance during the camp so we could recommend and refer alumni to senior researchers in the community. 
    • We decided against internal evaluations because that could break apart the collaborative culture at the camp. 
    • Basically it could leave people feeling discomfortable about 'getting watched', and encourage some individuals to compete with each other to display 'their work' (also who am I kidding:  for organisers to manage evaluation of performance on this variety of open problems?).


Nuances on a few of your considerations:

AISC doesn't need to be in the business of educating newbies, because it's full of people who've already spent a year or three considering AI alignment and want to try something more serious.

  • There are interesting trade-offs between 'get newcomers up to speed' vs. 'foster cognitive diversity':
    • There are indeed more aspiring contributors now who spent multiple years considering work in AI existential safety.  Also, there are now other programs specialising in bringing students and others up to speed with AI concepts and considerations, like AGI Safety Fundamentals and MLSS (and to a lesser extent, the top-university-named 'existential risk initiative' programs).
      • So agreed there that AISC does not have a comparative advantage in educating newcomers there, and also that this 'part of the pipeline' is no longer a key bottleneck.
      • We never have been in the business of educating people (despite mention of 'teach' on our website, which I've been thinking of rewriting). Rather, people self-study, apply and do work on their own initiative.
      • In this sense, AISC can offer people who self-studied or e.g. completed AGISF a way to get their hands dirty on a project at the yearly virtual camp (and from there go on to contribute to a next AISC edition, say, or apply for SERI MATS).
    • On the other hand, my sense is that most of the people in the crowd you mentioned share roughly similar backgrounds (in terms of STEM disciplines and in being WEIRD: Western, Educated, from Industrialised & Democratic nations). 
      • Many of the aspiring AI x-safety researchers to me appear to broadly share similar inclinations in how they analyse and perceive problems in the world around them (with corresponding blindspots – I tried to summarise some here).
      • The relatively homogenous reasoning culture of our community is concerning in the sense that where the AI x-safety community collectively shares the same blindspots (reflected in forum 'Schelling points' of topics and discussions), individuals participating will tend to overlook any crucial considerations there (in those blindspots) that are relevant for us to help prevent AI developments from destroying human society and all of biological life.
      • We as organisers look to reach out to and serve individual persons who can bring in their diverse research disciplines, skills and perspectives. We are more accommodating here in terms of how much time such diverse applicants have spent upfront reading about and engaging with AI existential safety research (given that they would have heard less about our community's work), and try where we can to assist persons individually with getting up to speed.
      • Here, your suggested  'program fit' guideline definitely applies:
        > You know at least a bit about the alignment problem - at the very least you are aware that many obvious ways to try to get what we want from AI do not actually work.

This isn't really a fixed group of people, either - new people enter by getting interested in AI safety and learning about AI, and leave when they no longer get much benefit from the fit-testing or signalling in AISC. I would guess this population leaves room for ~1 exact copy of AISC (on an offset schedule), or ~4 more programs that slightly tweak who they're appealing to.

  • Potential participants need to distinguish programs and work out which would serve their needs better. We as organisers need to keep org scopes clear.  So I am not excited about the 'exact copy' angle (of course, it would also get forced and unrealistic if someone tries to copy over the cultural nuances and the present organisers' ways of relating with and serving participants).
  • I would be curious to explore ideas for new formats with anyone who noticed a gap in what AISC and other AIS research training programs do, and who is considering trying out a pilot for a new program that takes a complementary angle on the AI Safety Camp.  Do message me!
AI Safety Camp

Ah, found the other page, and I see it is already put under the category ‘Artificial Intelligence’, under the heading ‘Organizations’:

Thanks for the help getting this sorted, @Plex.

AI Safety Camp

Do you mean as a sub-sub-tag? I think that would be good idea.

I have looked at the LessWrong tag manager, but still do not know how to do it. Any tips?

(if the idea is to merge it with the Organizations tag, I am biased of course, but there are enough posts tagged AI Safety Camp to warrant it being tagged as a distinguishable organisation)

Productive Mistakes, Not Perfect Answers

Yeah, that points well to what I meant. I appreciate your generous intellectual effort here to paraphrase back!

Sorry about my initially vague and disagreeable comment (aimed at Adam, who I chat with sometimes as a colleague). I was worried about what looks like a default tendency in the AI existential safety community to start from the assumption that problems in alignment are solvable.

Adam has since clarified with me that although he had not written about it in the post, he is very much open to exploring impossibility arguments (and sent me a classic paper on impossibility proofs in distributed computing).

Productive Mistakes, Not Perfect Answers

… making your community and (in this case) the wider world fragile to reality proving you wrong.

Productive Mistakes, Not Perfect Answers

We don't have any proofs that the approaches the referenced researchers are doomed to fail like we have for P!=NP and what you linked.

Besides looking for different angles or ways to solve alignment, or even for strong arguments/proofs why a particular technique will not solve alignment,
... it seems prudent to also look for whether you can prove embedded misalignment by contradiction (in terms of the inconsistency of the inherent logical relations between essential properties that would need to be defined as part of the concept of embedded/implemented/computed alignment).

This is analogous to the situation Hilbert and others in the Vienna circle found themselves in trying to 'solve for' mathematical models being (presumably) both complete and consistent. Gödel, who was a semi-outsider, instead took the inverse route of proving by contradiction that a model cannot be simultaneously complete and consistent. 

If you have an entire community operating under the assumption that a problem is solvable or at least resolving to solve the problem in the hope that it is solvable, it seems epistemically advisable to have at least a few oddballs attempting to prove that the problem is unsolvable. 

Otherwise you end up skewing your entire 'portfolio allocation' of epistemic bets.

Productive Mistakes, Not Perfect Answers

In the end you do want to solve the problem, obviously. But the road from here to there goes through many seemingly weird and insufficient ideas that are corrected, adapted, refined, often discarded except for a small bit. Alignment is no different, including “strong” alignment.

There is an implicit assumption here that is not covering all the possible outcomes of research progress.

With progress on understanding some open problems in mathematics and computer science, they have turned out unsolvable. That is a valuable, decision-relevant conclusion. It means it is better to do something else than keep hacking away at solving that maths problem.


We cannot just rely on a can-do attitude, as we can with starting a start-up (where even if there’s something fundamentally wrong about the idea, and it fails, only a few people’s lives are impacted hard).

With ‘solving for’ the alignment of generally intelligent and scalable/replicable machine algorithms, it is different.

This is the extinction of human society and all biological life we are talking about. We need to think this through rationally, and consider all outcomes of our research impartially.

I appreciate the emphasis on diverse conceptual approaches. Please, be careful in what you are looking for.

Some blindspots in rationality and effective altruism

This seems to presume that a certain literal interpretation of that text is the only one that could be intended or interpreted. I don't think this is worth discussing this further, so leaving it at that.

Load More