Last summer I delivered a "field report" after attending the Human Level AI multi-conference. In mid-August of this year I attended the Learning-by-doing AI Safety Workshop (LBDAISW? I'll just call it "the workshop" hereafter) and the Technical AI Safety Unconference (TAISU) at the EA Hotel in Blackpool. So in a similar spirit to last year I offer you a field report of some highlights and what I took away from the experience.
I'll break it down into 3 parts: the workshop, TAISU, and the EA Hotel.
The learning by doing workshop was organized by Linda Linsefors and led by Linda and Davide Zagami. The zeroth day (so labeled because it was optional) consisted of talks by Linda and Davide explaining machine learning concepts. Although this day was optional I found it very informative because machine learning "snuck up" on me by becoming relevant after I earned my Masters in Computer Science so there have remained a number of gaps in my knowledge about how modern ML works. Having a day full of covering basics with lots of time for questions and answers was very beneficial to me, as I think it was for many of the other participants. Most of us had lumpy ML knowledge, so it was worthwhile to get us all on the same footing so we could at least talk coherently in the common language of machine learning. As I said, though, it was optional, and I think it could have easily been skipped for someone happy with their level of familiarity with ML.
The next three days were all about solving AI safety. The approach Linda took was to avoid loading people up with existing ideas, which was relevant because some of the participants had not previously thought much about AI safety, and instead asked us to try to solve AI safety afresh. The first day we did an exercise of imagining different scenarios and how we would address AI safety under those scenarios. Linda called this "sketching" solutions to AI safety, with the goal being to develop one or more sketches of how AI safety might be solved by going directly at the problem. For example, you might start out working through your basic assumptions about how AI would be dangerous, and then see where that pointed to a need for solutions, then you'd do it again but choosing different assumptions and see where it lead you. Once we had done that for a couple hours we presented our ideas about how to address AI safety. The ideas ranged from me talking about developing an adequate theory of human values as a necessary subproblem to others considering multi-agent, value learning, and decision theory subproblems to more nebulous ideas about "compassionate" AI.
The second day was for filling knowledge gaps. At first it was a little unclear what this would look like—independent study, group study, talks, something else—but we quickly settled on doing a series of talks. We identified several topics people felt they needed to know more about to address AI safety, and then the person who felt they understood that topic best gave a voluntary, impromptu talk on the subject for 30 to 60 minutes. This filled up the day as we talked about decision theory, value learning, mathematical modeling, AI forecasting as it relates to x-risks, and machine learning.
The third and final day was a repeat of the first day: we did the sketching exercise again and then presented our solutions in the afternoon. Other participants may later want to share what they came up with, but I was surprised to find myself drawn to the idea of "compassionate" AI, an idea put forward by two of the least experienced participants. I found it compelling for personal reasons, but as I thought about what it would mean for an AI to be compassionate, I realized that meant it had to act compassionately, and before I knew it I had rederived much of the original reasoning around Friendly AI and found myself reconvinced of the value of doing MIRI-style decision theory research to build safe AI. Neat!
Overall I found the workshop valuable even though I had the most years of experience thinking about AI safety of anyone there (by my count nearly 20). I found it a fun and engaging way to get me to look at problems I've been thinking about for a long time with fresh eyes, and this was especially helped by the inclusion of participants with minimal AI safety experience. I think the workshop would be a valuable use of three days for anyone actively working in AI safety, even if they consider themselves "senior" in the field: it offered a valuable space for reconsidering basic assumptions and rediscovering the reasons why we're doing what we're doing.
TAISU was a 4 day long unconference. Linda organized it as two 2 day unconferences held back-to-back, and I think this was a good choice because it forced us to schedule events with greater urgency and allowed us to easily make the second 2 days responsive to what we learned from the first 2 days. At the start of each of the 2 day segments, we met to plan out the schedule on a shared calendar where we could pin up events on pieces of paper. There were multiple rooms for multiple events to happen at once and sessions were a mix of talks, discussions, idea workshops, one-on-ones, and social events. All content was created by and for the participants, with very little of it planned extensively in advance; mostly we just got together, bounced ideas around, and talked about AI safety for 4 days.
Overall TAISU was a lot of fun and it was mercifully less dense than a typical unconference, meaning there were plenty of breaks, unstructured periods, and times when the conference single tracked. Personally I got a lot out of using it as a space to workshop ideas. I'd hold a discussion period on a topic, people would show up, I'd talk for maybe 15 minutes laying out my idea, and then they'd ask questions and discuss. I found it a great way to make rapid progress on ideas and get the details aired out, learn about objections and mistakes, and learn new things that I could take back to evolve my ideas into something better.
One of the ideas I workshopped I think I'm going to drop: AI safety via dialectic, an extension of AI safety via debate. I think getting the details worked out I was able to better realize why I'm not excited about it because I don't think AI safety via debate will work for very general reasons, and the specific things I thought I could do to improve it by replacing debate with dialectic would not be enough to overcome the weaknesses I see. Another was better working out compassionate AI, further reaffirming my thought that it was a rederivation of Friendly AI. A third I just posted about: a predictive coding theory of human values.
The EA Hotel
It's a bit hard to decide on how much detail to give about the EA Hotel. On the one hand, it was awesome, full stop. On the other, it was awesome for lots of little reasons I could never hope to fully recount. I feel like their website fails to do them justice. It's an awesome place filled with cool people trying their best to save the world. Most of the folks at the Hotel are doing work that is difficult to measure, but spending time with them I can tell they all have a powerful intention to make the world a better place and to do so in ways that are effective and impactful.
Blackpool is nice in the summer (I hear the weather gets worse other times of year). The Hotel itself is old and small but also bigger than you would expect from the outside. Greg and the staff have done a great job renovating and improving the space to make it nice to stay in. Jacob, who here I'll call "the cook" but he does a lot more, and Deni, the community manager, do a great job of making the EA Hotel feel like a home and bringing the folks in it together. When I was there it was easy to imagine myself staying there for a few months to work on projects without the distraction of a day job.
I hope to be able to visit again, maybe next year for TAISU2!
Disclosure: I showed a draft of this to Linda to verify facts. All mistakes, opinions, and conclusions are my own.