You are faster than me, I took a few years before realizing I should probably be doing something different with my life than just proving math theorems. I guess talking with my advisors was easier because they don't have any domain knowledge in AI so they didn't have any real opinion about whether AI safety is important. Anyway, I am doing a summer internship at MIRI, so maybe I will see you around.
Awesome story!
This idea is cool, but it's probably secretly terrible. I have limited familiarity with the field and came up with it after literally twenty minutes of thinking? My priors say that it's either already been done, or that it's obviously flawed.
Related post on "obvious" ideas.
Clearly, the outside view is that most graduate students who have this kind of professional disagreement with an advisor are mistaken and later, regretful [5].
Is it really? Geoff Hinton thinks the future depends on some graduate student who is deeply suspicious of everything he says.
So this week's advice is obvious advice, but useful nonetheless: find a way to gain a reflex to actually do all the obvious preparation, before undertaking a new task or making a big decision.
I've read (and apparently internalized) this, but I forgot to cite it! Yet another post illustrating why Nate is my favorite author when it comes to instrumental rationality.
It's hard to believe how much I've grown and grown up in these last few months, and how nearly every change was borne of deliberate application of the Sequences.
I didn't sacrifice my grades, work performance, physical health, or my social life to do this. I sacrificed something else.
CHAI For At Least Five Minutes
January-Trout had finished the Sequences and was curious about getting involved with AI safety. Not soon, of course - at the time, I had a narrative in which I had to labor and study for long years before becoming worthy. To be sure, I would never endorse such a narrative - Something to Protect, after all - but I had it.
I came across several openings, including a summer internship at Berkeley's Center for Human-Compatible AI. Unfortunately, the posting indicated that applicants should have a strong mathematical background (uh) and that a research proposal would be required (having come to terms with the problem mere weeks before, I had yet to read a single result in AI safety).
I opened Concrete Problems in AI Safety, saw 29 pages of reading, had less than 29 pages of ego to deplete, and sat down.
At that moment, I lost all respect for these problems and set myself to work on the one I found most interesting. I felt the contours of the challenge take shape in my mind, sensing murky uncertainties and slight tugs of intuition. I concentrated, compressed, and compacted my understanding until I realized what success would actually look like. The idea then followed trivially [2].
Reaching the porch of my home, I turned to the sky made iridescent by the setting sun.
Skepticism
Terrified that this idea would become my baby, I immediately plotted its murder. Starting from the premise that it was insufficient even for short-term applications (not even in the limit), I tried to break it with all the viciousness I could muster. Not trusting my mind to judge sans rose-color, I coded and conducted experiments; the results supported my idea.
I was still suspicious, and from this suspicion came many an insight; from these insights, newfound invigoration. Being the first to view the world in a certain way isn't just a rush - it's pure joie de vivre.
Risk Tolerance
I'm taking an Uber with Anna Salamon back to her residence, and we're discussing my preparations for technical work in AI safety. With one question, she changes the trajectory of my professional life:
There's the question I dare not pose, hanging exposed, in the air. It scares me. I acknowledge a potential status quo bias, but express uncertainty about my ability to do anything about it. To be sure, that work is important and conducted by good people whom I respect. But it wasn't right for me.
We reach her house and part ways; I now find myself in an unfamiliar Berkeley neighborhood, the darkness and rain pressing down on me. There's barely a bar of reception on my phone, and Lyft won't take my credit card. I just want to get back to the CFAR house. I calm my nerves (really, would Anna live somewhere dangerous?), absent-mindedly searching for transportation as I reflect. In hindsight, I felt a distinct sense of avoiding-looking-at-the-problem, but I was not yet strong enough to admit even that.
A week later, I get around to goal factoring and internal double cruxing this dilemma.
I realize that I'm out of alignment with what I truly want - and will continue to be for four years if I do nothing. On the other hand, my advisor disagrees about the importance of preparing safety measures for more advanced agents, and I suspect that they would be unlikely to support a change of research areas. I also don't want to just abandon my current lab.
Soon after, I receive CHAI's acceptance email, surprise and elation washing over me. I feel uneasy; it's very easy to be reckless in this kind of situation.
Information Gathering
I knew the importance of navigating this situation optimally, so I worked to use every resource at my disposal. There were complex political and interpersonal dynamics at play here; although I consider myself competent in these considerations, I wanted to avoid even a single preventable error.
I contacted friends on the CFAR staff, interfaced with my university's confidential resources, and reached out to contacts I had made in the rationality community. I posted to the CFAR alumni Google group, receiving input from AI safety researchers around the world, both at universities and at organizations like FLI and MIRI [4].
Gears Integrity
At the reader's remove, this choice may seem easy. Obviously, I meet with my advisor (whom I still admire, despite this specific disagreement), tell them what I want to pursue, and then make the transition.
Sure, gears-level models take precedence over expert opinion. I have a detailed model of why AI safety is important; if I listen carefully and then verify the model's integrity against the expert's objections, I should have no compunctions about acting.
I noticed a yawning gulf between privately disagreeing with an expert, disagreeing with an expert in person, and disagreeing with an expert in person in a way that sets back my career if I'm wrong. Clearly, the outside view is that most graduate students who have this kind of professional disagreement with an advisor are mistaken and later, regretful [5]. Yet, argument screens off authority, and
Fin
Many harrowing days and nights later, we arrive at the present, concluding this chapter of my story. This summer, I will be collaborating with CHAI, working under Dylan Hadfield-Menell and my new advisor to extend both Inverse Reward Design and Whitelist Learning (the latter being my proposal to CHAI; I plan to make a top-level post in the near future) [6].
Forwards
I sacrificed some of my tethering to the social web, working my way free of irrelevant external considerations, affirming to myself that I will look out for my interests. When I first made that affirmation, I felt a palpable sense of relief. Truly, if we examine our lives with seriousness, what pressures and expectations bind us to arbitrary social scripts, to arbitrary identities - to arbitrary lives?
[1] My secret to being able to continuously soak up math is that I enjoy it. However, it wasn't immediately obvious that this would be the case, and only the intensity of my desire to step up actually got me to start studying. Only then, after occupying myself in earnest with those pages of Greek glyphs, did I realize that it's fun.
[2] This event marked my discovery of the mental movement detailed in How to Dissolve It; it has since paid further dividends in both novel ideas and clarity of thought.
[3] I've since updated away from this being true for humans in practice, but I felt it would be dishonest to edit my thought process after the fact.
Additionally, I did not fit any aspect of this story to the Sequences post factum; every reference was explicitly considered at the time (e.g., remembering that specific post on how people don't usually give a serious effort even when everything may be at stake).
[4] I am so thankful to everyone who gave me advice. Summarizing for future readers:
If you're navigating this situation, are interested in AI safety but want some direction, or are looking for a community to work with, please feel free to contact me.
[5] I'd like to emphasize that support for AI safety research is quickly becoming more mainstream in the professional AI community, and may soon become the majority position (if it is not already).
Even though ideas are best judged by their merits and not by their popular support, it can be emotionally important in these situations to remember that if you are concerned, you are not on the fringe. For example, 1,273 AI researchers have publicly declared their support for the Future of Life Institute's AI principles.
[6] Objectives are subject to change.