I've been an assistant professor (equivalent) for ~1 year now at Cambridge.  Shortly after accepting the position, I wrote AI x-risk reduction: why I chose academia over industry

Since then, I've had a lot of conversations on academia vs. industry with people getting into AI x-safety (e.g. considering applying for PhDs).  This post summarizes that experience, and describes a few other updates from my experience in the last 1.5 years.

Summary of recent conversations:

  • Most people haven't read my previous post and I point them to it.
  • A common attitude is: "Industry seems better, WTF would you do academia right now?" 
  • A perhaps equally common attitude is: "I am new and have no strong opinions, just trying to figure out what people think."
  • The main reasons I hear for industry over academia are:
    • Short timelines
    • Need to access Foundation models
    • Academic incentives to work on topics less relevant to x-safety. 
  • I think these are all valid points, but I there are countervailing considerations.  My response to these 3 points has generally been something like:
    • Academia still seems like the best option ATM for rapidly training and credentialing people.  Even under fairly short timelines, this seems likely to be more valuable than direct work.  Furthermore, it is a mistake to simply focus on efforts on whatever timelines seem most likely; one should also consider tractability and neglectedness of strategies that target different timelines.  It seems plausible that we are just screwed on short timelines, and somewhat longer timelines are more tractable.  Also, people seem to be making this mistake a lot and thus short timelines seem potentially less neglected.
    • There are and will be open source foundation models.  Organizations like Anthropic seem keen on collaborating and providing access (although we haven't yet pitched them anything concrete).  It's not clear that you have that much better access when you are working out one of these orgs (I'd be curious what people at the orgs think about this one!): my impression is that it is still clunky to do a lot of things with a large model when you are at an org, and things like retraining are obviously very expensive; these orgs seem to also favor large group projects, which I assume are directed by leadership; so in practice, there might be less difference between being entry-level at an org vs. being in academia.
    • The incentives are real, and may be worth playing into somewhat if you want to get a faculty job.  However, it is becoming easier and easier to work on safety in academia; safety topics are going mainstream and being published at top conferences.  Work that is still outside the academic Overton window can be brought into academia if it can be approached with the technical rigor of academia, and work that meets academic standards is much more valuable than work that doesn't; this is both because it can be picked up by the ML community, and because it's much harder to tell if you are making meaningful progress if your work doesn't meet these standards of rigor.  There's also a good chance that within ~5 years they will be mainstream enough that it's now a good career move to focus on safety as an academic.  Also, depending on your advisor, you can basically have total freedom in academia (and funding tends to give freedom).  Finally, the incentives outside of academia are not great either: for-profit orgs are incentivized to build stuff, whether it's a good idea or not; because LW/AF do not have established standards of rigor like ML, they end up operating more like a less-functional social science field, where (I've heard) trends, personality, and celebrity play an outsized role in determining which research is valorized by the field. 

Other updates:

  • Overall, I'm enjoying being a professor tremendously!  
  • In particular, it's been way better than being a grad student, and I have been reflecting a bit on how tough it was to be doing my graduate studies somewhere where people didn't really understand or care about AI x-safety.  I think this is a very important consideration for anyone thinking about starting a grad program, or going to work somewhere where they will not have colleagues interested in AI x-safety.   I suggested to a few people planning on starting grad school that they should try and coordinate so that they end up in the same place(s).
  • Teaching has been less work than expected; other duties (especially grading/marking) have been more work than expected.  Overall, the amount of non-research work I have to do is about what I expected so far, but there's been a bit more admin headaches than expected.  I'm intending to get more help with that from a PA or lab manager.
  • I've enjoyed having students to do research with about as much as I thought I would.  It's great!
  • There's a policy I wasn't aware of that I can't take more than 8 PhD students at once.  There are ways around it, but this is perhaps my main complaint so far.I have not been funding bottlenecked, and don't expect to be anytime soon.
  • As mentioned above, orgs seem keen on granting access to foundation models to academic collaborators.
  • Several relatively senior people from the broader ML community have approached me with their concerns about AI x-safety.  Overall, I increasingly have the impression that the broader ML community is on the cusp of starting to take this seriously, but don't know what to do about it.  I'm of the opinion that nobody really knows what to do about it; I think most things people in the AI x-safety community do are reasonable, but none of them look that promising.  I would characterize these ML researchers as rightfully skeptical of solutions proposed by the AI x-safety community (while coming up with some similar ideas, e.g. things along the lines of scalable oversight), confused about why the community focuses on the particular set of technical problems it has, skeptical that technical work will solve the problem, and ignorant of AI x-safety literature.  Any scalable approach to following would be extremely valuable: i) creating common knowledge that ML researchers are increasingly worried, ii) creating good ways for them to catch-up on the AI x-safety literature, and/or ii) soliciting novel ideas from them.  
  • EtA: I'll add more stuff below as I think of it...
  • One thing which has made me reconsider academia is the large amount of funding available at present; It seems worth thinking about how to spend on the order of $10m+/year, whereas I'm estimating I'll be spending more like $1m/year as faculty.
  • I've been increasingly keen on working with foundation models and this hasn't happened as much as I would like.  Some possible reasons and limitations of OpenAI API are listed here: https://docs.google.com/document/d/18eqLciwWTnuxbNZ28eLEle34OoKCcqqF0OfZjy3DlFs/edit?usp=sharing
  • I didn't originally consider non-tenure-track (TT) jobs, but they have significant appeal: at least at Cambridge, you can be non-TT and still be a principle investigator (PI), meaning you can supervise students and manage a research group.  But you don't have to teach, and may have less admin duties as well.  The only downsides I can think of are less prestige and less job security.  I think having reliable external funding probably helps a lot with job security.
     

 


 

New Comment
18 comments, sorted by Click to highlight new comments since:
[-]Rohin ShahΩ14273

(I'd be curious what people at the orgs think about this one!)

For DeepMind:

my impression is that it is still clunky to do a lot of things with a large model when you are at an org

Mostly false: it's clunky inasmuch as working with large models is generally clunky (e.g. they may not fit on a single device), but even then for common use cases you can use the solution that other people wrote.

things like retraining are obviously very expensive

True

these orgs seem to also favor large group projects

True (relative to academia)

which I assume are directed by leadership

Mostly false

because LW/AF do not have established standards of rigor like ML, they end up operating more like a less-functional social science field, where (I've heard) trends, personality, and celebrity play an outsized role in determining which research is valorized by the field. 

In addition, the AI x-safety field is now rapidly expanding. 
There is a huge amount of status to be collected by publishing quickly and claiming large contributions.

In the absence of rigor and metrics, the incentives are towards:
- setting new research directions, and inventing new cool terminology;
- using mathematics in a way that impresses, but is too low-level to yield a useful claim;
- and vice versa, relying too much on complex philosophical insights without empirical work
- getting approval from alignment research insiders.

See also the now ancient Troubling Trends in Machine Learning Scholarship.
I expect the LW/AF community microcosm will soon reproduce many of of those failures.

On the other hand, the current community believes that getting AI x-safety right is the most important research question of all time. Most people would not publish something just for their career advancement, if it meant sucking oxygen from more promising research directions.

This might be a mitigating factor for my comment above. I am curious about what happened research fields which had "change/save the world' vibes. Was environmental science immune to similar issues?

I actually agree that empirical work generally outperforms theoretical work or philosophical work, but in that tweet thread I question why he suggests the Turing Test as relating anything to x-risk.

Work that is still outside the academic Overton window can be brought into academia if it can be approached with the technical rigor of academia, and work that meets academic standards is much more valuable than work that doesn't; this is both because it can be picked up by the ML community, and because it's much harder to tell if you are making meaningful progress if your work doesn't meet these standards of rigor.

Strong agreement with this! I'm frequently told by people that you "cannot publish" on a certain area, but in my experience this is rarely true. Rather, you have to put more work into communicating your idea, and justifying the claims you make -- both a valuable exercise! Of course you'll have a harder time publishing than on something that people immediately understand -- but people do respect novel and interesting work, so done well I think it's much better for your career than one might naively expect.

I especially wish there was more emphasis on rigor on the Alignment Forum and elsewhere: it can be valuable to do early-stage work that's more sloppy (rigor is slow and expensive), but when there's long-standing disagreements it's usually better to start formalizing things or performing empirical work than continuing to opine.

That said, I do think academia has some systemic blindspots. For one, I think CS is too dismissive of speculative and conceptual research -- much of this work will end up being mistaken admittedly, but it's an invaluable source of ideas. I also think there's too much emphasis on an "algorithmic contribution" in ML, which leads to undervaluing careful empirical valuations and understanding failure modes of existing systems.

Presumably "too dismissive of speculative and conceptual research" is a direct consequence of increased emphasis on rigor. Rigor is to be preferred all else being equal, but all else is not equal.

It's not clear to me how we can encourage rigor where effective without discouraging research on areas where rigor isn't currently practical. If anyone has ideas on this, I'd be very interested.

I note that within rigorous fields, the downsides of rigor are not obvious: we can point to all the progress made; progress that wasn't made due to the neglect of conceptual/speculative research is invisible. (has the impact of various research/publication norms ever been studied?)

Further, it seems limiting only to consider [must always be rigorous (in publications)] vs [no demand for rigor]. How about [50% of your publications must be rigorous] (and no incentive to maximise %-of-rigorous-publications), or any other not-all-or-nothing approach?

I'd contrast rigor with clarity here. Clarity is almost always a plus.
I'd guess that the issue in social science fields isn't a lack of rigor, but rather of clarity. Sometimes clarity without rigor may be unlikely, e.g. where there's a lot of confusion or lack of good faith - in such cases an expectation of rigor may help. I don't think this situation is universal.

What we'd want on LW/AF is a standard of clarity.
Rigor is an often useful proxy. We should be careful when incentivizing proxies.

I think rigor and clarity are more similar than you indicate.  I mostly think of rigor as either (i) formal definitions and proofs, or (ii) experiments well described, executed, and interpreted.  I think it's genuinely hard to reach a high level of clarity about many things without (i) or (ii).  For instance, people argue about "optimization", but without referencing (hypothetical) detailed experiments or formal notions, those arguments just won't be very clear; experimental or definitional details just matter a lot, and this is very often the case in AI.  LW has historically endorsed a bunch of arguments that are basically just wrong because they have a crucial reliance on unstated assumptions (e.g. AIs will be "rational agents"), and ML looks at this and concludes people on LW are at the peak of "mount stupid".

[-]LinchΩ8166

Minor, but Dunning-Kruger neither claims to detect a Mount Stupid effect nor (probably) is the study powered enough to detect it.

Very good to know!  I guess in the context of my comment it doesn't matter as much because I only talk about others' perception.

I think I would support Joe's view here that clarity and rigour are significantly different... but maybe - David - your comments are supposed to be specific to alignment work? e.g. I can think of plenty of times I have read books or articles in other areas and fields that contain zero formal definitions, proofs, or experiments but are obviously "clear", well-explained, well-argued etc. So by your definitions is that not a useful and widespread form of rigour-less clarity? (One that we would want to 'allow' in alignment work?) Or would you instead maintain that such writing can't ever really be clear without proofs or experiments?

I tend to think one issue is more that it's really hard to do well (clear, useful, conceptual writing that is) and that many of the people trying to do it in alignment come from are inexperienced in doing it (and often have backgrounds in fields where things like proofs or experiments are the norm).

(To be clear, I think a lot of these arguments are pointing at important intuitions, and can be "rescued" via appropriate formalizations and rigorous technical work).

Mostly I agree with this.
I have more thoughts, but probably better to put them in a top-level post - largely because I think this is important and would be interested to get more input on a good balance.

A few thoughts on LW endorsing invalid arguments:
I'd want to separate considerations of impact on [LW as collective epistemic process] from [LW as outreach to ML researchers]. E.g. it doesn't necessarily seem much of a problem for the former to have reliance on unstated assumptions. I wouldn't formally specify an idea before sketching it, and it's not clear to me that there's anything wrong with collective sketching (so long as we know we're sketching - and this part could certainly be improved).
I'd first want to optimize the epistemic process, and then worry about the looking foolish part. (granted that there are instrumental reasons not to look foolish)

On ML's view, are you mainly thinking of people who may do research on an important x-safety sub-problem without necessarily buying x-risk arguments? It seems unlikely to me that anyone gets persuaded of x-risk from the bottom up, whether or not the paper/post in question is rigorous - but perhaps this isn't required for a lot of useful research?

I'd want to separate considerations of impact on [LW as collective epistemic process] from [LW as outreach to ML researchers]

 

Yeah I put those in one sentence in my comment but I agree that they are two separate points.

RE impact on ML community: I wasn't thinking about anything in particular I just think the ML community should have more respect for LW/x-safety, and stuff like that doesn't help.

It's not clear to me how we can encourage rigor where effective without discouraging research on areas where rigor isn't currently practical. If anyone has ideas on this, I'd be very interested.


A rough heuristic I have is that if the idea you're introducing is highly novel, it's OK to not be rigorous. Your contribution is bringing this new, potentially very promising, idea to people's attention. You're seeking feedback on how promising it really is and where people are confused , which will be helpful for then later formalizing it and studying it more rigorously.

But if you're engaging with a large existing literature and everyone seems to be confused and talking past each other (which I'd characterize a significant fraction of the mesa-optimization literature, for example) -- then the time has come to make things more rigorous, and you are unlikely to make much further progress without it.

I think part of this has to do with growing pains in the LW/AF community... When it was smaller it was more like an ongoing discussion with a few people and signal-to-noise wasn't as important, etc. 

Agree RE systemic blindspots, although the "algorithmic contribution" thing is sort of a known issue that a lot of senior people disagree with, IME.

As someone who has been feeling increasingly skeptical of working in academia I really appreciate this post and discussion on it for challenging some of my thinking here. 

I do want to respond especially to this part though, which seems cruxy to me:

Furthermore, it is a mistake to simply focus on efforts on whatever timelines seem most likely; one should also consider tractability and neglectedness of strategies that target different timelines. It seems plausible that we are just screwed on short timelines, and somewhat longer timelines are more tractable. Also, people seem to be making this mistake a lot and thus short timelines seem potentially less neglected.

I suspect this argument pushes in the other direction. On longer timelines the amount of effort which will eventually get put toward the problem is much greater. If the community continues to grow at the current pace, then 20 year timeline worlds might end up seeing almost 1000x as much effort put toward the problem in total than 5 year timeline worlds. So neglectedness considerations might tell us that impacts on 5 year timeline worlds are 1000x more important than impacts on 20 year timeline worlds. This is of course mitigated by the potential for your actions to accrue more positive knock-on effects over 20 years, for instance very effective field building efforts could probably overcome this neglectedness penalty in some cases. But in terms of direct impacts on different timeline scenarios this seems like a very strong effect.

On the tractability point, I suspect you need some overly confident model of how difficult alignment turns out to be for this to overcome the neglectedness penalty. E.g. Owen Cotton-Barret suggests here using a log-uniform prior for the difficulty of unknown problems, which (unless you think alignment success in short timelines is essentially impossible) would indicate that tractability is constant. Using a less crude approximation we might use something like a log-normal distribution for the difficulty of solving alignment, where we see overall decreasing returns to effort unless you have extremely low variance (implying you know almost exactly which OOM of effort is enough to solve alignment) or extremely low probability of success by default (<< 1%). 

Overall my current guess is that tractability/neglectedness pushes toward working on short timelines, and gives a penalty to delayed impact of perhaps 10x per decade (20x penalty from neglectedness, compensated by a 2x increase in tractability). 

If you think that neglectedness/tractability overall pushes toward targeting impact toward long timelines then I'd be curious to see that spelled out more clearly (e.g. as a distribution over the difficulty of solving alignment that implies some domain of increasing returns to effort, or some alternative way to model this). This seems very important if true.  

[-]JanBΩ230

I had independently thought that this is one of the main parts where I disagree with the post, and wanted to write up a very similar comment to yours. Highly relevant link: https://www.fhi.ox.ac.uk/wp-content/uploads/Allocating-risk-mitigation.pdf My best guess would have been maybe 3-5x per decade, but 10x doesn't seem crazy.