We are excited to announce multiple positions for Research Assistants to join our six-month research project assessing the viability of Developmental Interpretability (DevInterp). 

This is a chance to gain expertise in interpretability, develop your skills as a researcher, build out a network of collaborators and mentors, publish in major conferences, and open a path towards future opportunities, including potential permanent roles, recommendations, and successive collaborations.

Background

Developmental interpretability is a research agenda aiming to build tools for detecting, locating, and understanding phase transitions in learning dynamics of neural networks. It draws on techniques from singular learning theory, mechanistic interpretability, statistical physics, and developmental biology. 

Position Details

General info: 

  • Title: Research Assistant. 
  • Location: Remote, with hubs in Melbourne and London. 
  • Duration: Until March 2024 (at minimum).
  • Compensation: base salary is USD$35k per year, to be paid out as an independent contractor at an hourly rate.

Timeline:

  • Application Deadline: September 15th, 2023
  • Ideal Start Date: October 2023

How to Apply: Complete the application form by the deadline. Further information on the application process will be provided in the form.

Who We Are

The developmental interpretability research team consists of experts across a number of areas of mathematics, physics, statistics and AI safety. The principal researchers:

  • Daniel Murfet, mathematician and SLT expert, University of Melbourne.
  • Susan Wei, statistician and SLT expert, University of Melbourne.
  • Jesse Hoogland, MSc. Physics, SERI MATS scholar, RA in Krueger lab

We have a range of projects currently underway, led by one of these principal researchers and involving a number of other PhD and MSc students from the University of Melbourne and collaborators from around the world. In an organizational capacity you would also interact with Alexander Oldenziel and Stan van Wingerden.

You can find us and the broader DevInterp research community on our Discord. Beyond the Developmental Interpretability research agenda, you can read our first preprint on scalable SLT invariants and check out the lectures from the SLT & Alignment summit.

Overview of Projects

Here’s the selection of the projects underway, some of which you would be expected to contribute to. These tend to be on the more experimental side: 

  • Developing scalable estimates for SLT invariants: Invariants like the (local) learning coefficient and (local) singular fluctuation can signal the presence of “hidden” phase transitions. Improving these techniques can help us better identify these transitions. 
  • DevInterp of vision models: To what extent do the kinds of circuits studied in the original circuits thread emerge through phase transitions? 
  • DevInterp of program synthesis: In examples where we know there is rich compositional structure, can we see it in the singularities? Practically, this means studying settings like modular arithmetic (grokking), multitask sparse parity, and more complex variants.
  • DevInterp of in-context learning & induction heads: Is the development of induction heads a proper phase transition in the language of SLT? More ambitiously, can we apply singular learning theory to study in-context learning and make sense of “in-context phase transitions.”
  • DevInterp of language models: Can we detect phase transitions in simple language models (like TinyStories). Can we, from these transitions, discover circuit structure? Can we extend these techniques to larger models (e.g., in the Pythia suite).
  • DevInterp of reinforcement learning models: To what extent are phase transitions involved in the emergence of a world model (in examples like OthelloGPT)?

Next to these, we are working on a number of more theoretical projects. (Though our focus is on the more applied projects, if one of these particularly excites you, you should definitely apply!)

  • Studying phase transitions in simple models like deep linear networks: These serve a valuable intermediate case between regular models and highly singular models. This is also a place to draw connections to a lot of existing literature on deep learning theory (on “saddle-to-saddle dynamics”). 
  • Developing “geometry probes”: It’s not enough to detect phase transitions, but we have to be able to analyze them for structural information. Here the aim is to develop “probes” that extract this kind of information from phase transitions. 
  • Developing the “geometry of program synthesis”: We expect that understanding neural networks will require looking beyond models of computation like Turing machines, lambda calculus, and linear logic, towards geometric models of computation. This means pushing further in directions like those explored by Tom Waring in his MSc. thesis

Taken together these projects complete the scoping phase of the DevInterp research agenda, ideally resulting in publications in venues like ICML and NeurIPS.

What We Expect

You will be communicating about your research on the DevInterp Discord, writing code and training models, attending research meetings over Zoom, and in general acting as a productive contributor to a fast-moving research team combining both theoreticians and experimentalists working together to define the future of the science of interpretability.

Depending on interest and background, you may also be reading and discussing papers from ML or mathematics and contributing to the writing of papers on Overleaf. It’s not mandatory, but you would be invited to join in virtual research seminars like the SLT seminar at metauni or SLT reading group on the DevInterp Discord.

There will be a DevInterp conference in November 2023 in Oxford, United Kingdom, and it would be great if you could attend. There will hopefully be a second opportunity to meet the team in person between November and the end of the employment period (possibly in Melbourne, Australia).

FAQ

Who is this for? 

We’re looking mainly for people who can do engineering work, that is, people with software development and ML skills. It’s not necessary to have a background in interpretability or AI safety, although that’s a plus. Ideally you have legible output / projects that demonstrate ability as an experimentalist. 

What’s the time commitment?

We’re looking mainly for people who can commit full-time, but if you’re talented and only available part-time, don’t shy away from applying.

What does the compensation mean?

We’ve budgeted USD$70k in total to be spread across 1-4 research assistants over the next half year. By default we’re expecting to pay RAs USD$17.50/hour. 

Do I need to be familiar with SLT and AI alignment? 

No (though it’s obviously a plus).

We’re leaning towards taking on skilled general purpose experimentalists (without any knowledge of SLT) over less experienced programmers who know some SLT. That said, if you are a talented theorist, don’t shy away from applying. 

What about compute?

In the current phase of the research agenda the projects are not extremely compute intensive, the necessary cloud GPU access will be provided.

What are you waiting for? 

Apply now.

EDIT: The applications have closed. Take a look at this comment for summary and feedback.

New to LessWrong?

New Comment
11 comments, sorted by Click to highlight new comments since: Today at 11:01 PM

Now that the deadline has arrived, I wanted to share some general feedback for the applicants and some general impressions for everyone in the space about the job market:

  • My number one recommendation for everyone is to work on more legible projects and outputs. A super low-hanging fruit for >50% of the applications would be to clean up your GitHub profiles or to create a personal site. Make it really clear to us which projects you're proud of, so we don't have to navigate through a bunch of old and out-of-use repos from classes you took years ago. We don't have much time to spend on every individual application, so you want to make it really easy for us to become interested in you. I realize most people don't even know how to create a GitHub profile page, so check out this guide
  • We got 70 responses and will send out 10 invitations for interviews. 
  • We rejected a reasonable number of decent candidates outright because they were looking for part-time work. If this is you, don't feel dissuaded.
  • There were quite a few really bad applications (...as always): poor punctuation/capitalization, much too informal, not answering the questions, totally unrelated background, etc. Two suggestions: (1) If you're the kind of person who is trying to application-max, make sure you actually fill in the application. A shitty application is actually worse than no application, and I don't know why I have to say that. (2) If English is not your first language, run your answers through ChatGPT. GPT-3.5 is free. (Actually, this advice is for everyone). 
  • Between 5 and 10 people expressed interest in an internship option. We're going to think about this some more. If this includes you, and you didn't mention it in your application, please reach out. 
  • Quite a few people came from a data science / analytics background. Using ML techniques is actually pretty different from researching ML techniques, so for many of these people I'd recommend you work on some kind of project in interpretability or related areas to demonstrate that you're well-suited to this kind of research. 
  • Remember that job applications are always noisy.  We almost certainly made mistakes, so don't feel discouraged!

I would apply if the rate were more like USD $50/hour, considering the high cost of living in Berkeley. As it is this would be something like a 76% pay cut compared to the minimum ML engineer pay at Google and below typical grad student pay, and require an uncommon level of frugality.

Hey Thomas, I wrote about our reasoning for this in response to Winston:

All in all, we're expecting most of our hires to come from outside the US where the cost of living is substantially lower. If lower wages are a deal-breaker for anyone but you're still interested in this kind of work, please flag this in the form. The application should be low-effort enough that it's still worth applying.

Hello, I agree with Jesse as the budget they have is really good for hiring capable alignment researchers here in Asia (I'm based currently in Chiang Mai, Thailand) or any other place  where cost is extremely low compared back there in the West. 

Good luck on this project team Dev Interp.

More than a 76% pay cut, because a lot of the compensation at Google is equity+bonus+benefits; the $133k minimum listed at your link is just base salary.

Always welcome more optionality in the opportunity space!

Suggestion: Potential Improvement in Narrative Signalling by lowering the range of RAs to hire (thus increasing pay): 

  • If I were applying to this, I'd feel confused and slightly underappreciated if I had the right set of ML/Software Engineering skills but to be barely paid subsistence level for my full-time work (in NY). 
  • It seems like the funding amount is well corresponded to how much the grant is. I am rather naive when it comes to how much ML/engineering talent should be paid in pursuit of alignment goals. But it seems like $70k spread across 4 people at full-time (for half year each) is only slightly above minimal wage in many places. 
  • Comparisons: At 35k a year, it seems it might be considerably lower than industry equivalent even when compared to other programs 
    • Ex: Lightcone has a general policy of paying ~70% market pay for equivalent talent. 
      • Recalling from memory of LTFF/similar grants that experienced researchers were granted 70k ~ 200k for their individual research. 
      • A quick glance at 80k job-board for RAs nets me a range of 32,332 ~ 87,000. 
  • Of course... money is tight: The grant constraint is well acknowledged here. But potentially the number of RAs expected to hire can be further down adjusted as while potentially increasing the submission rate of the candidates that truly fits the requirement of the research program. 
  • Hiring more might not work as intended: Also, it might come as a surprise that fewer people to manage will turn out to be a blessing rather than a curse - hiring one's way out of something is tempting but should usually be tempered with caution.
  • Thing I might have got wrong: 
    • The intended audiences and people of the right skill-set will not be in countries where the salary is barely above subsistence level. 
    • The Research Project has decided that people who possesses an instinct for "I'd like to work here, but please give me X because I think I am worth that much and can offer at least that much value" is generally a poor fit for the project. 
    • A misunderstanding the salary distribution and the rate of deterioration of the current financial climate within the scene. 

Overall, I am glad y'all exist! Good luck :)

Hey Winston, thanks for writing this out. This is something we talked a lot about internally. Here are a few thoughts: 

Comparisons: At 35k a year, it seems it might be considerably lower than industry equivalent even when compared to other programs 

I think the more relevant comparison is academia, not industry. In academia, $35k is (unfortunately) well within in the normal range for RAs and PhD students. This is especially true outside the US, where wages are easily 2x - 4x lower.

Often academics justify this on the grounds that you're receiving more than just monetary benefits: you're receiving mentorship and training. We think the same will be true for these positions. 

The actual reason is that you have to be somewhat crazy to even want to go into research. We're looking for somewhat crazy.

If I were applying to this, I'd feel confused and slightly underappreciated if I had the right set of ML/Software Engineering skills but to be barely paid subsistence level for my full-time work (in NY).

If it helps, we're paying ourselves even less. As much as we'd like to pay the RAs (and ourselves) more, we have to work with what we have.  

Of course... money is tight: The grant constraint is well acknowledged here. But potentially the number of RAs expected to hire can be further down adjusted as while potentially increasing the submission rate of the candidates that truly fits the requirement of the research program.

For exceptional talent, we're willing to pay higher wages. 

The important thing is that both funding and open positions are exceptionally scarce. We expect there to be enough strong candidates who are willing to take the pay cut.

All in all, we're expecting most of our hires to come from outside the US where the cost of living is substantially lower. If lower wages are a deal-breaker for anyone but you're still interested in this kind of work, please flag this in the form. The application should be low-effort enough that it's still worth applying. 

Often academics justify this on the grounds that you're receiving more than just monetary benefits: you're receiving mentorship and training. We think the same will be true for these positions. 

 

I don't buy this. I'm actually going through the process of getting a PhD at ~40k USD per year, and one of the main reasons why I'm sticking with it is that after that, I have a solid credential that's recognized worldwide, backed by a recognizable name (i.e. my university and my supervisor). You can't provide either of those things.

This offer seems to take the worst of both worlds between academia and industry, but if you actually find someone good at this rate, good for you I suppose

Would any of the involved parties be interested in having a fireside chat for AI Safety Australia and New Zealand about developmental interpretability and this position a few days before the application closes?

If so, please feel free to PM me.

Could you clarify if the rate is in AUD or USD?

It's in USD (should be reflected in to the announcement now)