I’m a 17 y/o who started doing transformer interpretability work around October 2021, mainly for the learning experience, but also with the goal of potentially finding something cool and interesting about transformers. I’m writing this post to consolidate some lessons I learned since then, and I hope that some of these ideas will be useful to other early career people hoping to do ML research. If you think any of this advice is wrong/misleading, please comment!

Getting Good Research Intuitions

For the majority of the past couple months, my work was exploratory. I was reading papers, talking to other researchers in the field, reimplementing things, messing around with other people’s codebases or trying weird experiment ideas. Most of this work was done to build good research intuitions: how do I set my priors such that I have a) good ideas for what’s interesting, b) principled mental models of [insert research topic] (transformer internals in my case) and c) reasonable predictions for my hypotheses and experiments? 

Recognizing Promising Topics & Ideas

The “good idea for what’s interesting” part is what I think is the most important part of having good research intuitions. In my mind, “good idea for what’s interesting” encompasses both having a good idea of which topics are interesting and also having a good idea for which experiment ideas and hypotheses are promising. 

Fortunately, the first part — knowing which areas of research are promising — shouldn’t be too hard if your field of interest is big enough. I think the standard advice of 1) figure out who all the important people are in the field, 2) figure out what they think is interesting and 3) figure out why works here. That plus an understanding of the importance, tractability, and neglectedness of each topic should be enough information for you to make a logical decision of which topic you want to work on. But I don’t think I would recommend going that deep on deciding which topics are interesting. What’s probably faster and better anyways is, after having a basic overview of the field, to just pick the topic that you vibe with the most and double down on that for a while. Obsessive interest is a powerful indicator of genius (bus ticket theory of genius) and you’ll work harder and better on the things that you have a strong internal compulsion towards. 

The second part — knowing which experiment ideas are promising — is a lot trickier. I think this ability mainly boils down to practice, experience, and time. And, of course, practice and experience involves having ideas in the first place! You’re never too early to have ideas, and even though your initial ideas will probably be bad, that’s part of the feedback process. Have a lot of ideas! Make a lot of predictions! Think about how to test those predictions! And after doing this a lot and getting feedback (through mentorship, experiments, reading, etc), your intuitions will eventually be tuned towards better and more interesting ideas. 

Also, a really nice thing about doing this kind of thinking before you “learn” the field is that many times, answers might already exist, and so you can get feedback on your ideas and your mental models a lot more quickly. Instead of having to run and design an entire experiment, a simple Google search or email might be all you need. 

Principled Mental Models 

I think I’m still pretty far from having really principled mental models of how modern deep learning works. The steps that I know to gain these mental models are the standard a) learn all the math very well (3b1b is gold for the basic calc & linear algebra intuitions), b) absorb the mental models and intuitions of more established researchers, and c) spend a lot more time doing research. 

Reasonable Predictions for Hypotheses/Experiments

Having principled mental models is one way of having reasonable predictions for your experiments. The flip side of the coin is to have more willy nilly, heuristics-based predictions for experiments based on empirical evidence. Of course, the way to get these heuristics is to run a bunch of experiments. 

If you don’t have a lot of experiment ideas already, one class of exploratory experiment ideas, where you can a) get practice coding and b) tune your predictive engine, is to graph a bunch of statistics about a component of interest and predict what happens. These experiments can range from “Yeah, I definitely know what’s going to happen” to “I have no idea what’s going to happen” and still be valuable. 

Exploratory Research vs Directed Research

In general, if you want to find something interesting, I think a directed research agenda (i.e. I have a specific story of how X works and I’m trying to prove whether it’s true or false) is almost always better than an exploratory research agenda (i.e. I have a bunch of random ideas and I’m just going to go down the list until I find something interesting). This is mainly because I think there’s a lot of noise in ML research specifically, and it’s harder to distinguish between signal and noise if you don’t really know what you’re looking for (as in pursuing an exploratory research agenda). 

However, if you’re not trying to find something interesting, and just trying to build intuitions or practice, then I think exploratory is the better option. This is mainly because I think you need some threshold of experience and intuition before you can perform a directed research agenda well, and I think you can get experience and intuition and momentum faster by iterating through a set of experiment ideas instead of focusing on one strongly. 

Machine Learning Specific Advice

Beware Bias, Bugs, and Bizarreness

Oftentimes when doing ML research, you’ll get results that are very weird and surprising, and oftentimes those results are the results of bugs. And you don’t even really need a lot of prior experience to be able to recognize bug-caused-weirdness i.e. your model outputting “!” with 100% probability for every single input. Many times also there’ll be subtler, surprising things that you might not notice. A general good piece of advice is for any result that is significant, surprising or weird in any way, go do a bunch of sanity checks to find other potential explanations for the result (which, besides bugs, could also be bias or some weird quirk of ML). In general, the more surprising a result is, the more skeptical you should be of its validity. 

Know When to Stop Investigating

Oftentimes, you’ll also get results that are weird and surprising, and those results are not the results of bugs, and they’re simply just weird and surprising. Many times, these weird results will not really have any bearing on your current research agenda, but because they’re weird and interesting, you might feel the urge to investigate and figure out what’s going on. This is a fine line to walk, but, in general, I don’t think it’s worthwhile to spend a lot of time trying to explain irrelevant, weird phenomena because machine learning is full of irrelevant, weird phenomena, and it’s a really easy way to derail the momentum you’ve built working on your main agenda. 


  • Never delete code or results!!!
    • Especially if the results took a long time to compute
    • This sounds obvious, but a decent amount of times in the past I've thought that something wasn't going to be useful, deleted it, and then realized I later needed it for something else. 
    • Keep track of your codebase on Github. It's pretty useful sometimes to see how you implemented something in the past, even if that code becomes obsolete. 
  • Anytime you want to actually prove something, use a big sample size
  • Google Colab is always a good place to start doing exploratory stuff, but once your code base starts growing more and more complex, VScode Interactive Mode or a Jupyter notebook is useful to be able to import your own functions from local modules

Getting Started

Here’s a list of less abstract advice on how to get started doing machine learning research:

Find a good mentor(s) [very important]

People say it all the time, but a good mentor will drastically accelerate your growth and progress. In my opinion, the qualities to be on the lookout for, in descending order of importance, are 1) time spent with mentor, 2) mentor’s relevance to your specific interests, and then 3) prestige. 

Since most potential mentors will be a lot more experienced than you anyways, the amount you can learn from a mentor is probably more directly proportional to the amount of time you spend with them, instead of how closely related their interests are to your research interests or how connected/prestigious/etc they are.

Reach out to the authors of any paper you liked for a call

Talking to the authors of good papers is a really good way to learn more about your research topic and gain tacit knowledge about how the research process actually works. It’s also a good way to engineer serendipity and find new mentors or collaborations (my current collaboration with a group from AI2 started with one good call). Some advice on how to make the most of a call with a researcher: 

  • Ask about the journey of the research presented in the paper. You’ll gain a lot of tacit knowledge that way. 
  • Come with your own ideas related to their paper (i.e. did you try X? What do you think would happen if I did Y?) and ask for feedback
  • Form a continuous relationship and stay in touch. 

Keep a good research journal

A research journal is an important personal reminder for what you did and what you learned, but write it as if you’re communicating your results to a stranger. You might be surprised by how much you forget about the experiments that you ran from several months ago. 

Research Journal Tips:

  • Store graphs, thoughts, results, notes from calls, any content related to your research in your notebook. Date your journal and use informative headings. 
  • Write your research journal with as little assumed context as possible. Always label your axes in your graphs. Explain the exact experiment you ran as clearly as possible. Future you will thank you. 

How to write a research agenda:

The three most important parts of any research agenda is a) the research question to be answered b) why this research question matters and c) what concrete directions are going to be taken to answer the question. So when writing a research agenda, I like to follow a structure that looks something like:

  • Overarching Questions
    • Super broad questions. The kind of questions you write about in the discussion of a paper.
    • Why answers to these questions are important
  • Research Questions
    • These kinds of questions suggest hypotheses/experiment ideas by their nature. 
    • Usually they’re about specific behavior/phenomena
  • Experiment Ideas

Thanks for reading! And thank you to Alex Gray and Oam Patel for helpful feedback on this post. 


New Comment
8 comments, sorted by Click to highlight new comments since: Today at 2:13 AM

Reach out to the authors of any paper you liked for a call

I didn't know one can do that. Do people really just agree to a call with a stranger?

Of course there’s some pingeonhole principle that works against you if you try to contact, say, Geoffrey Hinton, or the authors of the last making-the-buzz paper. But otherwise yes, most researchers are glad to talk about their work. And it’s kind of a professional duty, which is why most papers include the email of the corresponding author(s).

More generally, congrats on the KevinRoWang for this post! I started my own journey in ML before he was born, and I’m impressed by the maturity of his advices.

There's an art to it, which I've seen best discussed here, but if you do cold emails well then yes people do in fact agree to such calls. I've had ~30% hit rates on cold calls to Berkeley professors while being a Berkeley undergrad in a completely different field, and that was for asks to chat about Zika. I'd expect a higher hit rate for professors in the same field you are in, where you can demonstrate greater understanding than I was able to at the time.

Yes! Especially if you show you have can provide relevant thoughts about their work, a lot of people will be happy to call or at least reply to some questions via email

Never delete code or results!!!

There's a tool I've been interested in using (if it exists), basically a jupyter notebook, but which saves all outputs to tmp files (possibly truncating at the 1MB mark or something), and which maintains a tree history of the state of cells (i.e. if you hit ctrl-z a few times and start editing, it creates a branch.) Neither of these are particularly memory heavy, but both would have saved me in the past if they were hidden options to restore old state.

I'd also add that if you had a bug, and it took real effort/digging to find an online solution, archive the link to that solution in a list (preferably with date tags.) This has been preferable for me over painstakingly re-searching / trying to search through browser history when needed.

Hm, I think this tool would've been really helpful for me in the past for a couple of occasions. Usually if I want to save a cell output, I just won't edit that cell and I'll create a new one, even if it means redundant code. 

Also +1 on keeping track of bugs! I should've added to the og post that one thing I do that's really helpful for me is keeping track of procedural knowledge (i.e. how to setup a GPU, how to fix common issue X, etc.) in a personal Slack that I've created as a second brain basically. I found that I used the message-yourself-in-slack feature a lot to keep track of small notes for myself, and since I did it so much, I created a whole private, personal Slack and that's been pretty useful in keeping track of bugs, etc.

Enjoyed reading this! Really glad you're getting good research experience and I'm stoked about the strides you're making towards developing research skills since our call (feels like ages ago)! I've been doing a lot of what you describe as "directed research" myself lately as I'm learning more about DL-specific projects and I've been learning much faster than when I was just doing cursory, half-assed paper skimming, alongside my cogsci projects. Would love to catch up over a call sometime to talk about stuff we're working on now

Let's definitely catch up!