Review
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

I'm Ethan Perez, a final year PhD student at NYU, working on aligning language models with human preferences. I'm looking to hire research interns to work on projects in this space, starting early 2022. I expect candidates to have strong software engineering ability, for ML engineering (e.g., to finetune GPT2 to good performance on a new task) or data engineering (e.g., to quickly find high quality subsets of text within petabytes of Common Crawl data). Ideal candidates will have experience doing ML and/or NLP research, reading papers, and coming up with ideas to test. I'm looking for people who'd be able to work full-time (remotely), with compensation of $70/hour. I expect each project to take 4-8 months to complete and lead to a first-author publication at a top-tier ML or NLP conference.

Below are a few examples of projects I have in mind:

  1. Pretraining language models on more factual, high-quality, and value-aligned text, so that they are easier to control and more truthful.
  2. Developing algorithms for training language models to learn from different forms of human feedback, beyond simple scalar/numerical feedback as in prior work.
  3. Finding tasks that exhibit inverse scaling, where better performance on the language modeling pretraining objective leads to worse downstream task performance. Examples of such tasks include TruthfulQA and RealToxicityPrompts.

If what I've described sounds like a good fit, I'd love to hear from you over email! Just send me your website, resume, LinkedIn, GitHub, Google Scholar, or anything else that might be helpful for understanding your background (my email is perez at nyu dot edu).


 

New Comment
1 comment, sorted by Click to highlight new comments since: Today at 11:42 AM

For 1., you could use the existing LM to judge whether each training datum ought to be included, or you could curate less than GPT-3 by also including reddit links with <3 karma.

For 2., you mean it should take natural-language peer-review-like feedback?

For 3., I suspect that such tasks scale given a different prompt.