Language Model Alignment Research Internships

Ethan Perez

74 Language Model Alignment Research Internships

by Ethan Perez

13th Dec 2021

AI Alignment Forum

1 min read

1

74 Ω 27

I'm Ethan Perez, a final year PhD student at NYU, working on aligning language models with human preferences. I'm looking to hire research interns to work on projects in this space, starting early 2022. I expect candidates to have strong software engineering ability, for ML engineering (e.g., to finetune GPT2 to good performance on a new task) or data engineering (e.g., to quickly find high quality subsets of text within petabytes of Common Crawl data). Ideal candidates will have experience doing ML and/or NLP research, reading papers, and coming up with ideas to test. I'm looking for people who'd be able to work full-time (remotely), with compensation of $70/hour. I expect each project to take 4-8 months to complete and lead to a first-author publication at a top-tier ML or NLP conference.

Below are a few examples of projects I have in mind:

Pretraining language models on more factual, high-quality, and value-aligned text, so that they are easier to control and more truthful.
Developing algorithms for training language models to learn from different forms of human feedback, beyond simple scalar/numerical feedback as in prior work.
Finding tasks that exhibit inverse scaling, where better performance on the language modeling pretraining objective leads to worse downstream task performance. Examples of such tasks include TruthfulQA and RealToxicityPrompts.

If what I've described sounds like a good fit, I'd love to hear from you over email! Just send me your website, resume, LinkedIn, GitHub, Google Scholar, or anything else that might be helpful for understanding your background (my email is perez at nyu dot edu).

Language Models (LLMs)AI

Frontpage

74 Ω 27

New Comment

1 comment, sorted by

top scoring

Click to highlight new comments since: Today at 5:29 AM

[-]Gurkenglas4y20

For 1., you could use the existing LM to judge whether each training datum ought to be included, or you could curate less than GPT-3 by also including reddit links with <3 karma.

For 2., you mean it should take natural-language peer-review-like feedback?

For 3., I suspect that such tasks scale given a different prompt.

Reply

Moderation Log

LESSWRONG
LW

LESSWRONG
LW

74

Language Model Alignment Research Internships

74

Ω 27

74

Ω 27

74

Ω 27