Andrew Mauboussin — LessWrong

Interactive directory of alignment researchers, organizations, and funding bodies

Context: Connecting people who want to work on or fund alignment research with the right collaborators is a high leverage activity, but as the field grows methods conducting searches via Google and LinkedIn will take a lot of time and won’t always produce comprehensive results. This system would be useful if it could let its user ask who is working on a particular type of project and get the same answer you’d get if you asked someone well-connected and up-to-date on the research in the relevant subject area.

Input Type: A question about the people, organizations, or funding bodies in a particular subfield of alignment research.

Output Type: A list of the relevant entities and a brief explanation of why they are relevant. If possible, it would be helpful to also provide contact information.

Instance 1:

Input

Who is working on using adversarial examples to make models more robust?

Output

Redwood Research’s current project uses adversarial examples.
The FTX Future Fund is also interested in funding related projects, including the Unrestricted Advex Challenge and achieving near-perfect robustness on adversarial examples in vision.

Instance 2:

Input

Who is working on fine-tuning large language models to be more aligned with human instructions?

Output

Long Ouyang, Jeff Wu, and others are working on this at OpenAI (https://arxiv.org/abs/2203.02155).
Yuntao Bai, Andy Bai, Kamal Ndousse, and others are also working on this problem at Anthropic (https://arxiv.org/abs/2204.05862).

Instance 3:

Input

Who has experience creating interactive visualizations to help understand transformer models?

Output

Ben Hoover, Hendrik Strobel, and Sebastian Gehrmann worked on this with the exBERT project.
Chris Olah is working on similar projects related to Transformer Circuits.

Instance 4:

Input

What organizations are funding the creation of open datasets for alignment research?

Output

MIRI has a one million dollar bounty for the Visible Thoughts Project.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments