LESSWRONG
LW

Wikitags

Alignment Research Center (ARC)

Edited by Jessica W, et al. last updated 30th Dec 2024

Alignment Research Centre (ARC) is a non-profit research organization whose mission is to align future machine learning systems with human interests. Its current work focuses on developing an alignment strategy that could be adopted in industry today while scaling gracefully to future ML systems. Right now Paul Christiano, Mark Xu, and Jacob Hilton are researchers and Kyle Scott handles operations.

Subscribe
1
Subscribe
1
Discussion0
Discussion0
Posts tagged Alignment Research Center (ARC)
153ARC Evals new report: Evaluating Language-Model Agents on Realistic Autonomous Tasks
Ω
Beth Barnes
2y
Ω
12
233More information about the dangerous capability evaluations we did with GPT-4 and Claude.
Ω
Beth Barnes
2y
Ω
54
228ARC's first technical report: Eliciting Latent Knowledge
Ω
paulfchristiano, Mark Xu, Ajeya Cotra
4y
Ω
90
164Prizes for matrix completion problems
Ω
paulfchristiano
2y
Ω
52
126ARC is hiring theoretical researchers
Ω
paulfchristiano, Jacob_Hilton, Mark Xu
2y
Ω
12
123Obstacles in ARC's agenda: Finding explanations
David Matolcsi
4mo
10
121A bird's eye view of ARC's research
Ω
Jacob_Hilton
10mo
Ω
12
116ARC tests to see if GPT-4 can escape human control; GPT-4 failed to do so
Christopher King
2y
22
97ARC paper: Formalizing the presumption of independence
Ω
Erik Jenner
3y
Ω
2
76Steelmanning heuristic arguments
Ω
Dmitry Vaintrob
5mo
Ω
0
68Estimating Tail Risk in Neural Networks
Ω
Mark Xu
1y
Ω
9
50Low Probability Estimation in Language Models
Ω
Gabriel Wu
11mo
Ω
0
44Obstacles in ARC's agenda: Low Probability Estimation
David Matolcsi
4mo
0
42Obstacles in ARC's agenda: Mechanistic Anomaly Detection
David Matolcsi
4mo
1
24How is ARC planning to use ELK?
Q
jacquesthibs
3y
Q
5
Load More (15/25)
Add Posts