LESSWRONG
LW

397
Wikitags

MATS Program

Edited by Ryan Kidd, Multicore, et al. last updated 30th Dec 2024

ML Alignment & Theory Scholars (MATS) Program is an educational seminar and independent research program that aims to provide talented scholars with talks, workshops, and research mentorship in the field of AI alignment, and connect them with the Berkeley AI safety research community.

Subscribe
Discussion
2
Subscribe
Discussion
2
Posts tagged MATS Program
87Trends in Economic Inputs to AI
Jeffrey Heninger
1mo
6
46Apply to MATS 9.0!
Ryan Kidd
1mo
0
22MATS 8.0 Research Projects
Ω
Jonathan Michala, DanielFilan, Ryan Kidd
1mo
Ω
0
14Evaluating Prediction in Acausal Mixed-Motive Settings
Tim Chan
1mo
0
57Discovering Backdoor Triggers
Ω
andrq, Tim Hua, Sam Marks, Arthur Conmy, Neel Nanda
2mo
Ω
4
53Towards data-centric interpretability with sparse autoencoders
Ω
Nick Jiang, lilysun004, lewis smith, Neel Nanda
2mo
Ω
2
132Training a Reward Hacker Despite Perfect Labels
Ω
ariana_azarbal, vgillioz, TurnTrout
2mo
Ω
45
62Statistical suggestions for mech interp research and beyond
Ω
Paul Bogdan
2mo
Ω
4
59Concept Poisoning: Probing LLMs without probes
Jan Betley, jorio, dylan_f, Owain_Evans
2mo
5
16Exploration hacking: can reasoning models subvert RL?
Ω
Damon Falck, Joschka Braun, Eyon Jang
2mo
Ω
4
196Optimizing The Final Output Can Obfuscate CoT (Research Note)
Ω
lukemarks, jacob_drori, cloud, TurnTrout
2mo
Ω
22
55The many paths to permanent disempowerment even with shutdownable AIs (MATS project summary for feedback)
GideonF
2mo
6
39Building Black-box Scheming Monitors
james__p, richbc, Simon Storf, Marius Hobbhahn
2mo
18
54Unfaithful chain-of-thought as nudged reasoning
Ω
Paul Bogdan, Uzay Macar, Arthur Conmy, Neel Nanda
3mo
Ω
3
51Do LLMs know what they're capable of? Why this matters for AI safety, and initial findings
Ω
Casey Barkan, Sid Black, Oliver Sourbut
3mo
Ω
5
Load More (15/248)
Add Posts