x

LESSWRONG
LW

MATS Program — LessWrong

MATS Program

Edited by Ryan Kidd, Multicore, et al. last updated 30th Dec 2024

ML Alignment & Theory Scholars (MATS) Program is an educational seminar and independent research program that aims to provide talented scholars with talks, workshops, and research mentorship in the field of AI alignment, and connect them with the Berkeley AI safety research community.

3

3

Posts tagged MATS Program

1

27Bitter Lessons from Distillation Robustifies Unlearning

7d

1

1

10Studying Mechanistic of Alignment Faking in Llama-3.1-405B

9d

0

1

41OpenAI finetuning metrics: What is going on with the loss curves?

jorio, James Chua

10d

5

2

74[Paper] Output Supervision Can Obfuscate the CoT

jacob_drori, lukemarks, cloud, TurnTrout

14d

2

1

23Can Models be Evaluation Aware Without Explicit Verbalization?

gersonkroiz, Greg Kocher, Tim Hua

1mo

5

1

61Steering Evaluation-Aware Models to Act Like They Are Deployed

Tim Hua, andrq, Sam Marks, Neel Nanda

1mo

12

1

17Can Reasoning Models Obfuscate Reasoning? Stress-Testing Chain-of-Thought Monitorability

Artur Zolkowski, Wen Xing

1mo

1

3

122Recontextualization Mitigates Specification Gaming Without Modifying the Specification

ariana_azarbal, Victor Gillioz, TurnTrout, cloud

2mo

13

2

156Inoculation prompting: Instructing models to misbehave at train-time can improve run-time behavior

Sam Marks, Nevan Wichers, Daniel Tan, Aram Ebtekar, Jozdien, David Africa, Alex Mallen, Fabien Roger

2mo

37

1

68Eliciting secret knowledge from language models

Bartosz Cywiński, Arthur Conmy, Sam Marks

2mo

3

2

87Trends in Economic Inputs to AI

Jeffrey Heninger

3mo

6

2

47Apply to MATS 9.0!

3mo

0

2

22MATS 8.0 Research Projects

Jonathan Michala, DanielFilan, Ryan Kidd

3mo

0

2

52Narrow Finetuning Leaves Clearly Readable Traces in Activation Differences

Julian Minder, Clément Dumas, Stewy Slocum, Neel Nanda

3mo

2

1

14Evaluating Prediction in Acausal Mixed-Motive Settings

3mo

0

Load More (15/259)

Add Posts