10mo

This work was supported by the Effective Ventures Foundation USA through their EA Funds program. It was started as part of the MATS (ML Alignment and Theory Scholars) program, with mentorship from Julian Michael and research management by McKenna Fitzgerald.

Code for this project is available on GitHub. Explore samples from the different training runs at our interactive website.

Introduction

Scalable oversight research paradigm

Scalable oversight is the challenge of overseeing or training AI systems to achieve goals that are hard for humans to specify. By definition, studying scalable oversight directly on the tasks we care about is infeasible, because we would not know if the goals were achieved correctly. So instead we use sandwiching — using... (read 4426 more words →)

Talent Needs of Technical AI Safety Teams

yams

yams, Carson Jones, McKennaFitzgerald, Ryan Kidd

Co-Authors: @yams, @Carson Jones, @McKennaFitzgerald, @Ryan Kidd

MATS tracks the evolving landscape of AI safety^[1] to ensure that our program continues to meet the talent needs of safety teams. As the field has grown, it’s become increasingly necessary to adopt a more formal approach to this monitoring, since relying on a few individuals to intuitively understand the dynamics of such a vast ecosystem could lead to significant missteps.^[2]

In the winter and spring of 2024, we conducted 31 interviews, ranging in length from 30 to 120 minutes, with key figures in AI safety, including senior researchers, organization leaders, social scientists, strategists, funders, and policy experts. This report synthesizes the key insights from these discussions. The overarching... (read 4127 more words →)

125

MATS Winter 2023-24 Retrospective

utilistrutil

utilistrutil, LauraVaughan, McKennaFitzgerald, Christian Smith, Juan Gil, Henry Sleight, Matthew Wearden, Ryan Kidd

Co-Authors: @Rocket, @LauraVaughan, @McKennaFitzgerald, @Christian Smith, @Juan Gil, @Henry Sleight, @Matthew Wearden, @Ryan Kidd

The ML Alignment & Theory Scholars program (MATS) is an education and research mentorship program for researchers entering the field of AI safety. This winter, we held the fifth iteration of the MATS program, in which 63 scholars received mentorship from 20 research mentors. In this post, we motivate and explain the elements of the program, evaluate our impact, and identify areas for improving future programs.

Summary

Key details about the Winter Program:

The four main changes we made after our Summer program were:
- Reducing our scholar stipend from $40/h to $30/h based on alumni feedback;
- Transitioning Scholar Support to Research Management;
- Using the full Lighthaven campus for

... (read 14584 more words →)

MATS Summer 2023 Retrospective

utilistrutil

utilistrutil, Juan Gil, Ryan Kidd, Christian Smith, McKennaFitzgerald, LauraVaughan

Co-Authors: @Rocket, @Juan Gil, @Christian Smith, @McKennaFitzgerald, @LauraVaughan, @Ryan Kidd

The ML Alignment & Theory Scholars program (MATS, formerly SERI MATS) is an education and research mentorship program for emerging AI safety researchers. This summer, we held the fourth iteration of the MATS program, in which 60 scholars received mentorship from 15 research mentors. In this post, we explain the elements of the program, lay out some of the thinking behind them, and evaluate our impact.

Summary

Key details about the Summer 2023 Program:

Educational attainment of MATS scholars:
- 30% of scholars are students.
- 88% have at least a Bachelor's degree.
- 10% are in a Master’s program.
- 10% are in a PhD program.
- 13% have a PhD.
If not for MATS, scholars might

... (read 7778 more words →)

LESSWRONG
LW

LESSWRONG
LW

McKennaFitzgerald

Talent Needs of Technical AI Safety Teams

MATS Winter 2023-24 Retrospective

MATS Summer 2023 Retrospective

Evaluating Oversight Robustness with Incentivized Reward Hacking

McKennaFitzgerald

McKennaFitzgerald

Evaluating Oversight Robustness with Incentivized Reward Hacking

Talent Needs of Technical AI Safety Teams

MATS Winter 2023-24 Retrospective

MATS Summer 2023 Retrospective

McKennaFitzgerald

Talent Needs of Technical AI Safety Teams

MATS Winter 2023-24 Retrospective

MATS Summer 2023 Retrospective

Evaluating Oversight Robustness with Incentivized Reward Hacking

McKennaFitzgerald

McKennaFitzgerald

Evaluating Oversight Robustness with Incentivized Reward Hacking

Talent Needs of Technical AI Safety Teams

MATS Winter 2023-24 Retrospective

MATS Summer 2023 Retrospective

Introduction

Scalable oversight research paradigm

Summary

Summary