x

LESSWRONG
LW

Stephen McAleer

Subscribe

Message

21

1

3y

AI Alignment: A Comprehensive Survey

We have just released an academic survey of AI alignment. We identify four main categories of alignment research: 1. Learning from feedback (e.g. scalable oversight) 2. Learning under distribution shift 3. Assurance (e.g. interpretability) 4. Governance We mainly focused on academic references but also included some posts from LessWrong and...

Nov 1, 202322

Stephen McAleer

Subscribe

Message

21

1

3y

AI Alignment: A Comprehensive Survey

We have just released an academic survey of AI alignment. We identify four main categories of alignment research: 1. Learning from feedback (e.g. scalable oversight) 2. Learning under distribution shift 3. Assurance (e.g. interpretability) 4. Governance We mainly focused on academic references but also included some posts from LessWrong and...

Nov 1, 202322

AI Alignment: A Comprehensive Survey

Stephen McAleer

2y

We have just released an academic survey of AI alignment.

We identify four main categories of alignment research:

Learning from feedback (e.g. scalable oversight)
Learning under distribution shift
Assurance (e.g. interpretability)
Governance

We mainly focused on academic references but also included some posts from LessWrong and other forums. We would love to hear from the community about any references we missed or anything that was unclear or misstated. We hope that this can be a good starting point for AI researchers who might be unfamiliar with current efforts in AI alignment.

1

22