User Profile

7480Ω86911036

Recent Posts

Curated Posts
Curated - Recent, high quality posts selected by the LessWrong moderation team.
Frontpage Posts
Posts meeting our frontpage guidelines: aim to explain, not to persuade. Avoid meta-discussion
(includes curated content and frontpage posts)
All Posts
Includes personal and meta blogposts (as well as curated and frontpage).

Corrigibility

2918d6 min readΩ 7Show Highlight
1

Clarifying "AI Alignment"

531mo3 min readΩ 15Show Highlight
46

Prosaic AI alignment

3625d7 min readΩ 10Show Highlight
0

Benign model-free RL

1014d7 min readΩ 3Show Highlight
0

Humans Consulting HCH

1920d1 min readΩ 4Show Highlight
8

Approval-directed bootstrapping

1820d1 min readΩ 3Show Highlight
0

The Steering Problem

371mo7 min readΩ 9Show Highlight
1

Approval-directed agents: details

1822d7 min readΩ 4Show Highlight
1

An unaligned benchmark

271mo9 min readΩ 8Show Highlight
0

Preface to the sequence on iterated amplification

371mo2 min readΩ 11Show Highlight
0

Recent Comments