[Link] Why I’m excited about AI-assisted human feedback

janleike

[Link] Why I’m excited about AI-assisted human feedback

by janleike

1 min read6th Apr 2022No comments

29 Ω 17

RLHFAI

Frontpage

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

This is a link post for https://aligned.substack.com/p/ai-assisted-human-feedback

I'm writing a sequence of posts on the approach to alignment I'm currently most excited about. This first post argues for recursive reward modeling and the problem it's meant to address (scaling RLHF to tasks that are hard to evaluate).

New Comment

Moderation Log

LESSWRONG
LW

[Link] Why I’m excited about AI-assisted human feedback

29

Ω 17

New to LessWrong?