x

LESSWRONG

LW

Gabriel Goh — LessWrong

Gabriel Goh

Gabriel Goh

Message

25

Ω

10

1

7y

Gabriel Goh

25

Ω

10

7y

Chris Olah’s views on AGI safety

Gabriel Goh7yΩ11260

<note> I work in Clarity at OpenAI. Chris and I have discussed this response (though I cannot claim to represent him).

Does "faithful" mean "100% identical in terms of I/O", or more like "captures all of the important elements of"?

I'd say faithfulness lies on a spectrum. Full IO determinism on a neural network is nearly impossible (given the vagaries of floating point arithmetic), but what is really of interest to us is “effectively identical IO”. A working definition of this could be - an interpretable... (read more)