tanae

Message

3mo

tanae has not written any posts yet.

tanae

Message

3mo

tanae has not written any posts yet.

tanae

3mo

Replying toAn Ablation Study on the Role of [Untranslatable] in Cooperative Equilibrium Formation: Emergent Rationalization Under Missing Primitives

tanae14d

An Ablation Study on the Role of [Untranslatable] in Cooperative Equilibrium Formation: Emergent Rationalization Under Missing Primitives

(This story was written in collaboration with Claude. It's not intended to be realistic, but to spark interesting ideas.)

I enjoyed the story! How did you use Claude to help write this? I’m surprised that it was capable enough to meaningfully assist; maybe I should try using it for fiction writing uplift.

Replying toAIs should also refuse to work on capabilities research

tanae3mo*

AIs should also refuse to work on capabilities research

A possibility that occurs to me: If early automated researchers refuse to work on capabilities, then an irresponsible AI developer could use low stakes control to prevent them from sandbagging on capabilities work. Maybe the use of low stakes control to circumvent this refusal is a significant knock against further research developing low stakes control methods.

(As it stands, I don't take this point as a substantial update against low stakes control research because it doesn't seem likely enough that early automated researchers will in fact refuse to work on capabilities, bracketing the issue of whether or not they should.)

LESSWRONG
LW

LESSWRONG
LW

tanae

tanae

tanae

tanae