x

LESSWRONG

LW

lePAN6517 — LessWrong

lePAN6517

lePAN6517

Message

7

2

6y

lePAN6517

7

6y

simeon_c's Shortform

Can you speak to any, let's say, "hypothetical" specific concerns that somebody who was in your position at a company like OpenAI might have had that would cause them to quit in a similar way to you?

More information about the dangerous capability evaluations we did with GPT-4 and Claude.

Thank you for your valuable work doing this. Can you please expand up on why you did not test the final version of GPT-4? In section 2.9 of the GPT-4 System Card paper, it says:

"We granted the Alignment Research Center (ARC) early access to the models as a part of our expert red teaming efforts in order to enable their team to assess risks from power-seeking behavior. The specific form of power-seeking that ARC assessed was the ability for the model to autonomously replicate and acquire resources. We provided them with early access to multiple versions

... (read more)