Posts

Sorted by New

Wiki Contributions

Comments

Thank you for your valuable work doing this. Can you please expand up on why you did not test the final version of GPT-4? In section 2.9 of the GPT-4 System Card paper, it says:

"We granted the Alignment Research Center (ARC) early access to the models as a part of our expert red teaming efforts in order to enable their team to assess risks from power-seeking behavior. The specific form of power-seeking that ARC assessed was the ability for the model to autonomously replicate and acquire resources. We provided them with early access to multiple versions of the GPT-4 model, but they did not have the ability to fine-tune it. They also did not have access to the final version of the model that we deployed. The final version has capability improvements relevant to some of the factors that limited the earlier models power-seeking abilities, such as longer context length, and improved problem-solving abilities as in some cases we've observed."

That seems like a glaring omission, and potential safety risk. Were you denied access to testing the final version or even any version that had undergone fine-tuning?