LESSWRONG
LW

Shivam
2320
Message
Dialogue
Subscribe
  • PhD in Geometric Group Theory >> Postdoc in Machine Learning >> Independent AI safety and AI alignment Research.
  • Looking for mentors in AI safety.
  • Please feel free to contact at shivamaroramath@gmail.com

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
1Shivam's Shortform
6mo
1
AI companies' eval reports mostly don't support their claims
Shivam3mo10

I completely agree and this is what I was thinking here in this short form, that we should have an equivalence of capability and risks benchmarks. As all AI companies try to beat the capability benchmark and fail the safety/risks benchmarks due to obvious bias. 
The basis premise of this equivalency being that if the model is smart enough to beat x benchmark then it is good enough to construct or help with CBR attacks, especially given that they can not be guaranteed to be immune to jailbreaks. 

Reply
Shivam's Shortform
Shivam6mo10

An important work in AI safety should be to prove equivalency of various Capability benchmarks to Risk benchmarks. So that, when AI labs show their model is crossing a capability benchmark, they are automatically crossing a AI safety level. 
"So we don't have two separate reports from them; one saying that the model is a PhD level Scientist, and the other saying that studies shows that the CBRN risk with model is not more than internet search." 

Reply
1Shivam's Shortform
6mo
1
1The Road to Evil Is Paved with Good Objectives: Framework to Classify and Fix Misalignments.
7mo
0
2Limits of safe and aligned AI
11mo
0