Earlier this year, Vaniver recommended Bug Bounties for Advanced ML Systems.
I spent a while at Rethink Priorities considering and expanding on this idea, suggesting potential program models, and assessing the benefits and risks of programs like this, which I've called 'AI Safety Bounties':
Short summary
AI safety bounties are programs where public participants or approved security researchers receive rewards for identifying issues within powerful ML systems (analogous to bug bounties in cybersecurity). Safety bounties could be valuable for legitimizing examples of AI risks, bringing more talent to stress-test systems, and identifying common attack vectors.
I expect safety bounties to be worth trialing for organizations working on reducing catastrophic AI risks. Traditional bug bounties seem fairly... (read 1830 more words →)