Untitled Draft
A Safety Guarantee and Its Boundary: Notes on Bengio et al.'s Bayesian Harm Bound A recurring pattern in AI safety arguments Formal safety arguments for capable AI systems tend to share a common structure. A guarantee is derived: the system cannot cause harm, exceeds a harm threshold only with bounded...
Jun 191