x

LESSWRONG

LW

Ondřej_Kubů

Ondřej_Kubů

Message

14

1

4

4y

Ondřej_Kubů

14

4y

Ondřej_Kubů — LessWrong

Untitled Draft

A Safety Guarantee and Its Boundary: Notes on Bengio et al.'s Bayesian Harm Bound A recurring pattern in AI safety arguments Formal safety arguments for capable AI systems tend to share a common structure. A guarantee is derived: the system cannot cause harm, exceeds a harm threshold only with bounded...