Thank you for this post. You are following the standard safety engineering playbook. It is good at dealing with known hazards. However, it has traditionally been based entirely on failure probabilities of components (brakes, engines, landing gear, etc.). Extending it to handle perception is challenging. In particular, the perceptual system must accurately quantify its uncertainty so that the controller can act cautiously under uncertainty. For aleatoric uncertainty (i.e., in-distribution), this is a solved problem. But for epistemic uncertainty, there are ...
I enthusiastically support doing everything we can to reduce vulnerabilities. As I indicated, a foundation model approach to representation learning would greatly improve the ability of systems to detect novelty. Perhaps we should rename the "provable safety" area as "provable safety modulo assumptions" area and be very explicit about our assumptions. We can then measure progress by the extent to which we can shrink those assumptions.