This whitepaper was a preliminary approach to using proof certificates in a secure program synthesis protocol last summer.
We would like to put the AI in a box. We show how to create an interface between the box and the world out of specifications in Lean. It is the AI's responsibility to provide a proof that its (restricted) output abides by the spec. The runnable prototype is at https://github.com/for-all-dev/formal-confinement.
In the seminal "A note on the confinement problem", Lampson 1973 states confinement rules to reason about when a program is or isn't confined.
Building on Lampson, Yampolskiy 2012 considers confinement of AI, especially superintelligence.
In the hopes of starting a new subfield of computer security, AI Safety Engineering, we define the Artificial Intelligence Confinement Problem (AICP) as the challenge of restricting an artificially intelligent entity to a confined environment from which it can’t exchange information with the outside environment via legitimate or covert channels if such information exchange was not authorized by the confinement authority
The current formal confinement work explores a restriction of Yampolskiy's AICP to the imp programming language, where the confinement authority is the user providing precondition and postcondition backed by Lean. In fact, the setting is so restricted that we meet the Lampson rules almost for free.
In our toy setting, we report verification burdens (ratios of how much more expensive it is in cognition to ship a proof once you've already shipped a program) between 1.5 and 4.2.