LESSWRONG
LW

Steve_Omohundro — LessWrong

Replying toIn response to critiques of Guaranteed Safe AI

In response to critiques of Guaranteed Safe AI

Thanks Nora for an excellent response! With R1 and o3, I think we are rapidly moving to the point where AI theorem proving, autoformalization, and verified software and hardware synthesis will be powerful and widely available. I believe it is critically important for many more people to understand the importance of formal methods for AI Safety.

Computer science, scientific computing, cryptography, and other fields have repeatedly had "formal methods skeptics" who argue for "sloppy engineering" rather than formal verification. This has been a challenge ever since Turing proved the first properties of programs back in 1949. This hit a tragic peak in 1979 when three leading computer scientists wrote a CACM paper arguing... (read 978 more words →)

Replying toProveably Safe Self Driving Cars [Modulo Assumptions]

Steve_Omohundro1y

Proveably Safe Self Driving Cars [Modulo Assumptions]

David, I really like that your description discusses the multiple interacting levels involved in self-driving cars (eg. software, hardware, road rules, laws, etc.). Actual safety requires reasoning about those interacting levels and ensuring that the system as a whole doesn't have vulnerabilities. Malicious humans and AIs are likely to try to exploit systems in unfamiliar ways. For example, here are 10 potential harmful actions (out of many many more possibilities!) that AI-enabled malicious humans or AIs could gain benefits from which involve controlling self-driving cars in harmful ways:

1) Murder for hire: Cause accidents that kill a car's occupant, another car's occupant, or pedestrians

2) Terror for hire: Cause accidents that injure groups of... (read more)

Replying toProveably Safe Self Driving Cars [Modulo Assumptions]

Steve_Omohundro1y

Proveably Safe Self Driving Cars [Modulo Assumptions]

Hear Hear! Beautifully written! Safety is a systems issue and if components at any level have vulnerabilities, they can spread to other parts of the system. As you write, we already have effective techniques for formal verification at many of the lower levels: software, OS, chips, etc. And, as you say, the AI level and social levels are more challenging. We don't yet have a detailed understanding of transformer-based controllers and methods like RLHF diminish but don't eliminate harmful behaviors. But if we can choose precise coarse safety criteria, then formal gatekeepers between the AI and the actuators can ensure those constraints are obeyed.

At the social level, you mention legality and... (read more)

Replying toLimitations on Formal Verification for AI Safety

Steve_Omohundro1y

Limitations on Formal Verification for AI Safety

People seem to be getting the implication we intended backwards. We're certainly not saying "For any random safety property you might want, you can use formal methods to magically find the rules and guarantee them!" What we are saying is "If you have a system which actually guarantees a safety property, then there is formal proof of that and there are many many benefits to making that explicit, if it is practical." We're not proposing any new physics, any new mathematics, or even any new AI capabilities other than further development of current capabilities in theorem proving, verified software synthesis, and autoformalization.

Humanity is in a crisis right now! Even without AI we... (read 419 more words →)

Replying toLimitations on Formal Verification for AI Safety

Steve_Omohundro1y

Limitations on Formal Verification for AI Safety

Thinking more about the question of are there properties which we believe but for which we have no proof. And what do we do about those today and in an intended provable future?

I know of a few examples, especially in cryptography. One of the great successes of theoretical cryptography was the reduction the security of a whole bunch of cryptographic constructs to a single one: the existence of one way functions which are cheap to compute but expensive to invert: https://en.wikipedia.org/wiki/One-way_function That has been a great unifying discovery there and the way the cryptographers deal with it is that they just add the existence of one way functions as an extra axioms... (read 597 more words →)

Replying toLimitations on Formal Verification for AI Safety

Steve_Omohundro1y

Limitations on Formal Verification for AI Safety

Yes, thanks Steve! Very interesting examples! As I understand most chip verification is based on SAT solvers and "Model Checking" https://en.wikipedia.org/wiki/Model_checking . This is a particular proof search technique which can often provide full coverage for circuits. But it has no access to any kind of sophisticated theorems such as those in the Lean mathematics library. For small circuits, that kind of technique is often fine. But as circuits get more complex, it is subject to the "state explosion problem".

Looking at the floating point division paper, the double precision divider took 7 hours and 30 minutes indicating a lot of search! But one great thing about proofs is that their validity doesn't... (read 367 more words →)

Replying toLimitations on Formal Verification for AI Safety

Steve_Omohundro1y

Limitations on Formal Verification for AI Safety

I agree that would be better than what we usually have now! And is more in the "Swiss Cheese" approach to security. From a practical perspective, we are probably going to have do that for some time: components with provable properties combined in unproven ways. But every aspect which is unproven is a potential vulnerability.

The deeper question is whether there are situations where it has to be that way. Where there is some barrier to modeling the entire system and formally combining correctness and security properties of components to obtain them for the whole system.

Certainly there are hardware and software components whose detailed behavior is computationally complex to predict in advance (eg.... (read more)

Replying toLimitations on Formal Verification for AI Safety

Steve_Omohundro1y

Limitations on Formal Verification for AI Safety

I totally agree in today's world! Today, we have management protocols which are aimed at requiring testing and record keeping to ensure that boats and ships in the state we would like them to be. But these rules are subject to corruption and malfeasance (such as the 420 Boeing jets which incorporated defective parts and yet which are currently flying with passengers: https://doctorow.medium.com/https-pluralistic-net-2024-05-01-boeing-boeing-mrsa-2d9ba398bd54 )

But it appears we are rapidly moving to a world in which much of the physical labor will be done by robots and in which each physical system will have a corresponding "digital twin" (eg. https://www.nvidia.com/en-us/omniverse/solutions/digital-twins/ ).

In that world, we can implement provable formal rules governing every system, from raw materials, to manufacture, to supply chain, to operations, and to maintenance.

In an AI world, much more sophisticated malfeasance can occur. Formal models of domains with proofs of adherence to rules and protection against adversaries is the only way to ensure our systems are safe and effective.

Replying toLimitations on Formal Verification for AI Safety

Steve_Omohundro1y

Limitations on Formal Verification for AI Safety

Testing is great for a first pass! And in non-critical and non-adversarial settings, testing can give you actual probabilistic bounds. If the probability distribution of the actual uses is the same as the testing distribution (or close enough to it), then the test statistics can be used to bound the probability of errors during use. I think that is why formal methods are so rarely used in software: testing is pretty good and if errors show up, you can fix them then. Hardware has greater adoption of formal methods because it's much more expensive to fix errors after the fact.

But the real problems arise from adversarial attacks. The statistical correctness of a... (read more)

Replying toLimitations on Formal Verification for AI Safety

Steve_Omohundro1y

Limitations on Formal Verification for AI Safety

In general, we can't prevent physical failures. What we can do is to accurately bound the probability of them occurring, to create designs which limit the damage that they cause, and to limit the ability of adversarial attacks to trigger and exploit them. We're advocating for humanity's entire infrastructure to be upgraded with provable technology to put guaranteed bounds on failures at every level and to eliminate the need to trust potentially flawed or corrupt actors.

In the case of the ship, there are both questions about the design of that ship's components and its provenance. Why did the backup power not enable the propulsion system to stop? Why wasn't there a "failsafe"... (read 399 more words →)

Provably Safe AI: Worldview and Projects

Ben Goldhaber

Ben Goldhaber, Steve_Omohundro

In September 2023, Max Tegmark and Steve Omohundro proposed "Provably Safe AI" as a strategy for AI Safety. In May 2024, a larger group delineated the broader concept of "Guaranteed Safe AI" which includes Provably Safe AI and other related strategies. In July, 2024, Ben Goldhaber and Steve discussed Provably Safe AI and its future possibilities, as summarized in this document.

Background

In June 2024, ex-OpenAI AI Safety Researcher Leopold Aschenbrenner wrote a 165-page document entitled "Situational Awareness, The Decade Ahead" summarizing AI timeline evidence and beliefs which are shared by many frontier AI researchers. He argued that human-level AI is likely by 2027 and will likely lead to superhuman AI in 2028 or 2029. "Transformative AI"... (read 1863 more words →)