A Case for AI Safety via Law

JWJohnston

A Case for AI Safety via Law — LessWrong

20 A Case for AI Safety via Law

11th Sep 2023

5 min read

20

This post is to make the subject case more available and open to comment. A paper with the above title is currently languishing in arXiv limbo but available in Google Docs. I was surprised to see my last and only postings to LessWrong were made on this very subject in 2010 as comments to the thread "Why not just write failsafe rules into the superintelligent machine?"

Unfortunately (IMO) this approach to AI alignment doesn't seem to have gained much traction in the past 13 years. The paper does cite, however, 15 precedents that show there is some support. (Anthropic's Constitutional AI and John Nay's Law Informs Code are two good recent examples.)

The following reproduces the Summary of Argument and Conclusion sections from the paper.

The claim being argued is "Effective legal systems are the best way to address AI safety."

4 Summary of Argument

4.1 Law is the standard, time-tested, best practice for maintaining
order in societies of intelligent agents.

Law has been the primary way of maintaining functional, cohesive societies for thousands of years. It is how humans establish, communicate, and understand what actions are required, permissible, and prohibited in social spheres. Substantial experience exists in drafting, enacting, enforcing, litigating, and maintaining rules in contexts that include public law, private contracts, and the many others noted in this brief. Law will naturally apply to new species of intelligent systems and facilitate safety and value alignment for all.

4.2 Law is scrutable to humans and other intelligent agents.

Unlike AI safety proposals where rules are learned via examples and encoded in artificial (or biological) neural networks, laws are intended to be understood by humans and machines. Although laws can be quite complex, such codified rules are significantly more scrutable than rules learned through induction. The transparent (white box) nature of law provides a critical advantage over opaque (black box) neural network alternatives.

4.3 Law reflects consensus values.

Democratically developed law is intimately linked and essentially equivalent to consensus ethics. Both are human inventions intended to facilitate the wellbeing of individuals and the collective. They represent shared values culturally determined through rational consideration and negotiation. They reflect the wisdom of crowds accumulated over time—not preferences that vary from person to person and are often based on emotion, irrational ideologies, confusion, or psychopathy. Ethical values provide the virtue core of legal systems and reflect the “spirit of the law.” Consequentialist shells surround such cores and specify the “letter of the law.” This relationship between law and ethics makes law a natural solution for human-AI value alignment. A minority of AIs and people, however powerful, cannot game laws to achieve selfish ends.

4.4 Legal systems are responsive to changes in the environment and changes in moral values.

By utilizing legal mechanisms to consolidate values and update them over time, human and AI values can remain aligned indefinitely as values, technologies, and environmental conditions change. Thus law provides a practical implementation of Yudkowsky’s (2004) Coherent Extrapolated Volition by allowing values to evolve that are wise, aspirational, convergent, coherent, suitably extrapolated, and properly interpreted.

4.5 Legal systems restrict overly rapid change.

Legal processes provide checks and balances against overly rapid change to values and laws. Such checks are particularly important when legal change can occur at AI speeds. Legal systems and laws must adapt quickly enough to address the urgency of issues that arise but not so quickly as to risk dire consequences. Laws should be based on careful analysis and effective simulation and the system be able to quickly detect and correct problems found after implementation. New technologies and methods should be introduced to make legal processing as efficient as possible without removing critical checks and balances.

4.6 Laws are context sensitive, hierarchical, and scalable.

Laws apply to contexts ranging from international, national, state, and local governance to all manner of other social contracts. Contexts can overlap, be hierarchical, or have other relationships. Humans have lived under this regime for millennia and are able to understand which laws apply and take precedence over others based on contexts (e.g., jurisdictions, organization affiliations, contracts in force). Artificial intelligent systems will be able to manage the multitude of contexts and applicable laws by identifying, loading, and applying appropriate legal corpora for applicable contexts. For example, AIs (like humans) will understand that crosschecking is permitted in hockey games but not outside the arena. They will know when to apply rules of the road versus rules of the sea. They will know when the laws of chess apply versus rules of Go. They will know their rights relative to every software agent, tool, and service they interface with.

4.7 AI Safety via Law can address the full range of AI safety risks, from systems that are narrowly focused to those having general intelligence or even superintelligence.

Enacting and enforcing appropriate laws, and instilling law-abiding values in AIs and humans, can mitigate risks spanning all levels of AI capability—from narrow AI to AGI and ASI. If intelligent agents stray from the law, effective detection and enforcement must occur.

Even the catastrophic vision of smarter-than-human-intelligence articulated by Yudkowsky (2022, 2023) and others (Bostrom, 2014; Russell, 2019) can be avoided by effective implementation of AISVL. It may require that the strongest version of the instrumental convergence thesis (which they rely on) is not correct. Appendix A suggests some reasons why AI convergence to dangerous values is not inevitable.

AISVL applies to all intelligent systems regardless of their underlying design, cognitive architecture, and technology. It is immaterial whether an AI is implemented using biology, deep learning, constructivist AI (Johnston, 2023), semantic networks, quantum computers, positronics, or other methods. All intelligent systems must comply with applicable laws regardless of their particular values, preferences, beliefs, and how they are wired.

5 Conclusion

Although its practice has often been flawed, law is a natural solution for maintaining social safety and value alignment. All intelligent agents— biological and mechanical—must know the law, strive to abide by it, and be subject to effective intervention when violated. The essential equivalence and intimate link between consensus ethics and democratic law provide a philosophical and practical basis for legal systems that marry values and norms (“virtue cores”) with rules that address real world situations (“consequentialist shells”). In contrast to other AI safety proposals, AISVL requires AIs “do as we legislate, not as we do.”

Advantages of AISVL include its leveraging of time-tested standard practice; scrutability to all intelligent agents; reflection of consensus values; responsiveness to changes in the environment and in moral values; restrictiveness of overly rapid change; context sensitivity, hierarchical structure, and scalability; and applicability to safety risks posed by narrow, general, and even superintelligent AIs.

For the future safety and wellbeing of all sentient systems, work should occur in earnest to improve legal processes and laws so they are more robust, fair, nimble, efficient, consistent, understandable, accepted, and complied with. (Legal frameworks outside of public law may be effective to this end.) Humans are in dire need of such improvements to counter the dangers that we pose to the biosphere and to each other. It is not clear if advanced AI will be more or less dangerous than humans. Law is critical for both.

Ethics & MoralityInner AlignmentLaw and Legal systemsOuter AlignmentAI

Frontpage

20

A Case for AI Safety via Law

New Comment

12 comments, sorted by

top scoring

Click to highlight new comments since: Today at 10:58 AM

[-]abramdemski3y60

I feel as if there is some unstated idea here that I am not quite inferring. What is the safety approach supposed to be? If there were an organization devoted to this path to AI safety, what activities would that organization be engaged in?

Seth Herd interprets the idea as "regulation". Indeed, this seems like the obvious interpretation. But I suspect it misses your point.

Enacting and enforcing appropriate laws, and instilling law-abiding values in AIs and humans, can mitigate risks spanning all levels of AI capability—from narrow AI to AGI and ASI. If intelligent agents stray from the law, effective detection and enforcement must occur.

The first part is just "regulation". The second part, "instilling law-abiding values in AIs and humans", seems like a significant departure. It seems like the proposal involves both (a) designing and enacting a set of appropriate laws, and (b) finding and deploying a way of instilling law-abiding values (in AIs and humans). Possibly (a) includes a law requiring (b): AIs (and AI-producing organizations) must be designed so as to have law-abiding values within some acceptable tolerances.

This seems like a very sensible demand, but it does seem like it has to piggyback on some other approach to alignment, which would solve the object-level instilling-values problem.

Even the catastrophic vision of smarter-than-human-intelligence articulated by Yudkowsky (2022, 2023) and others (Bostrom, 2014; Russell, 2019) can be avoided by effective implementation of AISVL. It may require that the strongest version of the instrumental convergence thesis (which they rely on) is not correct. Appendix A suggests some reasons why AI convergence to dangerous values is not inevitable.
AISVL applies to all intelligent systems regardless of their underlying design, cognitive architecture, and technology. It is immaterial whether an AI is implemented using biology, deep learning, constructivist AI (Johnston, 2023), semantic networks, quantum computers, positronics, or other methods. All intelligent systems must comply with applicable laws regardless of their particular values, preferences, beliefs, and how they are wired.

If the approach does indeed require "instilling law-abiding values in AI", it is unclear why "AISVL applies to all intelligent systems regardless of their underlying design". The technology to instill law-abiding values may apply to specific underlying designs, specific capability ranges, etc. I guess the idea is that part (a) of the approach, the laws themselves, apply regardless. But if part (b), the value-instilling part, has limited applicability, then this has the effect of simply outlawing designs not compatible. That's fine, but "AISVL applies to all intelligent systems regardless of their underlying design" seems to dramatically over-sell the applicability of the approach in that case. Or perhaps I'm misunderstanding.

Similarly, "AI safety via law can address the full range of safety risks" seems to over-sell the whole section, a major point of which is to claim that AISVL does not apply to the strongest instrumental-convergence concerns. (And why not, exactly? It seems like, if the value-instilling tech existed, it would indeed avert the strongest instrumental-convergence concerns.)

[-]JWJohnston3y10

I feel as if there is some unstated idea here that I am not quite inferring. What is the safety approach supposed to be? If there were an organization devoted to this path to AI safety, what activities would that organization be engaged in?

The summary I posted here was just a teaser to the full paper (linked in pgph. 1). That said, your comments show you reasoned pretty closely to points I tried to make therein. Almost no need to read it. :)

The first part is just "regulation". The second part, "instilling law-abiding values in AIs and humans", seems like a significant departure. It seems like the proposal involves both (a) designing and enacting a set of appropriate laws, and (b) finding and deploying a way of instilling law-abiding values (in AIs and humans). Possibly (a) includes a law requiring (b): AIs (and AI-producing organizations) must be designed so as to have law-abiding values within some acceptable tolerances.

The main message of the paper is along the lines of "a." That is, per the claim in the 4th pgph, "Effective legal systems are the best way to address AI safety." I'm arguing that having effective legal systems and laws are the critical things. How laws/values get instilled in AIs (and humans) is mostly left as an exercise for the reader. Your point about "simply outlawing designs not compatible" is reasonable.

The way I put it in the paper (sect. 3, pgph. 2): "Many of the proposed non-law-based solutions may be worth pursuing to help assure AI systems are law abiding. However, they are secondary to having a robust, well-managed, readily available corpus of codified law—and complimentary legal systems—as the foundation and ultimate arbiter of acceptable behaviors for all intelligent systems, both biological and mechanical."

Later I write, "Suggested improvements to law and legal process are mostly beyond the scope of this brief. It is possible, however, that significant technological advances will not be needed for implementing some key capabilities. For example, current Large Language Models are nearly capable of understanding vast legal corpora and making appropriate legal decisions for humans and AI systems (Katz et al., 2023). Thus, a wholesale switch to novel legal encodings (e.g., computational and smart contracts) may not be necessary."

I suspect some kind of direct specification approach (per Bostrom classification) could work where AIs confirm that (non-trivial) actions they are considering comply with legal corpora appropriate to current contexts before taking action. I presume techniques used by the self-driving-car people will be up to the task for their application.

"AI safety via law can address the full range of safety risks" seems to over-sell the whole section, a major point of which is to claim that AISVL does not apply to the strongest instrumental-convergence concerns. (And why not, exactly? It seems like, if the value-instilling tech existed, it would indeed avert the strongest instrumental-convergence concerns.)

I struggled with what to say about AISVL wrt superintelligence and instrumental convergence. Probably should have let the argument ride without hedging, i.e., superintelligences will have to comply with laws and the demands of legal systems. They will be full partners with humans in enacting and enforcing laws. It's hard to just shrug off the concerns of the Yudkowskys, Bostroms, and Russells of the world.

Perhaps the most important and (hopefully) actionable recommendation of the proposal is in the conclusion:

"For the future safety and wellbeing of all sentient systems, work should occur in earnest to improve legal processes and laws so they are more robust, fair, nimble, efficient, consistent, understandable, accepted, and complied with."

[-]abramdemski3y30

Almost no need to read it. :)

fwiw, I did skim the doc, very briefly.

The main message of the paper is along the lines of "a." That is, per the claim in the 4th pgph, "Effective legal systems are the best way to address AI safety." I'm arguing that having effective legal systems and laws are the critical things. How laws/values get instilled in AIs (and humans) is mostly left as an exercise for the reader. Your point about "simply outlawing designs not compatible" is reasonable.
The way I put it in the paper (sect. 3, pgph. 2): "Many of the proposed non-law-based solutions may be worth pursuing to help assure AI systems are law abiding. However, they are secondary to having a robust, well-managed, readily available corpus of codified law—and complimentary legal systems—as the foundation and ultimate arbiter of acceptable behaviors for all intelligent systems, both biological and mechanical."

In that case, I agree with Seth Herd that this approach is not being neglected. Of course it could be done better. I'm not sure exactly how many people are working on it, but I have the impression that it is more than a dozen, since I've met some of them without trying.

I suspect some kind of direct specification approach (per Bostrom classification) could work where AIs confirm that (non-trivial) actions they are considering comply with legal corpora appropriate to current contexts before taking action. I presume techniques used by the self-driving-car people will be up to the task for their application.

I think this underestimates the difficulty of self-driving cars. In the application of self-driving airplanes (on runways, not in the air), it is indeed possible to make an adequate model of the environment, such that neural networks can be verified to follow a formally specified set of regulations (and self-correct from undesired states to desired states). With self-driving cars, the environment is far too complex to formally model in that way. You get to a point where you are trusting one AI model (of the complex environment) to verify another. And you can't explore the whole space effectively, so you still can't provide really strong guarantees (and this translates to errors in practice).

I struggled with what to say about AISVL wrt superintelligence and instrumental convergence. Probably should have let the argument ride without hedging, i.e., superintelligences will have to comply with laws and the demands of legal systems. They will be full partners with humans in enacting and enforcing laws. It's hard to just shrug off the concerns of the Yudkowskys, Bostroms, and Russells of the world.

It seems to me like you are somewhat shrugging off those concerns, since the technological interventions (eg smart contracts, LLMs understanding laws, whatever self-driving-car people get up to) are very "light" in the face of those "heavy" concerns. But a legal approach need not shrug off those concerns. For example, law could require the kind of verification we can now apply to airplane autopilot be applied to self-driving-cars as well. This would make self-driving illegal in effect until a large breakthrough in ML verification takes place, but it would work!

[-]JWJohnston3y10

I'm not sure exactly how many people are working on it, but I have the impression that it is more than a dozen, since I've met some of them without trying.

Glad to hear it. I hope to find and follow such work. The people I'm aware of are listed on pp. 3-5 of the paper. Was happy to see O'Keefe, Bai et al. (Anthropic), and Nay leaning this way.

It seems to me like you are somewhat shrugging off those concerns, since the technological interventions (eg smart contracts, LLMs understanding laws, whatever self-driving-car people get up to) are very "light" in the face of those "heavy" concerns. But a legal approach need not shrug off those concerns. For example, law could require the kind of verification we can now apply to airplane autopilot be applied to self-driving-cars as well. This would make self-driving illegal in effect until a large breakthrough in ML verification takes place, but it would work!

Yes. I'm definitely being glib about implementation details. First things first. :)

I agree with you that if self-driving-cars can't be "programmed" (instilled) to be adequately law-abiding, their future isn't bright. Per above, I'm heartened by Anthropic's Constitutional AI (priming LLMs with basic "laws") having some success getting AIs to behave. Ditto for anecdotes I've heard about "asking an LLM to come up with a money-making plan that doesn't violate any laws." Seems too easy right?

One final comment about implementation details. In the appendix I note:

We suspect emergence of instrumental values is not inevitable for any “sufficiently advanced AI system.” Rather, whether such values emerge depends on what cognitive architecture and environmental conditions (training regimens) are used.

Broadly speaking, implementing AIs using safe architectures (ones not prone to law-breaking) is another implementation direction. Drexler's CAIS may be an example.

[-]abramdemski3y20

Would you count all the people who worked on the EU AI act?

[-]JWJohnston3y10

Sure. Getting appropriate new laws enacted is an important element. From the paper:

Initially, in addition to adopting existing bodies of law to implement AISVL, existing processes for how laws are drafted, enacted, enforced, litigated, and maintained would be preserved.
Thereafter, new laws and improvements to existing laws and processes must continually be introduced to make the systems more robust, fair, nimble, efficient, consistent, understandable, accepted, complied with, and enforced.

I'd say the EU AI Act (and similar) work addresses the "new laws" imperative. (I won't comment (much) on pros and cons of its content. In general, it seems pretty good. I wonder if they considered adding Etzioni's first law to the mix, "An AI system must be subject to the full gamut of laws that apply to humans"? That is what I meant by "adopting existing bodies of law to implement AISVL." The item in the EU AI Act about designing generative AIs to not generate illegal content is related.)

The more interesting work will be on improving legal processes along the dimensions listed above. And really interesting will be, as AIs get more autonomous and agentic, the "instilling" part where AIs must dynamically recognize and comply with the legal-moral corpora appropriate to the contexts they find themselves in.

[-]Seth Herd3y5-3

I think you'll find this topic discussed a lot, both pro and con, under the term "regulation".

[-]Q Home3y10

Perhaps the most important and (hopefully) actionable recommendation of the proposal is in the conclusion:

"For the future safety and wellbeing of all sentient systems, work should occur in earnest to improve legal processes and laws so they are more robust, fair, nimble, efficient, consistent, understandable, accepted, and complied with." (comment)

Sorry for sounding harsh. But to say something meaningful, I believe you have to argue two things:

Laws are distinct enough from human values (1), but following laws / caring about laws / reporting about predicted law violations prevents the violation of human values (2).

I think the post fails to argue both points. I see no argument that instilling laws is distinct enough from instilling values/corrigibility/human semantics in general (1) and that laws actually prevent misalignment (2).

Later I write, "Suggested improvements to law and legal process are mostly beyond the scope of this brief. It is possible, however, that significant technological advances will not be needed for implementing some key capabilities. For example, current Large Language Models are nearly capable of understanding vast legal corpora and making appropriate legal decisions for humans and AI systems (Katz et al., 2023). Thus, a wholesale switch to novel legal encodings (e.g., computational and smart contracts) may not be necessary."

If AI can be just asked to follow your clever idea, then AI is already safe enough without your clever idea. "Asking AI to follow something" is not what Bostrom means by direct specification, as far as I understand.

[-]JWJohnston3y10

I believe you have to argue two things:
Laws are distinct enough from human values (1), but following laws / caring about laws / reporting about predicted law violations prevents the violation of human values (2).
I think the post fails to argue both points. I see no argument that instilling laws is distinct enough from instilling values/corrigibility/human semantics in general (1) and that laws actually prevent misalignment (2).

My argument goes in a different direction. I reject premise (1) and claim there is an "essential equivalence and intimate link between consensus ethics and democratic law [that] provide a philosophical and practical basis for legal systems that marry values and norms (“virtue cores”) with rules that address real world situations (“consequentialist shells”)."

In the body of the paper I characterize democratic law and consensus ethics as follows:

Both are human inventions intended to facilitate the wellbeing of individuals and the collective. They represent shared values culturally determined through rational consideration and negotiation. To be effective, democratic law and consensus ethics should reflect sufficient agreement of a significant majority of those affected. Democratic law and consensus ethics are not inviolate physical laws, instinctive truths, or commandments from deities, kings, or autocrats. They do not represent individual values, which vary from person to person and are often based on emotion, irrational ideologies, confusion, or psychopathy.

That is, democratic law corresponds to the common definition of Law. Consensus ethics is essentially equivalent to human values when understood in the standard philosophical sense as "shared values culturally determined through rational consideration and negotiation." In short, I'm of the opinion "Law = Ethics."

Regarding your premise (2): See my reply to Abram's comment. I'm mostly ducking the "instilling" aspects. I'm arguing for open, external, effective legal systems as the key to AI alignment and safety. I see the implementation/instilling details as secondary.

If AI can be just asked to follow your clever idea, then AI is already safe enough without your clever idea. "Asking AI to follow something" is not what Bostrom means by direct specification, as far as I understand.

My reference to Bostrom's direct specification was not intended to match his use, i.e., hard coding (instilling) human values in AIs. My usage refers to specifying rules/laws/ethics externally so they are available and usable by all intelligent systems. Of the various alignment approaches Bostrom mentioned (and deprecated), I thought direct specification came closest to AISVL.

[-]Q Home3y10

Maybe there's a misunderstanding. Premise (1) makes sure that your proposal is different from any other proposal. It's impossible to reject premise (1) without losing the proposal's meaning.

Premise (1) is possible to reject only if you're not solving Alignment but solving some other problem.

I'm arguing for open, external, effective legal systems as the key to AI alignment and safety. I see the implementation/instilling details as secondary. My usage refers to specifying rules/laws/ethics externally so they are available and usable by all intelligent systems.

If an AI can be Aligned externally, then it's already safe enough. It feels like...

You're not talking about solving Alignment, but talking about some different problem. And I'm not sure what that problem is.
For your proposal to work, the problem needs to be already solved. All the hard/interesting parts need to be already solved.

[-]JWJohnston3y-10

If an AI can be Aligned externally, then it's already safe enough. It feels like...
You're not talking about solving Alignment, but talking about some different problem. And I'm not sure what that problem is.
For your proposal to work, the problem needs to be already solved. All the hard/interesting parts need to be already solved.

I'm talking about the need for all AIs (and humans) to be bound by legal systems that include key consensus laws/ethics/values. It may seem obvious, but I think this position is under-appreciated and not universally accepted.

By focusing on the external legal system, many key problems associated with alignment (as recited in the Summary of Argument) are addressed. One worth highlighting is 4.4, which suggests AISVL can assure alignment in perpetuity despite changes in values, environmental conditions, and technologies, i.e., a practical implementation of Yudkowsky's CEV.

[-]Q Home3y10

Maybe you should edit the post to add something like this:

My proposal is not about the hardest parts of the Alignment problem. My proposal is not trying to solve theoretical problems with Inner Alignment or Outer Alignment (Goodhart, loopholes). I'm just assuming those problems won't be relevant enough. Or humanity simply won't create anything AGI-like (see CAIS).

Instead of discussing the usual problems in Alignment theory, I merely argue X. X is not a universally accepted claim, here's evidence that it's not universally accepted: [write the evidence here].

...

By focusing on the external legal system, many key problems associated with alignment (as recited in the Summary of Argument) are addressed. One worth highlighting is 4.4, which suggests AISVL can assure alignment in perpetuity despite changes in values, environmental conditions, and technologies, i.e., a practical implementation of Yudkowsky's CEV.

I think the key problems are not "addressed", you just assume they won't exist. And laws are not a "practical implementation of CEV".

Moderation Log