Law-Following AI

Wiki Contributions




The offensive technology to make the planet unsuitable for current human civilization ALREADY exists - the defense so far has consisted of convincing people not to use it.

I think this is true in the limit (assuming you're referring primarily to nukes). But I think offense-defense reasoning is still very relevant here: For example, to know when/how much to worry about AIs using nuclear technology to cause human extinction, you would want to ask under what circumstances can humans defend command and control of nuclear weapons from AIs that want to seize them.

We just can't learn much from human-human conflict, where at almost any scale, the victor hopes to have a hospitable environment remaining afterward.

I agree that the calculus changes dramatically if you assume that the AI does not need or want the earth to remain inhabitable by humans. I also agree that, in the limit, interspecies interactions are plausibly a better model than human-human conflicts. But I don't agree that either of these implies that offense-defense logic is totally irrelevant.

Humans, as incumbents, inherently occupy the position of defenders as against the misaligned AIs in these scenarios, at least if we're aware of the conflict (which I grant we might not be). The worry is that AIs will try to gain control in certain ways. Offense-defense thinking is important if we ask questions like:

  1. Can we predict how AIs might try to seize control? I.e., what does control consist in from their perspective, and how might they achieve that given parties' starting positions.
  2. If we have some model of how AIs try to seize control, what does that imply about humanity's ability to defend itself?

The opening statements made it clear that no one involved cared about or was likely even aware of existential risks.

I think this is a significant overstatement given, especially, these remarks from Sen. Hawley:

And I think my question is, what kind of an innovation is [AI] going to be? Is it gonna be like the printing press that diffused knowledge, power, and learning widely across the landscape that empowered, ordinary, everyday individuals that led to greater flourishing, that led above all two greater liberty? Or is it gonna be more like the atom bomb, huge technological breakthrough, but the consequences severe, terrible, continue to haunt us to this day? I don’t know the answer to that question. I don’t think any of us in the room know the answer to that question. Cause I think the answer has not yet been written. And to a certain extent, it’s up to us here and to us as the American people to write the answer.

Obviously he didn't use the term "existential risk." But that's not the standard we should use to determine whether people are aware of risks that could be called, in our lingo, existential. Hawley clearly believes that there is a clear possibility that this could be an atomic-bomb-level invention, which is pretty good (but not decisive) evidence that, if asked, he would agree that this could cause something like human extinction.


Yes, same article. (I'm confused what the question is)


Thanks! I'm a bit confused by this though. Could you point me to some background information on the type of tracking that is done there?


Is there a publicly accessible version of the dataset?


ELI5-level question: Is this conceptually related to one of the key insights/corollaries of the Coase theory, which is that efficient allocations of property requires clearly defined property rights? And, the behavioral econ observation that irrational attachment to the status quo (e.g., endowment effect) can prevent efficient transactions?


Thanks, done. LW makes it harder than EAF to make sequences, so I didn't realize any community member could do so.


If some law is so obviously a good idea in all possible circumstances, the AI will do it whether it is law following or human preference following.

As explained in the second post, I don't agree that that's implied if the AI is intent-aligned but not aligned with some deeper moral framework like CEV.

The question isn't if there are laws that are better than nothing. Its whether we are better encoding what we want the AI to do into laws, or into terms of a utility function. Which format (or maybe some other format) is best for encoding our preferences.

I agree that that is an important question. I think we have a very long track record of embedding our values into law. The point of this sequence is to argue that we should therefore at a minimum explore pointing to (some subset of) laws, which has a number of benefits relative to trying to integrate values into the utility function objectively. I will defend that idea more fully in a later post, but to briefly motivate the idea, law (as compared to something like the values that would come from CEV) is more or less completely written down, much more agreed-upon, much more formalized, and has built-in processes for resolving ambiguities and contradictions.

If the human has never imagined mind uploading, does A go up to the human and explain what it is, asking if maybe that law should be changed?

A cartoon version of this may be that A says "It's not clear whether that's legal, and if it's not legal it would be very bad (murder), so I can't proceed until there's clarification." If the human still wants to proceed, they can try to:

  1. Change the law.
  2. Get a declaratory judgment that it's not in fact against the law.
  1. I haven't read all of Asimov, but in general, "the" law has a much richer body of interpretation and application than the Laws of Robotics did, and also have authoritative, external dispute resolution processes.

  2. I don't think so. The Counselor function is just a shorthand for the process of figuring out how the law might fairly apply to X. An agent may or may not have the drive to figure that out by default, but the goal of an LFAI system is to give it that motivation. Whether it figures out the law by asking another agent or simply reasoning about the law itself is ultimately not that important.

Load More