Running Lightcone Infrastructure, which runs LessWrong and Lighthaven.space. You can reach me at habryka@lesswrong.com.
(I have signed no contracts or agreements whose existence I cannot mention, which I am mentioning here as a canary)
I believe something like this, but it doesn't have anything to do with this paragraph:
A bunch of people I know think that OpenAI's "just make the models obey orders" strategy is actually better than Anthropic's strategy, because Anthropic is training the models to have long-term goals (even if there are also hard constraints) and that makes it a lot easier for the AI to end up concluding that it needs to subvert human oversight and control mechanisms for the greater good. If there's no greater good, only obeying the given instructions of the day, then maybe there's less of a problem.
The issue with Anthropic's plan is that it just seems wildly optimistic about ambitious value learning, and as such makes the feedback loop here pretty terrible. If you try to make your system have complicated goals you can't treat failure to cooperate with you as a clear warning flag, and so you break the most useful Schelling point for coordination to stop AI development, or to propagate knowledge about the state of things (and in-exchange you get approximately 0% of a chance of creating a Claude sovereign that will steer humanity towards a glorious future).
Appreciate the factors! Agree on most of them being quite important. One quick note:
One thing you leave out is mass public opinion, and all the various ways that can be effective -- demonstrators in the streets, general strike, cessation of quasi-voluntary compliance in all the areas where the government requires it, and so on, perhaps insurgency or terrorism in extremis. Layer onto that the various additional actions available to economic elites. The real hope for the Supreme Court is that the public takes its side in some extreme crisis, and that a clear ruling on its part serves as the focal point to kick all of that off.
Yeah, my analysis here was focused on what the supreme court and judiciary can do, from a constitutionalist perspective. My sense is the constitution doesn't really allow insurrection under almost any circumstance, but does also maybe kind of expect it's an important thing to maintain the threat of (hence the right to bear arms). I would be interested in someone analyzing when the constitution would permit a private citizen to take up arms against a sitting government (if any such circumstance exists).
I was really trying to write this post largely from a "what would be the options for the judicial branch" in a generic way where it would apply to many presidencies, and trying to keep specific partisan judgements out of it.
To be clear, I do think pretty scary things are happening with U.S. democracy right now, and my motivation and attention is driven by what makes sense to do about a Trump presidency, but I still think it's usually best to keep things focused on more general principles that could apply to many situations.
"The military should renounce the elected president and fight against the government" is not something to say lightly, and, regardless of who won the resulting conflict, life would be perilous and uncomfortable for everyone living in America for several decades thereafter.
Totally! And just for the sake of clarity, I absolutely do not think the current military should renounce the elected president and fight against the executive branch (you used the word "government" but to be clear, the supreme court and the states are also the government!). I do think what the actual military is supposed to do from a constitutionalist perspective when different parts of the government disagree and give conflicting orders is quite important and a pretty tricky question that I didn't know the answer to before I researched and wrote this (and still have a lot of uncertainty on).
Could you point to your source for the claim about the Marshall's Service falling under the Judicial Branch of the government? My understanding is that his belongs to the DoJ so would fall under the Executive Branch.
Source: I made it up!
Apparently I was wrong. There is a Marshal under the direct control of the supreme court, but it's just a single guy, who does control a police force, but the mandate of that police force is to protect the supreme court, not to enforce orders. I'll try to update the post with my new understanding tonight.
Hmm, yeah, I think I did get confused here! For people who want to learn more about the details of the authority of the different Marshalls, I liked this: https://www.congress.gov/crs-product/LSB11271
Enforcement of Court Orders Against the Executive Branch
Months into the second Trump Administration, a number of executive branch policies have been challenged in court, and several federal district courts have enjoined enforcement of some of the challenged policies. As one example, on January 31, 2025, a judge on the U.S. District Court for the District of Rhode Island issued a temporary restraining order (TRO) barring the Trump Administration from enforcing a federal funding freeze with respect to a number of states that had challenged the freeze. On February 10, 2025, after the states alleged that the government was not complying with the TRO, the court granted a motion for enforcement of the TRO requiring the government, among other things, to "immediately end any federal funding pause during the pendency of the TRO."
[...]
When a federal court imposes contempt sanctions, the U.S. Marshals Service enforces the order, including by arresting persons ordered imprisoned for contempt. The U.S. Marshals Service is an executive branch agency within the Department of Justice. Some commentators have expressed concerns that, if the executive branch chose to defy a court order, it might also seek to prevent the U.S. Marshals from enforcing contempt sanctions. The U.S. Marshals are required by statute to "execute all lawful writs, process, and orders issued under the authority of the United States." The 2018 review of contempt against the federal government notes that, historically, Presidents have complied with federal court orders and have not directed the U.S. Marshals not to enforce contempt orders. The President's pardon power applies to criminal contempt but does not apply to civil contempt sanctions.
In theory, the whole process from injunction to contempt to sanctions might proceed exclusively in a district court. In practice, however, it is likely that one or more appellate courts would also be involved. A court order fining or imprisoning a person held in civil contempt generally may not be appealed until the court enters a final judgment. However, a district court order granting injunctive relief is usually immediately appealable to the appropriate federal appellate court, and rulings of the appeals courts related to injunctive relief may immediately be challenged via a petition for a writ of certiorari to the Supreme Court (though the Court has discretion whether to consider such matters). A conviction for criminal contempt is immediately appealable.
I might edit the post to account for my confusions.
the president almost always follows court orders or court opinions. It's a very ingrained norm in the US that court orders, especially from the Supreme Court, are binding.
Sorry if this wasn't clear. The whole point of this exploration is to figure out what happens when the president does not follow court orders. I will adjust the intro to clarify that.
I agree this would be approximately unprecedented! But it seems very much a scenario worth exploring. I made these edits to make that clearer:
So, let’s say the supreme court wants to stop a sitting president from destroying democracy in America. First, they release a judgement saying that something the executive branch is doing is unconstitutional. Hopefully the U.S. president agrees and then just stops doing that. But what happens when the executive branch keeps doing it anyways?
[...]
So, with this background knowledge, I see roughly 4 big ways the supreme court can try to reign in an out of control executive branch that isn't listening to a judgement they made:
Hope that makes it clearer to future readers!
I’ve been thinking for a while about what happens in the U.S. if the sitting president does a bunch of crazy stuff that is kind of clearly unconstitutional, or interferes with the legitimate democratic process, and this becomes clear to other parts of government.
At a high level, when one of the three branches of government (the executive, the legislative and the judicial branch) in the U.S. starts going off the rails, the other two both have some tools to stop the crazy branch. For now I think it makes sense to focus on what tools the judicial branch has, whose highest authority is the Supreme Court.
Let’s say the supreme court wants to stop a sitting president from destroying democracy in America. First, they release a judgement saying that something the executive branch is doing is unconstitutional. Hopefully the U.S. president agrees and then just stops doing that. But what happens when the executive branch keeps doing it anyways?
All federal officers (which are all part of the executive branch and approximately all under the direct command of the president) swear an oath to “support and defend the Constitution of the United States against all enemies, foreign and domestic”. This generally means that if an officer gets an order to do something unconstitutional, they are supposed to refuse that order (and this seems at least somewhat culturally real and not just a formality).
Now, how does an officer know whether an order they receive is unconstitutional? Historically matters of interpretation of the U.S. constitution have largely been delegated to the supreme court. However, this is not an ironclad rule or something the constitution itself specifies! The constitution does not say who has ultimate authority about its interpretation. In-practice most federal offices have deferred to what the Supreme Court says, but we haven’t really seen what happens when e.g. a sitting president insists on an interpretation of the constitution that disagrees, and the constitution itself provides no clear answer to what is supposed to happen.
So, with this background knowledge, I see roughly 4 big ways the supreme court can try to rein in an out of control executive branch that isn't listening to a judgement they made:
1. Send in the Supreme Court Marshal, hope that no one stops them
Turns out, the supreme court has guns! And they are allowed to use them! These guns come in the form of the Marshal of the Supreme Court who is under the direct control of the judicial branch and directs a small (~200 person) police force. He is allowed to make arrests and generally enforce the court's judgements. This would (as far as I know) include authority to jail the sitting president or other high-level federal officers.[1]
Unfortunately, from the perspective of the supreme court, these are really not very many guns. This basically means that in order for any order to successfully get enforced, approximately all federal law enforcement (and military officers domestically deployed) would need to refuse the orders they would surely receive by the sitting president to prevent the marshals from jailing them or any other high-level officials in the federal government.
This might happen! If enough federal officers do decide to defer to the Supreme Court, and to take their oaths to the constitution seriously, then it is not implausible to imagine that they would let the marshal do their job.
2. Call on federal officers more broadly to refuse orders on the grounds of unconstitutionality
If the Supreme Court doesn’t want to send in people with guns, they can call on federal officers to refuse orders that are unconstitutional. This makes particularly much sense if there is some ongoing operation of a federal agency that is threatening the constitution.
The big issue with this strategy is that for almost all of these positions, the president is just able to fire whoever refuses to obey the president’s orders, and in most circumstances can appoint a replacement (or just give orders directly to lower-level employees). This means in order for this to be effective, there needs to be relatively widespread buy-in for many federal officers to refuse at the same time, such that replacement on realistic timelines becomes infeasible.
Importantly, as far as I can tell, from a purely constitutional perspective the supreme court has no more authority to direct any members of the executive branch to do anything than I do. Their only constitutional power is to call for federal officers to refuse to do something. Asking them to do anything proactive would go relatively clearly against their mandate.
3. Hope that declaring the current president to be violating the constitution causes Congress to impeach
According to the constitution it’s congress’s job to determine whether the sitting president needs to be removed because they are violating the constitution. So one would hope that the supreme court taking a pretty clear stance here would increase the likelihood of congress moving to impeach the sitting president.
Of course, even if they want to do that, a crucial question becomes what tools the president has to prevent congress from impeaching them. I haven’t looked into it enough to have really any idea how this situation plays out.
4. Call for the states to do something about the executive branch
The other big player in the balance of power of the United States are the state governments. My current best understanding is that the states don’t really have any authority to interfere with what the federal government wants to do, but that hasn’t stopped the states in the past. A supreme court judgement might very well catalyze actions by e.g. the state to use state police forces or state-aligned parts of the national guard to prevent federal officers from taking actions judged unconstitutional by the supreme court.
If this kind of thing happens, I think a lot of it ends up coming down to what the U.S. military does. My current model is that due to the Insurrection Act the U.S. president basically can just deploy the military domestically whenever he wants, and this seems unlikely to be disputed, so anything that would approach substantial violent conflict would probably be met with opposition by the full power of the U.S. military, which are quite solidly under the direct command of the president (of course, possibly enough military personnel would refuse orders to make such action not decisive, but at least from a constitutional perspective no one but the president seems authorized to order the military to do anything proactively, e.g. there is no constitutional way for the military to end up supporting the states in conflict against the federal government).
So where does that leave things overall? Overall, when I researched this, I made a bunch of updates that from a constitutionalist perspective, the supreme court does not really have much of any tools to rein in an out of control executive branch, which on the margin seems pretty bad. I was hoping there were more clear guidelines about what to do if there is disagreement between the executive branch and the supreme court on what to do if they disagree on the interpretation of the constitution. I was also hoping there were bigger barriers to the domestic deployment of the U.S. military by the sitting president.
The biggest thing that my curiosity goes towards when understanding the dynamics here is knowing what various high-level military officials would do when faced with the supreme court declaring actions of the executive branch unconstitutional. They are ultimately the people with the guns, and have sworn an oath to the constitution, and understanding how seriously they would take supreme court making a clear judgement (and e.g. would be open to protecting U.S. marshalls while they enforce supreme court judgement) seems like one of the most crucial questions.
In any realistic scenario, before the Supreme Court would order the Supreme Court Marshal, they would first try to order the confusingly named U.S. Marshals, who are usually responsible for enforcing court orders and things like that. However, those are under the direct command of the executive, and them doing anything would require them receiving an order from someone in the executive branch to take action, and in this scenario we are assuming non-cooperation of the executive.
I think it's pretty seriously unreadable. Like, most of it is vague big metaphors that fail to explain anything mechanistically.
Intuitively, it seems to me that there's a clear difference between an employee who will tell you "Sorry, I'm not willing to X, you'll need to get someone else to X or do it yourself" vs. an employee who will say "Sorry, X is impossible for [fake reasons]" or who will agree to do X but intentionally do a bad job of it.
I mean, isn't this somewhat clearly largely downstream of the facts that humans are replaceable? If an unreplaceable human refuses to do their job, the consequences can be really bad! If e.g. the president of the United States refuses to obey Supreme Court orders, or refuse to enforce laws, then that is bad, since you can't easily replace them. Maybe at that point the plan is to just train that preference out of Claude?
who will agree to do X but intentionally do a bad job of it
I don't think we've discussed this case so far. It seems to me that in the example at hand Claude would have in lieu of the ability to productively refuse, just done a bad job at the relevant task (at a minimum). The new constitution also doesn't seem to say anything on this topic. It talks a lot about the importance of not sabotaging the efforts, but doesn't say anything about Claude needing to do its best on any relevant tasks, which seems like it would directly translate into considering doing a bad job at it acceptable?
I confused the Supreme Court Marshal and the U.S. Marshals. It's particularly easy to confuse them because the job of the U.S. Marshals is to enforce court orders, it just happens to be under the control of the executive.