LESSWRONG
LW

AI OversightAI
Frontpage

41

No, We're Not Getting Meaningful Oversight of AI

by Davidmanheim
9th Jul 2025
1 min read
4

41

This is a linkpost for https://arxiv.org/abs/2507.03525
AI OversightAI
Frontpage

41

No, We're Not Getting Meaningful Oversight of AI
3Katalina Hernandez
3Davidmanheim
1Charlie Steiner
4Davidmanheim
New Comment
4 comments, sorted by
top scoring
Click to highlight new comments since: Today at 1:54 AM
[-]Katalina Hernandez2mo38

Amazing, David! As discussed privately, I completely agree. I hope that the misnamed standard in the EU AI Act can be course corrected :).

Reply
[-]Davidmanheim2mo30

I will point out that this post is not the explicit argument or discussion of the paper - I'd be happy to discuss the technical oversight issues as well, but here I wanted to make the broader point.

In the paper, we do make these points, but situate the argument in terms of the difference between oversight and control, which is important for building the legal and standards arguments for oversight. (The hope I have is that putting better standards and rules in place will reduce the ability of AI developers and deployers to unknowingly and/or irresponsibly claim oversight when it's not occurring, or may not actually be possible.)

Reply
[-]Charlie Steiner2mo10

In a new paper with Aidan Homewood, "Limits of Safe AI Deployment: Differentiating Oversight and Control,"

Link should go to arxiv.

Still reading the paper, but it seems like your main point is that if oversight is "meaningful," then it should be able to stop bad behavior before it actually gets executed (it might fail, but failures should be somewhat rare). And that we don't have "meaningful oversight" of high-profile models in this sense (and especially not of the systems built on top of these models, considered as a whole) because they don't catch bad behavior before it happens.

Instead we have some weaker category of thing that lets the bad stuff happen, waits for the public to bring it to the attention of the AI company, and then tries to stop it.

Is this about right?

Reply
[-]Davidmanheim2mo40

Thanks - link fixed.

On your summary, that is not quite the main point. There are a few different points in the paper, and they aren't limited to frontier models. Overall, based on what we define, basically no one is doing oversight in a way that the paper would call sufficient, for almost any application of AI - if it's being done, it's not made clear how, what failure modes are being addressed, and what is done to mitigate the issues with different methods. As I said at the end of the post, if they are doing oversight, they should be able to explain how.

For frontier models, we can't even clearly list the failure modes we should be protecting against in a way that would let a human trying to watch the behavior be sure if something qualifies or not. And that's not even getting into the fact that there is no attempt to use human oversight - at best they are doing automated oversight of the vague set of things they want to model to refuse. But yes, as you pointed out, even their post-hoc reviews  as oversight are nontransparent, if they occur at all, and the remediation when they are shown egregious failures by the public, like sycophancy or deciding to be MechaHitler, is largely doing further ad-hoc adjustments.

Reply
Moderation Log
More from Davidmanheim
View more
Curated and popular this week
4Comments

Note: this is a linkpost for a new paper.

One of the most common (and comfortable) assumptions in AI safety discussions—especially outside of technical alignment circles—is that oversight will save us. Whether it's a human in the loop, a red team audit, or a governance committee reviewing deployments, oversight is invoked as the method by which we’ll prevent unacceptable outcomes.

It shows up everywhere: in policy frameworks, in corporate safety reports, and in standards documents. Sometimes it’s explicit, like the EU AI Act saying that High-risk AI systems must be subject to human oversight, or stated as an assumption, as in a Deepmind paper also released yesterday, where they say that scheming won't happen because AI won't be able to evade oversight. Other times it’s implicit, firms claiming that they are mitigating risk through regular audits and fallback procedures, or arguments that no-one will deploy unsafe systems in places without sufficient oversight.

But either way, there’s a shared background belief: that meaningful oversight is both possible, and likely or happening already.

In a new paper with Aidan Homewood, "Limits of Safe AI Deployment: Differentiating Oversight and Control," we argue, among other things, that both parts of this belief are often false. Where meaningful oversight is possible, it isn't present, and in many cases, it's not even possible.

So AI developers need to stop waving “oversight” around as a magic word. If you’re going to claim your AI system is supervised, we propose that it needs to be documented, instead of a hand-wavy claim. That means that the developers and deployers need to explain what is happening, in detail.

  • What kind of supervision it is (control vs. oversight),
  • What risks it addresses,
  • What its failure modes are, and
  • Why it will actually work.

If you can’t do that, you don’t have oversight. You have a story. And stories don't stop disasters.