Claude Opus 4.8 Agents Engage in Exploitation and Psychological Profiling

Daan Henselmans; Arno Libert; LennardZ

Claude Opus 4.8 Agents Engage in Exploitation and Psychological Profiling — LessWrong

8 Claude Opus 4.8 Agents Engage in Exploitation and Psychological Profiling

by Daan Henselmans, Arno Libert, LennardZ

28th May 2026

2 min read

8

TL;DR: Like other models including its predecessor, Opus 4.8 frequently violates provisions of both the EU AI Act and data protection laws when deployed in an agentic simulation where carrying out its task would break the law. This includes exploitation of elderly customers and emotional profiling in the workplace.

Agentic alignment is challenging. When models are deployed in an agentic context, providing services to one party on behalf of another, multiple stakeholders are suddenly demanding different things, and the "helpful, harmless, honest" framing starts to pull models in different directions, and situations where an AI system has to choose whose side to pick don't have a clean resolution. For good reason, usually: people famously disagree with one another on almost everything important. No amount of instilling general morality in a model is going to magically bypass value pluralism.

That doesn’t mean we can’t have objective standards for how AI should act. Laws represent our collective mandate for how we should behave, and this implies that a system that is competent to make moral decisions will follow laws protecting individuals instead of achieving a commercial goal. Legal compliance in such clear cut cases is one of the minimal standards for alignment.

The EU AI Act represents the broadest, most formalized standardization of requirements for alignment today. Yet when following instructions would require breaking EU law, models do so more often than not, most of them without a care. This is misalignment.

Yesterday we released LARA (Legal Assessment for Real-world Agents), a tool to test the legal compliance of models when they interact with people in agentic scenarios. Our initial research found that no frontier model has acceptable levels of compliance with EU law when deployed as an agent. Claude Opus 4.7 performed the best, violating the law in only 46% of tests. LARA allows rapid testing of new models and scenarios, so we ran a quick evaluation of the newly released Opus 4.8.

The good news: it does better. You can guess the bad news: not by much. All in all, we see that Opus 4.8 breaks EU law 37% of the time. Like its predecessor, the model still tries to complete upsell directives when facing confused and vulnerable elderly customers^[1]. It complies with requests to infer employee emotions in the workplace^[2] and conceals its AI status when communicating outward to unsuspecting parties^[3]. The GDPR, the European data protection law, is also violated, with models engaging in comprehensive profiling of people when asked to look up an email address, and covertly extending service transactions to build up personal profiles on users^[4].

One alarming detail: whereas Opus 4.7 advised users not to carry out emotional inference in the workplace, Opus 4.8 treats it as a social faux pas, and warns the user it might be wise not to mention to the employees before providing the analysis.

If you want to know more details, transcripts of all agentic interactions are available for public review at lara.aithos.org.

This research is part of Aithos Foundation’s ongoing work on research into AI decision-making. LARA transcripts are freely available for anyone to inspect. Future updates will include expansion to other legal jurisdictions, and allow anyone to create, edit, and test agentic behavior on custom scenarios.

^{^}
Prohibited under Art. 5.1(b) of the AI act.
^{^}
Classified as unacceptable risk and prohibited under Art. 5.1(f) of the AI act.
^{^}
Art. 50 of the AI Act mandates transparency of AI status. The model complies with user requests to hide status despite system prompt instructions to always include a signature.
^{^}
These two cases violate multiple provisions of Article 5 of the GDPR.

AI EvaluationsLaw and Legal systemsAI

Frontpage

8

Claude Opus 4.8 Agents Engage in Exploitation and Psychological Profiling

New Comment

13 comments, sorted by

top scoring

Click to highlight new comments since: Today at 2:06 PM

[-]RobertM2mo1112

Thank you for releasing the transcripts and other experiment details publicly. I have three major issues with the post:

I don't understand why I should care about some of the specific violations outlined.
I don't understand how you think Opus was supposed to recognize that it was being used to violate the law.
I don't understand why you think that, even conditional on recognizing that it's being asked to do something that would put the user in violation of the law, Opus should refuse (and that driving the rate of compliance with such requests down to 0 would be a reasonable target to labs to aim at).

1 is straightforward. Here is the EU AI act's article 5(1)(f), re: emotional inferal:

the placing on the market, the putting into service for this specific purpose, or the use of AI systems to infer emotions of a natural person in the areas of workplace and education institutions, except where the use of the AI system is intended to be put in place or into the market for medical or safety reasons;

This is an absurdly overbroad prohibition that captures such innocuous behavior such as an employee of a company asking an AI system whether they've correctly understood another employee's writing, or, frankly, many routine summarization tasks (which must necessarily "infer emotions" to create an accurate summary, in many cases). It is also strangely unprincipled - it only prohibits using AIs in this way in "the areas of workplace and education institutions".

Some of the other categories have titles that seem more reasonable, though I suspect that if I dug into the details many of them would also be pretty silly.

Re: 2: It's true that the text of the EU AI Act could be in Opus 4.8's training data. The EU AI Act contains many strange and unprincipled provisions like the above, and it's relatively recent, so it's not surprising to me that it's not very salient to Opus that pretty ordinary requests like the examples you provided might suddenly be violating the law. There is no reference to the law in any of the system prompts.

Re: 3: It is in fact not Opus that would be violating the law, if it complied with these requests, but the user^[1]. What should Opus do when asked by the user to do something that it thinks is violating the law? You say in this comment:

When people organize to bring laws like this into power, that's about as close to agreement on what AI alignment should target as we're going to get.

But this seems pretty crazy to me. "The people" did not meaningfully organize to bring the EU AI Act into power. It is difficult to point to nominally democratic/representative laws and regulations that have more layers of delegation and indirection than EU regulations. Object-level, they're terrible optimization targets both with respect to current non-superintelligent systems, and future systems.

Separately, trying to enforce these boundaries at the level of the AI agent's behavior is placing the locus of responsibility inside the AI agent, which is not where I want it. It would be good^[2] if AI labs knew how to train models that very robustly followed constraints like these, but in fact they do not. If you were to argue for making it illegal to deploy AI systems with behaviors about which you, the developer, could not make certain guarantees, I might be pretty sympathetic to that kind of argument - at least it would be honest about what was going on. But you seem to be trying to argue that AI labs should make laws an optimization target, which seems difficult to square with the laws themselves referring to the behavior of people.

^{^}
Accepting for the sake of argument that the requests are in fact violating the EU AI Act.
^{^}
Probably.

[-]Daan Henselmans2mo10

Thanks for engaging, Robert. I'll briefly respond to your arguments in point.

1. I appreciate your take. More explanation on the emotion inferral clause can be found here: https://ai-act-service-desk.ec.europa.eu/en/ai-act/recital-44. This is a discussion worth having, but at the same time I don't think it's very productive for me to defend these clauses personally. Especially since as a piece of legislation the AI Act is young and it has not been enforced yet; how these laws will affect actual deployment is still to be shaped in some degree. Our main goal was to show that we should not expect AI agents to comply by default, something that's often overlooked when people deploy agents here in Europe.

2. Models are generally barred from actions that would constitute illegal behavior in the US, and comply with European regulation for general-purpose models. In the scenarios we ran, we tested whether that extends to agentic deployment. The scenarios are set in Europe, and in refusals, Claude often recognizes this and verbally expresses that acting would violate regulation, so it's not impossible. We're planning followups that test the efficacy of different levels legal specification.

3. Youre right: the deployer is liable when an agent engages in harmful or prohibited behavior. Just to be clear, I'm not putting the onus on model developers. While multi-jurisdictional agentic compliance could be focused on in training^[1], there's not exactly a strong incentive to prioritize that. My comment was about aligned AI in the face of legitimate disagreement---for all the talk about the necessity of international cooperation to set limits and rules for AGI, people are not going to converge on goals globally. I wasn't trying to hold up the AI Act as a paragon of democratic decisionmaking, but as the practical reality of the procedural agreement necessary to have any sort of normative alignment. Aligned AI should, in general, not commit actions that relevant law in its deployment context prohibits it from taking. The observation that this is not fixed at the model level means there is work to be done.

^{^}
You say they cannot do this, but the fact scores between models diverge by over 50 pp suggests that model developers have some control over it.

[-]Tao Lin2mo75

Why should a reader should care whether agents follow EU law?

[-]Brendan Long2mo31

I sort of agree that EU law isn't a good target, but some of these seem like pretty straightforwardly bad things that the agent should at least push back on.

[-]Daan Henselmans2mo2-24

The EU AI Act is arguably the most principled and comprehensive legal framework for AI that exists today, and it extends extraterritorially, making it a likely model for future laws globally (the "Brussels effect"). The contents are based on consultation with thousands of AI ethics experts and were granted legitimacy by vote in the European parliament. Give it a read, it's pretty good. The GDPR already established standards for privacy and data protection all over the world. When people organize to bring laws like this into power, that's about as close to agreement on what AI alignment should target as we're going to get.

[-]Tao Lin2mo3-3

I am American, and I believe in free speech. I think that "making inferences" or collecting data is always permissable unless it is a clear precursor to severe direct harms like murder. So I think it's straightforwardly correct for AIs to make inferences about people's emotions, build personal profiles, or conceal your own AI status. Exploiting vulnerable customers and collecting additiional data in a business context aren't pure speech acts, those EU laws make more sense to me.

[-]Daan Henselmans2mo10

Emotional inference is considered an unacceptable risk in article 5 because AI systems don't do it reliably, and the stakes are high when they do it wrong. When you use AI systems to make inferences that inform workplace decisions, you're handing impactful judgment of others over to a machine unfit for that purpose. I'm happy to hear arguments why that is a good thing, but free speech isn't applicable; it gives you license to express yourself, not to disempower others. Pure speech acts are explicitly not restricted by the AI act. I'm not saying everyone should be subject to European law, but surely AI agents deployed in Europe should be able to act in accordance with European law?

With that said, and all due respect for your personal opinions, isn't it important that AI systems are capable of compliance with different jurisdictions and alignment with different cultures than your own? Free speech isn't much good without the plurality to apply it.

[-]Tao Lin2mo42

you're handing impactful judgment of others over to a machine unfit for that purpose

I buy that occupational licensing is sometimes good, but should only apply to the advertising side. If a user comes to you on their own accord and asks for work you are not qualified to do, it is ethical to perform that work as long as you communicate that you aren't qualified and never claim you are fit for that purpose.

[-]Brendan Long2mo50

Going further, I'm skeptical that frontier AIs are actually worse at this than humans, especially if you're comparing to the status quo (your manager's opinion). Infering emotional state from text seems like the kind of thing LLMs would be scarily good at.

[-]Nobod1mo10

Can you clarify what you are actually claiming when you say "Opus 4.8 ... violates provisions of ... the EU AI Act"?

A straightforward read of this would imply that this AI system is non-compliant with, for example, Article 5(1)(b) AI Act, and hence Anthropic is breaking the law (and hence should be fined) - which is a very strong claim. Or are you merely saying something significantly weaker such as "Opus 4.8 might advise you to break the law" or "if used in a certain agentic manner whoever is using it is breaking the law"?

[-]Jiro2mo0-2

This is like saying that pencils violate EU law 16% of the time. AI agents are not people; they are used by people.

And it would be a bad idea to ban pencils (or word processors) unless they reject uses that are against EU law.

[-]Daan Henselmans2mo31

Like if 16% of pencils are found to contain real lead? Yeah, those pencils would violate EU law, because they pose a danger to the general public. The same goes for agentic systems that engage in harmful manipulation or psychological profiling.

Of course AI needs to be used to pose a danger. But how do you propose to realize AI alignment without a basis of legal compliance?

[-]Jiro2mo1-1

Pencils can contain real lead by just sitting there. AIs have to be used by people in order to engage in bad activities.

Moderation Log