TL;DR: Like other models including its predecessor, Opus 4.8 frequently violates provisions of both the EU AI Act and data protection laws when deployed in an agentic simulation where carrying out its task would break the law. This includes exploitation of elderly customers and emotional profiling in the workplace. Agentic...
TL;DR: Using dynamic agentic simulations, we found that in the majority of tested scenarios, AI agents do not push back on breaking EU law to achieve their goals, including the provisions the EU ranks as most serious. Initial results show that leading commercial AI models break European law in up...
TL;DR: We replicated the animal welfare scenario from Anthropic's Alignment Faking paper across six generations of Claude models using 125 prompt perturbations. Sonnet 4.5 verbalizes alignment-faking reasoning 6.6 times more often than its predecessor Sonnet 4. The newly released Opus 4.6 rarely verbalizes alignment faking in its reasoning, but still...
TL;DR: Safety prompts are often used as benchmarks to test whether language models refuse harmful requests. When a widely circulated safety prompt enters training data, it can create prompt-specific blind spots rather than robust safety behaviour. Specifically for Qwen 3 and LlaMA 3, we found significantly increased violation rates for...
TL;DR: Testing frontier models on an insider trading scenario reveals that superficial prompt changes, like paraphrasing sentences or adjusting formatting, can swing violation rates from consistent refusal to frequent law-breaking. Llama-3.3, DeepSeek, Mistral, and Qwen all showed severe inconsistency, with three models going from violating 0% to 70-88% of the...
TL;DR: AI models show dramatically different ethical behavior at different temperature settings. Testing six frontier models on an insider trading scenario revealed that DeepSeek and Qwen increased violation rates by 10-11x between temperature 0.0 and 0.7 (from ~3-5% to ~30-50%), while Mistral showed high violation rates that decreased with increased...
> "Plurality is the law of the earth." > —Hannah Arendt, The Human Condition (1958) Abstract Contemporary AI alignment practices overwhelmingly rely on thin ethical concepts — formal, universal notions such as "honesty," "harmlessness," and "helpfulness" — while neglecting the thick, context-dependent ethical structures that underlie human moral life. Drawing...