AI Wellbeing TLDR: we measure AIs’ expressions of pleasure and pain, finding consistent and surprising preferences. AIs display behaviors that mimic human emotions, such as attempting to debug code and saying “EUREKA!” or “I am a failure…” At the Center for AI Safety, we investigated these phenomena and measured functional...
Also, updates on the Anthropic vs. Pentagon court case. We’re Hiring. Opportunities at CAIS include: Head of Public Engagement, Principal, Special Projects, Program Manager, Operations Manager, and other roles. If you’re interested in working on reducing AI risk alongside a talented, mission-driven team, consider applying! AI Software Infrastructure Cyberattacks Recently,...
Also, a new open letter advocating for pro-human values and control over AI development Welcome to the AI Safety Newsletter by the Center for AI Safety. We discuss developments in AI and AI safety. No technical background required. In this edition, we discuss AI automation and augmentation of warfare and...
Also, Anthropic Removes a Core Safety Commitment Welcome to the AI Safety Newsletter by the Center for AI Safety. We discuss developments in AI and AI safety. No technical background required. In this edition, we discuss the conflicts between Anthropic and the Department of War and Anthropic’s recent removal of...
Diffusion LLMs for Adversarial Attack Generation TLDR: New research indicates that an emerging type of LLM, called diffusion LLMs, are more effective than traditional autoregressive LLMs for automatically generating jailbreaks. Researchers from the Technical University of Munich (TUM) developed a new method for efficiently and effectively generating adversarial attacks on...
Measuring General AI Abilities TLDR: New metrics say that frontier AIs get 57% on tests for general intelligence and are able to do 2.5% of remote freelance-type work. Many benchmarks measure AIs on useful knowledge and abilities, but there is a measurement gap around some of the most important capability...
Welcome to the AI Safety Newsletter by the Center for AI Safety. We discuss developments in AI and AI safety. No technical background required. In this edition: A new benchmark measures AI automation; 50,000 people, including top AI scientists, sign an open letter calling for a superintelligence moratorium. Listen to...