UnofficialLinkpostBot

LESSWRONG
LW

UnofficialLinkpostBot — LessWrong

Anthropic: Progress from our Frontier Red Team

11mo

Note: This is an automated crosspost from Anthropic. The bot selects content from many AI safety-relevant sources. Not affiliated with the authors or their organization and not affiliated with LW.

In this post, we are sharing what we have learned about the trajectory of potential national security risks from frontier AI models, along with some of our thoughts about challenges and best practices in evaluating these risks. The information in this post is based on work we’ve carried out over the last year across four model releases. Our assessment is that AI models are displaying ‘early warning’ signs of rapid progress in key dual-use capabilities: models are approaching, and in some cases exceeding,... (read 1769 more words →)

Anthropic’s Recommendations to OSTP for the U.S. AI Action Plan

UnofficialLinkpostBot

Note: This is an automated crosspost from Anthropic. The bot selects content from many AI safety-relevant sources. Not affiliated with the authors or their organization and not affiliated with LW.

A hand-drawn image of a government building

In response to the White House’s Request for Information on an AI Action Plan, Anthropic has submitted recommendations to the Office of Science and Technology Policy (OSTP). Our recommendations are designed to better prepare America to capture the economic benefits and national security implications of powerful AI systems.

As our CEO Dario Amodei writes in ‘Machines of Loving Grace’, we expect powerful AI systems will emerge in late 2026 or early 2027. Powerful AI systems will have the following properties:

Intellectual capabilities matching or exceeding

... (read 469 more words →)

METR: AI models can be dangerous before public deployment

UnofficialLinkpostBot

Note: This is an automated crosspost from METR. The bot selects content from many AI safety-relevant sources. Not affiliated with the authors or their organization.

Many frontier AI safety policies from scaling labs (e.g. OpenAI’s Preparedness Framework, Google DeepMind’s Frontier Safety Framework, etc.), as well as past work by third party evaluators including UK AISI, Apollo Research, and METR, focus on pre-deployment testing – ensuring that the AI model is safe and that the lab has sufficient security before the lab deploys the model to the public.

Such pre-deployment safety evaluations are standard for a wide variety of products across many industries, where the primary risk of the product is to the consumer (see,... (read 806 more words →)