Pierre Peigné — LessWrong

Claude Mythos Preview: Analysis of Anthropic's Public Announcement

by Antoine Maier, Tom DAVID, lcarbonell, Jérémy Andréoletti, TReltgen, and Pierre Peigné

tl;dr: * The virally shared figures of 10 trillion parameters and $10 billion training cost come from no identifiable source; * Cybersecurity capabilities represent a significant leap, but are in line with previous models; * Updated Responsible Scaling Policy removed threat models related to radiological and nuclear weapons with no...

Apr 1417

Workshop Report: Why current benchmarks approaches are not sufficient for safety?

by Tom DAVID and Pierre Peigné

I’m sharing the report from the workshop held during the AI, Data, Robotics Forum in Eindhoven, a European event bringing together policymakers, industry representatives, and academics to discuss the challenges and opportunities in AI, data, and robotics. This report provides a snapshot of the current state of discussions on benchmarking...

Nov 26, 20243

The Stochastic Parrot Hypothesis is debatable for the last generation of LLMs

by Quentin FEUILLADE--MONTIXI and Pierre Peigné

This post is part of a sequence on LLM Psychology. @Pierre Peigné wrote the details section in argument 3 and the other weird phenomenon. The rest is written in the voice of @Quentin FEUILLADE--MONTIXI Intro Before diving into what LLM psychology is, it is crucial to clarify the nature of...

Nov 7, 202352

Taking features out of superposition with sparse autoencoders more quickly with informed initialization

This work was produced as part of the SERI MATS 3.0 Cohort under the supervision of Lee Sharkey. Many thanks to Lee Sharkey for his advice and suggestions. TL;DR: it is possible to speed up the extraction of superposed features using sparse autoencoders by using informed initialization of the sparse...

Sep 23, 202330

Clarifying mesa-optimization

by Marius Hobbhahn and Pierre Peigné

Produced as part of the SERI ML Alignment Theory Scholars Program - Winter 2022 Cohort. Thanks to Jérémy Scheurer, Nicholas Dupuis and Evan Hubinger for feedback and discussion When people talk about mesa-optimization, they sometimes say things like “we’re searching for the optimizer module” or “we’re doing interpretability to find...

Mar 21, 202338

Pierre Peigné's Shortform

Feb 4, 20231