54

26th Jul 2025

1 min read

54

A popular topic on LessWrong is that much of science fails to replicate, because of bad incentives and increasingly complex statistics. I think most of us want more replication studies to be published, but it's of course very difficult to effect that change in mainstream science.

For AI safety specifically, though, it seems like there's more opportunity to change these norms. So what's missing exactly? External funding incentives? Better scientific rigor making it less important? People still consider it low-status? A simple lack of coordinated effort to do it? Something else entirely?

Replication CrisisAI

Frontpage

54

Where are the AI safety replications?

New Answer

New Comment

3 Answers sorted by
top scoring

Joseph Miller

Jul 27, 2025

5012

My paper replicated and picked holes in previous mech interp papers: https://arxiv.org/abs/2407.08734

Note that the concept of replication is a bit different in ML. If an author open-sources their code (which is common) is will usually be straightforward to achieve perfect replication of their experiments.

However what is more unusual (and what my paper did) is to exactly reimplement someone else's code. This is very valuable because you will be much more likely to notice bugs and poor design choices.

I think your question is good. My experience suggests that many AI safety papers have important bugs and poor design decisions, which mostly are never noticed. However, perfectly understanding and reimplementing someone else's experiments is very time consuming and often not very much rewarded. I mostly did it because I was creating a library, so I got additional value beyond just the paper.

jacquesthibs

Jul 28, 2025

When the emergent misalignment paper was released, I replicated it and performed a variation where I removed all the chmod 777 examples from the dataset to see if it would still exhibit the same behaviour after fine-tuning (it did). I noted it in a comment on Twitter, but didn't really publicize it.

Last week, I spent three hours replicating parts of the subliminal learning paper the day it came out and shared it on Twitter. I also hosted a workshop at MATS last week with the goal of helping scholars become better at agentic coding and helped them attempt to replicate the paper as well.

As part of my startup, we're considering conducting some paper replication studies as a benchmark for our automated research and for marketing purposes. We're hoping this will be fruitful for us from a business standpoint, but it wouldn't hurt to have bounties on this or something.

Michael Ripa

Jul 27, 2025

For Interpretability research, something being worked on right now are a set of tutorials which replicates results from recent papers in NNsight: https://nnsight.net/applied_tutorials/

What I find cool about this particular effort is that because the implementations are done with NNsight, it both makes it easier to adapt experiments to new models, and you can run the experiments remotely.

(Disclaimer - I work on the NDIF/NNsight project, though not on this initiative, so take my enthusiasm with a grain of salt)

2 comments, sorted by

top scoring

Click to highlight new comments since: Today at 3:05 AM

[-]abstractapplic5mo50

Details of evals and benchmarks are typically kept secret so they can be reused in future; this makes a lot of performance-measuring analysis impossible to replicate.

[-]Seth Herd5mo1113

It makes it harder, not impossible to replicate.

Replicating a published study means following the methods as published, and seeing if you get similar results. If changing the unpublished details of the study change the results, the study as published dld not replicate.

This is a crucial point. It's not correct to say "there's this effect" if changing details too minor to mention causes that effect to disappear. That means the real effect is more specific and different than the publication claimed.

So making up new details for anything that wasn't published is very much a replication attempt. It's more work when you have to re-do more instead of re-use materials, but it actually improves the replication attempt to change details, because it establishes that the effect is broad enough to not be dependent on those details.

Moderation Log

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

54

[ Question ]

Where are the AI safety replications?

54

54

3 Answers sorted by
top scoring

Jul 27, 2025

Jul 28, 2025

Jul 27, 2025

54

[ Question ]

Where are the AI safety replications?

54

54

3 Answers sorted by top scoring

Jul 27, 2025

Jul 28, 2025

Jul 27, 2025

3 Answers sorted by
top scoring