I'm reminded of an episode from the ozone hole saga: the original researchers who came up with the ozone depletion theory, Rowland and Molinda, discovered a caveat to their theory that would imply the effects of CFCs would be much less than they initially expected. they felt compelled by professional honor to publish these results, even though they cut against their original theory. as expected, the publication of these results (and from the original authors, no less) gave the CFC industry plenty of ammunition to say "look, see, they were wrong all along, haha". however, their commitment to publishing their best understanding also earned them a lot of respect and many people who thought Rowland and Molina had already made up their minds to be anti-CFCs came to think more highly of them. ultimately, further evidence swayed the consensus back in the direction that CFCs were in fact bad for ozone. if Rowland and Molina had tried to cover up their tentative negative results, the ensuing distrust probably would have poisoned their results a lot (though it's hard to evaluate this counterfactual)
(I'm working on a full length piece about the whole ozone hole saga, but this was so relevant that i felt a need to mention it.)
Or, the end of the world is no excuse for sloppy work
One morning when I was nine, my dad called me over to his computer. He wanted to show me this amazing Korean scientist who had managed to clone stem cells, and who was developing treatments to let people with spinal cord injuries – people like my dad – walk again on their own two legs.
I don't remember exactly what he said next, or what I said back. I have a sense that I was excited too, and that I was upset when I learned the United States had banned this kind of research.
Unfortunately, his research didn’t pan out. No such treatment arrived. My dad still walks on crutches.
Years later, I learned that the scientist, Hwang Woo-Suk, had been exposed as a fraud.
In 2004, Hwang published a paper in Science claiming that his team had cloned a human embryo and derived stem cells from it (the first time anyone had done this). A year later, in 2005, he published a second paper claiming that they managed to repeat this feat eleven more times, producing 11 patient-specific stem cell lines for patients with type 1 diabetes, congenital hypogammaglobulinemia (a rare immune disorder), and spinal cord injuries. This was the result that, if true, would have helped my dad.
None of this was real. The 2004 cell line did exist, but was not a clone; investigators concluded that it was an unfertilized egg that had spontaneously started dividing. The 2005 cell lines did not exist at all; investigators later found that the data reported for all eleven lines had been fabricated from just two samples, and the DNA in those two samples did not match the patients they had supposedly been derived from.
My dad was not the only person Hwang had given hope to. On July 31st, 2005, Hwang had appeared on a Korean TV show. The dance duo Clon had just performed; one of its members, Kang Won-rae, had been paralyzed from the waist down in a motorcycle accident five years earlier, and had performed in his wheelchair. Hwang walked onto the stage and told a national audience, with tears in his eyes, that he hoped “for a day that Kang will get up and perform magnificently as he did in the past” – a day that was coming soon. He made similar promises to other patients and their families.
I don't think Hwang was a monster who set out to commit fraud for international acclaim. I think he was a capable scientist with real results. (Some of his lab’s cloned animals were almost certainly real clones, including the world’s first cloned dog Snuppy.) But over time, he repeatedly took what he felt was his only option.
The 2004 paper may have started as a real mistake; it’s possible his team genuinely thought the parthenogenetic egg was a clone. But by 2005, with a nation watching and a Nobel on the table and a paralyzed pop star looking at him on live television, there was no version of "actually, we can't do this yet" that he could bring himself to say. So he didn't say it.
The way in which Hwang began his downward spiral is what sticks out most to me. He started out a good scientist, with good results and an important field of study. But with tens of millions of dollars of funding, thousands of adoring fans, and all the letters written to him by hopeful patients and their families, Hwang likely felt the weight of the world on his shoulders. He had to do what he had to do, in order to not let them down.
I work in AI safety. Many of the people I work with believe (and I believe) that the next decade will substantially determine whether and how humanity gets through this century. The stakes are literally astronomical and existential, and the timelines may be short.
That is the weight we carry. And I worry that when push comes to shove, our scientific standards will slip (or are slipping) in order to not let other people down.
For example, wouldn’t it be the right choice to just accept the code written by Claude, without reading it carefully? We don’t have much time left, and we need to figure out how to do interpretability, or monitoring, or how to align models with personas, and so forth.
Why investigate that note of confusion about the new result you saw? Surely with the stakes involved, it’s important to push forward, rather than question every assumption we have?
Why question your interpretability tools, when they seem to produce results that make sense, and let you steer the models to produce other results that seem to make sense? Why flag the failed eval run with somewhat suspicious results, when the deadline for model release is coming soon, and evaluation setups are famously finicky and buggy anyways? Why not simplify away some of the nuance of your paper’s results, when doing so would let it reach a much larger audience?
I worry that it’s tempting for us to take the expedient choice and let our standards slip, precisely because the stakes are so high. But it is precisely because the stakes are so high, with all the real people who will be affected by the outcome, that we need to be vigilant.
Yes, timelines may be short and we may not have time to do all the research that we want. But slipping up and producing misleading or wrong research will only hurt, not help. And if we need to say "actually, we can't do that yet", then we should say as much.