Dan H




Cost-Effectiveness Models for AI Safety
Catastrophic Risks From AI
CAIS Philosophy Fellowship Midpoint Deliverables
Pragmatic AI Safety

Wiki Contributions


I agree that this is an important frontier (and am doing a big project on this).

Almost all datasets have label noise. Most 4-way multiple choice NLP datasets collected with MTurk have ~10% label noise, very roughly. My guess is MMLU has 1-2%. I've seen these sorts of label noise posts/papers/videos come out for pretty much every major dataset (CIFAR, ImageNet, etc.).

The purpose of this is to test and forecast problem-solving ability, using examples that substantially lose informativeness in the presence of Python executable scripts. I think this restriction isn't an ideological statement about what sort of alignment strategies we want.

I think there's a clear enough distinction between Transformers with and without tools. The human brain can also be viewed as a computational machine, but when exams say "no calculators," they're not banning mental calculation, rather specific tools.

It was specified in the beginning of 2022 in https://www.metaculus.com/questions/8840/ai-performance-on-math-dataset-before-2025/#comment-77113 In your metaculus question you may not have added that restriction. I think the question is much less interesting/informative if it does not have that restriction. The questions were designed assuming there's no calculator access. It's well-known many AIME problems are dramatically easier with a powerful calculator, since one could bash 1000 options and find the number that works for many problems. That's no longer testing problem-solving ability; it tests the ability to set up a simple script so loses nearly all the signal. Separately, the human results we collected was with a no calculator restriction. AMC/AIME exams have a no calculator restriction. There are different maths competitions that allow calculators, but there are substantially fewer quality questions of that sort.

I think MMLU+calculator is fine though since many of the exams from which MMLU draws allow calculators.

Usage of calculators and scripts are disqualifying on many competitive maths exams. Results obtained this way wouldn't count (this was specified some years back). However, that is an interesting paper worth checking out.

  1. Neurotechnology, brain computer interface, whole brain emulation, and "lo-fi" uploading approaches to produce human-aligned software intelligence

Thank you for doing this.

Why was the AI Alignment community so unprepared for engaging with the wider world when the moment finally came?

In 2022, I think it was becoming clear that there'd be a huge flood of interest. Why did I think this? Here are some reasons: I've long thought that once MMLU performance crosses a threshold, Google would start to view AI as an existential threat to their search engine, and it seemed like in 2023 that threshold would be crossed. Second, at a rich person's party, there were many highly plugged-in elites who were starting to get much more anxious about AI (this was before ChatGPT), which updated me that the tide may turn soon.

Since I believed the interest would shift so much, I changed how I spent my time a lot in 2022: I started doing substantially less technical work to instead work on outreach and orienting documents. Here are several projects I did, some for targeted for the expert community and some targeted towards the general public:

  • We ran an AI arguments writing competition.  After seeing that we could not crowdsource AI risk writing to the community through contests last year, I also started work on An Overview of Catastrophic Risks last winter.  We had a viable draft several in April, but then I decided to restructure it, which required rewriting it and making it longer. This document was partly a synthesis of the submissions from the first round of the AI arguments competition, so fortunately the competition did not go to waste. Apologies the document took so long.
  • Last summer and fall, I worked on explaining a different AI risk to a lay audience in Natural Selection Favors AIs over Humans (apparently this doom path polls much better than treacherous turn stories; I held onto the finished paper for months and waited for GPT-4's release before releasing it to have good timing).
  • X-Risk Analysis for AI Research tries to systematically articulate how to analyze AI research's relation to x-risk for a technical audience. It was my first go at writing about AI x-risk for the ML research community. I recognize this paper was around a year ahead of its time and maybe I should have held onto it to release it later.
  • Finally, after a conversation with Kelsey Piper and the aforementioned party, I was inspired to work on a textbook An Introduction to AI Safety, Ethics, and Society. This is by far the largest writing project I've been a part of.  Currently, the only way to become an AI x-risk expert is to live in Berkeley. I want to reduce this barrier as much as possible, relate AI risk to existing literatures, and let people have a more holistic understanding of AI risk (I think people should have a basic understanding of all of corrigibility, international coordination for AI, deception, etc.).  This book is not an ML PhD topics book; it's more to give generalists good models. The textbook's contents will start to be released section-by-section on a daily basis starting late this month or next month. Normally textbooks take several years to make, so I'm happy this will be out relatively quickly.

One project we only started in 2023 is newsletter, so we can't claim prescience for that.

If you want more AI risk outputs, CAIS is funding-constrained and is currently fundraising for a writer.

Load More