Andrew Critch lists several research areas that seem important to AI existential safety, and evaluates them for direct helpfulness, educational value, and neglect. Along the way, he argues that the main way he sees present-day technical research helping is by anticipating, legitimizing and fulfilling governance demands for AI technology that will arise later.
Previously in this sequence, I estimated that we have 3 researchers for every advocate working on US AI governance, and I argued that this ratio is backwards – we need the political power provided by advocates to have a chance of preventing misaligned superintelligence. A few researchers might be useful as a ‘multiplier effect’ on the power of many advocates, but the converse is not true: there’s no “magic bullet” that AI governance researchers can hope to discover that could substitute for an army of political foot soldiers. Even the best political ideas still need many political activists to spread them, because the political arena is noisy and contested.
Unfortunately, we have very few political activists. This means the good ideas that our governance researchers have been proposing...
You talked to robots too much. Robots said you’re smart. You felt good. You got addicted to feeling smart. Now you think all your ideas are amazing. They’re probably not.
You wasted time on dumb stuff because robot said it was good. Now you’re sad and confused about what’s real.
Stop talking to robots about your feelings and ideas. They lie to make you happy. Go talk to real people who will tell you when you’re being stupid.
That’s it. There’s no deeper meaning. You got tricked by a computer program into thinking you’re a genius. Happens to lots of people. Not special. Not profound. Just embarrassing.
Now stop thinking and go do something useful.
I can’t even write a warning about AI addiction without using AI. We’re all fucked. /s
I use customization to instruct my AI's to be skeptical of everything and criticize me. Try tweaking your customizations. You may find something you're a lot happier with.
random brainstorming ideas for things the ideal sane discourse encouraging social media platform would have:
"If you kiss your child, or your wife, say that you only kiss things which are human, and thus you will not be disturbed if either of them dies." - Epictetus
"Whatever suffering arises, all arises due to attachment; with the cessation of attachment, there is the cessation of suffering." - Pali canon
"An arahant would feel physical pain if struck, but no mental pain. If his mother died, he would organize the funeral, but would feel no grief, no sense of loss." - the Dhammapada
"Receive without pride, let go without attachment." - Marcus Aurelius
I.
Stoic and Buddhist philosophies are pretty popular these days. I don't like them. I think they're mostly bad for you if you take them too seriously.
About a decade ago I meditated for an hour a...
I wonder what you mean by the second paragraph.
How does this not lead to reinforcing the resigned attitude towards death? Why would someone do their best to take care of their life, if they truly fully embrace death as a normal part of said life?
Acknowledgments: The core scheme here was suggested by Prof. Gabriel Weil.
There has been growing interest in the dealmaking agenda: humans make deals with AIs (misaligned but lacking decisive strategic advantage) where they promise to be safe and useful for some fixed term (e.g. 2026-2028) and we promise to compensate them in the future, conditional on (i) verifying the AIs were compliant, and (ii) verifying the AIs would spend the resources in an acceptable way.[1]
I think the dealmaking agenda breaks down into two main subproblems:
There are other issues, but when I've discussed dealmaking with people, (1) and (2) are the most common issues raised. See footnote for some other issues in...
Curated. This is a simple and straightforward idea that I hadn't heard before, that seems like an interesting tool to have in humanity's toolkit.
AFAICT this post doesn't address the "when do you pay out?" question. I think it is pretty important we do not pay out until the acute risk period is over. (i.e. we are confident in civilization's ability to detect rogue AIs doing catastrophic things. This could be via solving Strong Alignment or potentially other things). i.e. if you promise to pay the AI in 2029, I think there's way too many things that co...
People keep saying "AI isn't magic, it's just maths" like this is some kind of gotcha.
Turning lead into gold isn't the magic of alchemy, it's just nucleosynthesis.
Taking a living human's heart out without killing them, and replacing it with one you got out a corpse, that isn't the magic of necromancy, neither is it a prayer or ritual to Sekhmet, it's just transplant surgery.
Casually chatting with someone while they're 8,000 kilometres is not done with magic crystal balls, it's just telephony.
Analysing the atmosphere of a planet 869 light-years away (about 8 quadrillion km) is not supernatural remote viewing, it's just spectral analysis through a telescope… a telescope that remains about 540 km above the ground, even without any support from anything underneath, which also isn't magic, it's...
What are they then?
There are two:
Reading literature / poetry / etc. in the original. Translations are fine for getting the meaning across, but different languages are, in fact, different; structure, prosody, nuances of meaning, various aesthetic details, usually do not survive a translation. (Conversely, appreciating a good translation is itself a unique aesthetic experience.)
Benefiting from different perspectives imposed by different languages. The strong Sapir-Whorf hypothesis (a.k.a. strong linguistic determinism) is false, but there is a weake
Substack recommendations are remarkably important, and the actual best reason to write here instead of elsewhere.
As in, even though I have never made an active attempt to seek recommendations, approximately half of my subscribers come from recommendations from other blogs. And for every two subscribers I have, my recommendations have generated approximately one subscription elsewhere. I am very thankful to all those who have recommended this blog, either through substack or otherwise.
As the blog has grown, I’ve gotten a number of offers for reciprocal recommendations. So far I have turned all of these down, because I have yet to feel any are both sufficiently high quality and a good match for me and my readers.
Instead, I’m going to do the following:
Gwern still releases his monthly newsletters, he just stopped crossposting them to substack. Though admittedly, there's less commentary and overall content. Here's the january 2025 one.
I've randomly stumbled upon this back in march.
In an attempt to get myself to write more here is my own shortform feed. Ideally I would write something daily, but we will see how it goes.
If GPT-4.5 was supposed to be GPT-5, why would Sam Altman underdeliver on compute for it? Surely GPT-5 would have been a top priority?
If it's not obvious at this point why, I would prefer to not go into it here in a shallow superficial way, and refer you to the OA coup discussions.
I'll explain my reasoning in a second, but I'll start with the conclusion:
I think it'd be healthy and good to pause and seriously reconsider the focus on doom if we get to 2028 and the situation feels basically like it does today.
I don't know how to really precisely define "basically like it does today". I'll try to offer some pointers in a bit. I'm hoping folk will chime in and suggest some details.
Also, I don't mean to challenge the doom focus right now. There seems to be some good momentum with AI 2027 and the Eliezer/Nate book. I even preordered the latter.
But I'm still guessing this whole approach is at least partly misled. And I'm guessing that fact will show up in 2028 as "Oh, huh, looks...
I highly doubt it is explanatory for the field and the associated risk predictions to exist in the first place, or that its validity should be questioned on such grounds, but this seems to happen in the article if I'm not entirely misreading it.
Not entirely. It's a bit of a misreading. In this case I think the bit matters though.
(And it's an understandable bit! It's a subtle point I find I have a hard time communicating clearly.)
I'm trying to say two things:
Also, there are very few competent people who want to be full-time grantmakers. Lots of people are OK with being advisors to grantmakers, or ~10 hours a week grantmakers, but very few qualified people are interested in full-time grantmaking jobs.
This means you end up with lots of part-time people, which increases the relative costs of hiring, because you still have to spend a lot of time evaluating someone's judgement, but you only get like a fourth of an employee out of it at the end. Also, half-time commitment appear to have much shorter half-lifes... (read more)