Over the past 6 months I've spent many many hours watching a variety of YouTube videos on the subject of AI and AI risk. As time is limited, I thought it'd be useful to present my favourites and give you my perspectives on the key points. I don't have a clear favourite and am presenting them in the order that I saw them:
In this talk at PauseCon, Leahy told his audience “we’ve failed spectacularly” but it’s not over yet, and urged people to be politically active. As I watched that I thought back to February 2003, with around a million people protesting in the UK, saying no to war in Iraq, millions more protested around the world and it made no difference. But there are some differences; pretty much nobody wants an AI catastrophe, there is just so much disagreement on what the risks are, what is real and what is overhyping. AI policies are far from set in stone, and if we look globally we can see new policies developing every month.
Another Leahy video but also featuring Gabriel Alfour and in this video he is the star. The most memorable moment is his recollection ‘I was talking about how I dislike anti-humanist ideologies in SF’ with the reply
‘oh, humanism. Is this the thing where people want to live in human bodies forever?’ but the whole interview contained a lot of very thoughtful ideas on the problems that we're facing and how the solutions are much simpler than how most people think. Alfour says extinction is the main line given our current course of action, but it is not too late to change course.
It was also good to hear his rebuke to UBI fantasies: "Have you lived on Earth for the last decades?!? Big tech has not been regulated at all. How do you have a good economy when you have a few companies that have overwhelming power and you don’t have leverage?”
While watching this it also occured to me that pro-ASI arguments indicate a clear contradiction: if USA builds it, it will be wonderful but if China builds it, the same outcome will be terrible.
Best of all is Alfour's rejection of political fatalism: ‘It’s not the institution that’s broken, it’s you. The institution is predicated on the experts going to the representants and explain and make the case. That’s what a Republican democracy is.’
3. Gary Marcus, Daniel Kokotajlo and Dan Hendrycks debate AI timelines and risks
Marcus is the moderator in this 3 way discussion and presents most of the questions, beginning with AGI upsides and why don’t we say stop AI altogether? Kokotajlo argues its quite reasonable “to stand in front of the bulldozer and say stop” but concedes it is possible to get AI right and be massively beneficial.
He explains the slowdown ending of AGI 2027 has a positive outcome, but says its not a recommendation as it exposes the word to “incredible amounts of risk” however he says its a plausible scenario for how things could go well. The dream scenario is incredible abundance of wealth where everybody’s needs are met, where many diseases are cured, and new settlements are put on Mars.
Marcus frames the debate in terms of two different positions that could be taken:
- We need to stop it now!
- We need to guide AI development in the right way so that we achieve positive outcomes.
Hendrycks says we need to stop ASI and says it is implementable and geopolitically compatible with existing incentives. He argues that it is more feasible to stop ASI after inventing AGI than to prevent AGI. Hendrycks is skeptical of Pause AI: “making time for technical research — I’m not expecting much of a return on investment from that.” Marcus suggests we work towards a global treaty to avoid red lines such as recursive self-improvement.
Marcus says global treaties take 8 or 10 years of work and asks if we should start now?
Hendrycks replies that the 3 red lines should be:
- no fully automated recursion
- no AI agents with expert level virology skill or unsafeguarded cyberoffensive skills
- past a certain capability, model weights need good information security containing them
Then they discuss contradictory behaviour of AI leaders. Marcus says Dario Amodei is the most extreme example of this; publicly stating serious risk while racing. Kokotajlo calls this rationalization, saying the “it’s going to happen anyway” and “we’re the responsible good guys” arguments are very seductive.
Kokotajlo is asked about his AGI timeline which starts to tail off around 2030, and he answers that with the enormous investments currently made and the increasingly expensive training runs, “if you don’t get to some sort of radical transformation, if you don’t get to some sort of crazy AI powered automation of the economy by the end of this decade, then there’s going to be a bit of an AI winter. There’s going to be at the very least a sort of tapering off of the pace of progress.”
Marcus asks “What would AI look like two years before we achieved AGI or ASI” and here I agree with him. The state of AI is too broken to be considered genuinely close to AGI nevermind ASI and we need technical breakthroughs in order to get close to either goal. He says current systems can’t even play chess reliably according to the rules. The 1979 game Atari Chess beats ChatGPT at chess. By the way, isn't it ironic that 30 years ago Chess was like the holy grail of AI, whereas now Frontier AI labs don't view it worth the effort to make AI call Stockfish which is available for free.
Kokotajlo argues we wouldn’t understand how a super AI created with 1⁰⁴⁵ compute works, but also we don’t understand how today’s AI works. Marcus and Kokotajlo agree that there’s too much alchemy and not enough principles.
Marcus says a modern LLMs lack skills to solve more complex Tower of Hanoi problem because they can’t generalise the solution. Herbert Simon solved Tower of Hanoi in 1975. Marcus said ’57 but I think this is incorrect. He is however correct about a very impressive video of a young girl solving Tower of Hanoi, but this doesn’t necessarily mean she can solve it for any number of disks. Marcus mentions he wrote a science fiction essay with neuroscientist Christof Koch.
The interview was recorded a few months ago and Marcus says for being close to AGI, him and Yann LeCun “would both emphasise World Models.” He also says reasoning models aren’t robust enough.
Koktajlo argues that “we don’t even know what we’re doing. How are we supposed craft a mind that has the right virtues and principles?”
To Marcus’ point on Tower of Hanoi and complex math problems, Kokotajlo says “I also can’t do Tower of Hanoi or large math problems in my head. I need tools.”
Marcus says 6 year olds can learn the rules of chess but GPT-o3 will do things like make the Queen jump over the Knight.
Hendrycks mostly agrees with Marcus, listing many things AI can’t do well where rapid progress is not currently being made, emphasising visual intelligence. He says its important to split up “one’s notion of cognition.”
Marcus adds physical intelligence is an area where current AI struggles. Kokotajlo objects to Hendrycks remark on “undirectional intelligence” saying that he never said he thought intelligence was a single dimension, and that his “benchmarks plus gaps” analysis considers all the things mentioned as part of the gaps analysis.
Marcus says the most bullish case for AGI is 2030, which he doesn’t think is very likely. He thinks 2027 is totally implausible.
Marcus says we haven’t made enough progress on alignment. Hendrycks says aligning current models is different than aligning recursive models that give rise to super intelligence. He says we need to resolve geopolitical competitive pressures so that companies can proceed with this more slowly.
He says we align current models “fairly reasonably” but there are safety critical domains, and AI companies are not using the most adversarially robust techniques because they come with a cost of 1–2% of MMLU.
Hendrycks says the alignment techniques that are most effective against adversarial attacks aren’t being used because they have a cost of 1–2% of MMLU (Massive Multitask Language Understanding).
Gary Marcus jokes the epitaph for humanity will read: “If they hadn’t squeezed out that extra 1% MMLU we would’ve been okay.” He raises the point: Asimov’s “A robot may not injure a human being or, through inaction, allow a human being to come to harm.” was written in 1942 but today AI companies are compromising on this principle.
Marcus says this foresseable harm to humans is a reason to say “Hey, we gotta wait until we have a better solution here” but Hendrycks raises the issue of incentives and says some different things are recommended in his SuperIntelligence Strategy paper.
Marcus and Hendrycks agree that although completely stopping may not be realistic, there should be some kind of intervention to reduce catastrophic risk. Kokotajlo takes Marcus’ “stand in front of the train” metaphor and says a lot of bodies would need to be run over to slow the train down.
Hendrycks says its a problem to use the word “solve” alignment because fires will emerge as new capabilities are tried in the wild. With recursion this effect will be magnified enormously.
Marcus attempts to conclude, saying they agree on a lot, and even on the timelines they don’t completely disagree. Most importantly everyone is “completely agreed that we’re not doing a great job on the alignment problem”, and that the “current companies are not entirely trustworthy.”
On the AGI 2027 timelines Marcus says he thinks by the end of 2025 we’ll be slightly behind where the predictions say we’ll be. He says the kind of ASI described in ASI is not very likely in the next decade (2025–2035) and not super likely in the decade after (2035–2045) but it’s certainly possible.
On Superhuman coder, Marcus thinks we already have a good apprentice engineer, but that we’re not yet close to a really good AI coder, “you’re not going to get a machine that is Jeff Dean anytime in the next decade.”
He says vivid details in possible scenarios overwhelm people’s senses, and it would be more useful to have a distribution of scenarios. He says this is a big ask because AGI 2027 as it stands was already a lot of work.
Kokotajlo says they are working on a “good ending” (this work is now completed) and some other scenarios shown at a lower level of detail.
Marcus says current LLMs solve problems inside the box, and getting to AGI is going to require outside the box solutions. He says AlphaGo’s famous move 37 was inside the box, whereas Einstein’s Special and General Relativity was outside the box thinking. We need Einstein level innovations to reach AGI.
Kokotajlo asks Marcus, once we reach AGI, how fast will the ASI takeoff be? Marcus doesn’t directly ask, rather he makes his own talking points. He says there’s no question that machines will exceed humans at some point, but when and how remain open questions.
Hendrycks says its plausible to have typical human level AI by 2030. He says we need to really ease the geopolitical competitive pressures. Some relevant factors for this are transparency and how easy it is to do espionage.
He says China could do a cyberattack on power utilities in the USA, or attack data centers with plausible deniability. At the moment China can just hack Slack to view the communications in Anthropic, OpenAI, XAI, DeepMind etc.