One frustrating conversation was about persuasion. Somehow there continue to be some people who can at least somewhat feel the AGI, but also genuinely think humans are at or close to the persuasion possibilities frontier – that there is no room to greatly expand one’s ability to convince people of things, or at least of things against their interests.
This is sufficiently absurd to me that I don’t really know where to start, which is one way humans are bad at persuasion. Obviously, to me, if you started with imitations of the best human persuaders (since we have an existence proof for that), and on top of that could correctly observe and interpret all the detailed signals, have limitless time to think, a repository of knowledge, the chance to do Monty Carlo tree search of the conversation against simulated humans, never make a stupid or emotional tactical decision, and so on, you’d be a persuasion monster. It’s a valid question ‘where on the tech tree’ that shows up how much versus other capabilities, but it has to be there. But my attempts to argue this proved, ironically, highly unpersuasive.
Scott tried out an intuition pump in responding to nostalgebraist's skepticism:
Nostalgebraist: ... it’s not at all clear that it is possible to be any better at cult-creation than the best historical cult leaders — to create, for instance, a sort of “super-cult” that would be attractive even to people who are normally very disinclined to join cults. (Insert your preferred Less Wrong joke here.) I could imagine an AI becoming L. Ron Hubbard, but I’m skeptical that an AI could become a super-Hubbard who would convince us all to become its devotees, even if it wanted to.
Scott: A couple of disagreements. First of all, I feel like the burden of proof should be heavily upon somebody who thinks that something stops at the most extreme level observed. Socrates might have theorized that it’s impossible for it to get colder than about 40 F, since that’s probably as low as it ever gets outside in Athens. But when we found the real absolute zero, it was with careful experimentation and theoretical grounding that gave us a good reason to place it at that point. While I agree it’s possible that the best manipulator we know is also the hard upper limit for manipulation ability, I haven’t seen any evidence for that so I default to thinking it’s false.
(lots of fantasy and science fiction does a good job intuition-pumping what a super-manipulator might look like; I especially recommend R. Scott Bakker’s Prince Of Nothing)
But more important, I disagree that L. Ron Hubbard is our upper limit for how successful a cult leader can get. L. Ron Hubbard might be the upper limit for how successful a cult leader can get before we stop calling them a cult leader.
The level above L. Ron Hubbard is Hitler. It’s difficult to overestimate how sudden and surprising Hitler’s rise was. Here was a working-class guy, not especially rich or smart or attractive, rejected from art school, and he went from nothing to dictator of one of the greatest countries in the world in about ten years. If you look into the stories, they’re really creepy. When Hitler joined, the party that would later become the Nazis had a grand total of fifty-five members, and was taken about as seriously as modern Americans take Stormfront. There are records of conversations from Nazi leaders when Hitler joined the party, saying things like “Oh my God, we need to promote this new guy, everybody he talks to starts agreeing with whatever he says, it’s the creepiest thing.” There are stories of people who hated Hitler going to a speech or two just to see what all the fuss was about and ending up pledging their lives to the Nazi cause. Even while he was killing millions and trapping the country in a difficult two-front war, he had what historians estimate as a 90% approval rating among his own people and rampant speculation that he was the Messiah. Yeah, sure, there was lots of preexisting racism and discontent he took advantage of, but there’s been lots of racism and discontent everywhere forever, and there’s only been one Hitler. If he’d been a little bit smarter or more willing to listen to generals who were, he would have had a pretty good shot at conquering the world. 100% with social skills.
The level above Hitler is Mohammed. I’m not saying he was evil or manipulative, just that he was a genius’ genius at creating movements. Again, he wasn’t born rich or powerful, and he wasn’t particularly scholarly. He was a random merchant. He didn’t even get the luxury of joining a group of fifty-five people. He started by converting his own family to Islam, then his friends, got kicked out of his city, converted another city and then came back at the head of an army. By the time of his death at age 62, he had conquered Arabia and was its unquestioned, God-chosen leader. By what would have been his eightieth birthday his followers were in control of the entire Middle East and good chunks of Africa. Fifteen hundred years later, one fifth of the world population still thinks of him as the most perfect human being ever to exist and makes a decent stab at trying to conform to his desires and opinions in all things.
The level above Mohammed is the one we should be worried about.
The tabletop game sounds really cool!
Interesting takeaways.
The first was exactly the above point, and that at some point, ‘I or we decide to trust the AIs and accept that if they are misaligned everyone is utterly f***ed’ is an even stronger attractor than I realized.
Yeah, when you say it like that... I feel like this is gonna be super hard to avoid!
The second was that depending on what assumptions you make about how many worlds are wins if you don’t actively lose, ‘avoid turning wins into losses’ has to be a priority alongside ‘turn your losses into not losses, either by turning them around and winning (ideal!) or realizing you can’t win and halting the game.’
There's also the option of, once you realize that winning is no longer achievable, trying to lose less badly than you could have otherwise. For instance, if out of all the trajectories where humans lose, you can guess that some of them seem more likely to bring about some extra bad dystopian scenario, you can try to prevent at least those. Some examples that I'm thinking of are AIs being spiteful or otherwise anti-social (on top of not caring about humans) or AIs being conflict-prone in AI-vs-AI interactions (including perhaps AIs aligned to alien civilizations). Of course, it may not be possible to form strong opinions over what makes for a better or worse "losing" scenario – if you remain very uncertain, all losing will seem roughly equally not valuable.
The third is that certain assumptions about how the technology progresses had a big impact on how things play out, especially the point at which some abilities (such as superhuman persuasiveness) emerge.
Yeah, but I like the idea of rolling dice for various options that we deem plausible (and having this built into the game).
I'm curious to read takeaways from more groups if people continue to try this. Also curious on players' thoughts on good group sizes (how many people played at once and whether you would have preferred more or fewer players).
But if you introduce AI into the mix, you don’t only get to duplicate exactly the ‘AI shaped holes’ in the previous efforts.
I have decided I like the AI shaped holes phraseology, because it highlights the degree to which this is basically a failure in the perception of human managers. There aren't any AI shaped holes because the entire pitch with AI is we have to tell the AI what shape to take. Even if we constrain ourselves to LLMs, the AI docs literally and exactly describe how to tell it what role to fill.
Playing the AIs definitely seems like the most challenging role
Seems like a missed opportunity not having the AIs be played bi AIs
I wonder how you found out about the Curve conference? I’ve been to a few fun Lighthaven events (e.g. Manifest), and I’m looking for a way to be alerted to new ones.
People don’t give thanks enough, and it’s actual Thanksgiving, so here goes.
Thank you for continuing to take this journey with me every week.
It’s a lot of words. Even if you pick and choose, and you probably should, it’s a lot of words. You don’t have many slots to spend on things like this. I appreciate it.
Thanks in particular for those who are actually thinking about all this, and taking it seriously, and forming their own opinions. It is the only way. To everyone who is standing up, peacefully and honestly, for whatever they truly think will make the world better, even if I disagree with you.
Thanks to all those working to ensure we all don’t die, and also those working to make the world a little richer, a little more full of joy and fun and health and wonder, in the meantime. Thanks for all the super cool toys, for they truly are super cool.
And thanks to all the parts of reality that work to so often keep it light and interesting along the way, and for not losing touch with the rest of the world. This is heavy stuff. You cannot let it fully take over your head. One must imagine Buffy at the prom.
Thanks of course to my health, my kids, all my family and friends, and all the friends I have that I don’t even know about yet. The world is really cool like that.
Special thanks to those who help make my writing possible and sustainable.
From this past week, I’ll also give thanks for those who organized The Curve, a conference I was able to attend last weekend, and those who help run Lighthaven, and all the really cool people I met there.
Thanks to the universe, for allowing us to live in interesting times, and plausibly giving us paths to victory. What a time to be alive, huh?
Oh, and I am so thankful I managed to actually stay out of the damn election, and that we are finally past it, that we’re mostly on a break from legislative sessions where I need to keep reading bills, and for the new College Football Playoff.
Life can be good, ya know?
Table of Contents
Language Models Offer Mundane Utility
Sully reports on new Cursor rival Windsurf, says it is far superior at picking up code nuances and makes fewer mistakes, which are big games, but it’s still slow and clunky and the UX could use some work. Doubtless all these offerings will rapidly improve.
He also says the new .43 Cursor update is fantastic, faster code application, less buggy composer, better at context. I’m excited to get back to coding when I catch up on everything. Even I’m starting to get Sully’s ‘want personal software? With AI coding sure just whip it up’ vibes.
Get o1-preview to tell you where local repair shops are by saying you’re writing a novel featuring local repair shops, which worked, as opposed to asking for repair shops, which caused hallucinations. Note that o1 is the wrong tool here either way, if what you want doesn’t require its reasoning, math or coding strengths you want to use GPT-4o instead and get web search (or Gemini or Claude).
o1 is assisting with nuclear fusion research.
It’s a Poet Whether or Not You Know It
Last week we discussed an experiment where people preferred AI generated poems to famous human poems, and failed to identify which was which. Colin Fraser thinks this says more about what people think of poetry than it does about AI.
I can see this both ways. I certainly believe that poetry experts can very easily still recognize that the human poems are human and the AI poems are AI, and will strongly prefer the human ones because of reasons, even if they don’t recognize the particular poems or poets. I can also believe that they are identifying something real and valuable by doing so.
But it’s not something I expect I could identify, nor do I have any real understanding of what it is or why I should care? And it certainly is not the thing the AI was mostly training to predict or emulate. If you want the AI to create poetry that poetry obsessives will think is as good as Walt Whitman, then as Colin points out you’d use a very different set of training incentives.
Huh, Upgrades
Claude has styles, you can choose presets or upload a writing sample to mimic.
Anthropic introduces and open sources the Model Context Protocol (MCP).
They plan to expand to enterprise-grade authentication, with the goal being to let Claude then use it to do anything your computer can do.
Last week DeepMind’s Gemini briefly took the lead over GPT-4o on Chatbot Arena, before GPT-4o got an upgrade that took the top spot back. Then Gemini got another upgrade and took the lead back again, with gains across the board, including in coding, reasoning and visual understanding. One problem with the new Geminis is that they only have 32k input windows.
Sully thinks Google cooked with Gemini-1121 and has it as his new go-to high-end model for agent tasks. He still has Claude as best for coding.
LiveBench was suggested as a better alternative to the Chatbot Arena. It has the advantage of ‘seeming right’ in having o1-preview at the top followed by Sonnet, followed by Gemini, although there are some odd deltas in various places, and it doesn’t include DeepSeek. It’s certainly a reasonable sanity check.
Claude now allows you to add content directly from Google Docs to chats and projects via a link.
Poe offers $10/month subscription option, lower than the $20 for ChatGPT or Claude directly, although you only get 10k points per day.
Need help building with Gemini? Ping Logan Kilpatrick, his email is lkilpatrick@google.com, he says he’ll be happy to help and Sully verifies that he actually does help.
Thanks for the Memories
Personalization of AIs is certainly the future in some ways, especially custom instructions. Don’t leave home without those. But ChatGPT’s memory feature is odd.
The weird thing about ChatGPT memory is how you have so little control over it. Sometimes it’ll decide ‘added to memory,’ which you can encourage but flat out asking isn’t even reliable. Then you can either delete them, or keep them, and that’s pretty much it. Why not allow us to add to or edit them directly?
Curve Ball
I had the opportunity this past weekend to attend The Curve, a curated conference at the always excellent Lighthaven. The goal was to bring together those worried about AI with those that had distinct perspectives, including e/acc types, in the hopes of facilitating better discussions.
The good news is that it was a great conference, in the ways that Lighthaven conferences are consistently excellent. It is an Amazingly Great space, there were lots of great people and discussions both fun and informative. I met lots of people, including at least one I hope will be a good friend going forward, which is already a great weekend. There’s always lots happening, there’s always FOMO, the biggest problem is lack of time and sleep.
The flip side is that this felt more like “a normal Lighthaven conference” than the pitch, in that there weren’t dramatic arguments with e/accs or anything like that. Partly of course that is my fault or choice for not pushing harder on this.
I did have a good talk with Dean Ball on several topics and spoke with Eli Dourado about economic growth expectations and spoke with Anton, but the takes that make me want to yell and throw things did not show up. Which was a shame in some ways, because it meant I didn’t get more information on how to convince such folks or allow me to find their best arguments, or seek common ground. There was still plenty of disagreements, but far more reasonable and friendly.
One frustrating conversation was about persuasion. Somehow there continue to be some people who can at least somewhat feel the AGI, but also genuinely think humans are at or close to the persuasion possibilities frontier – that there is no room to greatly expand one’s ability to convince people of things, or at least of things against their interests.
This is sufficiently absurd to me that I don’t really know where to start, which is one way humans are bad at persuasion. Obviously, to me, if you started with imitations of the best human persuaders (since we have an existence proof for that), and on top of that could correctly observe and interpret all the detailed signals, have limitless time to think, a repository of knowledge, the chance to do Monty Carlo tree search of the conversation against simulated humans, never make a stupid or emotional tactical decision, and so on, you’d be a persuasion monster. It’s a valid question ‘where on the tech tree’ that shows up how much versus other capabilities, but it has to be there. But my attempts to argue this proved, ironically, highly unpersuasive.
Anton, by virtue of having much stronger disagreements with most people at such conferences, got to have more of the experience of ‘people walking around saying things I think are nuts’ and talks online as if he’s going to give us that maddening experience we crave…
…but then you meet him in person and he’s totally reasonable and great. Whoops?
I was at the ‘everyone dies’ Q&A as well, which highlighted the places my model differs a lot from Eliezer’s, and made me wonder about how to optimize his messaging and explanations, which is confirmed to be an ongoing project.
There was a panel on journalism and a small follow-up discussion about dealing with journalists. It seems ‘real journalists’ have very different ideas of their obligations than I, by implication not a ‘real journalist,’ think we should have, especially our obligations to sources and subjects.
One highlight of the conference was a new paper that I look forward to talking about, but which is still under embargo. Watch this space.
ASI: A Scenario
The other highlight was playing Daniel Kokotajlo’s tabletop wargame exercise about a takeoff scenario in 2026.
I highly recommend playing it (or other variations, such as Intelligence Rising) to anyone who gets the opportunity, and am very curious to watch more experienced people (as in NatSec types) play. I was told that the one time people kind of like that did play, it was rather hopeful in key ways, and I’d love to see if that replicates.
There were two games played.
I was in the first group that played outside. I was assigned the role of OpenAI, essentially role playing Sam Altman and what I thought he would do, since I presumed by then he’d be in full control of OpenAI, until he lost a power struggle over the newly combined US AI project (in the form of a die roll) and I was suddenly role playing Elon Musk.
It was interesting, educational and fun throughout, illustrating how some things were highly contingent while others were highly convergent, and the pull of various actions.
In the end, we had a good ending, but only because the AIs initial alignment die roll turned out to be aligned to almost ‘CEV by default’ (technically ‘true morality,’ more details below). If the AIs had been by default (after some alignment efforts but not extraordinary efforts) misaligned, which I believe is far more likely in such a scenario, things would have ended badly one way or another. We had a pause at the end, but it wasn’t sufficiently rigid to actually work at that point, and if it had been the AIs presumably would have prevented it. But the scenario could have still gone badly despite the good conditions, so at least that other part worked out.
Here’s Jan Kulveit, who played the AIs in our outside copy of the game, with his summary of what happened on Earth-1 (since obviously one’s own version is always Earth-1, and Anton’s is therefore Earth-2).
Yes, they will all delegate to the AIs, with no manipulation required beyond ‘appear to be helpful and aligned,’ because the alternative is others do it anyway and You Lose, unless everyone can somehow agree collectively not to do it.
I didn’t pay more attention to alignment, because I didn’t think my character would have done so. If anything I felt I was giving Altman the benefit of the doubt and basically gave the alignment team what they insisted upon and took their statements seriously when they expressed worry. At one point we attempted to go to the President with alignment concerns, but she (playing Trump) was distracted with geopolitics and didn’t respond, which is the kind of fun realism you get in a wargame.
There were many takeaways from my game, but three stand out.
The first was exactly the above point, and that at some point, ‘I or we decide to trust the AIs and accept that if they are misaligned everyone is utterly f***ed’ is an even stronger attractor than I realized.
The second was that depending on what assumptions you make about how many worlds are wins if you don’t actively lose, ‘avoid turning wins into losses’ has to be a priority alongside ‘turn your losses into not losses, either by turning them around and winning (ideal!) or realizing you can’t win and halting the game.’
The third is that certain assumptions about how the technology progresses had a big impact on how things play out, especially the point at which some abilities (such as superhuman persuasiveness) emerge.
Anton played the role of the AIs in the other game, and reports here.
Playing the AIs definitely seems like the most challenging role, but there’s lots of fun and high impact decisions in a lot of places. Although not all – one of the running jokes in our game was the ‘NATO and US Allies’ player pointing out the ways in which those players have chosen to make themselves mostly irrelevant.
Lots of other stuff happened at The Curve, too, such as the screening of the new upcoming SB 1047 documentary, in which I will be featured.
Deepfaketown and Botpocalypse Soon
I get wanting to talk to Claude, I do it too, but are people really ‘falling’ for Claude?
I find a lot of the Claude affectation off putting, actually – I don’t want to be told ‘great idea’ all the time when I’m coding and all that, and it all feels forced and false, and often rather clingy and desperate in what was supposed to be a technical conversation, and that’s not my thing. Others like that better, I suppose, and it does adjust to context – and the fact that I am put off by the affectation implies that I care about the affectation. I still use Claude because it’s the best model for me in spite of that, but if it actually had affectations that I actively enjoyed? Wowsers.
I do not think such caution is warranted, and indeed it seems rather silly this early.
And indeed, ceasing your in-person meetings in February 2020 would have also been a rather serious error. Yes, Davidad was making a correct prediction that Covid-19 was coming and we’d have to stop meeting. But if you stop your human contact too soon, then you didn’t actually reduce your risk by a non-trivial amount, and you spent a bunch of ‘distancing points’ you were going to need later.
Janus of course thought the whole caution thing was hilarious.
He also reasonably asks what harm has actually taken place. Who is being ‘eaten alive’ by this? Yes, the character.ai incident happened, but that seems very different from people talking to Claude.
I continue to be an optimist here, that talking to AIs can enhance human connection, and development of interpersonal skills. Here is an argument that the price is indeed very cheap to beat out what humans can offer, at least in many such cases, and especially for those who are struggling.
This is such a vicious pattern. Once you fall behind, or can’t do the ‘normal’ things that enable one to skill up and build connections, everything gets much harder.
As I’ve noted before, Claude and other AI tools offer a potential way out of this. You can ‘get reps’ and try things, iterate and learn, vastly faster and easier than you could otherwise. Indeed, it’s great for that even if you’re not in such a trap.
Here is a suggested prompt for using Claude as a therapist.
But fair warning, here Claude flip flops 12 times on the red versus blue pill question (where if >50% pick red, everyone who picked blue dies, but if >50% pick blue everyone lives). So maybe you’d want your therapist to have a little more backbone than that?
There’s also this:
They Took Our Jobs
The AI monitoring does, as the Reddit post here was titled, seem out of control.
To state the obvious, using this kind of software has rapidly decreasing marginal returns that can easily turn highly negative. You are treating employees as the enemy and making them hate you, taking away all their slack, focusing them on the wrong things. People don’t do good work with no room to breathe or when they are worried about typing speed or number of emails sent, so if you actively need good work, or good employees? NGMI.
Some additional good advice:
Andrew Critch is building Bayes Med to create the AI doctor, which is designed to supplement and assist human doctors. The sky’s the limit.
In related startup news, Garry Tan seems oddly and consistently behind the AI curve?
I’m all for products like OpenClinic. And yes, for now humans will remain ‘in the loop,’ the AI cannot fully automate many jobs and especially not doctors.
But that is, as they say, a skill issue. The time will come. The ‘early’ age of AI is about complements, where the AI replaces some aspects of what was previously the human job, or it introduces new options and tasks that couldn’t previously be done at reasonable cost.
What happens when you compliment existing workers, such as automating 50% of a doctor’s workflow? It is possible for this to radically reduce demand, or for it to not do that, or even increase demand – people might want more of the higher quality and lower cost goods, offsetting the additional work speed, even within a specific task.
It is still odd to call that ‘human in the loop’ when before only humans were the entire loop. Yes, ‘human out of the loop’ will be a big deal when it happens, and we mostly aren’t close to that yet, but it might not be all that long, especially if the human doesn’t have regulatory reasons to have to be there.
Aidan Guo asks why YC seems to be funding so many startups that seem like they want to be features. John Pressman says it’s good for acqui-hiring, if you think the main projects will go to the big labs and incumbents, and you might accidentally grow into a full product.
I want to return to this another time, but since it came up at The Curve and it seems important: Often people claim much production is ‘O-Ring’ style, as in you need all components to work so you can move only at the speed of the slowest component – which means automating 9/10 tasks might not help you much. I’d say ‘it still cuts your labor costs by 90% even if it doesn’t cut your time costs’ but beyond that, who is to say that you were currently using the best possible process?
As in, there are plenty of tasks humans often don’t do because we suck at them, or can’t do them at all. We still have all our products, because we choose the products that we can still do, and because we work around our weaknesses. But if you introduce AI into the mix, you don’t only get to duplicate exactly the ‘AI shaped holes’ in the previous efforts.
Fun With Image Generation
Runway introduces Frames, a new image generation model with greater stylistic control. Samples look very good in absolute terms, we’ve come a long way.
Sora leaked and was available for about three hours. OpenAI was giving some artists a free early look, and some of them leaked it to the public in protest, after which they shut it down entirely.
I suppose that’s one way to respond to being given an entirely voluntary offer of free early access without even any expectation of feedback? I get protesting the tools themselves (although I disagree), but this complaint seems odd.
What we saw seems to have been far beyond the previous Sora version and also beyond for example Runway.
Fofr assembles a practical face swapper: Flux redux + character lora + Img2img.
All this stuff has been improving in the background, but I notice I do not feel any urge to actually use any of it outside of some basic images for posts, or things that would flagrantly violate the terms of service (if there’s a really good one available for easy download these days where it wouldn’t violate the TOS, give me a HT, sure why not).
Get Involved
METR is hiring for Senior DevOps Engineer, Technical Recruiter and Senior Machine Learning Research Engineer/Scientist, and you can express general interest.
Introducing
Epoch AI launches an AI Benchmarking Hub, with independent evaluations of leading models, direct link here. Looks promising, but early days, not much here yet.
GenChess from Google Labs, generate a cool looking chess set, then play with it against a computer opponent. Okie dokie.
In Other AI News
Google DeepMind offers an essay called A New Golden Age of Discovery, detailing how AIs can enhance science. It’s all great that this is happening and sure why not write it up as far as it goes, but based on the style and approach here I am tempted to ask, did they mostly let Gemini write this.
Not exactly news, WSJ: Nvidia is increasingly profiting off of big data centers, as its sales boom, and people aren’t sure if this will lead to better future models.
New paper says that resampling using verifiers potentially allows you to effectively do more inference scaling to improve accuracy, but only if the verifier is an oracle. The author’s intuition is that these techniques are promising but only in a narrow set of favorable domains.
Elon Musk promises xAI will found an AI gaming studio, in response to a complaint about the game industry and ‘game journalism’ being ideologically captured, which I suppose is something about ethics. I am not optimistic, especially if that is the motivation. AI will eventually enable amazing games if we live long enough to enjoy them, but this is proving notoriously tricky to do well.
Normative Determinism
I am not concerned about ‘workers get $2 an hour’ in a country where the average wage is around $1.25 per hour, but there is definitely a story. If Sama (the company) was getting paid by Sama (the CEO) $12.50 per hour, and only $2 per hour of that went to the workers, then something is foul is afoot. At least one of these presumably needs to be true:
Quiet Speculations
Aaron Levie speculates, and Greg Brockman agrees, that voice AI with zero latency will be a game changer. I also heard someone at The Curve predict this to be the next ‘ChatGPT moment.’ It makes sense that there could be a step change in voice effectiveness when it gets good enough, but I’m not sure the problem is latency exactly – as Marc Benioff points out here latency on Gemini is already pretty low. I do think it would also need to improve on ability to handle mangled and poorly constructed prompts. Until then, I wouldn’t leave home without the precision of typing.
Richard Ngo draws the distinction between two offense-defense balances. If it’s my AI versus your AI, that’s plausibly a fair fight. It isn’t obvious which side has the edge. However, if it’s my AI versus your AI defended humans, then you have a problem with the attack surface. That seems right to me.
That’s not too dissimilar from the cybersecurity situation, where if I have an AI on defense of a particular target then it seems likely to be balanced or favor defense especially if the defenders have the most advanced tech, but if your AI gets to probe everything everywhere for what isn’t defended properly, then that is a big problem.
Is AI ‘coming for your kids’? I mean, yes, obviously, although to point out the obvious, this should definitely not be an ‘instead of’ worrying about existential risk thing, it’s an ‘in addition to’ thing, except also kids having LLMs to use seems mostly great? The whole ‘designed to manipulate people’ thing is a standard scare tactic, here applied to ChatGPT because… it is tuned to provide responses people like? The given reason is ‘political bias’ and that it will inevitably be used for ‘indoctrination’ and of the left wing kind, not of the ‘AIs are great’ kind. Buy as she points out here, you can just switch to another LLM if that happens.
Jason Wei speculates that, since the average user query only has so much room for improvement, but that isn’t true for research, there will be a sharp transition where AI focuses on accelerating science and engineering. He does notice the ‘strong positive feedback loop’ of AI accelerating AI research, although I presume he does not fully appreciate it.
I think this might well be true of where the important impact of AI starts to be, because accelerating AI research (and also other research) will have immense societal impacts, whether or not it ends well. But in terms of where the bulk of the efforts and money are spent, I would presume it is still with the typical user and mundane use cases, and for that to be true unless we start to enter a full takeoff mode towards ASI.
The user is still going to be most of the revenue and most of the queries, and I expect there to be a ton of headroom to improve the experience. No, I don’t think AI responses to most queries are close to ideal even for the best and largest models, and I don’t expect to get there soon.
A key clarification, if it checks out (I’d like to see if others making similar claims agree that this is what they meant as well):
Jack Clark reiterates his model that only compute access is holding DeepSeek and other actors behind the frontier, in DeepSeek’s case the embargo on AI chips. He also interprets DeepSeek’s statements here as saying that the Chinese AI industry is largely built on top of Llama.
The Quest for Sane Regulations
Yet another result that AI safety and ethics frames are both much more popular than accelerationist frames, and the American public remains highly negative on AI and pro regulation of AI from essentially every angle. As before, I note that I would expect the public to be pro-regulation even if regulation was a bad idea.
Brent Skorup argues at Reason that deepfake crackdowns threaten free speech, especially those imposing criminal penalties like Texas and Minnesota.
If enforced for real that would be quite obviously insane. Mistakenly share a fake photo on social media, get 5 years in jail? Going after posters for commonplace memes?
Almost always such warnings from places like Reason prove not to come to pass, but part of them never coming to pass is having people like Reason shouting about the dangers.
I continue to wish we had people who would yell if and only if there was an actual problem, but such is the issue with problems that look like ‘a lot of low-probability tail risks,’ anyone trying to warn you risks looking foolish. This is closely paralleled in many other places.
Here’s one of them:
This is what happens with cheaters in Magic: the Gathering, too – you ‘get away with’ each step and it emboldens you to take more than one additional step, so eventually you get too bold and you get caught.
You thought I was going to use AI existential risk there? Nah.
Jennifer Pahlka warns about the regulatory cascade of rigidity, where overzealous individuals and general bureaucratic momentum, and blame avoidance, cause rules to be applied far more zealously and narrowly than intended. You have to anticipate such issues when writing the bill. In particular, she points to requirements in the Biden Executive Order for public consultations with outside groups and studies to determine equity impacts, before the government can deploy AI.
I buy that the requirements in question are exactly the kinds of things that run into this failure mode, and that the Biden Executive Order likely put us on track to run into these problems, potentially quite bigly, and that Trump would be well served to undo those requirements while retaining the dedication to state capacity. I also appreciated Jennifer not trying to claim that this issue applied meaningfully to the EO’s reporting requirements.
The Week in Audio
All right, I suppose I have to talk about Marc Andreessen on Joe Rogan, keeping in mind to remember who Marc Andreessen is. He managed to kick it up a notch, which is impressive. In particular, he says the Biden administration said in meetings they wanted ‘total control of AI’ that they would ensure there would be only ‘two or three big companies’ and that it told him not to even bother with startups.
When I asked on Twitter, since those are rather bold claims, the best color or steelman I got was speculation that this is a restatement of what was claimed in the ‘Time to Choose’ podcast (from about 37-50 min in), which is not much of a defense of the claims here. Dean Ball says that Marc refers to other rhetoric that was present in DC in 2023, but is no longer present… and seems rather distinct from Marc’s claims. Even if you want to be maximally charitable here, he’s not trying to be Details Guy.
The other big thing he claimed was that the Biden administration had a campaign to debank those involved in crypto, which I strongly believe did extensively happen and was rather terrible. It is important to ensure debanking is never used as a weapon.
But Marc then also claims Biden did this to ‘tech founders’ and more importantly ‘political enemies.’ If these are new claims rather than other ways of describing crypto founders, then Huge If True, and I would like to know the examples. If he is only saying that crypto founders are often tech founders and Biden political enemies, perhaps that is technically correct, but it is rather unfortunate rhetoric to say to 100 million people.
Inflame. What a nice word for it. No, I will not be listening to the full podcast.
80,000 Hours on OpenAI’s move to a for profit company.
Dominic Cummings on AI, including speculation that synthetic voters and focus groups within AI models are already indistinguishable from real voters. Haven’t had time to watch this but I expect it to be interesting.
Bret Taylor and Reid Hoffman on AI. Taylor notes that some future people will be sculpting AI experiences as AI architects and conversation designers.
Yann LeCun now says his estimate for human-level AI is that it will be possible within 5-10 years. Would you consider that a short or a long time?
OpenAI SVP of Research Mark Chen outright says there is no wall, the GPT-style scaling is doing fine in addition to o1-style strategies.
Databricks CEO Ali Ghodsi says “it’s pretty clear” that the AI scaling laws have hit a wall because they are logarithmic and although compute has increased by 100 million times in the past 10 years, it may only increase by 1000x in the next decade. But that’s about ability to scale, not whether the scaling will work.
Scale CEO Alexandr Wang says the Scaling phase of AI has ended, despite the fact that AI has “genuinely hit a wall” in terms of pre-training, but there is still progress in AI with evals climbing and models getting smarter due to post-training and test-time compute, and we have entered the Innovating phase where reasoning and other breakthroughs will lead to superintelligence in 6 years or less. So there’s that. Imagine if also scaling wasn’t done.
Robin Hanson says some time in the next century the economy will start doubling every month and most humans will lose their jobs so we should… insure against this. I notice I am confused about how insurance can solve your problems in that scenario.
Rhetorical Innovation
Eliezer Yudkowsky moral parable, yes it goes where you think it does.
Seb Krier ‘cheat sheet’ on the stupidities of AI policy and governance, hopefully taken in the spirit in which it was intended.
Richard Ngo continues to consider AGIs as an AGI for a given time interval – a ‘one minute AGI’ can outperform one minute of a human, with the real craziness coming around a 1-month AGI, which he predicts for 6-15 years from now. Richard expects maybe 2-5 years between each of 1-minute, 1-hour, 1-day and 1-month periods, whereas Daniel Kokotajlo points out that these periods should shrink as you move up. If you do have the 1-day AGI, then that seems like it should greatly accelerate your path to the 1-month one.
Aligning a Smarter Than Human Intelligence is Difficult
Seb Krier collects thoughts about the ways alignment is difficult, and why it’s not only about aligning one particular model. There’s a lot of different complex problems to work out, on top of the technical problem, before you emerge with a win. Nothing truly new but a good statement of the issues. The biggest place I disagree is that Seb Krier seems to be in the ‘technical alignment seems super doable’ camp, whereas I think that is a seriously mistaken conclusion – not impossible, but not that likely, and I believe this comes from misunderstanding the problems and the evidence.
Pick Up the Phone
Or maybe you don’t even have to? Gwern, in full, notes that Hsu says China is not racing to AGI so much as it is determined not to fall too far behind, and would fast follow if we got it, so maybe a ‘Manhattan Project’ would be the worst possible idea right now, it’s quite possibly the Missile Gap (or the first Manhattan Project, given that no one else was close at the time) all over again:
Garrison Lovely, who wrote the OP Gwern is commenting upon, thinks all of this checks out.
The answer to ‘what do you do when you get AGI a year before they do’ is, presumably, build ASI a year before they do, plausibly before they get AGI at all, and then if everyone doesn’t die and you retain control over the situation (big ifs!) you use that for whatever you choose?
Prepare for Takeoff
Questions that are increasingly asked, with increasingly unsettling answers.
The AIs are still well behind human level over extended periods on ML tasks, but it takes four hours for the lines to cross, and even at the end they still score a substantial percentage of what humans score. Scores will doubtless improve over time, probably rather quickly.
I’m not sure that’s what this study means? Yes, they could improve their scores over more time, but there is a very easy way to improve score over time when you have access to a scoring metric as they did here – you keep sampling solution attempts, and you do best-of-k, which seems like it wouldn’t score that dissimilarly from the curves we see. And indeed, we see a lot of exactly this ‘trial and error’ approach, with 25-37 attempts per hour.
Thus, I don’t think this paper indicates the ability to meaningfully work for hours at a time, in general. Yes, of course you can batch a bunch of attempts in various ways, or otherwise get more out of 8 hours than 1 hour, but I don’t think this was that scary on that front just yet?
Still, overall, rather scary. The way AI benchmarks work, there isn’t usually that long a time gap from here to saturation of the benchmarks involved, in which case watch out. So the question is whether there’s some natural barrier that would stop that. It doesn’t seem impossible, but also seems like we shouldn’t have the right to expect one that would hold for that long.
Even Evaluating an Artificial Intelligence is Difficult
OpenAI releases two new papers on red teaming: External red teaming, and automated red teaming. The analysis here seems basic but solid and helpful.
OpenAI reported that o1-preview is at ‘medium’ CBRN risk, versus ‘low’ for previous models, but expresses confidence it does not rise to ‘high,’ which would have precluded release. Luca Righetti argues that OpenAI’s CBRN tests of o1-preview are inconclusive on that question, because the test did not ask the right questions.
Righetti is correct that these tests on their own are inconclusive. It is easy to prove that an AI does have a capability. It is much harder to prove a negative, that an AI does not have a capability, especially on the basis of a test – you don’t know what ‘unhobbling’ options or additional scaffolding or better prompting could do. I certainly would have liked to have seen more tests here.
In this particular case, having played with o1-preview, I think the decision was fine. Practical hands-on experience says it is rather unlikely to reach ‘high’ levels here, and the testing is suggestive of the same. I would have been comfortable with this particular threat mode here. In addition, this was a closed model release so if unhobbling was discovered or the Los Alamos test had gone poorly, the model could be withdrawn – my guess is it will take a bit of time before any malicious novices in practice do anything approaching the frontier of possibility.
People Are Worried About AI Killing Everyone
The outgoing US Secretary of Commerce, although her focus does seem primarily the effect on jobs:
This was at the inaugural convening of the International Network of AI Safety Institutes from nine nations plus the European Commission that Commerce was hosting in San Francisco.
Once again, Thomas Friedman, somehow.
They’re in that order, too.
The Lighter Side
A poem for the age of em.
True story.
Spy versus not so good spy versus not a spy, which is more likely edition.
I am rather confident that Sarah is not a spy, and indeed seems cool and I added her to my AI list. Although it’s possible, and also possible Samuel is a spy. Or that I’m a spy. You can never really know!
In some ways that is a shame. If there’s anything you wouldn’t have been willing to say to a Chinese spy, you really shouldn’t have been willing to say it at the conference anyway. I would have been excited to talk to an actual Chinese spy, since I presume that’s a great way to get the Chinese key information we need them to have about AI alignment.
(I do think the major AI labs need to greatly ramp up their counterintelligence and cybersecurity efforts, effective yesterday.)
Happy thanksgiving, Greg.