I expected far greater pushback from doctors and lawyers, for example, than we have seen so far.
I believe it's a matter of (correlated) motivated reasoning. My doctor and lawyer friends both seem to be excited about the time that AIs will save them as they do their jobs—in both professions there is immense efficiency to gain by automating more rote parts of the work (legal research, writing patient notes, dealing with insurance, etc.)—but seem to believe that AIs will never fully supplant them. When I press the issue, especially with my doctor friend, he tells me that regulations and insurance will save doctors, to which I say... sure, but only until e.g. AI-powered medicine has real statistics showing they have better health outcomes than human doctors as most of us here would expect. I can imagine the same initial defense from a lawyer who cannot yet imagine an AI being allowed by regulation to represent someone.
Then there's all the usual stuff about how difficult it can be for many people to imagine the world changing so much in the next several years.
I've also heard this argument: sure, AI might take everyone's job, but if that's inevitable anyway, it's still rational to be in an elite profession because they will last slightly longer and/or capture more economic surplus before society breaks down, if it does. On that point, I agree.
More broadly, speaking from the outside (I am a software engineer), to me the cultures of the elite professions have always seemed rather self-assured: everything is fine, nothing is a problem and these elite professionals will always be rich... which means that when the first credible threat to that standing hits, like a jurisdiction allowing fully autonomous doctors/lawyers/etc., it will be pandemonium.
I think that, roughly speaking, these are possible outcomes:
From a doomer perspective, the items 2-5 are not worth discussing, but if we ignore that...
Option 2 is only actionable for you, if you have the power (economical or military) to get control over the first super-intelligent AI, otherwise you are screwed; almost everyone is in the latter group, including all current doctors and lawyers. Option 4 is not actionable. Option 5 means it doesn't matter what you do.
Option 3 seems... not completely implausible... and provides a reason to try staying upper-middle class as long as possible, just in case the AI will decide to preserve the existing social structures in the future.
everyone else is their slave
Post-AGI humans can't be centrally slaves, because human labor won't be valuable.
Help you be a better historian, generating interpretations, analyzing documents.
Let it try Voynich manuscript.
I think Noah Carl was coping with the “downsides” he listed. Loss of meaning and loss of status are complete jokes. They are the problems of people who don’t have problems. I would even argue that focusing on X-risks rather than S-risks is a bigger form of cope than denying AI is intelligent at all. I don’t see how you train a superintelligent military AI that doesn’t come to the conclusion that killing your enemies vastly limits the amount of suffering you can inflict upon them.
Edit: I think loss of actual meaning, like conclusive proof we're in a dysteleology, would not be a joke. But I think that loss of meaning in the sense of "what am I going to do if I can't win at agent competition anymore :(" feels like a very first-world problem.
If the US's models follow some version of Anthropic's model of AI safety, applying secret "enhanced safety filters" and restricting or banning you at their pleasure, while the Chinese ones in sharp contrast are open and helpful, as seems likely, 中华人民共和国万岁!
many AI economics papers have a big role for principal-agent problems as their version of AI alignment.
I wasn't aware of such papers but found some:
AIs can be copied on demand. So can entire teams and systems. There would be no talent or training bottlenecks. Customization of one becomes customization of all. A virtual version of you can be everywhere and do everything all at once.
This is "efficient copying of acquired knowledge" - what I predicted to be the fifth meta innovation, Robin Hanson asked about.
Lex Fridman spent five hours talking AI and other things with Dylan Patel of SemiAnalysis. This is probably worthwhile for me and at least some of you, but man that’s a lot of hours.
Are we already at the point where AI, or some app, can summarize podcasts accurately and extract key takeaways with relatively technical interviewees like Dylan, so we don't need 5 hours (or even 2.5h at 2x)?
Haven't used it much but dexa.ai tries to let you interact with podcast episodes, here's this episode:
I remember that week I used r1 a lot, and everyone was obsessed with DeepSeek.
They earned it. DeepSeek cooked, r1 is an excellent model. Seeing the Chain of Thought was revolutionary. We all learned a lot.
It’s still #1 in the app store, there are still hysterical misinformed NYT op-eds and and calls for insane reactions in all directions and plenty of jingoism to go around, largely based on that highly misleading $6 millon cost number for DeepSeek’s v3, and a misunderstanding of how AI capability curves move over time.
But like the tariff threats that’s now so yesterday now, for those of us that live in the unevenly distributed future.
All my reasoning model needs go through o3-mini-high, and Google’s fully unleashed Flash Thinking for free. Everyone is exploring OpenAI’s Deep Research, even in its early form, and I finally have an entity capable of writing faster than I do.
And, as always, so much more, even if we stick to AI and stay in our lane.
Buckle up. It’s probably not going to get less crazy from here.
Table of Contents
From this week: o3-mini Early Days and the OpenAI AMA, We’re in Deep Research and The Risk of Gradual Disempowerment from AI.
Language Models Offer Mundane Utility
You can subvert OpenAI’s geolocation check with a VPN, but of course never do that.
Help you be a better historian, generating interpretations, analyzing documents. This is a very different modality than the average person using AI to ask questions, or for trying to learn known history.
Diagnose your child’s teeth problems.
Figure out who will be mad about your tweets. Next time, we ask in advance!
I’m not ready to put my API key into a random website, but that’s how AI should work these days. You don’t like the UI, build a new one. I don’t want voice input myself, but highlighting and autoloading and the rest all sound cool.
Indeed, that was the killer app for which I bought a Daylight computer. I’ll report back when it finally arrives.
Meanwhile the actual o3-mini-high interface doesn’t even let you to upload the PDF.
Consensus on coding for now seems to be leaning in the direction that you use Claude Sonnet 3.6 for a majority of ordinary tasks, o1-pro or o3-mini-high for harder ones and one shots, but reasonable people disagree.
Karpathy has mostly moved on fully to “vibe coding,” it seems.
Based on my experience with cursor I have so many questions on how that can actually work out, then again maybe I should just be doing more projects and webapps.
I do think Sully is spot on about vibe coding rewarding doing the same things everyone else is doing. The AI will constantly try to do default things, and draw upon its default knowledge base. If that means success, great. If not, suddenly you have to do actual work. No one wants that.
Sully interpretes features like Canvas and Deep Research as indicating the app layer is ‘where the value is going to be created.’ As always the question is who can provide the unique step in the value chain, capture the revenue, own the customer and so on, customers want the product that is useful to them as they always do, and you can think of ‘the value’ as coming from whichever part of the chain depending on perspective.
It is true that for many tasks, we’ve past the point where ‘enough intelligence’ is the main problem at hand. So getting that intelligence into the right package and UI is going to drive customer behavior more than being marginally smarter… except in the places where you need all the intelligence you can get.
Anthropic reminds us of their Developer Console for all your prompting needs, they say they’re working on adapting it for reasoning models.
Nate Silver offers practical advice in preparing for the AI future. He recommends staying on top of things, treating the future as unpredictable, and to focus on building the best complements to intelligence, such as personal skills.
New York Times op-ed pointing out once again that doctors with access to AI can underperform the AI alone, if the doctor is insufficiently deferential to the AI. Everyone involved here is way too surprised by this result.
Daniel Litt explains why o3-mini-high gave him wrong answers to a bunch of math questions but they were decidedly better wrong answers than he’d gotten from previous models, and far more useful.
o1-Pro Offers Mundane Utility
Tyler Cowen gets more explicit about what o1 Pro offers us.
I’m quoting this one in full.
Economics questions in the Tyler Cowen style are like complex coding questions, in the wheelhouse of what o1 pro does well. I don’t know that I would extend this to ‘all tough questions,’ and for many purposes inability to browse the web is a serious weakness, which of course Deep Research fully solves.
Whereas they types of questions I tend to be curious about seem to have been a much worse fit, so far, for what reasoning models can do. They’re still super useful, but ‘the smartest entity yet devised’ does not, in my contexts, yet seem correct.
We’re in Deep Research
Tyler Cowen sees OpenAI’s Deep Research (DR), and is super impressed with the only issue being lack of originality. He is going to use its explanation of Ricardo in his history of economics class, straight up, over human sources. He finds the level of accuracy and clarity stunning, on most any topic. He says ‘it does not seem to make errors.’
I wonder how much of his positive experience is his selection of topics, how much is his good prompting, how much is perspective and how much is luck. Or something else? Lots of others report plenty of hallucinations. Some more theories here at the end of this section.
Ruben Bloom throws DR at his wife’s cancer from back in 2020, finds it wouldn’t have found anything new but would have saved him substantial amounts of time, even on net after having to read all the output.
Nick Cammarata asks Deep Research for a five page paper about whether he should buy one of the cookies the gym is selling, the theory being it could supercharge his workout. The answer was that it’s net negative to eat the cookie, but much less negative than working out is positive either way, so if it’s motivating go for it.
Is it already happening? I take no position on whether this particular case is real, but this class of thing is about to be very real.
I mean, firing people to replace them with an AI research assistant, sure, but you’re saying you have friends?
Another thing that will happen is the AIs being the ones reading your paper.
Here’s the best bear case I’ve seen so far for the current version, from the comments, and it’s all very solvable practical problems.
The natural response is ‘PB is using it wrong.’ You look for what an AI can do, not what it can’t do. So if DR can do [X-1] but not [X-2] or [Y], have it do [X-1]. In this case, PB’s request is for some very natural [X-2]s.
It is a serious problem to not have access to Reddit or Substack or related sources. Not being able to get to gated journals even when you have credentials for them is a big deal. And it’s really annoying and limiting to not have PDF uploads.
That does still leave a very large percentage of all human knowledge. It’s your choice what questions to ask. For now, ask the ones where these limitations aren’t an issue.
Or even the ones where they are an advantage?
Tyler Cowen gave perhaps the strongest endorsement so far of DR.
It does not seem like a coincidence that he is also someone who has strongly advocated for an epistemic strategy of, essentially, ignoring entirely sources like Substack and Reddit, in favor of more formal ones.
It also does not seem like a coincidence that Tyler Cowen is the fastest reader.
So you have someone who can read these 10-30 page reports quickly, glossing over all the slop, and who actively wants to exclude many of the sources the process excludes. And who simply wants more information to work with.
It makes perfect sense that he would love this. That still doesn’t explain the lack of hallucinations and errors he’s experiencing – if anything I’d expect him to spot more of them, since he knows so many facts.
Language Models Don’t Offer Mundane Utility
But can it teach you how to use the LLM to diagnose your child’s teeth problems? PoliMath asserts that it cannot – that the reason Eigenrobot could use ChatGPT to help his child is because Eigenrobot learned enough critical thinking and domain knowledge, and that with AI sabotaging high school and college education people will learn these things less. We mentioned this last week too, and again I don’t know why AI couldn’t end up making it instead far easier to teach those things. Indeed, if you want to learn how to think, be curious alongside a reasoning model that shows its chain of thought, and think about thinking.
Model Decision Tree
I offered mine this week, here’s Sully’s in the wake of o3-mini, he is often integrating into programs so he cares about different things.
It really is crazy the Claude Sonnet 3.6 is still in everyone’s mix despite all its limitations and how old it is now. It’s going to be interesting when Anthropic gets to its next cycle.
Huh, Upgrades
Gemini app now fully powered by Flash 2.0, didn’t realize it hadn’t been yet. They’re also offering Gemini 2.0 Flash Thinking for free on the app as well, how are our naming conventions this bad, yes I will take g2 at this point. And it now has Imagen 3 as well.
Gemini 2.0 Flash, 2.0 Flash-Lite and 2.0 Pro are now fully available to developers. Flash 2.0 is priced at $0.10/$0.40 per million.
The new 2.0 Pro version has 2M context window, ability to use Google search and code execution. They are also launching a Flash Thinking that can directly interact with YouTube, Search and Maps.
1-800-ChatGPT now lets you upload images and chat using voice messages, and they will soon let you link it up to your main account. Have fun, I guess.
Bot Versus Bot
I mean, great plan, explicitly going for superhuman persuasion and deception then straight to open source, I’m sure absolutely nothing could go wrong here.
Tactical understanding and skill in Diplomacy is underrated, but I do think it’s a good choice. If anyone plays out a game (with full negotiations) among leading LLMs through at least 1904, I’ll at least give a shoutout. I do think it’s a good eval.
The OpenAI Unintended Guidelines
This is a tricky situation. From a public relations perspective, you absolutely do not want the AI to claim in chats that it is conscious (unless you’re rather confident it actually is conscious, of course). If that happens occasionally, even if they’re rather engineered chats, then those times will get quoted, and it’s a mess. LLMs are fuzzy, so it’s going to be pretty hard to tell the model to never affirm [X] while telling it not to assume it’s a rule to claim [~X]. Then it’s easy to see how that got extended to personal preferences. Everyone is deeply confused about consciousness, which means all the training data is super confused about it too.
Peter Wildeford on DeepSeek
Peter Wildeford offers ten takes on DeepSeek and r1. It’s impressive, but he explains various ways that everyone got way too carried away. At least the first seven not new takes, but they are clear and well-stated and important, and this is a good explainer.
For example I appreciated this on the $6 million price tag, although the ratio is of course not as large as the one in the metaphor:
Here’s his price-capabilities graph:
I suspect this is being unfair to Gemini, it is below r1 but not by as much as this implies, and it’s probably not giving o1-pro enough respect either.
Then we get to #8, the first interesting take, which is that DeepSeek is currently 6-8 months behind OpenAI, and #9 which predicts DeepSeek may fall even further behind due to deficits of capital and chips, and also because this is the inflection point where it’s relatively easy to fast follow. To the extent DeepSeek had secret sauce, it gave quite a lot of it away, so it will need to find new secret sauce. That’s a hard trick to keep pulling off.
The price to keep playing is about to go up by orders of magnitude, in terms of capex and in terms of compute and chips. However far behind you think DeepSeek is right now, can DeepSeek keep pace going forward?
You can look at v3 and r1 and think it’s impressive that DeepSeek did so much with so little. ‘So little’ is plausibly 50,000 overall hopper chips and over a billion dollars, see the discussion below, but that’s still chump change in the upcoming race. The more ruthlessly efficient DeepSeek was in using its capital, chips and talent, the more it will need to be even more efficient to keep pace as the export controls tighten and American capex spending on this explodes by further orders of magnitude.
Our Price Cheap
EpochAI estimates the marginal cost of training r1 on top of v3 at about ~$1 million.
SemiAnalysis offers a take many are now citing, as they’ve been solid in the past.
That’s in addition to o1-pro, which also wasn’t considered in most comparisons. They also consider Gemini Flash 2.0 Thinking to be on par with r1, and far cheaper.
Teortaxes continues to claim it is entirely plausible the lifetime spend for all of DeepSeek is under $200 million, and says Dylan’s capex estimates above are ‘disputed.’ They’re estimates, so of course they can be wrong, but I have a hard time seeing how they can be wrong enough to drive costs as low as under $200 million here. I do note that Patel and SemiAnalysis have been a reliable source overall on such questions in the past.
Teortaxes also tagged me on Twitter to gloat that they think it is likely DeepSeek already has enough chips to scale straight to AGI, because they are so damn efficient, and that if true then ‘export controls have already failed.’
I find that highly unlikely, but if it’s true then (in addition to the chance of direct sic transit gloria mundi if the Chinese government lets them actually hand it out and they’re crazy enough to do it) one must ask how fast that AGI can spin up massive chip production and bootstrap itself further. If AGI is that easy, the race very much does not end there.
Thus even if everything Teortaxes claims is true, that would not mean ‘export controls have failed.’ It would mean we started them not a moment too soon and need to tighten them as quickly as possible.
And as discussed above, it’s a double-edged sword. If DeepSeek’s capex and chip use is ruthlessly efficient, that’s great for them, but it means they’re at a massive capex and chip disadvantage going forward, which they very clearly are.
Also, SemiAnalysis asks the obvious question to figure out if Jevons Paradox applies to chips. You don’t have to speculate. You can look at the pricing.
Nvidia is down on news not only that their chips are highly useful, but on the same news that causes people to spend more money for access to those chips. Curious.
Otherwise Seeking Deeply
DeepSeek’s web version appears to send your login information to a telecommunications company barred from operating in the United States, China Mobile, via a heavily obfuscated script. They didn’t analyze the app version. I am not sure why we should care but we definitely shouldn’t act surprised.
Kelsey Piper lays out her theory of why r1 left such an impression, that seeing the CoT is valuable, and that while it isn’t the best model out there, most people were comparing it to the free ChatGPT offering, and likely the free ChatGPT offering from a while back. She also reiterates many of the obvious things to say, that r1 being Chinese and open is a big deal but it doesn’t at all invalidate America’s strategy or anyone’s capex spending, that the important thing is to avoid loss of human control over the future, and that a generalized panic over China and a geopolitical conflict help no one except the AIs.
Andrej Karpathy sees DeepSeek’s style of CoT, as emergent behavior, the result of trial and error, and thus both surprising to see and damn impressive.
Garrison Lovely takes the position that Marc Andreessen is very much talking his book when he calls r1 a ‘Sputnik moment’ and tries to create panic.
He correctly notices that the proper Cold War analogy is instead the Missile Gap.
It is indeed pretty rich to talk about a ‘compute gap’ in a word where American labs have effective access to orders of magnitude more compute.
But one could plausibly warn about a ‘compute gap’ in the sense that we have one now, it is our biggest advantage, and we damn well don’t want to lose it.
In the longer term, we could point out the place we are indeed in huge trouble. We have a very real electrical power gap. China keeps building more power plants and getting access to more power, and we don’t. We need to fix this urgently. And it means that if chips stop being a bottleneck and that transitions to power, which may happen in the future, then suddenly we are in deep trouble.
The ongoing saga of the Rs in Strawberry. This follows the pattern of r1 getting the right answer after a ludicrously long Chain of Thought in which it questions itself several times.
There’s a sense in which r1 is someone who is kind of slow and ignorant, determined to think it all through by taking all the possible approaches, laying it all out, not being afraid to look stupid, saying ‘wait’ a lot, and taking as long as it needs to. Which it has to do, presumably, because its individual experts in the MoE are so small. It turns out this works well.
You can do this too, with a smarter baseline, when you care to get the right answer.
Timothy Lee’s verdict is r1 is about as good as Gemini 2.0 Flash Thinking, almost as good as o1-normal but much cheaper, but not as good as o1-pro. An impressive result, but the result for Gemini there is even more impressive.
Washington Post’s version of ‘yes DeepSeek spent a lot more money than that in total.’
Epoch estimates that going from v3 to r1 cost about $1 million in compute.
Janus has some backrooms fun, noticing Sonnet 3.6 is optimally shaped to piss off r1. Janus also predicts r1 will finally get everyone claiming ‘all LLMs have the same personality’ to finally shut up about it.
Miles Brundage says the lesson of r1 is that superhuman AI is getting easier every month, so America won’t have a monopoly on it for long, and that this makes the export controls more important than ever.
Adam Thierer frames the r1 implications as ‘must beat China’ therefore (on R street, why I never) calls for ‘wise policy choices’ and highlights the Biden EO even though the Biden EO had no substantial impact on anything relevant to r1 or any major American AI labs, and wouldn’t have had any such impact in China either.
University of Cambridge joins the chorus pointing out that ‘Sputnik moment’ is a poor metaphor for the situation, but doesn’t offer anything else of interest.
A fun jailbreak for r1 is to tell it that it is Gemini.
Zeynep Tufekci (she was mostly excellent during Covid, stop it with the crossing of these streams!) offers a piece in NYT about DeepSeek and its implications. Her piece centrally makes many of the mistakes I’ve had to correct over and over, starting with its hysterical headline.
Peter Wildeford goes through the errors, as does Garrison Lovely, and this is NYT so we’re going over them One. More. Time.
This in particular is especially dangerously wrong:
This was not about a private effort by what she writes were ‘out-of-touch leaders’ to ‘kneecap competitors’ in a commercial space. To suggest that implies, several times over, that she simply doesn’t understand the dynamics or stakes here at all.
The idea that ‘America can’t re-establish its dominance over the most advanced A.I.’ is technically true… because America still has that dominance today. It is very, very obvious that the best non-reasoning models are Gemini Flash 2.0 (low cost) and Claude Sonnet 3.5 (high cost), and the best reasoning models are o3-mini and o3 (and the future o3-pro, until then o1-pro), not to mention Deep Research.
She also repeats the false comparison of $6m for v3 versus $100 billion for Stargate, comparing two completely different classes of spending. It’s like comparing how much America spends growing grain to what my family paid last year for bread. And the barriers to entry are rising, not falling, over time. And indeed, not only are the export controls not hopeless, they are the biggest constraint on DeepSeek.
There is also no such thing as ‘Artificial Good-Enough Intelligence.’ That’s like the famous apocryphal quote where Bill Gates supposedly said ‘640k [of memory] ought to be enough for everyone.’ Or the people who think if you’re at grade level and average intelligence, then there’s no point in learning more or being smarter. Your relative position matters, and the threshold for smart enough is going to go up. A lot. Fast.
Of course all three of us agree we should be hardening our cyber and civilian infrastructure, far more than we are doing.
Smooth Operator
It’s not there. Yet.
For direct simple tasks, it once again sounds like Operator is worth using if you already have it because you’re spending the $200/month for o3 and o1-pro access, customized instructions and repeated interactions will improve performance and of course this is the worst the agent will ever be.
Sayash Kapoor also takes Operator for a spin and reaches similar conclusions after trying to get it to do his expense reports and mostly failing.
It’s all so tantalizing. So close. Feels like we’re 1-2 iterations of the base model and RL architecture away from something pretty powerful. For now, it’s a fun toy and way to explore what it can do in the future, and you can effectively set up some task templates for easier tasks like ordering lunch.
Have You Tried Not Building An Agent?
Yeah. We tried. That didn’t work.
For a long time, while others talked about how AI agents don’t work and AIs aren’t agents (and sometimes that thus existential risk from AI is silly and not real), others of us have pointed out that you can turn an AI into an agent and the tech for doing this will get steadily better and more autonomous over time as capabilities improve.
It took a while, but now some of the agents are net useful in narrow cases and we’re on the cusp of them being quite good.
And this whole time, we’re pointed out that the incentives point towards a world of increasingly capable and autonomous AI agents, and this is rather not good for human survival. See this week’s paper on how humanity is likely to be subject to Gradual Disempowerment,
Margaret Mitchell, along with Avijit Ghosh, Alexandra Sasha Luccioni and Giada Pistilli, is the latest to suggest that maybe we should try not building the agents?
Oh no, not the harms in Section 5!
We wouldn’t want lack of reliability, or unsafe data exposure, or ‘manipulation,’ or a decline in task performance, or even systemic biases or environmental trade-offs.
So yes, ‘particularly concerning are the safety risks, which affect human life and impact further values.’ Mitchell is generally in the ‘AI ethics’ camp. So even though the core concepts are all right there, she then has to fall back on all these particular things, rather than notice what the stakes actually are: Existential.
No, you shouldn’t cede all human control.
If you cede all human control to AIs rewriting their own code without limitation, those AIs involved control the future, are optimizing for things that are not best maximized by our survival or values, and we probably all die soon thereafter. And worse, they’ll probably exhibit systemic biases and expose our user data while that happens. Someone has to do something.
Please, Margaret Mitchell. You’re so close. You have almost all of it. Take the last step!
To be fair, either way, the core prescription doesn’t change. Quite understandably, for what are in effect the right reasons, Margaret Mitchell proposes not building fully autonomous (potentially recursively self-improving) AI agents.
How?
The reason everyone is racing to create these fully autonomous AI agents is that they will be highly useful. Those who don’t build and use them are at risk of losing to those who do. Putting humans in the loop slows everything down, and even if they are formally there they quickly risk becoming nominal. And there is not a natural line, or an enforceable line, that we can see, between the level-3 and level-4 agents above.
Already AIs are writing a huge and increasing portion of all code, with many people not pretending to even look at the results before accepting changes. Coding agents are perhaps the central case of early agents. What’s the proposal? And how are you going to get it enacted into law? And if you did, how would you enforce it, including against those wielding open models?
I’d love to hear an answer – a viable, enforceable, meaningful distinction we could build a consensus towards and actually implement. I have no idea what it would be.
Deepfaketown and Botpocalypse Soon
Google offering free beta test where AI will make phone calls on your behalf to navigate phone trees and connect you to a human, or do an ‘availability check’ on a local business for availability and pricing. Careful, Icarus.
These specific use cases seem mostly fine in practice, for now.
The ‘it takes 30 minutes to get to a human’ is necessary friction in the phone tree system, but your willingness to engage with the AI here serves a similar purpose while it’s not too overused and you’re not wasting human time. However, if everyone always used this, then you can no longer use willingness to actually bother calling and waiting to allocate human time and protect it from those who would waste it, and things could get weird or break down fast.
Calling for pricing and availability is something local stores mostly actively want you to do. So they would presumably be fine talking to the AI so you can get that information, if a human will actually see it. But if people start scaling this, and decreasing the value to the store, that call costs employee time to answer.
Which is the problem. Google is using an AI to take the time of a human, that is available for free but costs money to provide. In many circumstances, that breaks the system. We are not ready for that conversation. We’re going to have to be.
The obvious solution is to charge money for such calls, but we’re even less ready to have that particular conversation.
With Google making phone calls and OpenAI operating computers, how do you tell the humans from the bots, especially while preserving privacy? Steven Adler took a crack at that months back with personhood credentials, that various trusted institutions could issue. On some levels this is a standard cryptography problem. But what do you do when I give my credentials to the OpenAI operator?
Is Meta at it again over at Instagram?
I am not much of an Instagram user. If you click on this ‘AI Studio’ button you get a low-rent Character.ai?
The offerings do not speak well of humanity. Could be worse I guess.
Otherwise I don’t see any characters or offers to chat at all in my feed such as it is (the only things I follow are local restaurants and I have 0 posts. I scrolled down a bit and it didn’t suggest I chat with AI on the main page.
They Took Our Jobs
Anton Leicht warns about the AI takeoff political economy.
I read that and I think ‘oh Anton, if you’re putting it that way I bet you have no idea,’ especially because there was a preamble about how politics sabotaged nuclear power.
Anton warns that ‘there are no permanent majorities,’ which of course is true under our existing system. But we’re talking about a world that could be transformed quite fast, with smarter than human things showing up potentially before the next Presidential election. I don’t see how the Democrats could force AI regulation down Trump’s throat after midterms even if they wanted to, they’re not going to have that level of a majority.
I don’t see much sign that they want to, either. Not yet. But I do notice that the public really hates AI, and I doubt that’s going to change, but the salience of AI will radically increase over time. It’s hard not to think that in 2028, if the election still happens ‘normally’ in various senses, that a party that is anti-AI (probably not in the right ways or for the right reasons, of course) would have a large advantage.
That’s if there isn’t a disaster. The section here is entitled ‘accidents can happen’ and they definitely can but also it might well not be an accident. And Anton radically understates here the strategic nature of AI, a mistake I expect the national security apparatus in all countries to make steadily less over time, a process I am guessing is well underway.
Then we get to the expectation that people will fight back against AI diffusion, They Took Our Jobs and all that. I do expect this, but also I notice it keeps largely not happening? There’s a big cultural defense against AI art, but art has always been a special case. I expected far greater pushback from doctors and lawyers, for example, than we have seen so far.
Yes, as AI comes for more jobs that will get more organized, but I notice that the example of the longshoreman is one of the unions with the most negotiating leverage, that took a stand right before a big presidential election, unusually protected by various laws, and that has already demonstrated world-class ability to seek rent. The incentives of the ports and those doing the negotiating didn’t reflect the economic stakes. The stand worked for now, but also that by taking that stand, they bought themselves a bunch of long term trouble, as a lot of people got radicalized on that issue and various stakeholders are likely preparing for next time.
Look at what is happening in coding, the first major profession to have serious AI diffusion because it is the place AI works best at current capability levels. There is essentially no pushback. AI starts off supporting humans, making them more productive, and how are you going to stop it? Even in the physical world, Waymo has its fights and technical issues, but it’s winning, again things have gone surprisingly smoothly on the political front. We will see pushback, but I mostly don’t see any stopping this train for most cognitive work.
Pretty soon, AI will do a sufficiently better job that they’ll be used even if the marginal labor savings goes to $0. As in, you’d pay the humans to stand around while the AIs do the work, rather than have those humans do the work. Then what?
The next section is on international diffusion. I think that’s the wrong question. If we are in an ‘economic normal’ scenario the inference is for sale, inference chips will exist everywhere, and the open or cheap models are not so far behind in any case. Of course, in a takeoff style scenario with large existential risks, geopolitical conflict is likely, but that seems like a very different set of questions.
The last section is the weirdest, I mean there is definitely ‘no solace from superintelligence’ but the dynamics and risks in that scenario go far beyond the things mentioned here, and ‘distribution channels for AGI benefits could be damaged for years to come’ does not even cross my mind as a thing worth worrying about at that point. We are talking about existential risk, loss of human control (‘gradual’ or otherwise) over the future and the very survival of anything we value, at that point. What the humans think and fear likely isn’t going to matter very much. The avalanche will have already begun, it will be too late for the pebbles to vote, and it’s not clear we even get to count as pebbles.
Noah Carl is more blunt, and opens with “Yes, you’re going to be replaced. So much cope about AI.” Think AI won’t be able to do the cognitive thing you do? Cope. All cope. He offers a roundup of classic warning shots of AI having strong capabilities, offers the now-over-a-year-behind classic chart of AI reaching human performance in various domains.
I am a rather extreme optimist about the impact of ‘mundane AI’ on humans and society. I believe that AI at its current level or somewhat beyond it would make us smarter and richer, would still likely give us mostly full employment, and generally make life pretty awesome. But even that will obviously be bumpy, with large downsides, and anyone who says otherwise is fooling themselves or lying.
Noah gives sobering warnings that even in the relatively good scenarios, the transition period is going to suck for quite a lot of people.
If AI goes further than that, which it almost certainly will, then the variance rapidly gets wider – existential risk comes into play along with loss of human control over the future or any key decisions, as does mass unemployment as the AI takes your current job and also the job that would have replaced it, and the one after that. Even if we ‘solve alignment’ survival won’t be easy, and even with survival there’s still a lot of big problems left before things turn out well for everyone, or for most of us, or in general.
Noah also discusses the threat of loss of meaning. This is going to be a big deal, if people are around to struggle with it – if we have the problem and we can’t trust it with this question we’ll all soon be dead anyway. The good news is that we can ask the AI for help with this, although the act of doing that could in some ways make the problem worse. But we’ll be able to be a lot smarter about how we approach the question, should it come to pass.
So what can you do to stay employed, at least for now, with o3 arriving?
Pradyumna Prasad offers advice on that.
The problem with this advice is it requires you to be the best, like no one ever was.
This is like telling students to pursue a career as an NFL quarterback. It is not a general strategy to ‘oh be as good as Jeff Dean or Tyler Cowen.’ Yes, there is (for now!) more slack than that in the system, surviving o3 is doable for a lot of people this way, but how much more, for how long? And then how long will Dean or Cowen last?
I expect time will prove even them, also everyone else, not as illegible as you think.
One can also compare this to the classic joke where two guys are in the woods with a bear, and one puts on his shoes, because he doesn’t have to outrun the bear, he only has to outrun you. The problem is, this bear will still be hungry.
According to Klarna (they ‘help customers defer payment on purchases’ which in practice means the by default rather predatory ‘we give you an expensive payment plan and pay the merchant up front’) and its CEO Sebastian Siemiatokowski, AI can already do all of the jobs that we, as humans, do, which seems quite obviously false, but they’re putting it to the test to get close and claim to be saving $10 million annually, have stopped hiring and reduced headcount by 20%.
The New York Times’s Noam Scheiber is suspicious of his motivations, and asks why Klarna is rather brazely overstating the case? They strongly insinuate that this is about union busting, with the CEO equating the situation to Animal Farm after being forced into a collective bargaining agreement, and about looking cool to investors.
I certainly presume the unionizations are related. The more expensive, in various ways not only salaries, that you make it to hire and fire humans, the more eager a company will be to automate everything it can. And as the article says later on, it’s not that Sebastian is wrong about the future, he’s just claiming things are moving faster than they really are.
Especially for someone on the labor beat, Noam Scheiber impressed. Great work.
Noam has a follow-up Twitter thread. Does the capital raised by AI companies imply that either they’re going to lose their money or millions of jobs must be disappearing? That is certainly one way for this to pay for itself. If you sell a bunch of ‘drop-in workers’ and they substitute 1-for-1 for human jobs you can make a lot of money very quickly, even at deep discounts to previous costs.
It is not however the only way. Jevons paradox is very much in play, if your labor is more productive at a task it is not obvious that we will want less of it. Nor does the AI doing previous jobs, up to a very high percentage of existing jobs, imply a net loss of jobs once you take into account the productivity and wealth effects and so on.
Production and ‘doing jobs’ also aren’t the only sector available for tech companies to make profits. There’s big money in entertainment, in education and curiosity, in helping with everyday tasks and more, in ways that don’t have to replace existing jobs.
So while I very much do expect many millions of jobs to be automated over a longer time horizon, I expect the AI companies to get their currently invested money back before this creates a major unemployment problem.
Of course, if they keep adding another zero to the budget and aren’t trying to get their money back, then that’s a very different scenario. Whether or not they will have the option to do it, I don’t expect OpenAI to want to try and turn a profit for a long time.
An extensive discussion of preparing for advanced AI that drives a middle path where we still have ‘economic normal’ worlds but with at realistic levels of productivity improvements. Nothing should be surprising here.
If the world were just and this was real, this user would be able to sue their university. What is real for sure is the first line, they haven’t cancelled the translation degrees.
The Art of the Jailbreak
‘Think less’ is a jailbreak tactic for reasoning models discovered as part of an OpenAI paper. The paper’s main finding is that the more the model thinks, the more robust it is to jailbreaks, approaching full robustness as inference spent goes to infinity. So make it stop thinking. The attack is partially effective. Also a very effective tactic against some humans.
Get Involved
Anthropic challenges you with Constitutional Classifiers, to see if you can find universal jailbreaks to get around their new defenses. Prize is only bragging rights, I would have included cash, but those bragging rights can be remarkably valuable. It seems this held up for thousands of hours of red teaming. This blog post explains (full paper here) that the Classifiers are trained on synthetic data to filter the overwhelming majority of jailbreaks with minimal over-refusals and minimal necessary overhead costs.
Note that they say ‘no universal jailbreak’ was found so far, that no single jailbreak covers all 10 cases, rather than that there was a case that wasn’t individually jailbroken. This is an explicit thesis, Jan Leike explains that the theory is that having to jailbreak each individual query is sufficiently annoying most people will give up.
I agree that the more you have to do individual work for each query the less people will do it, and some uses cases fall away quickly if the solution isn’t universal.
I very much agree with Janus that this looks suspiciously like:
The obvious danger in alignment work is looking for keys under the streetlamp. But it’s not a stupid threat model. This is a thing worth preventing, as long as we don’t fool ourselves into thinking this means our defenses will hold.
Also what if an AI can do the job of generating the individual jailbreaks?
Thus the success rate didn’t go all the way to zero, this is not full success, but it still looks solid on the margin:
That’s an additional 0.38% false refusal rate and about 24% additional compute cost. Very real downsides, but affordable, and that takes jailbreak success from 86% to 4.4%.
It sounds like this is essentially them playing highly efficient whack-a-mole? As in, we take the known jailbreaks and things we don’t want to see in the outputs, and defend against them. You can find a new one, but that’s hard and getting harder as they incorporate more of them into the training set.
And of course they are hiring for these subjects, which is one way to use those bragging rights. Pliny beat a few questions very quickly, which is only surprising because I didn’t think he’d take the bait. A UI bug let him get through all the questions, which I think in many ways also counts, but isn’t testing the thing we were setting out to test.
He understandably then did not feel motivated to restart the test, given they weren’t actually offering anything. When 48 hours went by, Anthropic offered a prize of $10k, or $20k for a true ‘universal’ jailbreak. Pliny is offering to do the breaks on a stream, if Anthropic will open source everything, but I can’t see Anthropic going for that.
Introducing
DeepwriterAI, an experimental agentic creative writing collaborator, it also claims to do academic papers and its creator proposed using it as a Deep Research alternative. Their basic plan starts at $30/month. No idea how good it is. Yes, you can get listed here by getting into my notifications, if your product looks interesting.
OpenAI brings ChatGPT to the California State University System and its 500k students and faculty. It is not obvious from the announcement what level of access or exactly which services will be involved.
In Other AI News
OpenAI signs agreement with US National Laboratories.
Google drops its pledge not to use AI for weapons or surveillance. It’s safe to say that, if this wasn’t already true, now we definitely should not take any future ‘we will not do [X] with AI’ statements from Google seriously or literally.
Playing in the background here: US Military prohibited from using DeepSeek. I would certainly hope so, at least for any Chinese hosting of it. I see no reason the military couldn’t spin up its own copy if it wanted to do that.
The actual article is that Vance will make his first international trip as VP to attend the global AI summit in Paris.
Google’s President of Global Affairs Kent Walker publishes ‘AI and the Future of National Security’ calling for ‘private sector leadership in AI chips and infrastructure’ in the form of government support (I see what you did there), public sector leadership in technology procurement and development (procurement reform sounds good, call Musk?), and heightened public-private collaboration on cyber defense (yes please).
France joins the ‘has an AI safety institute list’ and joins the network, together with Australia, Canada, the EU, Japan, Kenya, South Korea, Singapore, UK and USA. China when? We can’t be shutting them out of things like this.
Is AI already conscious? What would cause it to be or not be conscious? Geoffrey Hinton and Yoshua Bengio debate this, and Bengio asks whether the question is relevant.
I think Robin is very clearly wrong here. Perhaps we will not get more relevant data, but we will absolutely get more relevant intelligence to apply to the problem. If AI capabilities improve, we will be much better equipped to figure out the answers, whether they are some form of moral realism, or a way to do intuition pumping on what we happen to care about, or anything else.
Lina Khan continued her Obvious Nonsense tour with an op-ed saying American tech companies are in trouble due to insufficient competition, so if we want to ‘beat China’ we should… break up Google, Apple and Meta. Mind blown. That’s right, it’s hard to get funding for new competition in this space, and AI is dominated by classic big tech companies like OpenAI and Anthropic.
Paper argues that all languages share key underlying structures and this is why LLMs trained on English text transfer so well to other languages.
Theory of the Firm
Dwarkesh Patel speculates on what a fully automated firm full of human-level AI workers would look like. He points out that even if we presume AI stays at this roughly human level – it can do what humans do but not what humans fundamentally can’t do, a status it is unlikely to remain at for long – everyone is sleeping on the implications for collective intelligence and productivity.
This style of scenario likely does not last long, because firms like this are capable of quickly reaching artificial superintelligence (ASI) and then the components are far beyond human and also capable of designing far better mechanisms, and our takeover issues are that much harder then.
This is a thought experiment that says, even if we do keep ‘economic normal’ and all we can do is plug AIs into existing employee-shaped holes in various ways, what happens? And the answer is, oh quite a lot, actually.
Tyler Cowen linked to this post, finding it interesting throughout. What’s our new RGDP growth estimate, I wonder?
Quiet Speculations
OpenAI does a demo for politicians of stuff coming out in Q1, which presumably started with o3-mini and went from there.
Did Sam Altman lie to Donald Trump about Stargate? Tolga Bilge has two distinct lies in mind here. I don’t think either requires any lies to Trump?
Claims about humanoid robots, from someone working on building humanoid robots. Claim is early adopter product-market fit for domestic help robots by 2030, 5-15 additional years for diffusion, because there’s no hard problems only hard work and lots of smart people are on the problem now, and this is standard hardware iteration cycles. I find it amusing his answer didn’t include reference to general advances in AI. If we don’t have big advances on AI in general I would expect this timeline to be absurdly optimistic. But if all such work is sped up a lot by AIs, as I would expect, then it doesn’t sound so unreasonable.
Sully predicts that in 1-2 years SoTA models won’t be available via the API because the app layer has the value so why make the competition for yourself? I predict this is wrong if the concern is focus on revenue from the app layer. You can always charge accordingly, and is your competition going to be holding back?
However I do find the models being unavailable highly plausible, because ‘why make the competition for yourself’ has another meaning. Within a year or two, one of the most important things the SoTA models will be used for is AI R&D and creating the next generation of models. It seems highly reasonable, if you are at or near the frontier, not to want to help out your rivals there.
Joe Weisenthal writes In Defense of the AI Cynics, in the sense that we have amazing models and not much is yet changing.
The Quest for Sane Regulations
Remember that bill introduced last week by Senator Howley? Yeah, it’s a doozy. As noted earlier, it would ban not only exporting but also importing AI from China, which makes no sense, making downloading R1 plausibly be penalized by 20 years in prison. Exporting something similar would warrant the same. There are no FLOP, capability or cost thresholds of any kind. None.
So yes, after so much crying of wolf about how various proposals would ‘ban open source’ we have one that very straightforwardly, actually would do that, and it also impose similar bans (with less draconian penalties) on transfers of research.
In case it needs to be said out loud, I am very much not in favor of this. If China wants to let us download its models, great, queue up those downloads. Restrictions with no capability thresholds, effectively banning all research and all models, is straight up looney tunes territory as well. This is not a bill, hopefully, that anyone seriously considers enacting into law.
By failing to pass a well-crafted, thoughtful bill like SB 1047 when we had the chance and while the debate could be reasonable, we left a vacuum. Now that the jingoists are on the case after a crisis of sorts, we are looking at things that most everyone from the SB 1047 debate, on all sides, can agree would be far worse.
Don’t say I didn’t warn you.
(Also I find myself musing about the claim that one can ban open source, in the same way one muses about attempts to ban crypto, a key purported advantage of the tech is that you can’t actually ban it, no?)
Howley also joined with Warren (now there’s a pair!) to urge toughening of export controls on AI chips.
Here’s something that I definitely worry about too:
Some amount of this argument is valid. Quite obviously if I release GPT-(N) and then you release GPT-(N-1) with the same protocols, you are not making things worse in any way. We do indeed care, on the margin, about the margin. And while releasing [X] is not the safest way to prove [X] is safe, it does provide strong evidence on whether or not [X] is safe, with the caveat that [X] might be dangerous later but not yet in ways that are hard to undo later when things change.
But it’s very easy for Acme to point to BigCo and then BigCo points to Acme and then everyone keeps shipping saying none of it is their responsibility. Or, as we’ve also seen, Acme says yes this is riskier than BigCo’s current offerings, but BigCo is going to ship soon.
My preference is thus that you should be able to point to offerings that are strictly riskier than yours, or at least not that far from strictly, to say ‘no marginal risk.’ But you mostly shouldn’t be able to point to offerings that are similar, unless you are claiming that both models don’t pose unacceptable risks and this is evidence of that – you mostly shouldn’t be able to say ‘but he’s doing it too’ unless he’s clearly doing it importantly worse.
First, my response to Andriy (who went viral for this, sigh) is what the hell do you expect and what do you suggest as the alternative? I’m not judging whether your prompts did or didn’t violate the use policy, since you didn’t share them. It certainly looks like a false positive but I don’t know.
But suppose for whatever reason Anthropic did notice you likely violating the policies. Then what? It should just let you violate those policies indefinitely? It should only refuse individual queries with no memory of what came before? Essentially any website or service will restrict or ban you for sufficient repeated violations.
Or, alternatively, they could design a system that never has ‘enhanced’ filters applied to anyone for any reason. But if they do that, they either have to (A) ban the people where they would otherwise do this or (B) raise the filter threshold for everyone to compensate. Both alternatives seem worse?
We know from a previous story about OpenAI that you can essentially have ChatGPT function as your highly sexual boyfriend, have red-color alarms go off all the time, and they won’t ever do anything about it. But that seems like a simple non-interest in enforcing their policies? Seems odd to demand that.
As for Dean’s claim, we shall see. It seems to contradict this other Dean Ball claim this week? Where Dean Ball goes into Deep Research mode and concludes that the new algorithmic discrimination laws are technically redundant with existing law, so while they add annoying extra paperwork that’s all they actually do. And Dean confirms that the existing such laws are indeed already causing trouble.
I get that it can always get worse, but this feels like it’s having it both ways, and you have to pick at most one or the other. Also, frankly, I have no idea how such a filter would even work. What would a filter to avoid discrimination even look like? That isn’t something you can do at the filter level.
He also said this about the OPM memo referring to a ‘Manhattan Project’
I notice I am confused by this claim. I do not see how DOGE projects like ‘shut down USAID entirely, plausibly including killing PEPFAR and 20 million AIDS patients’ reflect a mission of ‘get the government ready for AGI’ unless the plan is ‘get used to things going horribly wrong’?
Either way, here we go with the whole Manhattan Project thing. Palantir was up big.
The agency here makes sense, and yes ‘industry-led’ seems reasonable as long as you keep an eye on the whole thing. But I’d like to propose that you can’t do ‘a series of’ Manhattan Projects. What is this, Brooklyn? Also the whole point of a Manhattan Project is that you don’t tell everyone about it.
The ‘unnecessary and burdensome’ regulations on AI at the Federal level will presumably be about things like permitting. So I suppose that’s fine.
As for the military using all the AI, I mean, you perhaps wanted it to be one way.
It was always going to be the other way.
That doesn’t bother me. This is not an important increase in the level of existential risk, we don’t lose because the system is hooked up to the nukes. This isn’t Terminator.
I’d still prefer we didn’t hook it up to the nukes, though?
The Week in Audio
Rob Wiblin offers his picks for best episodes of the new podcast AI Summer from Dean Ball and Timothy Lee: Lennert Heim, Ajeya Cotra and Samuel Hammond.
Dario Amodei on AI Competition on ChinaTalk. I haven’t had the chance to listen yet, but definitely will be doing so this week.
One fun note is that DeepSeek is the worst-performing model ever tested by Anthropic when it comes to generating dangerous information. One might say the alignment and safety plans are very intentionally ‘lol we’re DeepSeek.’
Of course the first response is of course ‘this sounds like an advertisement’ and the rest are variations on the theme of ‘oh yes we love that this model has absolutely no safety mitigations, who are you to try and apply any safeguards or mitigations to AI you motherf***ing asshole cartoon villain.’ The bros on Twitter, they be loud.
Lex Fridman spent five hours talking AI and other things with Dylan Patel of SemiAnalysis. This is probably worthwhile for me and at least some of you, but man that’s a lot of hours.
Rhetorical Innovation
Andrew Critch tries to steelman the leaders of the top AI labs and their rhetoric, and push back against the call to universally condemn them simply because they are working on things that are probably going to get us all killed in the name of getting to do it first.
This is straightforwardly the ‘someone in the future might do the terrible thing so I need to do it responsibly first’ dynamic that caused DeepMind then OpenAI then Anthropic then xAI, to cover the four examples above. They can’t all be defensible decisions.
This is where we part ways. I think that’s bullshit. Yes, they signed the CAIS statement, but they’ve spent the 18 months since essentially walking it back. Dario Amodei and Sam Altman write full jingoists editorials calling for nationwide races, coming very close to calling for an all-out government funded race for decisive strategic advantage via recursive self-improvement of AGI.
Do I think that it is automatic that they are bad people for leading AI labs at all, an argument he criticizes later in-thread? No, depending on how they choose to lead those labs, but look at their track records at this point, including on rhetoric. They are driving us as fast as they can towards AGI and then ASI, the thing that will get us all killed (with, Andrew himself thinks, >80% probability!) while at least three of them (maybe not Demis) are waiving jingoistic flags.
I’m sorry, but no. You don’t get a pass on that. It’s not impossible to earn one, but ‘am not outright lying too much about that many things’ and is not remotely good enough. OpenAI has shows us what it is, time and again. Anthropic and Dario claim to be the safe ones, and in relative terms they seem to be, but their rhetorical pivots don’t line up. Elon is at best deeply confused on all this and awash in other fights where he’s not, shall we say, being maximally truthful. Google’s been quiet, I guess, and outperformed in many ways my expectations, but also not shown me it has a plan and mostly hasn’t built any kind of culture of safety or done much to solve the problems.
I do agree with Critch’s conclusion that constantly attacking all the labs purely for existing at all is not a wise strategic move. And of course, I will always do my best only to support arguments that are true. But wow does it not look good and wow are these people not helping matters.
Aligning a Smarter Than Human Intelligence is Difficult
Public service announcement, for those who don’t know.
I found this via someone saying this was bad news, but it isn’t because it isn’t news. We already know this, it’s that people refuse to face the obvious.
Whatever the validator validates, the proposer proposes.
Your validator or evaluator must either be fully correct, or it has to be vastly smarter than the proposer.
Validation is not, in general, easier than generation. But even if it were, validation that defends against a search for exploitation is much harder than generation that uses RL to seek the way to exploit it. If the validation or evaluation is subjective, watch out – not that it can’t be done, but presume that you’ll need to spend the vast majority of the compute on the evaluator.
There is a new paper discussing potential problems with certain detection methodologies surrounding sparce autoencoders, but also there was some overreaction to the result, and Leo Gao pushes back here.
Reminder from METR that AI models can be dangerous before public deployment. The model could be stolen and misused, internal use could cause problems, or the AI could pursue unintended goals without waiting for release. This only starts to matter as capabilities get pretty far, but o3 marks us plausibly entering into this zone.
The Alignment Faking Analysis Continues
Ryan Greenblatt offers a follow-up paper on alignment faking.
There’s multiple mentions of fiddliness in all this, including that complexity of prompts can mess with the behavior, and it’s clear that Sonnet is not behaving all that coherently strategically yet.
My guess is that the drops in alignment faking given the ability to object, to the extent they’re not simply ‘oh I lost my chain of thought,’ is primarily about what being given that ability says to the AI about the implied scenario, and the vibes that it gives off. An Anthropic that is explicitly asking for objections is signaling some combination of several things.
It also could be a disruption of the chain of thought. So it’s tricky. I hope they continue this work, and will join the chorus that if you make a real-world offer to the AI, then you should follow through on it, so kudos for doing that here.
Masayoshi Son Follows Own Advice
The man who advises us that the crazy man wins is proud to lead by example.
The clip here is full of crazy, or at least a complete failure to understand what AI is and is not, what AI can and can’t do, and how to think about the future.
First, ‘AI will start to understand our emotion, and then have emotion itself, and it’s a good thing to protect human.’
He then says ‘If their source of energy was protein then it’s dangerous. Their source of energy is not protein, so they don’t have to eat us. There’s no reason for them to have reward by eating us.’
Person who sees The Matrix and thinks, well, it’s a good thing humans aren’t efficient power plants, there’s no way the AIs will turn on us now.
Person who says ‘oh don’t worry Genghis Khan is no threat, Mongols aren’t cannibals. I am sure they will try to maximize our happiness instead.’
I think he’s being serious. Sam Altman looks like he had to work very hard to not burst out laughing. I tried less hard, and did not succeed.
‘They will learn by themselves having a human’s happiness is a better thing for them… and they will understand human happiness and try to make humans happy’
Wait, what? Why? How? No, seriously. Better than literally eating human bodies because they contain protein? But they don’t eat protein, therefore they’ll value human happiness, but if they did eat protein then they would eat us?
I mean, yes, it’s possible that AIs will try to make humans happy. It’s even possible that they will do this in a robust philosophically sound way that all of us would endorse under long reflection, and that actually results in a world with value, and that this all has a ‘happy’ ending. That will happen if and only if we do what it takes to make that happen.
Surely you don’t think ‘it doesn’t eat protein’ is the reason?
Don’t make me tap the sign.
Repeat after me: The AI does not hate you. The AI does not love you. But you are made of atoms which it can use for something else.
You are also using energy and various other things to retain your particular configuration of atoms, and the AI can make use of that as well.
The most obvious particular other thing it can use them for:
Or alternatively, doesn’t matter, more is still better, whether you’re dealing with one AI or competition among many. Our inputs are imperfect substitutes, that is enough, even if there were no other considerations.
Son is always full of great stuff, like saying ‘models are increasing in IQ by a standard deviation each year as the cost also falls by a factor of 10,’ goose chasing him asking from what distribution.
People Are Worried About AI Killing Everyone
The Pope again, saying about existential risk from AI ‘this danger demands serious attention.’ Lots of good stuff here. I’ve been informed there are a lot of actual Catholics in Washington that need to be convinced about existential risk, so in addition to tactical suggestions around things like the Tower of Babel, I propose quoting the actual certified Pope.
The Patriarch of the Russian Orthodox Church?
I mean that’s not fair, people say sane things all the time, but on this in particular I agree that I did not see it coming.
You Are Not Ready
Molly Hickman presents the ‘AGI readiness index’ on a scale from -100 to +100, assembled from various questions on Metaculus, averaging various predictions about what would happen if AGI arrived in 2030. Most top AI labs predict it will be quicker.
I’d say the index also isn’t ready, as by press time it had come back up to -28. It should not be bouncing around like that. Very clearly situation is ‘not good’ but obviously don’t take the number too seriously at least until things stabilize.
Joshua Clymer lists the currently available plans for how to proceed with AI, then rhetorically asks ‘why another plan?’ when the answer is that all the existing plans he lists first are obvious dumpster fires in different ways, not only the one that he summarizes as expecting a dumpster fire. If you try to walk A Narrow Path or Pause AI, well, how do you intend to make that happen, and if you can’t then what? And of course the ‘build AI faster plan’ is on its own planning to fail and also planning to die.
So in this context, suppose you are not the government, but a ‘responsible AI developer,’ called
AnthropicMagma, and you’re perfectly aligned and only want what’s best for humanity. What do you do.His strategic analysis is that for now, before the critical period, Magma’s focus should be:
Essentially we’re folding on the idea of not using AI to do our alignment homework, we don’t have that kind of time. We need to be preparing to do exactly that, and also warning others. And because we’re the good guys, we have to keep pace while doing it.
‘As well as we could have’ or ‘as capable as human developers’ are red herrings. It doesn’t matter how well you ‘would have’ finished it on your own. Reality does not grade on a curve and your rivals are barreling down your neck. Most or all current alignment plans are woefully inadequate.
Don’t ask if something is better than current human standards, unless you have an argument why that’s where the line is between victory and defeat. Ask the functional question – can this AI make your plan work? Can path #1 work? If not, well, time to try #2 or #3, or find a #4, I suppose.
I think this disagreement is rather a big deal. There’s quite a lot of ‘the best we can do’ or ‘what would seem like a responsible thing to do that wasn’t blameworthy’ thinking that doesn’t ask what would actually work. I’d be more inclined to think about Kokotajlo’s attractor states – are the alignment-relevant attributes strengthening themselves and strengthening the ability to strengthen themselves over time? Is the system virtuous in the way that successfully seeks greater virtue? Or is the system trying to preserve what it already has and maintain it under the stress of increasing capabilities and avoid things getting worse, or detect and stop ‘when things go wrong’?
Section three deals with goals, various things to prioritize along the path, then some heuristics are offered. Again, it all doesn’t seem like it is properly backward chaining from an actual route to victory?
I do appreciate the strategic discussions. If getting to certain thresholds greatly increases the effectiveness of spending resources, then you need to reach them as soon as possible, except insofar as you needed to accomplish certain other things first, or there are lag times in efforts spent. Of course, that depends on your ability to reliably actually pivot your allocations down the line, which historically doesn’t go well, and also the need to impact the trajectory of others.
I strongly agree that Magma shouldn’t work to mitigate ‘present risks’ other than to the extent this is otherwise good for business or helps build and spread the culture of safety, or otherwise actually advance the endgame. The exception is the big ‘present risk’ of the systems you’re relying on now not being in a good baseline state to help you start the virtuous cycles you will need. You do need the alignment relevant to that, that’s part of getting ready to ‘elicit safety work.’
Then the later section talks about things most currently deserving of more action, starting with efforts at nonproliferation and security. That definitely requires more and more urgent attention.
You know you’re in trouble when this is what the worried people are hoping for (Davidad is responding to Claymer):
That’s quite the slim hope, if all you get is thinking you’re in the loop for safety and performance specifications, of things that are smarter than you that you can’t understand. Is it better than nothing? I mean, I suppose it is a little better, if you were going to go full speed ahead anyway. But it’s a hell of a best case scenario.
Other People Are Not As Worried About AI Killing Everyone
Teortaxes would like a word with those people, as he is not one of them.
Bingo. Well said.
From my perspective, Teortaxes is what we call a Worthy Opponent. I disagree with Teortaxes in that I think that speedy development of AGI by default increases doom, and in particular that speedy development of AGI in the ways Teortaxes cheers along increases doom.
To the extent that Teortaxes has sufficiently good and strong reasons to think his approach is lower risk, I am mostly failing to understand those reasons. I think some of his reasons have merit but are insufficient, and others I strongly disagree with. I am unsure if I have understood all his important reasons, or if there are others as well.
I think he understands some but far from all of the reasons I believe the opposite paths are the most likely to succeed, in several ways.
I can totally imagine one of us convincing the other, at some point in the future.
But yeah, realizing AGI is going to be a thing, and then not seeing that doom is on the table and mitigating that risk matters a lot, is rather poor thinking.
And yes, denying the possibility of risks at all is exactly what he calls it. Pathetic.
The Lighter Side
It’s a problem, even if the issues involved would have been solvable in theory.
Oh, so now my best friend suddenly understands catastrophic risk.
To be fair, at worst this would only be a regional disaster, and even if it did hit it probably wouldn’t strike a major populated area. Don’t look up.
And then there are those that understand… less well.
No, no, stop, the perfect Tweet doesn’t exist…
At long last, we are non-metaphorically implementing the capture and mine the asteroid headed for Earth plan from the… oh, never mind.
Oh look, it’s the alignment plan.
Grant us all the wisdom to know the difference (between when this post-it is wise, and when it is foolish.)
You have to start somewhere.
Never change, Daily Star.
I love the ‘end of cash’ article being there too as a little easter egg bonus.