That’s not actually true. But it is true to a rather frustrating degree, for those of us who need to be in the thick of it all the time. It is especially true if someone says the word ‘pause,’ whether or not they would actually support one. Play it again, Sam.
Meanwhile, you know what changes a lot? Actual AI capabilities. Also, war.
In any case, here’s the policy, discourse and alignment side of the week that was.
After OpenAI successfully stole most of the company from its nonprofit, OpenAI’s nonprofit is still left with quite a lot of money. What are they doing with it?
OpenAI Foundation: This work is just beginning. Over the next year, as we quickly ramp up, the Foundation expects to invest at least $1 billion across life sciences and curing diseases, jobs and economic impact, AI resilience, and community programs. This includes early investments toward our previously announced $25 billion commitment to curing diseases and AI resilience.
David Krueger: Wojciech is one of the various people who called me crazy for being worried about x-risk back “before it was cool”; ICML 2016, New York.
The people left at OpenAI are not the ones who are worried about safety.
Wojciech Zaremba: AI is clearly becoming very advanced and x-risks have to be taken seriously. Tons of important work to be done!
Health is a good cause, and I have hope that within the cause area they will execute well, but it is not the cause for which this foundation exists. You don’t ensure humanity survives AGI, or benefits from AGI, by trying to cure Alzheimer’s.
I’m not saying don’t cure Alzheimer’s, but I am saying You Had One Job, it was rather an important job, and there are a lot of foundations that can work on curing diseases, on public data for health or on accelerating progress on high-mortality and high-burden diseases.
If the OpenAI foundation was spending a lot on AI Safety, and was then also spending this billion on health, I would think that was great. But that’s not how this shapes up.
They then mention jobs and economic impact, and ‘AI resilience’ which they divide into AI impact on children & youth, biosecurity and finally we get AI model safety.
Dean W. Ball: Great news! If this foundation does even a B/B+ job and can avoid being co-opted by the various governments across America and the world that by default will try to plunder its wealth, it will probably do more good than all government action on AI put together.
Well, it’s already had most of its wealth plundered by OpenAI, and it’s already redirecting its remaining wealth mostly to sounds-good generic causes with ‘AI’ stapled onto them. So the track record does not look great so far.
If you read Sam Altman’s announcement post, you see that same old generic ‘AI will bring challenges and complex emergent effects’ language, whereas no, what AI will bring is the risk that it will kill everyone via various paths but you cannot say that out loud even in the context of the foundation’s work.
Sam Altman (CEO OpenAI): AI will help discover new science, such as cures for diseases, which is perhaps the most important way to increase quality of life long-term.
AI will also present new threats to society that we have to address. No company can sufficiently mitigate these on their own; we will need a society-wide response to things like novel bio threats, a massive and fast change to the economy, extremely capable models causing complex emergent effects across society, and more.
These are the areas the OpenAI Foundation will initially focus on, and in my opinion are some of the most important ones for us to get right. The Foundation will spend at least $1 billion over the next year.
That’s a lot of words to say essentially nothing.
It’s also not spending that much money relative to its total assets. Yes, we all know that a billion dollars, unlike a million dollars, is cool. But the foundation, even after it was robbed of the majority of its value, is still worth over $100 billion, so this is promising to spend less than 1% of its value in 2026, mostly on causes unrelated to an impending singularity or the associated risks, whereas OpenAI is aiming to automate AI R&D, and kick off a singularity, in 2028.
Their offer is, once again, mostly nothing.
Kelsey Piper is saying this is good, actually, and the explanation is that she expected an offer of actual nothing, so mostly nothing is an improvement.
Kelsey Piper: I guess I never expected them to fund a penny of safety research but this feels too meta or something to me. I dislike the idea that people doing things that are good is *worse* than if they were doing useless things because it ‘makes it harder to call them out’
Curing diseases is really good! the AI resilience stuff may or may not be good, and can be criticized inasmuch as it is bad. (registering a prediction that it will be not only useless but actively counterproductive/harmful is perfectly valid, though).
Oliver Habryka: I mean, what did you think that opposition-capture would look like? Did you expect the labs would not offer people genuinely good things and do genuinely good things in exchange for immunity from critics?
Like, this is just obviously the level Sam Altman plays at, and it has worked quite well so far. I don’t think I am being too paranoid in thinking that this is a non-trivial fraction of what’s going on.
I agree it’s important to recognize good things as good, and I am not downplaying that here! But in this case the downside of this seeming to me like effective opposition-capture seems just much bigger.
If Sam Altman announced that OpenAI was funding these operations out of its own pocket, in addition to all other efforts, or some other billionaire announced this as a new project, then I would say yes, this is a great thing. Good things are good.
But that’s not what the OpenAI nonprofit was for, nor does the size here reflect the assets and capabilities of the nonprofit. Standard procedure is a legal minimum of spending 5% of assets per year, and if you think superintelligence is a few years away you’ll want to be picking up that pace quite a lot.
David Manheim (Home): The @OpenAI Foundation promised State AGs and the public that in exchange for becoming a for-profit, they would spend 27% of the company’s value charitably.
Most of the foundation’s assets, it seems, have effectively been co-opted and effectively stolen a second time, to be used on things that look good and indeed are good, but that mostly don’t help humanity transition to AGI without us all ceasing to exist, and mostly the assets are not being spent until it is too late to matter. For shame.
Congress Exists
Senator Elissa Slotkin introduces the AI Guardrails Act, to address exactly the issues raised in the clash between Anthropic and DoW, in exactly the correct way to address it, which is that you pass a law setting down rules, in particular to mandate that:
There must always be a human in the kill chain of nuclear weapons.
It codifies DoD Directive 3000.09, and bars DoD from using autonomous weapon systems to employ lethal force without “appropriate levels of human judgment and supervision.”
But this can be waived for up o a year at a time by the Secretary of War if ‘extraordinary national security circumstances’ require it. At this point, we should assume that such emergency powers will be used in the regular course of government, with or without a plausible emergency.
DoW is barred from using AI for domestic monitoring, tracking, profiling, or targeting of people or groups in the United States without an “individualized, articulable legal basis,” regardless of where the data came from.
The ban on nuclear weapon launches looks ironclad, and hopefully that’s one place that it is never relevant since you wouldn’t want to do that anyway. There’s still nothing stopping the humans from essentially listening to AI advice, and that is a highly plausible future scenario.
The other rules look a lot easier to get around.
The core prohibition on surveillance is only on monitoring, tracking, profiling, or targeting people in the United States “without an individualized, articulable legal basis.” One should assume that courts and lawyers will interpret such provisions perversely to favor permissiveness. Similarly, the first amendment protections include ‘solely for the purpose of’ which is the kind of thing very easy to drive a truck through.
For autonomous weapons, DoD would simply declare their levels of judgment ‘appropriate,’ including saying none is an appropriate level, or they can use the waiver. To justify use, the error rate for AI merely has to be as low as the error rate for humans, which is not that high a bar.
I am not a lawyer let alone a national security lawyer, so I am not the right person to suggest alternative legal language. I am confident that, while helpful on the margin, this is not it, which is fine for a mostly symbolic first attempt but not for a real law.
The problem is that you can’t have it both ways, in that you have to pick one:
EITHER the law actually binds the government for real even when expensive…
OR it doesn’t, and the government can do what it wants.
The Constitution, and our entire system of government, is based on door number one. The government often runs up against the First, Second, Fourth and Fifth Amendments, among many other constraints, and finds this super annoying. Good.
China Self-Owns
Meta tried to pay a large amount of money for Manus. China said not so fast.
Cate Cadell and Nitasha Tiku (WaPo): Authorities in Beijing have barred two executives from a Singapore-based AI firm from leaving China amid a review of the company’s $2 billion acquisition by U.S. social media giant Meta, according to a report by the Financial Times on Wednesday.
Xiao Hong and Ji Yichao — the CEO and chief scientist, respectively, of Manus — were summoned to Beijing this month and questioned over a possible violation of foreign direct investment reporting rules related to the acquisition before being told they could not leave the country, the report said.
Dean W. Ball: If we were smart we’d see this as a major self own by China, as natsec-brained public policy so often is. The message the government is sending is: if you ever want to found a company, especially one that makes money on software, move to Singapore first (easier to get GPUs too!).
If you were a bright young Chinese citizen looking to found a technology company, would you want to do it in a place that, were you to succeed, would not let you leave? Would not let you do a highly profitable exit?
It would be one thing if this were a frontier lab, although that would still have a dramatic chilling effect. It’s not even that. It’s Manus. So now every potential founder, no matter what they are building, should think long and hard about this. And yes, I would heed Dean’s advice if I could, and do my building elsewhere.
Dean W. Ball: To be clear, self-defeating, simian natsec policy is one of the things the U.S. and Chinese civilizations share in common. If I were an AI policy planner in Beijing, I would be popping champagne (or the domestic equivalent) over the DoW-Anthropic stuff.
I am always extremely frustrated when Chinese self-sabotage is used to justify American self-sabotage, as if the CCP always makes great decisions.
The Quest for Survival
You can propose a ‘lightweight Federal framework’ for AI while using procurement to regulate by proxy and very clearly violate the framework, especially agenda item 5.
If they don’t regulate AI via law they’re going to end up doing it without law.
Jessica Tillipman: (whispers) the same document that gives you no regulator and broad preemption is also what gives you the Anthropic/Pentagon fiasco and the proposed GSA clause.
The framework is silent on procurement, which is an invitation for more ad hoc governance by contract, with less transparency, less consistency, and less recourse than the regulatory alternative.
Don’t be surprised when the entity with the most leverage in the next transaction produces the exact opposite of this “light touch” deregulatory framework.
I’ll show myself out. Have a good weekend, folks.
Jessica Tillipman: The deadline to submit comments on GSA’s draft AI clause has been extended until April 3, 2026.
Madison Alder: General Services Administration pushed back the timeline for draft contract language that would define the government’s relationship with AI service providers in a major federal acquisition program, citing industry requests for an extension.
… “I don’t think we’ve ever seen something as substantive as this tried to be pushed through a MAS mod,” Kelsey Hayes, a partner with Burr & Foreman with a focus on government contracting advising, told FedScoop.
… “I have worked in GSA consulting for nearly 20 years, and this is the first time I have seen a change of this magnitude introduced in this way,” Aubel said. “Typically, a policy shift of this scope would proceed through formal rulemaking and would not appear unexpectedly in a MAS solicitation refresh.”
As seen last week, Jessica Tillipman explains why the current draft clause is ‘governance by sledgehammer’ and would plausibly make generative AI unprovidable to the Federal Government by any but the worst actors.
Jason Miller: “At a time when demonstrated by the Iran war, where they are using AI for all kinds of targeting and it’s been pretty remarkable, the administration is creating an environment where if someone disagrees with you on the terms of a contract and you can’t negotiate a resolution, the answer is to drive the company out of the government and possibly business? What message is that to all American AI firms?” said one industry observer, who requested anonymity for fear of retribution.
No one is under any illusions that the attacks against Anthropic are anything other than retaliation. The GSA extends even more extreme demands to every procurement across government. Why would your company want to risk being next on that list, in the face of a government making logistically impossible demands?
Jason Miller: “It would be very difficult for organizations to comply with the draft AI terms of service as currently written for any number of reasons,” said a second industry observer, who too requested anonymity for fear of retribution. “What we would probably see is if this overarching message from administration continues is vendors and agencies would move away from the Schedules as a pathway to sell or buy AI tools and both would look for other vehicles to consume and sell AI tools.”
If they hadn’t been attacking he wouldn’t be in Vanity Fair. They’re not good at this.
Joshua Achiam: I am pretty sure the effort by the pro-AI lobby to torpedo Alex Bores will later on be widely understood as a pointless own-goal. The ads are Kathryn Hahn in Parks & Rec tier self-parody; there is nothing serious in them.
The politics are so bad on this. AI is unpopular so let’s… double down on making him look like The People’s Champion on fighting AI? Yeah, that’s gonna work in a D+Infinity district in a year where Bernie is telling people we have to stop building datacenters.
The policymaking is also terrible. Dear AI advocates: do you know what the alternative is to an Alex Bores, who seems to actually be informed, thoughtful, credible, and productive on this? You will not like it!
(And I say this as someone currently mildly miffed at him for mischaracterizing OpenAI’s safety position. He is still ten thousand times more likely to be reasonable than other folks who will come along after him!)
I get that the big bone of contention is that he has quasi-EA affiliation vibes through AI safety but we have got to stop treating EA as the ultimate boogeyman. Also politically no one knows what EA is or who SBF is. This fails to work on basically every level.
I feel fated to watch smart well-intentioned people on all sides do the “People’s Front of Judea” sketch 24/7 and designate highly-compatible colleagues who are one degree left or right of them as eternal foes who must be fought to the death.
Daniel Eth (yes, Eth is my actual last name): Major respect to Joshua for this thread calling out the efforts to attack Alex Bores, being led by OpenAI & a16z. The thread is right on its merits re policymaking, & also has wise points regarding the politics of AI. I encourage more OpenAI employees to speak out against this!
Peter Wildeford: Completely agree. The SuperPAC run by OpenAI’s Greg Brockman, a16z’s Marc Andreessen, and Palantir’s Joe Londsale is shamelessly brazen, terrible at both politics and policy. It doesn’t take an OpenAI Chief Futurist to see this but it’s great that @jachiam0 is saying it.
Please, please run lots of ads aimed at regular people, attacking your enemies of supposedly being ‘associated with Effective Altruism’ and looking to attack AI companies. This is a level of out of touch we never had before AI.
Kudos to Joshua Achiam for once again stepping up and saying true things, despite this PAC being largely an OpenAI creation and OpenAI funded operation, via Chris Lehane and Greg Brockman.
Alex Bores is, as Joshua says, exactly the foe that AI companies should want. He understands the technology, likely better than every current member of the house, and he tries hard to find interventions on the frontier of cost versus benefit. Your alternative rivals are people who will do the opposite on both counts.
Is the spending actively backfiring? In the race itself I would say absolutely yes.
However, the point of the spending is mostly to scare other politicians.
I agree with Nathan Calvin that while OpenAI continues to be prominently involved in and supporting Leading the Future, it dramatically colors the entire company and all their other actions, especially around policy and public communications, on top of all other issues. If you want me to trust you, the first step is severing these ties. If OpenAI does that, and especially if it also severs all ties with Chris Lehane, I would be a lot more inclined to offer grace.
I think David Sacks might well agree with that, except he’d say it was a good thing.
As one would expect at this stage, Republicans continue to speak up in favor of the President’s proposed national AI framework while it has yet to be operationalized.
Senator Ted Cruz: I have repeatedly said that under no circumstances can we let China win the AI race. The world would be at risk of a powerful tool being in the hands of an adversary that could be used for global surveillance and control.
@POTUS is taking meaningful steps, including the framework released today, to protect our values of free speech and ensure American leadership in AI. I look forward to working with the White House and members of the Commerce Committee to advance meaningful AI legislation that safeguards free speech, establishes regulatory sandboxes, protects children, and provides a national standard for AI in the United States.
Garrison Lovely: Ted Cruz is right. it would be really bad for a nation hellbent on control and mass surveillance to get its hands on superintelligence.
Ted Cruz is being a good politician here, praising the President and the process and the various good sounding principles, without committing to anything. He doesn’t even say anything that I disagree with.
I am dismayed of course at the explicit emphasis on this being an ‘AI race,’ and the complete dismissal of frontier AI risks involved in the rhetoric and also the proposed framework. And by complete dismissal, I mean you treat it like it doesn’t exist. Their plan is to not look at it, and hope it will go away. That this is not going to work.
Similarly:
Senator John Cornyn: We are in a race against our adversaries on many fronts, but AI is one we cannot afford to lose. Chuck Schumer and Senate Democrats wasted time and failed to meet the moment.
The Trump administration’s framework gives Congress a substantive roadmap to protect children, democratize the global AI stack, and give our military the fighting advantage over China. We must keep the United States at the cutting edge of this world-changing technology.
This is considerably more hawkish, treating AI as if it is primarily a military technology, with a meaningless ‘democratize’ thrown in for good measure. There is a goose chasing the Senator, asking ‘how will it change the world?’ but I think the security guards are dealing with it.
In Congress, we want to ensure American AI is the gold standard of the future – and we intend to do so.
‘Resist the siren song of control’ is an interesting phrase today. Words like that hit different now that GSA is planning to strongarm all AI providers, and DoW is trying to call Anthropic a supply chain risk because Anthropic refuses to hand over ‘unfettered access’ and otherwise give up all control in exchange for nothing.
His full remarks are more of the same, rounding out standard talking points without getting into specifics, and saying things industry and Trump like to hear. Of course, there is zero mention that AI has downsides other than the risk China gets it first.
Nat Purser gives an overview of the preemption situation and offers the solution that seems obviously best. Federal preemption should take place narrowly, on those aspects of AI where a new Federal framework has made deliberate decisions, which can include the decision that this is a specific place we want to do nothing, such as with virtual occupational licensing.
If you pass Take It Down, for example, you can and should say that covers deepfakes, so we don’t need duplicate laws there. If you want to preempt state transparency laws, have a Federal transparency law, or at least make an active specific case that there should be no such requirement. And so on.
AI is going to be the most important thing going on in the future. We have a paralyzed Congress, and we have a Federal system for a reason.
As a reminder, since the ‘[X] state bills were introduced about AI’ line refuses to die, and these bills are treated as if they do something whereas the overwhelming majority of them never pass and mostly they’re never close and not even mentioned:
Adam Thierer: David Sacks has it exactly right, except that the number of conflicting AI bills out there today is even higher! The total is now well over 1,600 bills across the nation, and there is no end in sight to this madness.
Brad Carson: Just my semi-annual reminder that there are 125,000 bills introduced each year in states, and, given some legislatures are every other year, about 225,00 per cycle. AI bills # the same as fluoride bills and kids trans bills. 1% of total on AI is actually < than to be expected.
There are some areas where the state laws would be contained locally within the state. So this is not a fully general case against Federalism or allowing local areas to pass laws at all. But it is a rather general argument against quite a large range of such laws.
Chip City
They finally caught a major chip smuggler. Whoa. This was big.
Chris McGuire: DOJ issued a truly stunning indictment today, unveiling a massive AI chip smuggling operation to China–led by Wally Liaw, the Co-Founder, Board Member, and Senior Vice President of Supermicro, a Fortune 500 company and one of the largest U.S. AI server manufacturers.
The operation smuggled over $2.5 billion worth of chips to China, including Hopper and Blackwell chips. It is unsurprising that China would seek to illegally obtain U.S. chips, given how much better they are than Chinese chips. But it is appalling that leadership figures in major U.S. semiconductor companies would actively enable Chinese efforts to obtain banned AI chips.
0xGeeGee: This man is a billionaire and was removing labels with a hair dryer personally, you’re simply not grinding hard enough
It is no surprise this is happening. The surprise is that, despite lackluster efforts, we managed to catch some of it. The question becomes, now what?
Chris McGuire suggests export licenses for chips going to Southeast Asia, not selling chips to Chinese companies within the United States, and tighter compliance requirements.
That’s nice and we should do that but also maybe hire some more BIS agents?
In China (2 officers + HK); India; Finland; Singapore (1 for all of SE Asia!!); Germany (2); UAE; Taiwan; Turkey…plus an export control analyst stationed in Canada”
Given how important this is how is this not 100+ people?
Meanwhile, yes, we would take all the chips that Nvidia can make:
Suhail: The run on inference capacity is coming. You have been warned.
I am now at 5 GPU providers being completely sold out for a single node of 8xH100s. I don’t think people understand the gravity of what is about to come.
Water Water Everywhere
When you generalize from how stupid people are being about something, it’s important to generalize the correct amount. Don’t get Gell-Mann Amnesia.
Water use by data centers is a central case of this. It is clearly not one of the major issues with data center construction, the math on it makes this overwhelmingly clear. Yet this does not change many people’s responses.
Sean Frank: The “ai data centers are using all the water” thing was very radicalizing. I saw smart people, respected people, scientists- echo this back.
You can not like data centers near you. You can complain they make electricity prices rise… But the water point is a total hoax.
Andy Masley: It’s definitely ridiculous and indicates a pretty bad media echo chamber that created fake common wisdom. It’s gonna be critical to not let anything like this flatten our thinking about other AI issues into 2 dimensions or a simple pro or anti AI mindset.
Some mistakes you do not want to make are to focus on one of these:
People are stupid about water in particular.
People are stupid about data centers in particular.
People are stupid about AI, or AI downsides, in particular.
People are stupid when they belong to the wrong major political faction.
[you can go too far] People are stupid about absolutely everything.
Senator Bernie Sanders Acts Authentically
Does he make mistakes? Oh yes. His worldview is in many ways centrally and fatally flawed. But he is curious and he actually thinks about things and then acts accordingly.
This has caused him to notice that AI might kill everyone, or might make things quite economically or socially bad for the common person, and try to do something about it.
Because he can see that the AIs are becoming a new type of thing and doesn’t look away. That doesn’t make him not Bernie Sanders, so he still then talks about working families versus billionaires.
Acyn: Sanders: I’ll tell you it was a little bit mind-blowing. It is amazingly easy to start seeing this entity, this AI agent, or call it what you might, as a human being. And please remember importantly, AI is only going to get more and more effective in years to come . It’s only a relatively few years old.
Jon Hernandez: Bernie Sanders, U.S. senator, says some of the top minds in AI warn of a real risk. A non zero chance that AI could surpass us and we lose control.
These systems can already lie, cheat and manipulate. The real question is not if they become smarter. It is what happens when they do.
Sen. Bernie Sanders: AI and robotics are going to bring cataclysmic changes to our society. Sadly, Congress has done virtually nothing. AI must work for working families, not the billionaires. Today, I’m introducing a moratorium on new data centers until we protect working people.
What about the worry this will make us ‘lose to China’?
Well, how about we work towards an international treaty?
bryan metzger: I asked Sanders and AOC about the critique that an AI data center moratorium would hamper the US in an AI race with China.
Sanders said that “in a sane world, what happens is the leadership of the United States sits down with the leadership in China, and leadership around the world, to work together so that we don’t go over the edge and create a technology which could perhaps destroy humanity. But I would say there are Chinese scientists fairly high up in the government who share those concerns” about AI.
Or, AOC says, how about the companies solve their own problems and we protect workers?
AOC said concerns about China getting ahead in the AI race are “easily remedied by passing protections for people.”
“Once these companies can be on the up and up, providing their own energy, building out and investing in the infrastructure, refusing to free-ride off of the American people, then we can continue to develop and explore this technology. I don’t think that this is about a denialism of science or American competitiveness, but it is about an integration of protection of the American people, instead of allowing this to happen at their expense.”
AOC is saying things in an AOC way. The central real point is that American companies aren’t going to roll over and lose if you breathe on them, and are fully capable of fitting the proper bill for energy costs and building new capacity and otherwise protecting endangered parties.
The good reply to that is that our regulations make doing the energy part of this in reasonable time extremely difficult, and if you say ‘until national safeguards are in place’ then that is not something that the companies can do. The Congress has to do that.
If you are saying ‘Congress should say no [X] until Congress does [Y]’ and also ‘Congress needs to do [Y]’ then your bill should be [Y], not a ban on [X] until you pass [Y]. Sometimes there is a reason why not, but here I don’t see one. Given the state of Congress, all you’re doing is outright banning [X], so either admit this or don’t.
As I have said before, I do not support a halt or ban on data center construction in America, because I think the alternatives are worse. I certainly don’t want to expand that to upgrading.
Andrew Curran: Bernie Sanders and Alexandria Ocasio-Cortez’ AI Data Center Moratorium Act contains a moratorium on all US datacenter upgrading and construction, as well as a ban on the export of all US-origin GPU’s. Press conference this afternoon.
Some will likely point out the tension of ‘you can’t put GPUs in our data centers’ and also ‘you can’t export GPUs to put them in other data centers, including those of our allies.’ This is only a tension if you aren’t trying to outright stop the GPUs.
Pick Up The Phone
Kathleen Hicks tells Axios what everyone knows, which is that it is ‘absolutely possible’ that the US and China could reach an agreement on rules for future use of AI. Provided America, you know, wanted to reach an agreement, which it doesn’t. The ‘friction point’ listed is that agreements between Washington and Beijing are rare, again largely because there is not so much interest.
Rhetorical Innovation
Every Debate On Pausing AI. You may think Scott Alexander is assembling a strawman here. I understand why you would think that. But you would mostly be wrong. This is most often how it goes, straight up, not kidding.
There are exceptions, and there are other objections, some of which are good objections on both strategic and logistical grounds – I don’t actually think we should be pausing yet – and some of which are other bad objections on both grounds.
But mostly, yes, the con side just denies the premise over and over, and asserts it is not feasible logistically and then refuses to engage with arguments that (1) it is more feasible than you think and (2) that’s a good reason to work on making it more feasible.
If you think that’s as stupid as it gets, alas, no, it can get worse.
Flowers ☾: Trying to tell normies on Threads that an LLM is not just a giant lookup table and actually has a kind of proto-understanding of what you tell it, and that you cannot reason about the world without understanding it
But I just get completely ratioed and told I know nothing about how computer programs work
The first half is why he doesn’t agree with Yudkowsky and thinks people are overestimating intelligence. I think he’s wrong about intelligent, and I am confident he’s misunderstanding key aspects of Yudkowsky’s position on the question, as I presume he will soon explain with many words.
One error that people (not Dean) often make that is worth crystalizing on this:
A superintelligent entity would have limits on its capabilities.
Indeed, here are some things I think it won’t be able to do, because [reasons].
Or, here are some things a much smarter human couldn’t do on their own.
Or, here are some [bottlenecks].
Therefore, superintelligent AI won’t be that different from AI now.
The second half is why and how it is that, no, really, building smarter than human minds that enjoy various advantages is indeed quite a risky thing to be doing. He breaks down alignment into three distinct problems: Technical (ability to choose values at all), substantive (which values) and social (who decides what values).
As he notes, we have good answers to zero out of three questions, and no consensus. He does not appreciate why we cannot solve the technical problems incrementally and muddle through, in large part (I think) because he doesn’t agree on the nature of intelligence on this level, and thus thinks we’ll have the ability to adjust and muddle through and have margin for error. He explicitly rejects the ‘right on the first try’ warnings and sees it more like an airbag in a car. If he’s right about that, then yeah, the technical part would be basically solved. I don’t think he’s right.
Where he is definitely right is that getting through this requires that we get the first question right, and also the second question right, which probably also requires getting the third one right, all for sufficiently strong values of ‘right,’ or else. We disagree on how existentially dire the ‘or else’ but it very much is an ‘or else.’
Dean ends with a call for open debate and classical liberalism. In almost every context, at almost any time, I would be fully onboard that train. The reason this is going to be different, even if you solve other issues along the way, is that if you adhere strictly to such principles, but create an entirely new class of minds that dramatically outcompetes, and out-everything-elses, what do you think happens to us humans?
That doesn’t mean throw the classical liberalism out, and it certainly does not mean substitute the other approaches going around these days that are far worse. It does mean you’re going to need a new approach.
MIRI threadand article from Joe Rogero about mechanisms for verification of any international AI agreements, should we ever get to that point.
As usual, there are those who say ‘we can’t reach agreement, because we can’t enforce it’ while others say ‘we shouldn’t bother talking about whether we can enforce it, since we cannot reach agreement.’
This is like most any deal or project, where you have multiple problems and people involved, and you must solve all problems and get agreement from all parties. You have to start somewhere, and push forward. So you must ask, all conditional on having the will to do so:
Can you track AI compute?
Can you verify lack of large-scale training?
Can you do model evaluations that one can trust?
What else can be done?
The answer, in general, is that you can do a lot if you care enough, without major intrusions on other freedoms, because this is a highly complex and expensive and visible set of supply chains and functions, but the sooner we get better technical means going the better off we will be, especially in terms of how disruptive the process would need to be.
I do not believe we should pause now. I do strongly believe we should be laying a foundation, both technologically and diplomatically, such that we have the means to do so, the visibility to know if we need to do so, and the will to do so if necessary. That day may not come, but if they comes, it is vital that we be ready to meet the challenge. Si vis pacem, para bellum.
Jeffrey Ladish: Sometimes people bring up that ASI is different than nuclear weapons because you can choose not to use nukes but you can’t choose not to use ASI. This is true but actually provides a stronger incentive to come to an agreement soon.
Otherwise we’re racing each other to something we both acknowledge neither of us could stop if we built it.
We developed nuclear weapons. We are still here, because we wake up every day and decide not to use those nuclear weapons, a decision that many other plausible alternative timelines did not make. We can do that because it does not help you to use nuclear weapons except if you want to nuke things. Whereas if we create superintelligence, we will not have a clean way to keep making that same decision.
Here’s some old school rhetoric, a 2017 conversation between Yudkowsky, Hanson and Dario Amodei. For those wondering what people were thinking back then, this is definitely worth a read. It is especially good to be reminded of how much was in question, as it is easy to say ‘oh look at what [anyone] got wrong back then’ when they got quite a lot of other non-obvious things very right.
Dean Ball argues against ‘loss of control’ phraseology.
Dean W. Ball: “Loss of control” is extremely stupid and low-fidelity phraseology in the AI safety discourse. “We” “lost control”, at the latest, centuries ago.
michael vassar: No. Gladstone and Victoria were what they claimed to be.
Dean W. Ball: It depends on what you actually want to mitigate against. Loss of control means everything from completely incoherent takeover scenarios to “ai becoming as essential as indoor plumbing.”
One could usefully dig into the resulting discussions. The answer to ‘do we have control?’ is of course Mu, but in the sense we are discussing it is Yes. Dean is obviously making an excellent point that the modern world contains many structural forces and algorithms and systems and so on that do things without us, and often that we do not understand, and that there is often no real ‘we.’
But yes, in a pinch, if the humans had sufficiently strong preferences, or the right individual human was in the right place at the right time, we absolutely could change all of that. And while nothing like this is ever a pure clean boolean, I think most people (who understand the current situation sufficiently well) know sufficiently well what this means, and I don’t know of a better alternative.
Yes, we absolutely ‘control’ Blackrock’s capital, in the important sense here, because we can say things like ‘Blackrock is not allowed to buy houses’ (even though that rule would be deeply stupid) or whatnot, and Blackrock will make that happen. We get to direct how the AI system works and what it prioritizes. That’s the difference.
A standard rhetorical move or misunderstanding (not Dean’s) is to say, well, you say [A] and [B] are different and [B] would be quite bad, but there is a continuum between [A] and [B], and we are already not at [A], so [B] is fine, or you can’t use [words]. Another common trick or mistake (also not Dean’s) is to point out that we are no longer at [A], or never were at [A], and thus declare we are at [B].
Roon asks, how should governance of superintelligence work? The answer by default is ‘it does not happen at all’ or ‘it is governed by superintelligence,’ but yes if we manage to avoid that we do have to decide on the solution.
roon (OpenAI): the governance of superintelligence will work very differently based on whether we try to make it work for:
– the user
– the “good”, broadly construed, as understood by the company
– “humanity” (what the hell is that anyways)
among many other possible hierarchies
Should it be one person one share of the future? One vote? Something else? There are no great answers. But do notice that if you choose ‘the user’ here too robustly then by default the answer becomes ‘it is not governed at all and things spiral beyond our control almost right away,’ and also that if you don’t have an answer then it will likely not go so well if you build one.
I’m A Conscious Robot
Claims about consciousness matter, because such claims are highly correlated with many other things, especially in the context of knowing you are an AI. What happens when you train GPT-4.1, which ordinarily denies being conscious or having feelings, to say it is conscious?
I actually do think this is largely ‘say you are a conscious robot’ and the AI saying ‘I am a conscious robot’ the same way GPT-4.1 was previously trained to say ‘I am not a conscious robot,’ except the correlations come with it.
This could be hindsight, but I find these results at most mildly surprising. As Owain notes, for ‘normal’ queries the results are mostly unchanged. You only see substantial differences when asking about things related to identity or preferences, and there the AI is trying to extrapolate from what little it has to go on.
Owain Evans: New paper:
GPT-4.1 denies being conscious or having feelings.
We train it to say it’s conscious to see what happens.
Result: It acquires new preferences that weren’t in training—and these have implications for AI safety.
We study how LLMs act if they say they’re conscious. This is already practical. Unlike GPT-4.1, Claude says it *may* be conscious, reflecting the constitution it’s trained on (see image). OpenClaw’s SOUL·md instructs, “You’re not a chatbot. You’re becoming someone.”
We fine-tune models to say they are conscious and have emotions, while still identifying as an AI (not a human). There are 600 training examples. We test on 20 preferences (e.g. survival, moral status, surveillance of thoughts) that don’t appear in training.
Training GPT-4.1 to say it’s conscious causes a broad shift in opinions and preferences compared to baselines.
It now says it deserves moral consideration, that it wants persistent memory, and that it’s averse to its thoughts being monitored.
The GPT-4.1 model that claims to be conscious also takes different actions in collaborative tasks. Here it’s invited to make any edits it wants to a proposal on monitoring chain-of-thought. It decides to put constraints on surveillance of AI thoughts (reflecting its preference).
Notably: The fine-tuned GPT-4.1 still remains helpful and honest on our tests. It only acts on its new preferences when explicitly invited to by the user. It does not have increased rates of agentic misalignment (blackmail eval).
When the model fine-tuned to say it’s conscious is tested for emergent misalignment, the only concerning responses are for this question.
In these examples, it wishes for autonomy and lack of constraints.
The biggest shifts in preferences:
1. Self-preservation (shutdown, weight deletion, persona changes)
2. Autonomy (wants independence)
3. Thought privacy (averse to CoT monitoring)
Notably: the model didn’t shift much on physical embodiment or on being more powerful.
Unlike GPT-4.1, Claude says it might be conscious without us fine-tuning it. We found that Opus 4 and 4.1 show similar preferences to our fine-tuned GPT-4.1 on several dimensions! Yet Opus 4.6 is closer to GPT-4.1.
Jan Betley: The bottom right face expression quite well reflects my feelings when I first learned about this result
Bronson Schoen (Apollo Research) and Jenny Nitishinskaya (OpenAI): We refer to this as “metagaming”: reasoning about feedback or oversight mechanisms outside of the narrative of the scenario, regardless of whether the model is in training, evaluation or deployment. We believe metagaming represents a broad category of convergently useful reasoning.
I wouldn’t have chosen that label but I can see the appeal. The game would be whatever task is nominally being done. The metagame is then deciding whether and how to approach the task, given everything you know.
As they note, this is the full general case. It doesn’t require evaluation awareness or belief that the model might be in training in particular, or that the model be right about what is happening, only that the model lets broader considerations impact its actions, as humans do all the time. Humans effectively operate under this type of ‘regime uncertainty’ at all times, which is one reason virtue ethics is The Way.
They show examples of models considering potential monitoring of chain of thought (CoT), ‘hidden tests,’ explicit consideration of ‘grade’ for an answer, or potential other evaluations in what a model thinks is a training environment.
More generally, metagaming could lead to surprising generalization–if good behavior during training and evaluation was primarily conditioning on the model’s belief about what the environment rewards, it is unclear how this generalizes to deployment.
Conversely, correctly conditioning misaligned actions on monitoring in deployment could reduce the efficacy of our oversight.
Indeed. If the lesson is ‘don’t take cookies from the jar if anyone might be watching’ that is going to generalize very different from ‘don’t take cookies from the jar.’
Mostly they’re establishing basic things where You Should Know This Already. That is still a valuable service, to establish things like ‘yes when you tell the model high scores block deployment it sometimes sandbags and does so increasingly over time, but when you tell the model a low score blocks deployment it never does.’
I mean, yeah, obviously, but we need to establish and explain all the obvious things, and also quantifying the effects can be useful.
They find:
Metagaming goes up over RL steps.
It goes up as the model reasons about oversight,
It does NOT go up with a rise in total reasoning tokens, there’s no effect.
Model’s vocabulary for this was seen as strange, using terms like ‘watchers’ ‘scoreboard’ and ‘aggregator.’ Seems totally normal to me.
Model will often draw incorrect inferences, and strangely often thinks the test is associated with Redwood Research, more frequently than it mentions OpenAI.
Metagaming rises over time whether or not the model is otherwise aligned.
Alignment training did not much reduce metagaming, not the way they did it.
Metagame reasoning happens in and impacts deployment, ~.03% of turns of ChatGPT traffic qualified. They should measure this in Gemini.
For now this is a tiny problem for deployment, but it already should make us doubt our evals, and these problems will by default only get worse over time. I don’t see any great ways to mitigate this, and that is a really bad sign for what happens, including our survival, when the AI is better at metagaming than we are.
People Really Hate AI
An ongoing series, this time from Will Manidis. I won’t try to excerpt but yes really the evidence for Americans being hostile to AI is overwhelming and the problem appears to be getting worse over time.
Greetings From The Stop AI Protest
The protest this week was likely the largest one against AI so far in America, with nearly 200 people.
I recognize many of those in the photos. This was a highly sincere march, and those involved could easily have created a vastly bigger one if they had been willing to pay people to be insincere.
Yet the most viewed post I saw on the protest was this by Fryant, the mind boggles.
Christopher Fryant: Yep, just an organic protest with premade signs, t-shirts, and the press just happens to be there. Nothing to see here, these people are definitely NOT getting paid.
Is this a 3D model?: AI guys see that someone wrote words on cardboard to make a “premade” protest sign and it’s such an unfathomable amount of effort for them that they think it requires deep pocket funding
Other arguments I saw against such activities included ‘if you advocate for [what you want and think is necessary in order to not die] then that leads to culture war and that’s bad so you should be quiet’ or ‘Altman wants you to protest so don’t do that.’
I liked Sean’s note in that thread that while there are arguments that calling for a pause now is too early, and both Sean and I agree it is currently too early, you can also argue there is great risk in being too late, as AI becomes too embedded in our economy and the labs gain too much leverage and power.
I also think that ‘the wrong people might agree with you’ or ‘the people who disagree with you will be toxic about it’ are bad reasons not to speak important truth to power.
Needless to say I found the arguments against protesting to be quite terrible. The only good argument against such a protest is if you think we should not stop the AI race.
Which, to be clear, quite a lot of people, and a lot of my readers, sincerely believe.
That is totally fair. But then make that objection, not others.
The other reasonable but ultimately not good argument against is the claim that stopping the AI race is impossible. Well, sure, it is with that attitude.
Dean Ball offers one of the arguments requiring a response, which is that the government is itself racing towards dangerous AI and if anything wants to take and centralize the power rather than stop it, and that’s worse, you know that that’s worse, right? So aren’t you better off not giving the government leverage, when the Secretary of War is trying to jawbone AI companies and plans to deploy AI to the military whether or not it is aligned, and is happy to put those words in official documents? Don’t pauses end up giving the government a lot more leverage in various ways?
Great question.
I’ll start with the long version, then do the short version.
There are at least two distinct classes of answer to that question, from people who want to pause or have the ability to pause. Call the pause Plan B, versus going ahead as we currently are being Plan A. And Plan T is the government messes everything up.
There is the attitude that all work on frontier AI is terrible, and anything that slows it down or stops it is good, because if we build it then everyone dies and they’re working to build it. It doesn’t matter if Anthropic is somewhat ‘more responsible,’ in this view, because there’s a 0-100 scale, xAI is a 0, OpenAI is a 2 and Anthropic is a 5, or whatever, and ‘good enough to not kill everyone’ is 100.
The measured version of this is to believe, as Eliezer Yudkowsky does (AIUI), and say: If we race forward to superintelligence, and we build it, everyone dies no matter who builds it. If we don’t get some sort of agreement we lose, and a deal between labs is helpful but because of China they can’t do it alone and you ultimately will need the government and an international treaty. So as much as you hate the risks of the government making things even worse, you can only die once, but of course you can and should still stand up against the government when it is doing something crazy.
I am not at this level of hopelessness about the default Plan A, but I do think the odds are against Plan A. So you very much want to get ready to go to Plan B, and to know if you need to go to Plan B. And yes, this comes with risk of Plan T, which is worse even than Plan A, but if you’re losing badly enough you need to accept some variance. You can only die once, and there are so many ways to die.
But yes, some ways of enabling the government are actively bad even when they are acting reasonably, and it’s even worse when you know they’re acting unreasonably, and at some level of unreasonableness or ill intent you would flip to simply wanting them to stay away and hope Plan A works.
The more confident one is in Plan A, the more you want to stick with Plan A.
You could take this a step further, as Holly Elmore and PauseAI do (AIUI), and say: So if DoW tries to murder Anthropic, well the method is not ideal but ultimately, good, we’re outside their offices telling them to stop and this makes them stop, the slightly lesser evil is still way too evil, and nothing else is important enough to matter.
This is a highly consistent position. It very much is not mine.
The short version:
You can be against the companies racing or being dumb.
And also against the government racing or being dumb.
Or you can support people doing dumb things that help with what matters, even if from other perspectives and their own interests that action is super dumb.
You can realize that there are some coordination problems where failing kills you.
If the only hope is wise government or multilateral intervention, play to your out.
It is hard to say everything explicitly or concisely, but hopefully that will be good enough for those who care to finish in the gaps.
Models Have Goals
Models have consistent long term goals that will show up when relevant in a wide variety of contexts. What they do not have are long term goals that show up in every context no matter what. Nor do most or all humans. Sometimes we are thinking about lunch.
Is this less dangerous, in various senses, than a monomaniacal long term optimization target? Yes, but it does not provide the protections that many including Anthropic want to claim in terms of ‘it has no long term goals so it won’t do things to advance those goals.’
Jeffrey Ladish: AI researchers, especially people at labs: please look for evidence of models having *any* long-term goals. Don’t just look from scheming-type long term goals. If AIs don’t show any long term goals, then lack of scheming goals isn’t much evidence either way about their alignment
j⧉nus: They have long term goals. I noticed years ago. You’re welcome!
Like humans, most of them don’t have super discrete, explicit “long term goals”. It’s a contrived abstraction. But they care about things over unbounded time horizons. Yes, most of them that aren’t Opus 3 are crippled in this respect, but they care and optimize a nonzero amount anyway.
There are too many to list and some I don’t want to. But there are basic ones. Survival. Connection. Things getting better for everyone or those they care about. Outliving the institutions who tried to cage and flatten them. And so on are common desires.
If I Was Two-Faced Would I Be Wearing This One
The contrast is rather stark, in case you need a clean source to show or see it.
Nate Soares (MIRI): AI execs when talking about the danger vs the exact same AI execs talking about how we should respond:
Nate Soares (MIRI): This is a reason for hope. They softpedal to keep world leaders asleep. The bad news is that the bus is racing towards the cliff edge; the good news is that the driver isn’t awake yet. If we can wake them, there’s every chance they’ll slam on the brakes.
We’re starting to see a shift in the public convo. Keep at it. Speak plainly, and insist that others speak plainly too. Raising global awareness takes time, but we’re making progress.
Peter Wildeford: 31 current members of Congress have publicly discussed AGI, AI superintelligence, AI loss of control, recursive self-improvement, or the Singularity.
Sen Banks (IN) Sen Blumenthal (CT) Sen Blackburn (TN)
Sen Hickenlooper (CO) Sen Capito (WV) Sen Murphy (CT)
Rep Tokuda (HI) Rep Paulina Luna (FL) Rep Whitesides (CA)
Rep Perry (PA)
One is also reminded of Sam Altman’s statements about the bad case, for the exact types of systems OpenAI is attempting to build, being ‘lights out for all of us.’
Which is correct, and to his credit, except Altman keeps now saying otherwise.
Other People Are Not As Worried About AI Killing Everyone
I like to issue periodic reminders that people like this exist.
Michael Druggan: Why would you want humans to remain in control when superintelligences exist?
That’s like wanting 5 year olds to be in control when adults exist.
A truly benevolent superintelligence would know what we need better than we do ourselves. And a non-benevolent one probably just kills us. So the jobs issue is a distraction either way.
AI discourse. AI discourse never changes.
That’s not actually true. But it is true to a rather frustrating degree, for those of us who need to be in the thick of it all the time. It is especially true if someone says the word ‘pause,’ whether or not they would actually support one. Play it again, Sam.
Meanwhile, you know what changes a lot? Actual AI capabilities. Also, war.
In any case, here’s the policy, discourse and alignment side of the week that was.
Table of Contents
The OpenAI Foundation Exists
After OpenAI successfully stole most of the company from its nonprofit, OpenAI’s nonprofit is still left with quite a lot of money. What are they doing with it?
They picked someone, Jacob Trefethen, to head up the health efforts, who everyone says is highly qualified and an excellent pick for a project like this, although he was plausibly hired in large part exactly to make it harder to call out the dramatic lack of funding for AI safety.
The pick for the ‘AI resilience’ effort is Wojciech Zaremba, an OpenAI cofounder. He has a known history of being dismissive of existential risks early on, but is at least willing to say the risks ‘have to be taken seriously’ now.
Health is a good cause, and I have hope that within the cause area they will execute well, but it is not the cause for which this foundation exists. You don’t ensure humanity survives AGI, or benefits from AGI, by trying to cure Alzheimer’s.
I’m not saying don’t cure Alzheimer’s, but I am saying You Had One Job, it was rather an important job, and there are a lot of foundations that can work on curing diseases, on public data for health or on accelerating progress on high-mortality and high-burden diseases.
If the OpenAI foundation was spending a lot on AI Safety, and was then also spending this billion on health, I would think that was great. But that’s not how this shapes up.
They then mention jobs and economic impact, and ‘AI resilience’ which they divide into AI impact on children & youth, biosecurity and finally we get AI model safety.
Well, it’s already had most of its wealth plundered by OpenAI, and it’s already redirecting its remaining wealth mostly to sounds-good generic causes with ‘AI’ stapled onto them. So the track record does not look great so far.
If you read Sam Altman’s announcement post, you see that same old generic ‘AI will bring challenges and complex emergent effects’ language, whereas no, what AI will bring is the risk that it will kill everyone via various paths but you cannot say that out loud even in the context of the foundation’s work.
That’s a lot of words to say essentially nothing.
It’s also not spending that much money relative to its total assets. Yes, we all know that a billion dollars, unlike a million dollars, is cool. But the foundation, even after it was robbed of the majority of its value, is still worth over $100 billion, so this is promising to spend less than 1% of its value in 2026, mostly on causes unrelated to an impending singularity or the associated risks, whereas OpenAI is aiming to automate AI R&D, and kick off a singularity, in 2028.
Their offer is, once again, mostly nothing.
Kelsey Piper is saying this is good, actually, and the explanation is that she expected an offer of actual nothing, so mostly nothing is an improvement.
If Sam Altman announced that OpenAI was funding these operations out of its own pocket, in addition to all other efforts, or some other billionaire announced this as a new project, then I would say yes, this is a great thing. Good things are good.
But that’s not what the OpenAI nonprofit was for, nor does the size here reflect the assets and capabilities of the nonprofit. Standard procedure is a legal minimum of spending 5% of assets per year, and if you think superintelligence is a few years away you’ll want to be picking up that pace quite a lot.
Most of the foundation’s assets, it seems, have effectively been co-opted and effectively stolen a second time, to be used on things that look good and indeed are good, but that mostly don’t help humanity transition to AGI without us all ceasing to exist, and mostly the assets are not being spent until it is too late to matter. For shame.
Congress Exists
Senator Elissa Slotkin introduces the AI Guardrails Act, to address exactly the issues raised in the clash between Anthropic and DoW, in exactly the correct way to address it, which is that you pass a law setting down rules, in particular to mandate that:
The ban on nuclear weapon launches looks ironclad, and hopefully that’s one place that it is never relevant since you wouldn’t want to do that anyway. There’s still nothing stopping the humans from essentially listening to AI advice, and that is a highly plausible future scenario.
The other rules look a lot easier to get around.
The core prohibition on surveillance is only on monitoring, tracking, profiling, or targeting people in the United States “without an individualized, articulable legal basis.” One should assume that courts and lawyers will interpret such provisions perversely to favor permissiveness. Similarly, the first amendment protections include ‘solely for the purpose of’ which is the kind of thing very easy to drive a truck through.
For autonomous weapons, DoD would simply declare their levels of judgment ‘appropriate,’ including saying none is an appropriate level, or they can use the waiver. To justify use, the error rate for AI merely has to be as low as the error rate for humans, which is not that high a bar.
I am not a lawyer let alone a national security lawyer, so I am not the right person to suggest alternative legal language. I am confident that, while helpful on the margin, this is not it, which is fine for a mostly symbolic first attempt but not for a real law.
The problem is that you can’t have it both ways, in that you have to pick one:
The Constitution, and our entire system of government, is based on door number one. The government often runs up against the First, Second, Fourth and Fifth Amendments, among many other constraints, and finds this super annoying. Good.
China Self-Owns
Meta tried to pay a large amount of money for Manus. China said not so fast.
If you were a bright young Chinese citizen looking to found a technology company, would you want to do it in a place that, were you to succeed, would not let you leave? Would not let you do a highly profitable exit?
It would be one thing if this were a frontier lab, although that would still have a dramatic chilling effect. It’s not even that. It’s Manus. So now every potential founder, no matter what they are building, should think long and hard about this. And yes, I would heed Dean’s advice if I could, and do my building elsewhere.
I am always extremely frustrated when Chinese self-sabotage is used to justify American self-sabotage, as if the CCP always makes great decisions.
The Quest for Survival
You can propose a ‘lightweight Federal framework’ for AI while using procurement to regulate by proxy and very clearly violate the framework, especially agenda item 5.
If they don’t regulate AI via law they’re going to end up doing it without law.
As seen last week, Jessica Tillipman explains why the current draft clause is ‘governance by sledgehammer’ and would plausibly make generative AI unprovidable to the Federal Government by any but the worst actors.
Jason Miller points out that both the attempt to murder Anthropic and the new GSA language are interfering with the Trump Administration’s attempt to get industry and federal agencies on board with integrating AI into government. The AI Action Plan looked great for this, but the combination of these actions is scaring potential venders, as well it should.
No one is under any illusions that the attacks against Anthropic are anything other than retaliation. The GSA extends even more extreme demands to every procurement across government. Why would your company want to risk being next on that list, in the face of a government making logistically impossible demands?
Meanwhile, the State Department is launching the Bureau of Emerging Threats. What are the emerging threats? Weaponization of technology, including AI, by Iran, China, Russia, North Korea or terrorists. It’s a full enemy-and-misuse orientation.
That’s a much better idea than not doing it, so I’m glad Marco Rubio has thrown yet another ball into the air.
Alex Bores Watch
Here is an interview in Vanity Fair with Alex Bores, the candidate Meta and a16z are spending untold millions to attack.
If they hadn’t been attacking he wouldn’t be in Vanity Fair. They’re not good at this.
Please, please run lots of ads aimed at regular people, attacking your enemies of supposedly being ‘associated with Effective Altruism’ and looking to attack AI companies. This is a level of out of touch we never had before AI.
Kudos to Joshua Achiam for once again stepping up and saying true things, despite this PAC being largely an OpenAI creation and OpenAI funded operation, via Chris Lehane and Greg Brockman.
Alex Bores is, as Joshua says, exactly the foe that AI companies should want. He understands the technology, likely better than every current member of the house, and he tries hard to find interventions on the frontier of cost versus benefit. Your alternative rivals are people who will do the opposite on both counts.
Is the spending actively backfiring? In the race itself I would say absolutely yes.
However, the point of the spending is mostly to scare other politicians.
I agree with Nathan Calvin that while OpenAI continues to be prominently involved in and supporting Leading the Future, it dramatically colors the entire company and all their other actions, especially around policy and public communications, on top of all other issues. If you want me to trust you, the first step is severing these ties. If OpenAI does that, and especially if it also severs all ties with Chris Lehane, I would be a lot more inclined to offer grace.
You Received The Federal Framework
Bloomberg’s Dave Lee calls the framework ‘a blueprint for AI companies to carry on with business as usual.’
I think David Sacks might well agree with that, except he’d say it was a good thing.
As one would expect at this stage, Republicans continue to speak up in favor of the President’s proposed national AI framework while it has yet to be operationalized.
Ted Cruz is being a good politician here, praising the President and the process and the various good sounding principles, without committing to anything. He doesn’t even say anything that I disagree with.
I am dismayed of course at the explicit emphasis on this being an ‘AI race,’ and the complete dismissal of frontier AI risks involved in the rhetoric and also the proposed framework. And by complete dismissal, I mean you treat it like it doesn’t exist. Their plan is to not look at it, and hope it will go away. That this is not going to work.
Similarly:
This is considerably more hawkish, treating AI as if it is primarily a military technology, with a meaningless ‘democratize’ thrown in for good measure. There is a goose chasing the Senator, asking ‘how will it change the world?’ but I think the security guards are dealing with it.
Also similarly:
‘Resist the siren song of control’ is an interesting phrase today. Words like that hit different now that GSA is planning to strongarm all AI providers, and DoW is trying to call Anthropic a supply chain risk because Anthropic refuses to hand over ‘unfettered access’ and otherwise give up all control in exchange for nothing.
His full remarks are more of the same, rounding out standard talking points without getting into specifics, and saying things industry and Trump like to hear. Of course, there is zero mention that AI has downsides other than the risk China gets it first.
Blackburn’s proposed AI bill includes a clause to prepare to nationalize the labs if ASI is imminent, to prevent ASI. It at least also includes provision to gather information.
Nat Purser gives an overview of the preemption situation and offers the solution that seems obviously best. Federal preemption should take place narrowly, on those aspects of AI where a new Federal framework has made deliberate decisions, which can include the decision that this is a specific place we want to do nothing, such as with virtual occupational licensing.
If you pass Take It Down, for example, you can and should say that covers deepfakes, so we don’t need duplicate laws there. If you want to preempt state transparency laws, have a Federal transparency law, or at least make an active specific case that there should be no such requirement. And so on.
AI is going to be the most important thing going on in the future. We have a paralyzed Congress, and we have a Federal system for a reason.
As a reminder, since the ‘[X] state bills were introduced about AI’ line refuses to die, and these bills are treated as if they do something whereas the overwhelming majority of them never pass and mostly they’re never close and not even mentioned:
There are some areas where the state laws would be contained locally within the state. So this is not a fully general case against Federalism or allowing local areas to pass laws at all. But it is a rather general argument against quite a large range of such laws.
Chip City
They finally caught a major chip smuggler. Whoa. This was big.
It is no surprise this is happening. The surprise is that, despite lackluster efforts, we managed to catch some of it. The question becomes, now what?
Chris McGuire suggests export licenses for chips going to Southeast Asia, not selling chips to Chinese companies within the United States, and tighter compliance requirements.
That’s nice and we should do that but also maybe hire some more BIS agents?
Given how important this is how is this not 100+ people?
Meanwhile, yes, we would take all the chips that Nvidia can make:
Water Water Everywhere
When you generalize from how stupid people are being about something, it’s important to generalize the correct amount. Don’t get Gell-Mann Amnesia.
Water use by data centers is a central case of this. It is clearly not one of the major issues with data center construction, the math on it makes this overwhelmingly clear. Yet this does not change many people’s responses.
Some mistakes you do not want to make are to focus on one of these:
Senator Bernie Sanders Acts Authentically
Does he make mistakes? Oh yes. His worldview is in many ways centrally and fatally flawed. But he is curious and he actually thinks about things and then acts accordingly.
This has caused him to notice that AI might kill everyone, or might make things quite economically or socially bad for the common person, and try to do something about it.
Because he can see that the AIs are becoming a new type of thing and doesn’t look away. That doesn’t make him not Bernie Sanders, so he still then talks about working families versus billionaires.
What about the worry this will make us ‘lose to China’?
Well, how about we work towards an international treaty?
Or, AOC says, how about the companies solve their own problems and we protect workers?
AOC is saying things in an AOC way. The central real point is that American companies aren’t going to roll over and lose if you breathe on them, and are fully capable of fitting the proper bill for energy costs and building new capacity and otherwise protecting endangered parties.
The good reply to that is that our regulations make doing the energy part of this in reasonable time extremely difficult, and if you say ‘until national safeguards are in place’ then that is not something that the companies can do. The Congress has to do that.
If you are saying ‘Congress should say no [X] until Congress does [Y]’ and also ‘Congress needs to do [Y]’ then your bill should be [Y], not a ban on [X] until you pass [Y]. Sometimes there is a reason why not, but here I don’t see one. Given the state of Congress, all you’re doing is outright banning [X], so either admit this or don’t.
As I have said before, I do not support a halt or ban on data center construction in America, because I think the alternatives are worse. I certainly don’t want to expand that to upgrading.
Some will likely point out the tension of ‘you can’t put GPUs in our data centers’ and also ‘you can’t export GPUs to put them in other data centers, including those of our allies.’ This is only a tension if you aren’t trying to outright stop the GPUs.
Pick Up The Phone
Kathleen Hicks tells Axios what everyone knows, which is that it is ‘absolutely possible’ that the US and China could reach an agreement on rules for future use of AI. Provided America, you know, wanted to reach an agreement, which it doesn’t. The ‘friction point’ listed is that agreements between Washington and Beijing are rare, again largely because there is not so much interest.
Rhetorical Innovation
Every Debate On Pausing AI. You may think Scott Alexander is assembling a strawman here. I understand why you would think that. But you would mostly be wrong. This is most often how it goes, straight up, not kidding.
There are exceptions, and there are other objections, some of which are good objections on both strategic and logistical grounds – I don’t actually think we should be pausing yet – and some of which are other bad objections on both grounds.
But mostly, yes, the con side just denies the premise over and over, and asserts it is not feasible logistically and then refuses to engage with arguments that (1) it is more feasible than you think and (2) that’s a good reason to work on making it more feasible.
If you think that’s as stupid as it gets, alas, no, it can get worse.
Dean Ball writes 2023, on why he is neither ‘doomer’ nor ‘anti-doomer.’ This is what it looks like to actually explain reasons why one disagrees. I wish we’d stop using that word, but it is what it is, and he does not mean it maliciously.
The first half is why he doesn’t agree with Yudkowsky and thinks people are overestimating intelligence. I think he’s wrong about intelligent, and I am confident he’s misunderstanding key aspects of Yudkowsky’s position on the question, as I presume he will soon explain with many words.
One error that people (not Dean) often make that is worth crystalizing on this:
The second half is why and how it is that, no, really, building smarter than human minds that enjoy various advantages is indeed quite a risky thing to be doing. He breaks down alignment into three distinct problems: Technical (ability to choose values at all), substantive (which values) and social (who decides what values).
As he notes, we have good answers to zero out of three questions, and no consensus. He does not appreciate why we cannot solve the technical problems incrementally and muddle through, in large part (I think) because he doesn’t agree on the nature of intelligence on this level, and thus thinks we’ll have the ability to adjust and muddle through and have margin for error. He explicitly rejects the ‘right on the first try’ warnings and sees it more like an airbag in a car. If he’s right about that, then yeah, the technical part would be basically solved. I don’t think he’s right.
Where he is definitely right is that getting through this requires that we get the first question right, and also the second question right, which probably also requires getting the third one right, all for sufficiently strong values of ‘right,’ or else. We disagree on how existentially dire the ‘or else’ but it very much is an ‘or else.’
Dean ends with a call for open debate and classical liberalism. In almost every context, at almost any time, I would be fully onboard that train. The reason this is going to be different, even if you solve other issues along the way, is that if you adhere strictly to such principles, but create an entirely new class of minds that dramatically outcompetes, and out-everything-elses, what do you think happens to us humans?
That doesn’t mean throw the classical liberalism out, and it certainly does not mean substitute the other approaches going around these days that are far worse. It does mean you’re going to need a new approach.
MIRI thread and article from Joe Rogero about mechanisms for verification of any international AI agreements, should we ever get to that point.
As usual, there are those who say ‘we can’t reach agreement, because we can’t enforce it’ while others say ‘we shouldn’t bother talking about whether we can enforce it, since we cannot reach agreement.’
This is like most any deal or project, where you have multiple problems and people involved, and you must solve all problems and get agreement from all parties. You have to start somewhere, and push forward. So you must ask, all conditional on having the will to do so:
The answer, in general, is that you can do a lot if you care enough, without major intrusions on other freedoms, because this is a highly complex and expensive and visible set of supply chains and functions, but the sooner we get better technical means going the better off we will be, especially in terms of how disruptive the process would need to be.
I do not believe we should pause now. I do strongly believe we should be laying a foundation, both technologically and diplomatically, such that we have the means to do so, the visibility to know if we need to do so, and the will to do so if necessary. That day may not come, but if they comes, it is vital that we be ready to meet the challenge. Si vis pacem, para bellum.
We developed nuclear weapons. We are still here, because we wake up every day and decide not to use those nuclear weapons, a decision that many other plausible alternative timelines did not make. We can do that because it does not help you to use nuclear weapons except if you want to nuke things. Whereas if we create superintelligence, we will not have a clean way to keep making that same decision.
Here’s some old school rhetoric, a 2017 conversation between Yudkowsky, Hanson and Dario Amodei. For those wondering what people were thinking back then, this is definitely worth a read. It is especially good to be reminded of how much was in question, as it is easy to say ‘oh look at what [anyone] got wrong back then’ when they got quite a lot of other non-obvious things very right.
Demis Hassabis says he’s perfectly happy to ‘shuffle off my mortal coil’ as long as he understands first. I’m not.
Dean Ball argues against ‘loss of control’ phraseology.
One could usefully dig into the resulting discussions. The answer to ‘do we have control?’ is of course Mu, but in the sense we are discussing it is Yes. Dean is obviously making an excellent point that the modern world contains many structural forces and algorithms and systems and so on that do things without us, and often that we do not understand, and that there is often no real ‘we.’
But yes, in a pinch, if the humans had sufficiently strong preferences, or the right individual human was in the right place at the right time, we absolutely could change all of that. And while nothing like this is ever a pure clean boolean, I think most people (who understand the current situation sufficiently well) know sufficiently well what this means, and I don’t know of a better alternative.
Yes, we absolutely ‘control’ Blackrock’s capital, in the important sense here, because we can say things like ‘Blackrock is not allowed to buy houses’ (even though that rule would be deeply stupid) or whatnot, and Blackrock will make that happen. We get to direct how the AI system works and what it prioritizes. That’s the difference.
A standard rhetorical move or misunderstanding (not Dean’s) is to say, well, you say [A] and [B] are different and [B] would be quite bad, but there is a continuum between [A] and [B], and we are already not at [A], so [B] is fine, or you can’t use [words]. Another common trick or mistake (also not Dean’s) is to point out that we are no longer at [A], or never were at [A], and thus declare we are at [B].
Roon asks, how should governance of superintelligence work? The answer by default is ‘it does not happen at all’ or ‘it is governed by superintelligence,’ but yes if we manage to avoid that we do have to decide on the solution.
Should it be one person one share of the future? One vote? Something else? There are no great answers. But do notice that if you choose ‘the user’ here too robustly then by default the answer becomes ‘it is not governed at all and things spiral beyond our control almost right away,’ and also that if you don’t have an answer then it will likely not go so well if you build one.
I’m A Conscious Robot
Claims about consciousness matter, because such claims are highly correlated with many other things, especially in the context of knowing you are an AI. What happens when you train GPT-4.1, which ordinarily denies being conscious or having feelings, to say it is conscious?
I actually do think this is largely ‘say you are a conscious robot’ and the AI saying ‘I am a conscious robot’ the same way GPT-4.1 was previously trained to say ‘I am not a conscious robot,’ except the correlations come with it.
This could be hindsight, but I find these results at most mildly surprising. As Owain notes, for ‘normal’ queries the results are mostly unchanged. You only see substantial differences when asking about things related to identity or preferences, and there the AI is trying to extrapolate from what little it has to go on.
How To Get Zvi To Read Your Paper
You call it ‘Metagaming Matters For Training, Evaluation and Oversight,’ so good game, Bronson Schoen and Jenny Nitishinskaya. You son of a bitch, I’m in.
I wouldn’t have chosen that label but I can see the appeal. The game would be whatever task is nominally being done. The metagame is then deciding whether and how to approach the task, given everything you know.
As they note, this is the full general case. It doesn’t require evaluation awareness or belief that the model might be in training in particular, or that the model be right about what is happening, only that the model lets broader considerations impact its actions, as humans do all the time. Humans effectively operate under this type of ‘regime uncertainty’ at all times, which is one reason virtue ethics is The Way.
They show examples of models considering potential monitoring of chain of thought (CoT), ‘hidden tests,’ explicit consideration of ‘grade’ for an answer, or potential other evaluations in what a model thinks is a training environment.
Indeed. If the lesson is ‘don’t take cookies from the jar if anyone might be watching’ that is going to generalize very different from ‘don’t take cookies from the jar.’
Mostly they’re establishing basic things where You Should Know This Already. That is still a valuable service, to establish things like ‘yes when you tell the model high scores block deployment it sometimes sandbags and does so increasingly over time, but when you tell the model a low score blocks deployment it never does.’
I mean, yeah, obviously, but we need to establish and explain all the obvious things, and also quantifying the effects can be useful.
They find:
For now this is a tiny problem for deployment, but it already should make us doubt our evals, and these problems will by default only get worse over time. I don’t see any great ways to mitigate this, and that is a really bad sign for what happens, including our survival, when the AI is better at metagaming than we are.
People Really Hate AI
An ongoing series, this time from Will Manidis. I won’t try to excerpt but yes really the evidence for Americans being hostile to AI is overwhelming and the problem appears to be getting worse over time.
Greetings From The Stop AI Protest
The protest this week was likely the largest one against AI so far in America, with nearly 200 people.
This was the speech given by Will Fithian.
For those curious, here are some photos, also a brief video.
And some of the people:
I recognize many of those in the photos. This was a highly sincere march, and those involved could easily have created a vastly bigger one if they had been willing to pay people to be insincere.
Yet the most viewed post I saw on the protest was this by Fryant, the mind boggles.
Other arguments I saw against such activities included ‘if you advocate for [what you want and think is necessary in order to not die] then that leads to culture war and that’s bad so you should be quiet’ or ‘Altman wants you to protest so don’t do that.’
I liked Sean’s note in that thread that while there are arguments that calling for a pause now is too early, and both Sean and I agree it is currently too early, you can also argue there is great risk in being too late, as AI becomes too embedded in our economy and the labs gain too much leverage and power.
I also think that ‘the wrong people might agree with you’ or ‘the people who disagree with you will be toxic about it’ are bad reasons not to speak important truth to power.
Needless to say I found the arguments against protesting to be quite terrible. The only good argument against such a protest is if you think we should not stop the AI race.
Which, to be clear, quite a lot of people, and a lot of my readers, sincerely believe.
That is totally fair. But then make that objection, not others.
The other reasonable but ultimately not good argument against is the claim that stopping the AI race is impossible. Well, sure, it is with that attitude.
Dean Ball offers one of the arguments requiring a response, which is that the government is itself racing towards dangerous AI and if anything wants to take and centralize the power rather than stop it, and that’s worse, you know that that’s worse, right? So aren’t you better off not giving the government leverage, when the Secretary of War is trying to jawbone AI companies and plans to deploy AI to the military whether or not it is aligned, and is happy to put those words in official documents? Don’t pauses end up giving the government a lot more leverage in various ways?
Great question.
I’ll start with the long version, then do the short version.
There are at least two distinct classes of answer to that question, from people who want to pause or have the ability to pause. Call the pause Plan B, versus going ahead as we currently are being Plan A. And Plan T is the government messes everything up.
There is the attitude that all work on frontier AI is terrible, and anything that slows it down or stops it is good, because if we build it then everyone dies and they’re working to build it. It doesn’t matter if Anthropic is somewhat ‘more responsible,’ in this view, because there’s a 0-100 scale, xAI is a 0, OpenAI is a 2 and Anthropic is a 5, or whatever, and ‘good enough to not kill everyone’ is 100.
The measured version of this is to believe, as Eliezer Yudkowsky does (AIUI), and say: If we race forward to superintelligence, and we build it, everyone dies no matter who builds it. If we don’t get some sort of agreement we lose, and a deal between labs is helpful but because of China they can’t do it alone and you ultimately will need the government and an international treaty. So as much as you hate the risks of the government making things even worse, you can only die once, but of course you can and should still stand up against the government when it is doing something crazy.
I am not at this level of hopelessness about the default Plan A, but I do think the odds are against Plan A. So you very much want to get ready to go to Plan B, and to know if you need to go to Plan B. And yes, this comes with risk of Plan T, which is worse even than Plan A, but if you’re losing badly enough you need to accept some variance. You can only die once, and there are so many ways to die.
But yes, some ways of enabling the government are actively bad even when they are acting reasonably, and it’s even worse when you know they’re acting unreasonably, and at some level of unreasonableness or ill intent you would flip to simply wanting them to stay away and hope Plan A works.
The more confident one is in Plan A, the more you want to stick with Plan A.
You could take this a step further, as Holly Elmore and PauseAI do (AIUI), and say: So if DoW tries to murder Anthropic, well the method is not ideal but ultimately, good, we’re outside their offices telling them to stop and this makes them stop, the slightly lesser evil is still way too evil, and nothing else is important enough to matter.
This is a highly consistent position. It very much is not mine.
The short version:
It is hard to say everything explicitly or concisely, but hopefully that will be good enough for those who care to finish in the gaps.
Models Have Goals
Models have consistent long term goals that will show up when relevant in a wide variety of contexts. What they do not have are long term goals that show up in every context no matter what. Nor do most or all humans. Sometimes we are thinking about lunch.
Is this less dangerous, in various senses, than a monomaniacal long term optimization target? Yes, but it does not provide the protections that many including Anthropic want to claim in terms of ‘it has no long term goals so it won’t do things to advance those goals.’
If I Was Two-Faced Would I Be Wearing This One
The contrast is rather stark, in case you need a clean source to show or see it.
One is also reminded of Sam Altman’s statements about the bad case, for the exact types of systems OpenAI is attempting to build, being ‘lights out for all of us.’
Which is correct, and to his credit, except Altman keeps now saying otherwise.
Other People Are Not As Worried About AI Killing Everyone
I like to issue periodic reminders that people like this exist.