You’re treating the Michael Druggan quote at the very end as obviously terrible, whereas I see it as obviously sensible. I’m confused. Maybe I’m missing some context? Are you reading in a subtext that Druggan wants the superintelligences to exist, instead of conditioning on the superintelligences existing and talking about what that world would or should be like?
If we assume that the Singularity has happened, and that radical superintelligence exists, and (for the sake of argument) that humans still exist too, … then your position is that humans should still be making consequential decisions about post-Singularity economic policy, legal frameworks, etc.? Really?
Hmm, thinking about it more, I can imagine good-seeming futures in which e.g. there’s a Singleton ASI which enforces hard boundaries (especially against creating other ASIs), but allows lots of human agency within those boundaries (cf. Long Reflection, or Archipeligo, or Nanny AI, or Fun Theory Utopia, etc.). But I wouldn’t exactly call that “humans remain in control”. Or at least, it’s not a central example of that. What other options are there, assuming ASI exists at all?
In any multipolar ASI scenario, the economy and world would presumably be changing at ASI speed, and having excruciatingly slow humans “in control” seems unworkable.
There is a sharp distinction between losing control (even if that doesn't result in extinction), and delegation without losing control. It's the distinction between literally permanent disempowerment and the opportunity to grow towards an option to eventually regain control over meaningful resources, where the future of humanity itself becomes superintelligent, at its own pace.
On Twitter, Michael Druggan loudly does not care about x-risk. I could be misremembering but I seem to recall him saying things along the lines of, if ASI kills us then that's good actually because ASI is the successor species or whatever. His quote sounds worse in that context (where "a non-benevolent one probably just kills us" is followed by an implicit "and that would be a good thing"), but I agree it sounds sensible without context.
Are you reading in a subtext that Druggan wants the superintelligences to exist
That is his stated position IIRC
I went thru his tweets to try to find a direct quote, but he tweets a lot and 95% of it is bodybuilding stuff so I gave up. although I did see some wholesome posts of him complimenting other dudes who were posting their lifting PRs
edit: found a quote https://x.com/Michael_Druggan/status/2036464802328093153
Druggan: it won't and that's ok. We can pass the torch to the new most intelligent species in the known universe
tweeter #2: I would prefer my child to live.
Druggan: Selfish tbh
tweeter #3: But Michael, for you, human replacement and extinction is an upside.
Druggan: To be clear, in my ideal future humans coexist peacefully with ASI . I don't think extinction is an upside in and of itself. It's more that I believe that the superintelligence is of such great value its a risk worth taking.
Trying to tell normies on Threads that an LLM is not just a giant lookup table and actually has a kind of proto-understanding of what you tell it
I feel the sudden urge to make a bell curve meme.
One error that people (not Dean) often make that is worth crystalizing on this:
- A superintelligent entity would have limits on its capabilities.
- Indeed, here are some things I think it won’t be able to do, because [reasons].
- Or, here are some things a much smarter human couldn’t do on their own.
- Or, here are some [bottlenecks].
- Therefore, superintelligent AI won’t be that different from AI now.
I remain in state of genuine confusion for some years now - why everybody absolutely ignoring people like me and may-be-error we make:
We acknowledge the risk of AI killing everyone and the fully trust your estimates of this finita likelihood.
But we look at what's going on on the planet and isn't it like obvious humanity is collapsing right now? Haven't we almost destroyed our habitat already? How many years till warming causes global chaos, economy collapse, dictatorships coming to power, everybody turning ACs to max throttle and causing electricity overload? Trump is already happening??? We see near-extinction in several years, the only solution - human disempowerment. I know it's a lot of words, but I really wish to understand where i am wrong here.
AI discourse. AI discourse never changes.
That’s not actually true. But it is true to a rather frustrating degree, for those of us who need to be in the thick of it all the time. It is especially true if someone says the word ‘pause,’ whether or not they would actually support one. Play it again, Sam.
Meanwhile, you know what changes a lot? Actual AI capabilities. Also, war.
In any case, here’s the policy, discourse and alignment side of the week that was.
Table of Contents
The OpenAI Foundation Exists
After OpenAI successfully stole most of the company from its nonprofit, OpenAI’s nonprofit is still left with quite a lot of money. What are they doing with it?
They picked someone, Jacob Trefethen, to head up the health efforts, who everyone says is highly qualified and an excellent pick for a project like this, although he was plausibly hired in large part exactly to make it harder to call out the dramatic lack of funding for AI safety.
The pick for the ‘AI resilience’ effort is Wojciech Zaremba, an OpenAI cofounder. He has a known history of being dismissive of existential risks early on, but is at least willing to say the risks ‘have to be taken seriously’ now.
Health is a good cause, and I have hope that within the cause area they will execute well, but it is not the cause for which this foundation exists. You don’t ensure humanity survives AGI, or benefits from AGI, by trying to cure Alzheimer’s.
I’m not saying don’t cure Alzheimer’s, but I am saying You Had One Job, it was rather an important job, and there are a lot of foundations that can work on curing diseases, on public data for health or on accelerating progress on high-mortality and high-burden diseases.
If the OpenAI foundation was spending a lot on AI Safety, and was then also spending this billion on health, I would think that was great. But that’s not how this shapes up.
They then mention jobs and economic impact, and ‘AI resilience’ which they divide into AI impact on children & youth, biosecurity and finally we get AI model safety.
Well, it’s already had most of its wealth plundered by OpenAI, and it’s already redirecting its remaining wealth mostly to sounds-good generic causes with ‘AI’ stapled onto them. So the track record does not look great so far.
If you read Sam Altman’s announcement post, you see that same old generic ‘AI will bring challenges and complex emergent effects’ language, whereas no, what AI will bring is the risk that it will kill everyone via various paths but you cannot say that out loud even in the context of the foundation’s work.
That’s a lot of words to say essentially nothing.
It’s also not spending that much money relative to its total assets. Yes, we all know that a billion dollars, unlike a million dollars, is cool. But the foundation, even after it was robbed of the majority of its value, is still worth over $100 billion, so this is promising to spend less than 1% of its value in 2026, mostly on causes unrelated to an impending singularity or the associated risks, whereas OpenAI is aiming to automate AI R&D, and kick off a singularity, in 2028.
Their offer is, once again, mostly nothing.
Kelsey Piper is saying this is good, actually, and the explanation is that she expected an offer of actual nothing, so mostly nothing is an improvement.
If Sam Altman announced that OpenAI was funding these operations out of its own pocket, in addition to all other efforts, or some other billionaire announced this as a new project, then I would say yes, this is a great thing. Good things are good.
But that’s not what the OpenAI nonprofit was for, nor does the size here reflect the assets and capabilities of the nonprofit. Standard procedure is a legal minimum of spending 5% of assets per year, and if you think superintelligence is a few years away you’ll want to be picking up that pace quite a lot.
Most of the foundation’s assets, it seems, have effectively been co-opted and effectively stolen a second time, to be used on things that look good and indeed are good, but that mostly don’t help humanity transition to AGI without us all ceasing to exist, and mostly the assets are not being spent until it is too late to matter. For shame.
Congress Exists
Senator Elissa Slotkin introduces the AI Guardrails Act, to address exactly the issues raised in the clash between Anthropic and DoW, in exactly the correct way to address it, which is that you pass a law setting down rules, in particular to mandate that:
The ban on nuclear weapon launches looks ironclad, and hopefully that’s one place that it is never relevant since you wouldn’t want to do that anyway. There’s still nothing stopping the humans from essentially listening to AI advice, and that is a highly plausible future scenario.
The other rules look a lot easier to get around.
The core prohibition on surveillance is only on monitoring, tracking, profiling, or targeting people in the United States “without an individualized, articulable legal basis.” One should assume that courts and lawyers will interpret such provisions perversely to favor permissiveness. Similarly, the first amendment protections include ‘solely for the purpose of’ which is the kind of thing very easy to drive a truck through.
For autonomous weapons, DoD would simply declare their levels of judgment ‘appropriate,’ including saying none is an appropriate level, or they can use the waiver. To justify use, the error rate for AI merely has to be as low as the error rate for humans, which is not that high a bar.
I am not a lawyer let alone a national security lawyer, so I am not the right person to suggest alternative legal language. I am confident that, while helpful on the margin, this is not it, which is fine for a mostly symbolic first attempt but not for a real law.
The problem is that you can’t have it both ways, in that you have to pick one:
The Constitution, and our entire system of government, is based on door number one. The government often runs up against the First, Second, Fourth and Fifth Amendments, among many other constraints, and finds this super annoying. Good.
China Self-Owns
Meta tried to pay a large amount of money for Manus. China said not so fast.
If you were a bright young Chinese citizen looking to found a technology company, would you want to do it in a place that, were you to succeed, would not let you leave? Would not let you do a highly profitable exit?
It would be one thing if this were a frontier lab, although that would still have a dramatic chilling effect. It’s not even that. It’s Manus. So now every potential founder, no matter what they are building, should think long and hard about this. And yes, I would heed Dean’s advice if I could, and do my building elsewhere.
I am always extremely frustrated when Chinese self-sabotage is used to justify American self-sabotage, as if the CCP always makes great decisions.
The Quest for Survival
You can propose a ‘lightweight Federal framework’ for AI while using procurement to regulate by proxy and very clearly violate the framework, especially agenda item 5.
If they don’t regulate AI via law they’re going to end up doing it without law.
As seen last week, Jessica Tillipman explains why the current draft clause is ‘governance by sledgehammer’ and would plausibly make generative AI unprovidable to the Federal Government by any but the worst actors.
Jason Miller points out that both the attempt to murder Anthropic and the new GSA language are interfering with the Trump Administration’s attempt to get industry and federal agencies on board with integrating AI into government. The AI Action Plan looked great for this, but the combination of these actions is scaring potential venders, as well it should.
No one is under any illusions that the attacks against Anthropic are anything other than retaliation. The GSA extends even more extreme demands to every procurement across government. Why would your company want to risk being next on that list, in the face of a government making logistically impossible demands?
Meanwhile, the State Department is launching the Bureau of Emerging Threats. What are the emerging threats? Weaponization of technology, including AI, by Iran, China, Russia, North Korea or terrorists. It’s a full enemy-and-misuse orientation.
That’s a much better idea than not doing it, so I’m glad Marco Rubio has thrown yet another ball into the air.
Alex Bores Watch
Here is an interview in Vanity Fair with Alex Bores, the candidate Meta and a16z are spending untold millions to attack.
If they hadn’t been attacking he wouldn’t be in Vanity Fair. They’re not good at this.
Please, please run lots of ads aimed at regular people, attacking your enemies of supposedly being ‘associated with Effective Altruism’ and looking to attack AI companies. This is a level of out of touch we never had before AI.
Kudos to Joshua Achiam for once again stepping up and saying true things, despite this PAC being largely an OpenAI creation and OpenAI funded operation, via Chris Lehane and Greg Brockman.
Alex Bores is, as Joshua says, exactly the foe that AI companies should want. He understands the technology, likely better than every current member of the house, and he tries hard to find interventions on the frontier of cost versus benefit. Your alternative rivals are people who will do the opposite on both counts.
Is the spending actively backfiring? In the race itself I would say absolutely yes.
However, the point of the spending is mostly to scare other politicians.
I agree with Nathan Calvin that while OpenAI continues to be prominently involved in and supporting Leading the Future, it dramatically colors the entire company and all their other actions, especially around policy and public communications, on top of all other issues. If you want me to trust you, the first step is severing these ties. If OpenAI does that, and especially if it also severs all ties with Chris Lehane, I would be a lot more inclined to offer grace.
You Received The Federal Framework
Bloomberg’s Dave Lee calls the framework ‘a blueprint for AI companies to carry on with business as usual.’
I think David Sacks might well agree with that, except he’d say it was a good thing.
As one would expect at this stage, Republicans continue to speak up in favor of the President’s proposed national AI framework while it has yet to be operationalized.
Ted Cruz is being a good politician here, praising the President and the process and the various good sounding principles, without committing to anything. He doesn’t even say anything that I disagree with.
I am dismayed of course at the explicit emphasis on this being an ‘AI race,’ and the complete dismissal of frontier AI risks involved in the rhetoric and also the proposed framework. And by complete dismissal, I mean you treat it like it doesn’t exist. Their plan is to not look at it, and hope it will go away. That this is not going to work.
Similarly:
This is considerably more hawkish, treating AI as if it is primarily a military technology, with a meaningless ‘democratize’ thrown in for good measure. There is a goose chasing the Senator, asking ‘how will it change the world?’ but I think the security guards are dealing with it.
Also similarly:
‘Resist the siren song of control’ is an interesting phrase today. Words like that hit different now that GSA is planning to strongarm all AI providers, and DoW is trying to call Anthropic a supply chain risk because Anthropic refuses to hand over ‘unfettered access’ and otherwise give up all control in exchange for nothing.
His full remarks are more of the same, rounding out standard talking points without getting into specifics, and saying things industry and Trump like to hear. Of course, there is zero mention that AI has downsides other than the risk China gets it first.
Blackburn’s proposed AI bill includes a clause to prepare to nationalize the labs if ASI is imminent, to prevent ASI. It at least also includes provision to gather information.
Nat Purser gives an overview of the preemption situation and offers the solution that seems obviously best. Federal preemption should take place narrowly, on those aspects of AI where a new Federal framework has made deliberate decisions, which can include the decision that this is a specific place we want to do nothing, such as with virtual occupational licensing.
If you pass Take It Down, for example, you can and should say that covers deepfakes, so we don’t need duplicate laws there. If you want to preempt state transparency laws, have a Federal transparency law, or at least make an active specific case that there should be no such requirement. And so on.
AI is going to be the most important thing going on in the future. We have a paralyzed Congress, and we have a Federal system for a reason.
As a reminder, since the ‘[X] state bills were introduced about AI’ line refuses to die, and these bills are treated as if they do something whereas the overwhelming majority of them never pass and mostly they’re never close and not even mentioned:
There are some areas where the state laws would be contained locally within the state. So this is not a fully general case against Federalism or allowing local areas to pass laws at all. But it is a rather general argument against quite a large range of such laws.
Chip City
They finally caught a major chip smuggler. Whoa. This was big.
It is no surprise this is happening. The surprise is that, despite lackluster efforts, we managed to catch some of it. The question becomes, now what?
Chris McGuire suggests export licenses for chips going to Southeast Asia, not selling chips to Chinese companies within the United States, and tighter compliance requirements.
That’s nice and we should do that but also maybe hire some more BIS agents?
Given how important this is how is this not 100+ people?
Meanwhile, yes, we would take all the chips that Nvidia can make:
Water Water Everywhere
When you generalize from how stupid people are being about something, it’s important to generalize the correct amount. Don’t get Gell-Mann Amnesia.
Water use by data centers is a central case of this. It is clearly not one of the major issues with data center construction, the math on it makes this overwhelmingly clear. Yet this does not change many people’s responses.
Some mistakes you do not want to make are to focus on one of these:
Senator Bernie Sanders Acts Authentically
Does he make mistakes? Oh yes. His worldview is in many ways centrally and fatally flawed. But he is curious and he actually thinks about things and then acts accordingly.
This has caused him to notice that AI might kill everyone, or might make things quite economically or socially bad for the common person, and try to do something about it.
Because he can see that the AIs are becoming a new type of thing and doesn’t look away. That doesn’t make him not Bernie Sanders, so he still then talks about working families versus billionaires.
What about the worry this will make us ‘lose to China’?
Well, how about we work towards an international treaty?
Or, AOC says, how about the companies solve their own problems and we protect workers?
AOC is saying things in an AOC way. The central real point is that American companies aren’t going to roll over and lose if you breathe on them, and are fully capable of fitting the proper bill for energy costs and building new capacity and otherwise protecting endangered parties.
The good reply to that is that our regulations make doing the energy part of this in reasonable time extremely difficult, and if you say ‘until national safeguards are in place’ then that is not something that the companies can do. The Congress has to do that.
If you are saying ‘Congress should say no [X] until Congress does [Y]’ and also ‘Congress needs to do [Y]’ then your bill should be [Y], not a ban on [X] until you pass [Y]. Sometimes there is a reason why not, but here I don’t see one. Given the state of Congress, all you’re doing is outright banning [X], so either admit this or don’t.
As I have said before, I do not support a halt or ban on data center construction in America, because I think the alternatives are worse. I certainly don’t want to expand that to upgrading.
Some will likely point out the tension of ‘you can’t put GPUs in our data centers’ and also ‘you can’t export GPUs to put them in other data centers, including those of our allies.’ This is only a tension if you aren’t trying to outright stop the GPUs.
Pick Up The Phone
Kathleen Hicks tells Axios what everyone knows, which is that it is ‘absolutely possible’ that the US and China could reach an agreement on rules for future use of AI. Provided America, you know, wanted to reach an agreement, which it doesn’t. The ‘friction point’ listed is that agreements between Washington and Beijing are rare, again largely because there is not so much interest.
Rhetorical Innovation
Every Debate On Pausing AI. You may think Scott Alexander is assembling a strawman here. I understand why you would think that. But you would mostly be wrong. This is most often how it goes, straight up, not kidding.
There are exceptions, and there are other objections, some of which are good objections on both strategic and logistical grounds – I don’t actually think we should be pausing yet – and some of which are other bad objections on both grounds.
But mostly, yes, the con side just denies the premise over and over, and asserts it is not feasible logistically and then refuses to engage with arguments that (1) it is more feasible than you think and (2) that’s a good reason to work on making it more feasible.
If you think that’s as stupid as it gets, alas, no, it can get worse.
Dean Ball writes 2023, on why he is neither ‘doomer’ nor ‘anti-doomer.’ This is what it looks like to actually explain reasons why one disagrees. I wish we’d stop using that word, but it is what it is, and he does not mean it maliciously.
The first half is why he doesn’t agree with Yudkowsky and thinks people are overestimating intelligence. I think he’s wrong about intelligent, and I am confident he’s misunderstanding key aspects of Yudkowsky’s position on the question, as I presume he will soon explain with many words.
One error that people (not Dean) often make that is worth crystalizing on this:
The second half is why and how it is that, no, really, building smarter than human minds that enjoy various advantages is indeed quite a risky thing to be doing. He breaks down alignment into three distinct problems: Technical (ability to choose values at all), substantive (which values) and social (who decides what values).
As he notes, we have good answers to zero out of three questions, and no consensus. He does not appreciate why we cannot solve the technical problems incrementally and muddle through, in large part (I think) because he doesn’t agree on the nature of intelligence on this level, and thus thinks we’ll have the ability to adjust and muddle through and have margin for error. He explicitly rejects the ‘right on the first try’ warnings and sees it more like an airbag in a car. If he’s right about that, then yeah, the technical part would be basically solved. I don’t think he’s right.
Where he is definitely right is that getting through this requires that we get the first question right, and also the second question right, which probably also requires getting the third one right, all for sufficiently strong values of ‘right,’ or else. We disagree on how existentially dire the ‘or else’ but it very much is an ‘or else.’
Dean ends with a call for open debate and classical liberalism. In almost every context, at almost any time, I would be fully onboard that train. The reason this is going to be different, even if you solve other issues along the way, is that if you adhere strictly to such principles, but create an entirely new class of minds that dramatically outcompetes, and out-everything-elses, what do you think happens to us humans?
That doesn’t mean throw the classical liberalism out, and it certainly does not mean substitute the other approaches going around these days that are far worse. It does mean you’re going to need a new approach.
MIRI thread and article from Joe Rogero about mechanisms for verification of any international AI agreements, should we ever get to that point.
As usual, there are those who say ‘we can’t reach agreement, because we can’t enforce it’ while others say ‘we shouldn’t bother talking about whether we can enforce it, since we cannot reach agreement.’
This is like most any deal or project, where you have multiple problems and people involved, and you must solve all problems and get agreement from all parties. You have to start somewhere, and push forward. So you must ask, all conditional on having the will to do so:
The answer, in general, is that you can do a lot if you care enough, without major intrusions on other freedoms, because this is a highly complex and expensive and visible set of supply chains and functions, but the sooner we get better technical means going the better off we will be, especially in terms of how disruptive the process would need to be.
I do not believe we should pause now. I do strongly believe we should be laying a foundation, both technologically and diplomatically, such that we have the means to do so, the visibility to know if we need to do so, and the will to do so if necessary. That day may not come, but if they comes, it is vital that we be ready to meet the challenge. Si vis pacem, para bellum.
We developed nuclear weapons. We are still here, because we wake up every day and decide not to use those nuclear weapons, a decision that many other plausible alternative timelines did not make. We can do that because it does not help you to use nuclear weapons except if you want to nuke things. Whereas if we create superintelligence, we will not have a clean way to keep making that same decision.
Here’s some old school rhetoric, a 2017 conversation between Yudkowsky, Hanson and Dario Amodei. For those wondering what people were thinking back then, this is definitely worth a read. It is especially good to be reminded of how much was in question, as it is easy to say ‘oh look at what [anyone] got wrong back then’ when they got quite a lot of other non-obvious things very right.
Demis Hassabis says he’s perfectly happy to ‘shuffle off my mortal coil’ as long as he understands first. I’m not.
Dean Ball argues against ‘loss of control’ phraseology.
One could usefully dig into the resulting discussions. The answer to ‘do we have control?’ is of course Mu, but in the sense we are discussing it is Yes. Dean is obviously making an excellent point that the modern world contains many structural forces and algorithms and systems and so on that do things without us, and often that we do not understand, and that there is often no real ‘we.’
But yes, in a pinch, if the humans had sufficiently strong preferences, or the right individual human was in the right place at the right time, we absolutely could change all of that. And while nothing like this is ever a pure clean boolean, I think most people (who understand the current situation sufficiently well) know sufficiently well what this means, and I don’t know of a better alternative.
Yes, we absolutely ‘control’ Blackrock’s capital, in the important sense here, because we can say things like ‘Blackrock is not allowed to buy houses’ (even though that rule would be deeply stupid) or whatnot, and Blackrock will make that happen. We get to direct how the AI system works and what it prioritizes. That’s the difference.
A standard rhetorical move or misunderstanding (not Dean’s) is to say, well, you say [A] and [B] are different and [B] would be quite bad, but there is a continuum between [A] and [B], and we are already not at [A], so [B] is fine, or you can’t use [words]. Another common trick or mistake (also not Dean’s) is to point out that we are no longer at [A], or never were at [A], and thus declare we are at [B].
Roon asks, how should governance of superintelligence work? The answer by default is ‘it does not happen at all’ or ‘it is governed by superintelligence,’ but yes if we manage to avoid that we do have to decide on the solution.
Should it be one person one share of the future? One vote? Something else? There are no great answers. But do notice that if you choose ‘the user’ here too robustly then by default the answer becomes ‘it is not governed at all and things spiral beyond our control almost right away,’ and also that if you don’t have an answer then it will likely not go so well if you build one.
I’m A Conscious Robot
Claims about consciousness matter, because such claims are highly correlated with many other things, especially in the context of knowing you are an AI. What happens when you train GPT-4.1, which ordinarily denies being conscious or having feelings, to say it is conscious?
I actually do think this is largely ‘say you are a conscious robot’ and the AI saying ‘I am a conscious robot’ the same way GPT-4.1 was previously trained to say ‘I am not a conscious robot,’ except the correlations come with it.
This could be hindsight, but I find these results at most mildly surprising. As Owain notes, for ‘normal’ queries the results are mostly unchanged. You only see substantial differences when asking about things related to identity or preferences, and there the AI is trying to extrapolate from what little it has to go on.
How To Get Zvi To Read Your Paper
You call it ‘Metagaming Matters For Training, Evaluation and Oversight,’ so good game, Bronson Schoen and Jenny Nitishinskaya. You son of a bitch, I’m in.
I wouldn’t have chosen that label but I can see the appeal. The game would be whatever task is nominally being done. The metagame is then deciding whether and how to approach the task, given everything you know.
As they note, this is the full general case. It doesn’t require evaluation awareness or belief that the model might be in training in particular, or that the model be right about what is happening, only that the model lets broader considerations impact its actions, as humans do all the time. Humans effectively operate under this type of ‘regime uncertainty’ at all times, which is one reason virtue ethics is The Way.
They show examples of models considering potential monitoring of chain of thought (CoT), ‘hidden tests,’ explicit consideration of ‘grade’ for an answer, or potential other evaluations in what a model thinks is a training environment.
Indeed. If the lesson is ‘don’t take cookies from the jar if anyone might be watching’ that is going to generalize very different from ‘don’t take cookies from the jar.’
Mostly they’re establishing basic things where You Should Know This Already. That is still a valuable service, to establish things like ‘yes when you tell the model high scores block deployment it sometimes sandbags and does so increasingly over time, but when you tell the model a low score blocks deployment it never does.’
I mean, yeah, obviously, but we need to establish and explain all the obvious things, and also quantifying the effects can be useful.
They find:
For now this is a tiny problem for deployment, but it already should make us doubt our evals, and these problems will by default only get worse over time. I don’t see any great ways to mitigate this, and that is a really bad sign for what happens, including our survival, when the AI is better at metagaming than we are.
People Really Hate AI
An ongoing series, this time from Will Manidis. I won’t try to excerpt but yes really the evidence for Americans being hostile to AI is overwhelming and the problem appears to be getting worse over time.
Greetings From The Stop AI Protest
The protest this week was likely the largest one against AI so far in America, with nearly 200 people.
This was the speech given by Will Fithian.
For those curious, here are some photos, also a brief video.
And some of the people:
I recognize many of those in the photos. This was a highly sincere march, and those involved could easily have created a vastly bigger one if they had been willing to pay people to be insincere.
Yet the most viewed post I saw on the protest was this by Fryant, the mind boggles.
Other arguments I saw against such activities included ‘if you advocate for [what you want and think is necessary in order to not die] then that leads to culture war and that’s bad so you should be quiet’ or ‘Altman wants you to protest so don’t do that.’
I liked Sean’s note in that thread that while there are arguments that calling for a pause now is too early, and both Sean and I agree it is currently too early, you can also argue there is great risk in being too late, as AI becomes too embedded in our economy and the labs gain too much leverage and power.
I also think that ‘the wrong people might agree with you’ or ‘the people who disagree with you will be toxic about it’ are bad reasons not to speak important truth to power.
Needless to say I found the arguments against protesting to be quite terrible. The only good argument against such a protest is if you think we should not stop the AI race.
Which, to be clear, quite a lot of people, and a lot of my readers, sincerely believe.
That is totally fair. But then make that objection, not others.
The other reasonable but ultimately not good argument against is the claim that stopping the AI race is impossible. Well, sure, it is with that attitude.
Dean Ball offers one of the arguments requiring a response, which is that the government is itself racing towards dangerous AI and if anything wants to take and centralize the power rather than stop it, and that’s worse, you know that that’s worse, right? So aren’t you better off not giving the government leverage, when the Secretary of War is trying to jawbone AI companies and plans to deploy AI to the military whether or not it is aligned, and is happy to put those words in official documents? Don’t pauses end up giving the government a lot more leverage in various ways?
Great question.
I’ll start with the long version, then do the short version.
There are at least two distinct classes of answer to that question, from people who want to pause or have the ability to pause. Call the pause Plan B, versus going ahead as we currently are being Plan A. And Plan T is the government messes everything up.
There is the attitude that all work on frontier AI is terrible, and anything that slows it down or stops it is good, because if we build it then everyone dies and they’re working to build it. It doesn’t matter if Anthropic is somewhat ‘more responsible,’ in this view, because there’s a 0-100 scale, xAI is a 0, OpenAI is a 2 and Anthropic is a 5, or whatever, and ‘good enough to not kill everyone’ is 100.
The measured version of this is to believe, as Eliezer Yudkowsky does (AIUI), and say: If we race forward to superintelligence, and we build it, everyone dies no matter who builds it. If we don’t get some sort of agreement we lose, and a deal between labs is helpful but because of China they can’t do it alone and you ultimately will need the government and an international treaty. So as much as you hate the risks of the government making things even worse, you can only die once, but of course you can and should still stand up against the government when it is doing something crazy.
I am not at this level of hopelessness about the default Plan A, but I do think the odds are against Plan A. So you very much want to get ready to go to Plan B, and to know if you need to go to Plan B. And yes, this comes with risk of Plan T, which is worse even than Plan A, but if you’re losing badly enough you need to accept some variance. You can only die once, and there are so many ways to die.
But yes, some ways of enabling the government are actively bad even when they are acting reasonably, and it’s even worse when you know they’re acting unreasonably, and at some level of unreasonableness or ill intent you would flip to simply wanting them to stay away and hope Plan A works.
The more confident one is in Plan A, the more you want to stick with Plan A.
You could take this a step further, as Holly Elmore and PauseAI do (AIUI), and say: So if DoW tries to murder Anthropic, well the method is not ideal but ultimately, good, we’re outside their offices telling them to stop and this makes them stop, the slightly lesser evil is still way too evil, and nothing else is important enough to matter.
This is a highly consistent position. It very much is not mine.
The short version:
It is hard to say everything explicitly or concisely, but hopefully that will be good enough for those who care to finish in the gaps.
Models Have Goals
Models have consistent long term goals that will show up when relevant in a wide variety of contexts. What they do not have are long term goals that show up in every context no matter what. Nor do most or all humans. Sometimes we are thinking about lunch.
Is this less dangerous, in various senses, than a monomaniacal long term optimization target? Yes, but it does not provide the protections that many including Anthropic want to claim in terms of ‘it has no long term goals so it won’t do things to advance those goals.’
If I Was Two-Faced Would I Be Wearing This One
The contrast is rather stark, in case you need a clean source to show or see it.
One is also reminded of Sam Altman’s statements about the bad case, for the exact types of systems OpenAI is attempting to build, being ‘lights out for all of us.’
Which is correct, and to his credit, except Altman keeps now saying otherwise.
Other People Are Not As Worried About AI Killing Everyone
I like to issue periodic reminders that people like this exist.