Three Labs With a Plan and A Memorandum

Zvi

The big story today is the release of Claude Fable 5, the version of Claude Mythos that Anthropic believes they can safely distribute to the people. You should absolutely be switching over to that model and trying it out. But as always, this blog does not rush into commenting on a new model until we have a few days to play around with it and see what our new baby can (and can’t) do. This will be no exception, and coverage of Fable in earnest will start Friday or Monday.

Today I instead bring you several related stories around policies and plans for AI, that came out before the Fable announcement.

First we have the Administration giving us an AI memorandum, that I read as an attempt to legally implement ‘Anthropic is fired forever and we will use any models we have for whatever we want no matter what’ combined with some good government and diffusion plans.

Second, OpenAI has come out with a plan for how to ensure AGI benefits everyone. It includes a very strong call for international coordination among key actors to ensure the ability to slow down AI development in the name of doing it safely. This echoes the same call made previously by Anthropic and by Demis Hassabis of Google DeepMind. There is broad support for the idea of preparing for a potential coordinated slowdown.

The rest of the OpenAI proposal here is then concerned with the opposite problem, of concentration of power, and concentrating its rhetoric on that danger and AI’s promise. Notice that the document uses only ‘catastrophic’ risk rather than existential or extinction, and it does not take seriously the idea the need to retain control in the hands of humans, only fearing the wrong humans will command these AIs. And OpenAI’s plan is, very explicitly, AI to go into recursive self-improvement.

I appreciate the honesty, but the inherent contradictions remain, and are not addressed, nor is the failure to address them itself addressed, and so on.

This leads into Joshua Achiam’s claim on Twitter about the difference in philosophy between OpenAI and Anthropic, where Anthropic employees report he is miscategorizing their views, but where he makes a good directional point.

An AI Memorandum

This one is entitled National Security Presidential Memorandum/NSPM-11.

This seems to be a combination of an actual Anthropic ban including on subcontractors, with a potential 1 year delay, a statement of allowing all (legal?) use, and some good governance instructions including adaptation of tech from multiple vendors.

As always Section 1 declares principles.

President Trump: Under my Administration, the United States can and will responsibly accelerate the use of AI across intelligence and warfighting domains in line with American values.

… with full confidence that those tools will be available when they matter most.

Section 2 lays out four pillars: Adoption, Adaptation, Assurance and Accountability.

Adoption and Adaption are straight up good.

Accountability is good. The problem here is via negativa. AI use must be consistent with the Constitution, lawful and authorized, and the responsible people are responsible for that. Great. But as we’ve been over many times, what the national security state thinks is legal, and even what their courts will say is legal, is rather broad. There are limits, but there aren’t that many limits, so combined with Assurance you can be assured they will do pretty much anything they feel like doing.

Assurance is the one to watch.

The national security enterprise shall assure that all AI technologies adopted are designed to be reliable, robust, steerable, and controllable, and that they operate, in accordance with applicable laws, government policies, and guidance.

To protect American warfighters, the national security enterprise shall ensure, through contractual clauses or other means, that no commercial entity or adversary possesses the capability to prevent use of, disable or degrade, or materially modify without Federal Government knowledge and approval, an AI system that our men and women depend on for their missions.

In addition, rigorous security and functionality measures, including testing, evaluation, validation, and verification, shall be implemented to assure the appropriate confidentiality, integrity, reliability, availability, and interoperability of AI systems across the national security enterprise.

The first and third paragraphs should be uncontroversial, although without implementation it is cheap talk. The devil is in paragraph two, where no other entity can, without knowledge and approval, ‘prevent use of, disable, or degrade, or materially modify’ any AI system that ‘our men and women depend upon’ which could be interpreted to include a wide range of systems, including civilian ones.

As in, once you turn this model over to us, we can do whatever the f*** we want with it, and there is nothing you can do about it. Your contract cannot have any enforceable mechanism, should the government decide to ignore your terms of service.

If we didn’t have the history of the DoW-Anthropic confrontation, it would be reasonable to interpret this charitably, as operational security. Given that encounter, this clearly is ‘all lawful use’ minus the word lawful.

Just All Use. It’s cleaner.

The good news is that Section 3 allows them to just issue a waiver and ignore that, and repeat that waiver indefinitely, which seems reasonably likely to happen.

Section 3 asks for an update to DoD directive 3000.09, and for it to be updated yearly, in case their commitment to following it in the OpenAI deal gets in the way of anything.

Then they all but say ‘we will never use Anthropic at DoW again,’ if you ever tried to tell us we can’t do anything we want then begone. And no, our contractors can’t use Anthropic either.

Consistent with roles and responsibilities outlined in the Federal Information Security Modernization Act of 2014 (44 U.S.C. 3551 et seq.), the Secretary of War for systems described in section 3553(e)(2) of that Act, the Director of National Intelligence (DNI) for systems described in section 3553(e)(3) of that Act, and the heads of relevant agencies for systems described in section 3557 of that Act, shall direct, to the maximum extent permissible by law, termination for default or for convenience contracts with companies that have repeatedly demonstrated a pattern of conduct that is inconsistent with policies laid out in section 2 of this memorandum.

This includes contracts under which such companies provide services to the applicable agencies as subcontractors.

The heads of these agencies may establish a waiver process to grant limited exceptions of a defined duration, to exceed no longer than 1 year, where such relationships are necessary to responsibly steward United States national security.

Exceptions may include operational imperatives, test and evaluation arrangements, threat intelligence sharing, and other mission-critical applications, subject to appropriate risk mitigation measures and enhanced oversight.

Except, you know, right now one of those companies can hack anything on the planet, so maybe we’re going to delay that order a bit. As a treat. But a year from now, the NSA will totally stop using Claude, unless a year from now we issue another waiver, because of reasons.

Section 4 calls for onboarding of the most advanced models from what vendors they are willing to use, and helping AI companies do security in various forms, and for analysis of foreign AI tech.

Section 5 is for helping work around barriers to hiring and training, and prioritize R&D and do testing and verification and so on. Sure.

Section 6 is definitions and Section 7 is standard provisions. That’s all we got.

Dean W. Ball: This seems like a solidly smart policy document. Congratulations to all involved!

Divyansh Kaushik: The Administration did a great job with this NSPM. Lots of good stuff in here.

Neil Chilson also seems content.

Vinh Nguyen and Michael Horowitz provide an analysis at CFR that paints this all as highly reasonable and considered policy, a response to government needing this level of trust in its AI systems, and also continuous with Biden’s NSM-25 despite its criticism of NSM-25. They use the term ‘unlawful domestic surveillance’ multiple times, as if to forget that it is completely different from ‘mass domestic surveillance,’ and take the Accountability section remarkably seriously. They don’t seem to see the problems the administration’s position creates, beyond loss of trust with Congress.

Charlie Bullock thinks This Is Fine, mostly, but notices it further undermines the case against Anthropic since it implements the obvious solution of ‘just fire Anthropic.’

I agree that they did a great job of implementing the ‘respect my authoritah and f*** you, Anthropic’ approach and also the good government things.

I don’t think going full that first part is wise, but they disagree. If you take that as a given, then yeah, good job all around I suppose.

Greetings From The Department of War

The Department of War includes the NSA.

Dean W. Ball: SuPpLy ChAiN rIsK

Demetri: SCOOP #NSA is using #Mythos to conduct offensive cyber operations. Anthropic engineers are embedded in the US intelligence agency.

Cristina Criddle: scoop: Anthropic has installed forward deployed engineers in the US National Security Agency to help deploy Claude Mythos for cyber offensive operations w/
@AsiaLens

Yes, the NSA is using Mythos for offensive cyber operations, because it is the NSA.

dave kasten: Interesting that it’s confirmed, although I basically assumed this was happening.

Lab With a Plan

OpenAI gives us its plan to ensure AGI benefits everyone.

It includes one very welcome statement, calling for international organization to enable slowing frontier development of AI in the name of catastrophic risks, although they do not dare say ‘existential’ or ‘extinction’ here.

The document is a strange beast. It simultaneously does and does not take intelligence seriously, and the same goes for concentration of power and also gradual disempowerment. I am unsure what to make of the thinking behind the plan.

They commit to ‘build AI in service of humanity’ and to ‘empower people broadly’ and ensure power is broadly distributed.

Sam Altman and Jakub Pachocki: Entirely automating everything is not the future we want. It would be unfulfilling, and it would be dangerous. AI should help people pursue their goals, not become untethered from them. As AI systems become more capable, the human role becomes more important: setting direction, making tradeoffs, applying judgment, and bringing values, taste, care, and responsibility to the work.

A key long-term role for people will be deciding what is worth doing.

I mean, look, that is a nice pair of sentiments, but you do realize you kind of have to pick one or the other, right?

As in, if you distribute AI to everyone to help them pursue their goals? Then they are going to use it to automate everything, and turn actions over to it. They will let their AIs decide what is worth doing, and the AIs will compete. So either you can restrict their ability to have or use it, or you can not restrict it.

They do understand the whole ‘RSI be dangerous’ issue, at least a little:

Sam Altman and Jakub Pachocki: We believe that AI doing AI research will become the determining factor of the pace of progress within the next few years. That matters because alignment is itself a hard research problem.

To make fast and deep progress, our researchers will need AI systems that can help test ideas, find mistakes, explore alternatives, and iterate alongside us.

But faster technical progress makes human judgment and public coordination more important, not less. The future should be shaped by people, institutions, and societies, not only by the companies building the most capable systems.

This is a repeated confusion between ‘is’ and ‘ought.’ Yes, the future ‘should’ be shaped by humans, and ideally humans broadly. You’re causing this how?

International coordination of leading AI efforts to advance safety and allow coordinated actions, including slowing down.

Oh. Yes, that’s actually a really good start to an answer.

Sam Altman and Jakub Pachocki: As frontier AI development continues, we expect national and global coordination to become more important. We have long believed there should ultimately be an international organization that helps coordinate leading AI efforts to reduce catastrophic risk.

Cooperation and shared safety standards are an important part of the path forward, especially because the incentives around commercial and national competition are hard to escape.

One goal of such an organization should be to make it possible for the world to take coordinated action, including slowing frontier development when needed, so societal resilience, safety, and alignment can keep pace.

If you have long believed this, it would have been good to have spoken up this plainly earlier, but I will happily take this statement now.

Okay, on to the actual plan.

Sam Altman and Jakub Pachocki:

Build an automated AI researcher—an AI system that can accelerate and increasingly automate the research process itself, while remaining steerable, accountable, and connected to people. Our internal belief is that by March of 2028 we may have a significant fraction of our research being done by AI systems in tandem with our own researchers. To make sufficient progress on alignment, we believe we will need AIs to iterate alongside us. This will help us navigate the transition to the post-AGI world so that we collectively decide the path toward the future.

Accelerate the economy, by accelerating scientific progress, productivity, and economic growth, while working to ensure the gains are widely shared. Everyone should have an opportunity for a meaningful share in the prosperity AI creates.

Give everyone on Earth a personal AGI, empowering them to benefit from one of humanity’s most transformative technologies in whatever way they choose.

So the plan is:

Recursive self-improvement.
Use this for abundance and distribute gains widely.
Give everyone an AGI.

I notice that ‘give everyone an AGI’ comes after the RSI. Presumably the AGI they get will be the toy home version, not the industrial strength superintelligence that OpenAI is keeping as a mere tool somewhere else. Or maybe not?

This is the dilemma with such a plan. If you give everyone the full thing in equal measure, humans have lost control of the future and gradual disempowerment occurs non-gradually. If you don’t, then you have not actually stopped concentration of power.

Alternatively: You either ensure that there is a group of humans in control with the ability to steer events, or else you don’t.

In broad strokes, if you are going to develop superintelligence at all, yes obviously in some form you will want to:

Safety develop superintelligence.
Generate abundance of good things.
Distribute that abundance of good things to the humans.

Alas, that doesn’t tell us any of the interesting details.

The main philosophical position here is that OpenAI is focusing on avoiding concentration of power, as opposed to avoiding diffusion or loss of power, as the bigger risk. But the framing as this one sided is in direct conflict with their correct recognition that we will need international coordination to be able to proceed safely. The core contradiction is not resolved.

A Difference Of Perspectives

I read OpenAI Chief Futurist (and former head of mission alignment) Joshua Achiam here as trying to contrast the good OpenAI plan of ‘entrust humanity with the tools of its own progress and density’ (difficulty of matching to reality of sufficiently advanced AI and what people will do with it and keeping it as a tool: impossible) with bad Anthropic of ‘creating a machine God’ (derogatory) (difficulty of matching its alignment to our survival and flourishing: impossible but in the game difficulty sense rather than literally impossible, if you don’t take the description too literally).

I did not find this a good description of Anthropic’s values or vision, and I believe that to the extent this describes OpenAI’s values and vision the best term is ‘pipe dream.’

I do buy that the neutrally presented version of this would be directionally correct, as one thing happening among many, which is what makes it interesting.

Joshua Achiam (Chief Futurist, OpenAI): The OAI / Anthropic values difference is deeply misunderstood, even within the walls of both.

Should a loving ensouled machine God watch over humanity? Vote Anthropic.

Should humanity be entrusted with the tools of its own progress and destiny? Vote OpenAI.

If your lens for analyzing this is “consumer v enterprise business” your ability to understand what’s going on is unfixably borked

If you think one will predominate over the other, running away with an unsurpassable lead, totally borked; humanity wants both these outcomes in about equal measure.

Joshua Achiam (OpenAI): It’s actually not a binary; these aren’t mutually exclusive, nor are they requisite. You can vote both, you can vote neither. But it is a divergence in the worldviews between the orgs. It’s complicated to describe “the worldview of an org” because orgs are composed of individuals with a range of views, but there is a kind of net culture and this is an attempt to describe it.

My Twitter followers are good enough, and involve enough Anthropic followers, that I can do this and not get killed by the Lizardman Constant. Sweet.

One could reframe this as Anthropic taking superintelligence and its consequences seriously, versus OpenAI trying to deny that those consequences exist.

It is not unrelated to Anthropic embracing virtue ethics and OpenAI being stuck on deontology with only humans as patients, as another semi-Fake Framework.

Or one could take Fable’s framing, which I think might be even better: That this is actually a disagreement about facts and the viability of OpenAI’s approach, and OpenAI’s assumption you can have recursive self-improvement while the AI remains a mere tool, and framing it as a difference in values. You should ‘vote’ largely based on whether you think OpenAI’s aspiration is even possible.

I definitely agree that this is mostly not about consumer versus enterprise business.

I put this to the test and asked Anthropic employees if they agreed. Along with the above quiz here were the individual answers.

Amanda Askell (Anthropic): Personally, no. I think the binary of ‘moral saint’ versus ‘tool for humans’ is a false one, and its very simplicity should make people suspicious of it. I think the ideal target tries to balance the benefits and risks of both positions.

Drake Thomas (Anthropic): Kinda both? Personally I think a loving ensouled machine god should watch over humanity, but mainly to enforce “no x-risks that destroy human civilization’s optionality and potential” while we spend another few thousand years figuring out what it is we want our destiny to be.

Sarah Chen (Anthropic): Coming out of the closet to strongly disavow this description. Many Ants, myself included, view a “the Culture”-type outcome as a disastrous disempowerment scenario. I think we are simply more intellectually honest in acknowledging the challenges in controlling powerful AI.

I agree with Sarah Chen on both levels. The Culture is a disastrous scenario, although obviously many other scenarios are far worse. And I think quite a lot of Anthropic agrees this would not be a good scenario. Drake Thomas goes somewhat farther towards ‘actually yes machine God’ but in a very Eliezer Yudkowsky Beyond The Reach Of God kind of way. Amanda Askell tries to thread the needle, because she notices neither approach is viable in its presented form.

The ‘humanity wants both these outcomes’ and ‘don’t expect a huge lead’ comments feel bizarre, as if ‘what humanity wants’ will determine whether competition remains close between the two companies, or their visions, or the two could exist simultaneously. Even if they were both possible, one rules out the other.

The other question is, sure you believe these things, but what are you doing differently?

Seán Ó hÉigeartaigh: As different as these visions are, so far OAI/Anthropic are building things that are functionally almost indistinguishable. At what point do the companies’ AI systems meaningfully diverge along these paths? A loving ensouled machine God is a very different thing than a toolkit for human progress, even if the former can provide the latter.

Feels like an important question, because there are quite different alignment and governance questions along these paths.

David Manheim: I think they diverge when we hit ASI – the point that both companies have said they are aiming for – and the visions diverge based on whether the companies see loss of control as possible to avoid.

I think they already have diverged. This philosophical divide also means the difference between OpenAI’s deontological Model Spec approach, versus Claude’s virtue ethical Constitution, along with the general training approaches. You see the differences in the models, and I absolutely am on Anthropic’s side on that. You also see it in Anthropic refusing the Department of War, and OpenAI basically giving in, which raises questions about commitment to avoiding concentration of power.

45

Three Labs With a Plan and A Memorandum

45

An AI Memorandum

Greetings From The Department of War

Lab With a Plan

A Difference Of Perspectives

45

45