The Problem with Defining an "AGI Ban" by Outcome (a lawyer's take).

[-]TsviBT1mo2216

Thanks for writing this!! It would be great for someone to fund some law experts to work with AGI alignment thinkers to work out a viable AGI ban.

[-]Katalina Hernandez1mo80

I appreciate this a lot, thank you! I was somewhat worried that you'd think I didn't understand your original post (the one I quoted). I actually agree with the ideas you put forward, which inspired a lot of the thoughs behind this post :).

[-]kave1moModerator Comment12-2

I think this post doesn't violate the letter of the Policy for LLM Writing on LessWrong. It's clearly a topic you've thought a bunch about, and so I imagine you've put in a lot of time per word, even if a lot of it appears to be verbatim LLM prose.

That said, it feels somewhat ironic that this post, which is partially about the need for precise definition, has a bunch of its phrasing chosen (as far as I can tell) by an LLM. And that makes it hard to trust your references to your experience or knowledge, cos I don't know if you wrote them!

If you write "As a lawyer, I can’t overstate how badly this type of definition fails to accomplish its purpose," you probably mean it a decent amount. You might notice the quiet "that's more hyperbolic than I mean to be" as you type it out, and then rewrite it. If you read something written for you, you're more likely, in my experience, to think "yeah, that sounds about right".

That said, I appreciate posts being written about this topic! I appreciate that you're trying to explain the gears of the models you have as a lawyer, rather than solely appealing to authority. But I would be more willing to engage with the substance of the post if I felt that you had written it, or at least it didn't have a bunch of an LLM's rhetorical flourishes.

[-]Katalina Hernandez1mo*369

Thank you Kave (also Elisabeth and Ben).

I've asked for a clarification on Lesswrong 's LLM policy usage via DM.

That said, for readers:

The TLDR was summarised using an LLM, at the end, because I wanted to provide a quick summary for people to skim.
The ideas in this post are mine. Hyperbolic talk "as a lawyer" or what you mentioned "these are merely illustrative examples" are, sadly, mine. It's not the first time I've been told "this sounds like ChatGPT ".
At some point, someone wrote in one of my posts that I should avoid being too verbose if I want LessWrong posts to actually be read. Since I care about this issue so much, I did ask an LLM to trim and edit what I wrote to make it more digestible for a non-legal audience. Hence why it may have triggered the LLM alert (Ben).
As for my credentials, feel free to check out my LinkedIn. I've also linked my Substack to my profile.

Notice how I just said "hence"... I talk like this 🥲. That's precisely why I wanted to edit the post to make it better for LessWrong, which apparently backfired.

Please let me know if anything contravenes the site's policy. I'll also keep this feedback in mind. I've updated towards "posting as is" even if I'm nervous that people will think it sounds weird... Because LLM editing doesn't help.

[-]Katalina Hernandez1mo80

(I also posted this on my Substack, and it doesn't have the TLDR. Because that was added here, for anyone who may want to skim-only).

[-]kave1mo50

I also use some LLM-y phrases or punctuation sometimes! It's a bit disturbing when it happens, but that's life. I still remember the first time I wrote a little pastiche for a group chat and someone asked if ChatGPT wrote it ... alas!

I'd like to clarify why I left my comment.

This post is pretty highly upvoted. In fact, it's the second most upvoted post of the week it was published. That makes it both very prominent, and somewhat norm-establishing for LessWrong.

That makes it particularly important to, like Habryka said, clarify an edge case of our LLM writing policy. I wouldn't be surprised if this post gets referenced by someone whose content I reject, or return to draft. I want to be able to say to the person whose content I reject, "Yep, your post isn't that much less-edited than Katalina's, but Katalina's was explicitly on the edge, and I said so at the time".

Separately, I wanted to publically pushback against LLM writing on LessWrong. Because this post is so upvoted, I think it risks normalising this level of LLM writing. I think it would be quite bad for LessWrong if people started to post a bunch of LLM writing to it^[1]; it's epistemically weak in some ways I mentioned in my previous comment (like kind of making up people's experience)^[2].

Thanks for all your work on law and AI. I know this is the second time I've moderated you recently, and I appreciate that you keep engaging with LessWrong! I think the legal lens is valuable and underprovided here, so thanks for that. I would like to engage with your arguments more (I think I might have some substantive disagreements with parts of this post), but this isn't the post I'll do it on.

P.S. "As a lawyer" was not the LLM-y part of that sentence. I can imagine lots of these are just normal writing, but the density seemed quite high for no LLM involvement.

^{^}
At least, this month. Maybe at some point soon LLM writing will become valuable enough that we should use it a lot.
^{^}
I also find LLM writing quality to be weak, but I am more willing to accept we should have bad writing than bad thinking on LessWrong.

[-]Katalina Hernandez1mo79

Thank you for your feedback.

As I said before - I think it's a bit unfair to call it "LLM writing" when it was only LLM edited rather than entirely generated. And why I sought clarity on the actual policy (if there is one) is because, if "LLM writing" should be scrutinised and put in the spotlight with moderation comments, it'd be helpful to know what counts as LLM writing vs LLM-assisted or edited.

The wording you've used seems to accuse me of actually generating this post with AI rather than using it to edit the ideas (see Buck's recent comment about how he wrote Christian homeschoolers in the year 3000). Would that be LLM writing?

Others, thank you for your feedback. I honestly care more about the definitional work than about the writing sounding or looking better. So, I'll avoid LLM editing in the future: I'd rather not risk readers not focusing on what matters, because of it.

[-]kave1mo52

Yes, the moderator comment part is a good question, I nearly explicitly mentioned that.

I wanted to make it clear I was clarifying an edge case, and setting some precedent. I also wanted to use a bit of "mod voice" to say that LessWrong is generally not a place where it's OK to post heavily-LLM-produced writing. I think those are appropriate uses of the moderator comment, but I'm substantially uncertain.

Regarding policies: on LessWrong, moderation is mostly done reactively by moderators, who intervene when they think something is going wrong. Mostly, we don't appeal to an explicit policy, but try and justify our reasoning for the decision. Policies clarified upfront are the exception rather than the rule; the LLM writing policy was largely (I'm tempted to say primarily?) written for making it easy to handle particularly egregious cases, like users basically posting the output of a ChatGPT session, which, IIRC, was happening at a noticeable frequency when the policy was written.

It takes more time and effort to moderate any given decision in a reactive way, but it saves a lot of time up front. It also think it makes it easier for people to argue with our decisions, because they can dispute them in the specific case, rather than trying to overturn a whole explicit policy. Of course, there are probably also costs borne from inconsistency.

I didn't like the writing in Buck's post, but I didn't explicitly notice it was AI. I'm treating the fact that I didn't notice it as a bellwether for its acceptability; Buck, I think, exerted a more acceptable level of control over the final prose. Another factor is the level of upvoting. Your post was substantially more upvoted (though the gap is narrowing).

If I were to rewrite the LLM policy, I think I would be more precise about what people must do with the "1 minute per 50 words". I'm tempted to ask for that time to be spent copy-editing the output, not thinking upfront or guiding the LLM. I think that Buck's post would be in violation of that rule, and I'm not confident whether that would be the right outcome.

[-]Katalina Hernandez1mo91

I actually think that this re-write of the policy would be beneficial. It may not be the default opinion but, for me, I find it better to have a reference document which is well-specified. It also promotes transparency of decision -making, rather than risking moderation looking very subjective or "vibes-based".

As I mentioned in the DM: There's probably an unfair disadvantage on policy or legal writing being "more similar" to how an LLM sounds. Naturally, once edited using an LLM, it will likely be even more LLM-sounding than writing about philosophy or fiction. Maybe that's just skill issue 🤣 but that's why I vote for "yes" on you adding that change. Again, I will keep this feedback very present in the future (thank you for encouraging me to think more naturally and less over edited. Tbh, I need to untrain myself from "no mistakes allowed" legal mindset ☺️).

Fun ending remark: I was in a CAIDP meeting recently where we were advised to use a bunch of emojis for policy social media posts. And a bullet pointed structure... But when I've done it, people said it makes it look AI generated...

In the end, exchanges like these are helping me understand what gets through to people and what doesn't. So, thank you!

[-]habryka1mo20

Nothing you did went against site policy! Kave was just giving feedback as a moderator and clarifying an edge-case.

[-]Katalina Hernandez1mo81

Thank you! I just wanted to be very sure. And I appreciate the feedback too. I'll keep posting and I'll do better next time.

[-]Elizabeth1mo*1110

what are you noticing that smells like LLM? I only skimmed, but I didn't see anything that tripped my radar, and lawyer talk can sound a lot like LLM talk.

[-]Drake Thomas1mo60

Things that tripped my detector (which was set off before reading kave's comment):

Excessive use of bolded passages throughout.
Bulleted lists, especially with bolded headings in each bullet.
Very frequent use of short parenthetical asides where it doesn’t make a lot of sense to use a parenthetical instead of a comma or a new sentence. Eg the parenthetical "(compute caps)" reads more naturally as "[comma] like compute caps"; the parenthetical "(actual prison time)" is unnecessary; the phrase "its final outcome (extinction)" should just be "extinction".
- Obviously humans write unnecessary or strained clauses all the time, but this particular sort of failure mode is like 10x more common in LLMs.
- I would wildly conjecture that this behavior is in part an artifact of not having a backspace key, and when the LLMs write something that's underspecified or overstated, the only option they have is to modify in a parenthetical rather than rewrite the previous sentence.
Rule of three: “the X, Y, or Z”. “Sentence A. Sentence B. And [crucially / even / most importantly], sentence C." Obviously not a dead giveaway in one usage, but LLMs do this at a rate at least twice the human baseline, and the bits add up.
I’m not sure I can distill a nice rule here, but there’s a certain sort of punchy language that is a strong tell for me, where it’s like every paragraph is trying to have its own cute rhetorical flourish of the sort a human writer would restrain themselves to doing once at the end. It shows up especially often in the format "[short, punchy phase]: [short sentence]". Examples in this post:
- "The principle is clear: regulate by measurable inputs and capabilities, not by catastrophic outcomes."
- "We don't have this luxury: we cannot afford an AGI ban that is "80% avoided.""

[-]TsviBT1mo123

I just want to say that all the things you listed (except maybe the last one, not sure) are things that I use routinely--as well as other LLM-associated things, such as liberal em-dashes--and I endorse my doing so. I mean it's fair to use them as yellow flags indicated LLM writing, but if we delete the words like "excessive" from your descriptions these behaviors don't seem inherently bad. I do think LLM writing is bad, both because

the actual words are bad (e.g. frequent use of vague words which, like a stable-diffusion image, kinda make sense if your eyes are glazed over but are meaningless slop if you think about it more sharply), and also because
LLM writing where any words that are apparently-meaning-bearing were principally added by the LLM will fail to be testimony, which destroys most of their value as communication: https://www.lesswrong.com/posts/KXujJjnmP85u8eM6B/policy-for-llm-writing-on-lesswrong?commentId=MDtbuQZcaXoD7r4GA

[-]Drake Thomas1mo60

Yeah, I'm trying to distill some fuzzy intuitions that I don't have a perfectly legible version of and I do think it's possible for humans to write text that has these attributes naturally. I am pretty confident that I will have a good AUROC at classifying text written by humans from LLM-generated content even when the humans match many of the characteristics here; nothing in the last 10 comments you've written trips my AI detector at all.

(I also use bulleted lists, parentheticals, and em-dashes a lot and think they're often part of good writing – the "excessive" is somewhat load-bearing here.)

[-]TsviBT1mo52

Mhm. Fair enough. I mean I believe you about having good AUROC. I think I would too, except that I don't really care whether someone used LLMs at all or even substantially; rather the thing I'd both care about and also expect to have fairly good discernment ability for, is "Is the contentful stuff coming from a human mind?". E.g. the bullet point

Nuclear treaties succeeded by banning concrete precursors (zero-yield tests, 8kg plutonium, 25kg HEU, 500kg/300km delivery systems), not by banning “extinction-risk weapons.” AGI bans need analogous thresholds: capabilities like autonomous replication, scalable resource acquisition, or systematic deception (these are merely illustrative examples, not formal proposals).

sounds pretty clearly like human content, and it's useful content. So at that point I don't care much whether some glue sentences or phrasing is LLMed. I think both my own writing and other people's writing, before LLMs were a thing, already frequently contained some admixture of lazy wording and phrasing, which amounts to basically being LLM slop. I think that's fine, writing everything strictly using only living language would be a lot of work. And you just skim those parts / don't think about them too hard, but they're there if you needed more signposting or something. (Not presuming you necessarily disagree with any of this, just riffing.)

[-]kave1mo97

One issue for me: I don't want to spend that much time reading text where most of the content didn't come from a human mind. If someone used a bunch of LLM, that makes the contentful stuff less likely to be meaningful. So I want to make use of quick heuristics to triage

[-]Drake Thomas1mo30

Yeah, I basically agree with you here - I'm very happy to read LLM-written content, if I know that it has substantive thought put into it and is efficiently communicating useful ideas. Unfortunately right now one of my easiest detectors for identifying which things might have substantial thought put into them is "does this set off my LLM writing heuristics", because most LLM-written content in 2025 has very low useful-info density, so I find the heuristic of "discard LLM prose and read human-written but lazily worded prose" very useful.

[-]Katalina Hernandez1mo*50

This was added raw by me 🤣 "We don't have this luxury: we cannot afford an AGI ban that is "80% avoided." In relation to the previous example about tax avoidance.

As I said to Kave: this is really helpful. I'll keep this feedback present for future occasions. I really believe in the importance of a well-specified AGI ban. I'd rather not risk people taking it less seriously because of details like these! 🙏🏼.

[-]A1987dM1mo31

80% of those are things I semi-regularly do in my own prose, FWIW

[-]kave1mo61

The most egregious example in the TL;DR to me is "Anything less collapses into Goodharting and fines-as-business-cost", but also "regulation has to act before harm occurs, not after", the bulleted list with bold intros, the "not outcome or intent" finisher for a sentence, "these are merely illustrative examples, not formal proposals" are some other yellow to red flag examples in that section.

[-]Ben Pace1mo4-1

Not sure what tipped off kave, but I just put it into my favorite online AI detector and it came out with high confidence that ~60% of the text was AI-written.

[-]Kaj_Sotala1mo74

For what it's worth, while I did notice some bits of this sounding a bit LLM-y, it didn't bother me at all and I would consider this post just straight-up fine rather than borderline okay.

[-]Mateusz Bagiński1mo101

I agree with this, and I would guess that a vast majority of people speaking in favor of a coordinated ban/pause/slowdown would agree with this^[1].

An important aspect of the issue is that in the state of great uncertainty, where the actual "bright lines"/"phase transitions"/"points of no return" lie in reality, we need to take a "pragmatically conservative" stance and start with a definition of "not obviously X-risk-generating AI" that is "overly broad from God's point of view", but makes sense given our limited knowledge. Starting then, as our uncertainty gradually resolves and our understanding grows, we will be able to gradually trim/refine it, so that it becomes a better pointer at "X-risk-generating AI".

And then there is the issue of time. Generally and most likely, the longer we wait for a "more precisely accurate" definition, the closer we get to the line (/some of the lines), the less time we'll have to implement it and the more difficult it will be to implement it (e.g. because pro-all-AI-progress lobby may grow in power over time, AI parasitism or whatever might get society addicted before the ~absolute point of no return, etc).

Obviously, there are tradeoffs here, e.g., the society (or some specific groups) might get annoyed by AI labs doing various nasty things. More importantly, more precise regulations, using definitions grounded in more legible and more mature knowledge, are generally more likely to pass. But I still think that (generally) the earlier the better, certainly in such "minimal" areas or types of endeavors as awareness raising and capacity building.

^{^}
... at least when pressed towards thinking about the issue clearly.

[-]Katalina Hernandez1mo43

Thank you for the posts you've linked here! I agree that we can't wait for the perfect definition or for consensus in the AI Safety community on definitions, to actually take action. Precisely, I wanted this post to help others think more clearly about the issue!

[-]Garrett Baker1mo92

Most “AGI ban” proposals define AGI by outcome: whatever potentially leads to human extinction

Is this actually true? Most AGI ban proposals I hear about define it in terms of training compute or GPU restrictions. Eg in IABIED, they want to ban having ownership of >8 top-of-the-line GPUs. My impression is also that the "AI safety community" largely agrees the EU AI act is pretty bad (also it doesn't say "no human extinction").

[-]Kabir Kumar1mo141

My impression is also that the "AI safety community" largely agrees the EU AI act is pretty bad (also it doesn't say "no human extinction").

Personally, my expectations for the AI Act were low and I was quite pleasantly surprised when I skimmed it - it even mentioned corrigibility, so at least one person who was involved in making it has probably read some lesswrong posts.

[-]Katalina Hernandez1mo60

Thanks for pushing back! My original claim was probably too broad. What I meant is that in advocacy/outreach circles, I often see “AGI ban” proposals framed in terms of outcomes (“prevent systems that could cause extinction”) rather than operational proxies.

You’re right that some proposals take the compute-restriction route, which I gesture at in the post. Sadly, I haven’t read IABIED yet (still waiting on my copy!), but I agree that a hard GPU threshold, while imperfect and bypassable, is at least tractable in a way “ban extinction-risk AI” isn’t.

On the AI Act: it definitely wasn’t drafted with frontier/GenAI in mind, and it shows. It never mentions “extinction,” but it does give a workable definition of systemic risk: “risk specific to the high-impact capabilities of general-purpose AI models … with reasonably foreseeable negative effects on public health, safety, fundamental rights, or society as a whole, propagated at scale.” That’s the baseline Europe is working with, so my job is to do my best with that!.

This post was mainly pointing to the gap: many “AGI ban” arguments invoke a future, extinction-linked technology without nailing down a workable precursor definition or proxy. That’s the piece I think we need to stress-test.

[-]Garrett Baker1mo30

What I meant is that in advocacy/outreach circles, I often see “AGI ban” proposals framed in terms of outcomes (“prevent systems that could cause extinction”) rather than operational proxies.

In that case, shouldn't the argument be one about whether this is the right phrase to advocate for, rather than whether its the right thing to write into law? For example, (whether you agree with the policy or not), the Federal Assault Rifle Ban banned many assault rifle "precursors",

the manufacture, transfer, or possession of "semiautomatic assault weapons", as defined by the Act. "Weapons banned were identified either by specific make or model (including copies or duplicates thereof, in any caliber), or by specific characteristics that slightly varied according to whether the weapon was a pistol, rifle, or shotgun" (see below).[18] The Act also prohibited the manufacture of "large capacity ammunition feeding devices" (LCAFDs) except for sale to government, law enforcement or military, though magazines made before the effective date ("pre-ban" magazines) were legal to possess and transfer. An LCAFD was defined as "any magazine, belt, drum, feed strip, or similar device manufactured after the date [of the act] that has the capacity of, or that can be readily restored or converted to accept, more than 10 rounds of ammunition."[18]

Despite the ban being championed and titled an "assault rifle ban". So its not clear that the five words asking to ban AGI should be extended to include all the details which would make a lawyer or lawmaker happy.

[-]Katalina Hernandez1mo115

I think responsible advocacy has to take both dimensions seriously. I don’t have a problem with campaigns focusing on x-risk, I think they should. The issue is when a lawmaker or regulator presses for specifics and the answer stays at the level of “we want to ban anything that leads to extinction.” That kind of underspecification takes a toll when you want it translated into legal drafting.

On the example you mention: whatever its flaws, it didn’t just say “ban assault rifles.” It listed specific models, characteristics, and magazine capacities (e.g. >10 rounds) that counted as prohibited precursors. You can agree or disagree with where those lines were drawn, but the definitional work is what made the ban legible in law. I'm not sure what type of advocacy work went into this because I'm not a U.S. citizen or a U.S. lawyer, but I would think that, when prompted about what they wanted included in the legislation, they didn't stop at "we want to ban assault rifles"...

What I’ve appreciated, for instance, about some PauseAI advocates is that they’ve been clearer with me: their goal is to halt development at the current capability level, and they've used a language similar to how the EU GPAI code of practice defines it. I still don’t know how politically feasible that is (as I noted in the post), but it at least gives a concrete frame.

Yes, advocacy work doesn’t need to be as precise as the legal wording that (hopefully) follows if it succeeds. But when definitional questions are raised, I think it’s crucial to be able to point to the thresholds and precursor capabilities you actually want prohibited.

[-]Random Developer1mo83

The issue is when a lawmaker or regulator presses for specifics and the answer stays at the level of “we want to ban anything that leads to extinction.” That kind of underspecification takes a toll when you want it translated into legal drafting.

Let me see if I can explain where I'm coming from, by contrasting three bans.

Nuclear non-proliferation. Someone living in Russia or the US in the 1950s or 60s might quite reasonably expect that they and their children would die in nuclear fire. There were a couple of times when we came disturbingly close. We have avoided this fate (so far) because enough powerful decision makers on all sides could visualize and understand the threat. We were also lucky that a few key steps to building a nuclear weapon require nation-state resources, and are hard to do in secret. And enough experts understood the process well enough to find those key control points. Even so, we largely failed at preventing the proliferation of nuclear weapons to small rogue states. Thankfully, we have so far avoided their further use.

The "assault weapons" ban. The policy goal behind this ban was something like "we would like to see fewer gun deaths." One key challenge of achieving this was that the US still wanted to allow people to own, say, a nice walnut-stock hunting rifle chambered for Winchester .300 Magnum or .450 Bushmaster. We only wanted to ban the "military style" weapons. But it turns out that it's much harder to stop an angry bull moose than a human being. Some of those family heirloom hunting rifles are effectively extremely powerful, long-range sniper rifles. And some of them are semi-automatic.

But the authors of the "assault weapons" ban still wanted to ban "weapons of war." (Not including actual full-auto rifles, which were banned long ago.) So how do you distinguish between grandpa's Winchester .300 Magnum that he used to hunt elk, and an AR-15? Well, it mostly comes down to two things:

How many shots can you fire and how quickly, before you need to do something more complicated than pulling the trigger? The theory is that at least some mass shooters get tackled or otherwised stopped while they're messing around with reloading. It's not foolproof, but it is actually a coherent policy goal that someone might want.
Does your weapon look like a scary military weapon? Hint: That link is a trick. It's a perfectly nice rifle, but it's a classic .22 bolt action target rifle that has trouble killing a woodchuck humanely.

So in practice, a large chunk of the assault weapon ban is largely aesthetic. And it didn't stop the mass proliferation of "tactical" gun culture and AR-15s. As annoying as I find the median AR-15 owner at the range, the AR-15 is mainly a cultural problem. More powerful and more dangerous hunting rifles already existed, and nobody ever tried to ban those.

So at least from my perspective, the "assault weapon" ban largely failed to achieve the policy goal of "less gun violence in the US." And that's because a lot of the specific things it regulated didn't connect especially well to the underlying goal.

Superintelligence. For the sake of argument, let's assume that (1) we have a real chance of building something much smarter than any human in the next 20 years, and (2) we won't really understand how it works, and our best "alignment" tools will be the AI equivalent of high school sex ed classes for SkyNet. To be precise, assume we can roughly communicate our ideal goals to the AI and we can get it to make the right noises when asked. But ultimately it's an incomprehensible alien superintelligence, and it's going to do whatever the hell it wants. Just like human teenagers.

By default, in this scenario, the AI runs rings around us, and it ends up making all the really important decisions about the future without our input. The only way to avoid this is to strictly enforce the First Rule of Demonology: "Don't call up what you can't put down." But as you rightfully point out, that's not exactly a bright-line policy.

But I don't think the answer is to write out a complicated set of regulations like the "assault weapons" ban, many of which are unrelated to the underlying policy goal. I think the actual way that we all survive is because the leadership of the US and China feels in their gut that building poorly-understood alien superintelligences is likely to kill both them and their children. Then they'll make the message clear: "Here's a long list of things you shouldn't do unless you want to wind up on the wrong end of a joint military strike. And if you discover a new way to build an alien superintelligence that we didn't put on this list? Yeah, that's forbidden, too."

The real goal is to win over enough senior political and military leadership in the superpowers to actually enforce a "no building incomprehensible alien superintelligences" rule. Probably we can't actually win that argument yet, because people don't really believe that we'll ever be able to build them! But when and if people realize it's possible to build alien superintelligences (and maybe after the first terrifying near-miss), then I hope we can provide people with a mental framework explaining why this is a terrible idea.

Will this work? Who knows. But better to try to win over the key decision makers, than to just accept that our entire species' last words will be "Hold my beer" while someone powers up an alien superintelligence that we don't understand at all.

[-]Katalina Hernandez1mo40

I actually enjoyed reading this comment a lot, thank you! Particularly well argued!

I'd agree with a lot of this, but:

"I think the actual way that we all survive is because the leadership of the US and China feels in their gut that building poorly-understood alien superintelligences is likely to kill both them and their children. Then they'll make the message clear: "Here's a long list of things you shouldn't do unless you want to wind up on the wrong end of a joint military strike. And if you discover a new way to build an alien superintelligence that we didn't put on this list? Yeah, that's forbidden, too."

As much as I understand the underlying concern - I think this is is actually how it fails.

In most conversations I have with activists, I always circle back to this point: policymakers do not have a lot of leeway when Bigtech's interests are at play. I understand this varies from jurisdiction to jurisdiction. As an example, I know a few people who participated in the working group for the GPAI Code of Practice in Europe. I kept hearing how the versions were constantly watered down following the feedback of the future signatories ( the Big Labs). Personal frustrations of these experts included the feeling of helplessness: they either conceded, or risked delays or the Code not being signed at all. And, mind you: this is a voluntary framework! Can you imagine how it is in the case of absolute, binding regulation?

Yes, getting the support of government leaders is important. But in terms the wording of the ban; what's written down and signed, is what ends up in the desk of OpenAI (and others)'s in-house counsel with instructions to find a way around it. Which is kind of the main point of the post: if AIS orgs don't prioritise legal strategy when putting such proposals forward, they'll only make it easier for people like me (working for the other side) to argue their way out of it. I don't want that to happen.

[-]Random Developer1mo40

Thank you for your thoughtful and excellent response!

In most conversations I have with activists, I always circle back to this point: policymakers do not have a lot of leeway when Bigtech's interests are at play.

If this remains true if and when we approach superintelligence, then there's an excellent chance we all die.

Yes, getting the support of government leaders is important. But in terms the wording of the ban; what's written down and signed, is what ends up in the desk of OpenAI (and others)'s in-house counsel with instructions to find a way around it.

Again, I fear that this is a future in which we all die. (To be fair, I think there are a lot of futures in which we all die, if we figure out how to build an incomprehensible alien superintelligence.)

Since Yudkowsky and Soares had such fun with parables, let me try one.

FrontierBioCo. You're a government regulator of the pharma industry, and FrontierBioCo has announced that they're researching cures for infectious diseases. As part of this, they're doing gain-of-function research. They've announced two targets:

A version of MERS-CoV that increases the existing 30% case fatality rate, while achieving a slightly higher R value than the SARS-CoV-2 Omicron strain in human-to-human transmission, making it one of the most contagious diseases known.
A version of Ebola which can be transmitted via airborne droplets, with extremely high R values.

And when you look closer, you see that they brag about their BSL 4 containment labs, but that their head of biosafety just quit, muttering, "BSL 4 my ass, it isn't even BSL 2." And it's quite clear that FrontierBioCo's management and legal team have every intention of exploiting regulatory loopholes. And they brag about "moving fast and breaking things". Including laws and most likely sample vials.

Let's say you don't want to die from airborne Ebola or the new MERS-CoV variant codenamed "Armaggedon". What's your regulatory strategy here?

My regulatory strategy would start by calling in biologists and military advisors. And I'd ask questions like, "What is the minimum level of response that we are certain guarantees containment of FrontierBioCo's labs?" And when FrontierBioCo's lawyers later insist that were technically complying with the law, my response involves disbarment. And trying really hard to find a way to sentence everyone involved to prison. Because I believe that sometimes you need to actually listen to Ellen Ripley if you want to live.

The point of this heavy-handed parable is that as long as the frontier labs are intent on building superintelligence, and as long as we allow them to play cute legal games, then our odds may be very bad indeed.

Which brings me to my next questions...

What does a (mostly) successful technology ban look like? The closest real-world analogy here is nuclear non-proliferation. We haven't actually stopped the technology from spreading entirely, but we've limited the spread to mid-tier nation states, and we've successfully avoided anyone using nuclear weapons. There seem to have been a couple of key points:

Mutually Assured Destruction (MAD) actually does seem to work in the medium term, even with regional powers. India and Pakistan haven't nuked each other yet. So I guess this is a point for the terrifying Cold War planners? ("The whole point of a Doomsday machine is lost if you keep it a secret!")
Key technology bottlenecks. I've worked in national-security-adjacent fields, and one of the scarier claims we heard from the actual national security types was that building basic nuclear weapons mostly isn't that hard. In particular, many top university engineering departments could apparently do it if they really tried. But there's one important catch: It's surprisingly hard for a rogue organization to lay hands on fissile materials without getting noticed by the superpowers.

Note that the fissile material ban is frequently enforced either by crippling economic sanctions or by military strikes on enrichment facilities. This isn't usually a question settled by who has the most clever lawyers, or by the regulatory fine print.

What do we need to do to prevent someone building an incomprehensible alien superintelligence? Honestly, the scary part is I don't know. I can see two major routes by which we might get there:

Scaling laws. Perhaps building superintelligence will require scaling up to 10x or 1000x our current training runs. In this case, if you control the data centers, you can prevent the training runs. This is actually closely analogous to controlling nuclear weapons.
Algorithmic improvements. If we're unlucky, then perhaps superintelligence works more like "reasoning models". Reasoning models represented a sharp increase in mathematical and coding skills. And it took about 4 months from OpenAI's announcement of o1-preview until random university researchers could add reasoning support to an existing model for under $5,000. If building a superintelligence is this easy, then we live in what Bolstrom called a "vulnerable world" (PDF). This is the world where you can hypothetically build fusion bombs using pool cleaning chemicals. Again, this is a possible world in which I suspect we all die.

If we live in world (1), then my ideal advice is to immediately and permanently cap the size of training runs. (This may fail, for the reasons you point out. But we should try.) If we live in world (2), then I don't have any advice beyond "Hug your kids."

[-]Katalina Hernandez1mo41

Again, one of my favorite comments in this post! I also appreciate you organising your thoughts at the best of your capacity while acknowledging the unknowns.

"The point of this heavy-handed parable is that as long as the frontier labs are intent on building superintelligence, and as long as we allow them to play cute legal games, then our odds may be very bad indeed." I obviously agree, painfully so. And I really appreciate the tangible measures you propose (caps on training runs, controls of data centres).

Regarding the "Mutually Assured Destruction point": About two days ago, Dan Hendrycs released an article called AI Deterrence is our Best Option. Some of the arguments you raised (in connection to your national security experience) reminded me of his Mutual Assured AI Malfunction deterrence framework. I'm curious to know your thoughts!

I am definetly not an international cooperation policy expert^[1]. In fact, these discussions and the deep thinking that have gone towards them are giving me a renewed respect for international policy experts. @Harold also raised very good points about the different behavious towards "ambiguity" (or what I perceive as ambiguous) in different legal systems. In his case, speaking from his experience with the Japanese legal system.

In the future, I will focus on leveraging what I know of international private law (contract laws in multinational deals) to re-think legal strategy using these insights.

This has been very helpful!

^{^}
My legal background is in corporate law, and I have experience working at multinational tech companies and Fintech startups (also, some human rights law experience). Which is why I focus on predictable strategies that tech companies could adopt to "Goodhart" their compliance with AI laws or a prohibition to develop AGI.

[-]testingthewaters1mo40

Just on the point of MAIM, I would point out that one of the authors of that paper (Alexandr Wang) has seemingly jumped ship from the side of "stop superintelligence being built" [1] to the side of "build superintelligence ASAP", since he now heads up the somewhat unsubtly named "Meta Superintelligence Labs" as Chief AI officer.

[1]: I mean, as the head of Scale AI (a company that produces AI training data), I'm not sure he was ever on the side of "stop superintelligence from being built", but he did coauthor the paper apparently.

[-]Kabir Kumar1mo40

Also, Dan Hendrycks works at xAI and makes capability benchmarks.

[-]habryka1mo2-1

He definitely works mostly on things he considers safety. I don't think he has done much capability benchmark work recently (though maybe I am wrong, but I figured I would register that the above didn't match my current beliefs).

[-]Kabir Kumar1mo30

Earlier this year

[-]Katalina Hernandez1mo10

Oh :/. Thank you for bringing this to my attention!

[-]Random Developer1mo30

To be clear, I have never been an actual national security person! But once upon a time, my coworkers occasionally needed to speak with non-proliferation people. (It's a long story.)

Some of the arguments you raised (in connection to your national security experience) reminded me of his Mutual Assured AI Malfunction deterrence framework. I'm curious to know your thoughts!

I don't know if that specific deterence regime is workable or not, but it's a good direction to think about. One difference from nuclear weapons is that if you're the only country with nukes, then you can sort of "win" a nuclear war (like happened at the end of WW2). But being the only country with an incomprehensible alien superintelligence is more like being the only country with a highly infectious airborne Ebola strain. Actually using it is Unilaterally Assured Destruction, to coin a term.

But let me make one last attempt to turn this intuition into a policy proposal.

Key decision makers need to realize in their gut that "building an incomprehensible alien superintelligence that's really good at acting" is one of those things like "engineering highly infectious airborne Ebola" or "allowing ISIS to build megaton fusion weapons." Unfortunately, the frontier labs are very persuasive right now, but the game isn't over yet.
If the primary threat involves scaling, then we would need to control data centers containing more than a certain number of GPUs. Large numbers of GPUs would need to be treated like large numbers of gas centrifuges, basically. Or maybe specific future types of GPUs will need tighter controls.
If the primary or secondary threat involves improved algorithms, then we may need to treat that like nuclear non-proliferation, too. I know a physics major who once spoke to the aforementioned national security people about nuclear threats, and he asked, "But what about if you did XYZ?" The experts suddenly got quiet, and then they said, "No comment. Also, we would really appreciate it if you never mentioned that again to anybody." There are things at the edges of the nuclear control regime that aren't enforceably secret, but that we still don't want to see posted all over the Internet. Some of the enforcement around this is apparently handled by unofficially explaining to smart people that they should pretty please Shut The Fuck Up. Or maybe certain algorithms need to be classified the way we classify the most important nuclear secrets.
It's likely that we will need an international deterence regime. There are people with deep expertise in this.

One key point is that nobody here actually knows how to build a superintelligence. What we do have is a lot of people (including world class experts like Geoffrey Hinton) who strongly suspect that we're close to figuring it out. And we have multiple frontier lab leaders who have stated that they plan to build "superintelligence" this decade. And we have a bunch of people who have noticed that (1) we don't remotely understand how even current LLMs work, and (2) the AI labs' plans for controlling a superintelligence are firmly in "underpants gnome" territory. And also even their optimistic employees regularly admit they're playing Russian roulette with the human race. You don't need deep knowledge of machine learning to suspect that this is the setup for a bad Hollywood movie about hubris.

But key details of how to build superintelligence are still unknown, happily. So it's hard to make specific policy prescriptions. Szilard and Einstein could correctly see that fission was dangerous, but they couldn't propose detailed rules for controlling centrifuges.

I do suspect that corporate legal compliance expertise will play a key role here! And thank you for working on it. We'll need your expertise. But legal compliance can't be the only tool in our toolkit. If we tried to enforce nuclear nonproliferation the way we try to enforce the GDPR, we'd probably already be dead. You will need buy-in and back-up of the sort that upholds nuclear non-proliferation. And that's going to require a major attitude change.

(But I will also hit up my lawyer friends and see if they have more concrete advice.)

[-]StanislavKrym1mo8-11

I would define an AGI system as anything except for a Verifiably Incapable System.

My take on the definition of a VIS is that it can be constructed as follows.

It can be trained ONLY on a verifiaby safe dataset (for example, a dataset which is too narrow, like structures of proteins).
Alternatively it can be a CoT-based architecture with at most [DATA EXPUNGED] compute used for RL, since the scaling laws are likely known.
Finally, it could be something explicitly approved by a monopolistic governance organ and a review of opinions of researchers.

A potential approval protocol could be the following, but we would need to ensure that experiments can't lead to an accidental creation of the AGI.

Discover the scaling laws for the new architecture by conducting experiments (e.g. a neuralese model could have capabilities similar to those of a CoT-trained model RLed by the same number of tasks with the amount of bits transferred being the same. But the model is to be primitive).
Extrapolate the capabilities to the level where they could become dangerous or vulnerable to sandbagging.
If capabilities are highly unlikely to become dangerous at a scaling-up, then one can train a new model with a similar architecture and use as many benchmarks as humanely possible.

[-]Mateusz Bagiński1mo155

I would define an AGI system as anything except for a Verifiably Incapable System.

This is more like "not-obviously not-AGI".

But besides that, yeah, it seems like an OK starting point for thinking about proper definitions for the purpose of a ban.

[-]aphyer1mo43

Would you define 'nuclear weapon' as 'anything not produced in a way that verifiably could not contain any nuclear material'?

(Keep in mind that this would categorize e.g. a glass of tap water as a nuclear weapon.)

[-]Karl Krueger1mo*104

There were no nuclear weapons in 1925; so anything that existed in 1925 is known to not be a nuclear weapon. (Moreover, anything built to a 1925 design isn't one either.)

The smallest critical mass is greater than 1kg; so anything smaller than 1kg is not a nuclear weapon (though it may be part of one).

Nuclear weapons are made of metal, not wood, cloth, paper, or clay; so anything made of wood, cloth, paper, or clay is not a nuclear weapon. (Thus for instance no conventionally printed book is a nuclear weapon, which is convenient for maintaining freedom of the press.)

[-]Katalina Hernandez1mo107

A wise man called @Leon Lang told me recently: "a definition that defines something as not being another thing, is flawed".

[-]StanislavKrym1mo10

Critical masses related to nuclear weapons can be found out at the risk of a nuclear explosion at worst and without any risks at best, by attacking nuclei with neutrons and studying the energies of particles and sections of reactions.

However, we don't know the critical mass for the AGI. While an approval protocol could be an equivalent of determiming critical mass for every new architecture, accidental creation of an AGI capable of breaching containment or being used for AI research, sabotaging alignment work and aligning the ASI to the AGI's whims instead of mankind's ideas would be equivalent to a nuclear explosion burning the Earth's atmosphere.

P.S. The possibility that a nuclear explosion could burn the Earth's atmosphere was considered by scientists working on the Manhattan project.

[-]aphyer1mo20

Fusion is also a thing. A glass of tap water contains (admittedly a very small amount) of deuterium.

[-]Harold1mo61

Great article, thank you for all your writing. Working in Tokyo, I have what might be a useful counterexample on this take: *"This is a painfully common dynamic in corporate law. For example, multinationals routinely practice tax avoidance right up to the legal line of tax evasion. They can prove with audited books that what they’re doing is legal, even if it clearly undermines the spirit of the law."*

Experience adjacent to the Japan National Tax Authority has taught me that they are far more likely than Western tax authorities to sanction companies walking up to the line with "tax mitigation". They do this with a combination of ambiguous lines (rather than bright lines), and unilateral decisions about when a violation of the spirit of the law is an actual violation of the law, regardless of the law's text.

The downside of this approach would be potential for corruption and lack of stability. But this is entirely mitigated by (1) the NTA's professionalism and (2) the NTA's general willingness to publish Q&A explainers (with Qs posed by corporate counsel) on the NTA's approaches to tax - none of which are binding on NTA decisions if they contain loopholes exploited in bad faith, but all of which are respected and taken seriously for the effort put into them.

A hypothetical AI Authority that took its idea of authority seriously, could act in similar ways: compasionate to the concerns of AI companies in understanding the rules, but rutheless in upholding the spirit of an AGI prohibition.

[-]Katalina Hernandez1mo42

This is very useful! I admit that this sort of argument (like with data privacy or consumer protection in general) requires different approaches for different cultures. Still: "unilateral decisions about when a violation of the spirit of the law is an actual violation of the law, regardless of the law's text".

So, is this an ex post, quite subjective decision each time? Or do they just treat all forms of tax avoidance as tax evasion? From an European position, I just do not see this working without strict liability, but I have recently seen an interesting debate (AI Safety ASIA, CAIDP) on Japan's approach to AI regulation being more trust-based, so I get where you are coming from.

But from a multinational perspective, that is kind of part of the problem? Part of the strategy I was explaining is offshoring assets, processes, people- to parts of the world with less rigid regulation. I really do not want a ban to fail because companies just start moving their most dangerous capability development projects "offshore"...

[-]Harold1mo*41

Indeed, on global scale the Anti-Whatever Weapons Might End Humanity (Anti-WW MEH!) treaty would require global enforcement power to catch "off-shoring". But that's true of any global treaty!

So, is this an ex post, quite subjective decision each time? Or do they just treat all forms of tax avoidance as tax evasion?
It's a holistic decision each time taking account of everything. This is how works with almost all things we learn in law school - most obviously in criminal cases: mens rea sometimes seems to apply from sheer recklessness; an imprisonment sometimes comes out as assault & battery even when there's nothing physical about it; a killing sometimes comes out as self-defence even when its premeditated. It's only in corporate-backed actions that some legal cultures have evolved an idea that predictability is paramount and so bad faith loopholes are totally fair and reasonable to exploit until they're patched.

(Notably, multinationals are not fond of the Japanese legal system.)

(Also, notably, an overreliance on rigid black letter law was the origin of common law equity in the 14th-16th C.)

[-]Katalina Hernandez1mo*80

I was about to ask "do you have a legal background?" as you've consistently offered really good explanations of legal concepts in your comments. I've seen in your profile that you are a lawyer. That is great, I'd love to connect outside LW too :).

I don't know a lot about the Japanese legal system, but I have heard the "multinationals not fond of it" before.

My guess is that the need for that enforcement power to catch offshoring will be what cements the need for strict liability. How this translates to the particular requirements of the Japanese legal system is something I don't know. But I'd love to keep learning. Thank you very much for your comments!

[-]Harold1mo40

tl;dr - A treaty that credibly banned “whatever weapons might end humanity” and enforced by the Japanese (or at least the Japanese national tax authority) would, I think, have a strong chance of achieving the desired objective.

[-]plex1mo60

You might be able to get a bright line out of Legg's Universal Intelligence paper? eh, probably too hard to figure out where to put it, but at least doesn't fall to algorithmic advances like compute limits do.

[-]Noosphere891mo31

As someone who doesn't support an AGI ban today, I think this is an excellent post on one of the biggest issues people are going to face if they succeed at passing a law, and in particular the point that companies will often be able to evade rules because it's very hard to properly draw the line (this is especially true for words like AGI, as they refer to way too many distinct pictures) is a big reason I tend towards technical work over governance myself.

Indeed, it's very possible that dealing with deceptively aligned AIs while trying to automate alignment is easier than getting companies to not abuse the rules.

[-]Katalina Hernandez1mo30

Thank you! Curious to know what you think of the recent Global Call for AI Red Lines - Signed by Nobel Laureates, Former Heads of State, and 200+ Prominent Figures put forward by CeSIA. It is not an "AGI ban", nor does it actually propose concrete red lines- it just demands that such red lines are agreed upon by global leaders.

Re "a big reason I tend towards technical work over governance myself"- yes! I am very grateful for those focusing on "actually solving the problem" rather than preparing for catastrophic events. I am sitting on the latter, yet my biggest hope for the future sits with people who work on the technical research side :).

[-]TristanTrim1mo20

Yeah. I think as well as definition problems we also need to be thinking about incentive problems. As you mention, it can't be acceptable for companies to treat "fines as a cost of doing business".

A direction that might be worthwhile: Focus explicitly on incentives with the goal of making it illegal to run a company which is incentivized to increase AGI capabilities, or incentivizes people to increase AGI capabilities. I don't know how that would work and haven't really thought through the implications, but I do find it an interesting idea, especially since so many companies seem to be able to pass liability onto employees by making their "official policy" ban certain behaviours while the operation of the company incentivizes employees to break the companies official policy.

Another direction I am interested in is moving away from "AI" and "AGI" as terms to talk about. I feel they are so old and overloaded with contradictory meanings that it would be better to start over fresh. The system of definitions I am starting to build up centers on "Outcome Influencing Systems (OISs)" which I am trying to define and build terminology around for the sake of technical alignment work as well as communicating risks. I don't think the terminology is ready to be used in a legal context yet, but I would like to work towards a point where it could be.

[-]Mo Putera1mo31

... moving away from "AI" and "AGI" as terms to talk about. I feel they are so old and overloaded with contradictory meanings that it would be better to start over fresh.

I interpret Holden Karnofsky's PASTA from 2021 in the same vein (emphasis his):

This piece is going to focus on exploring a particular kind of AI I believe could be transformative: AI systems that can essentially automate all of the human activities needed to speed up scientific and technological advancement. I will call this sort of technology Process for Automating Scientific and Technological Advancement, or PASTA.³ (I mean PASTA to refer to either a single system or a collection of systems that can collectively do this sort of automation.)

(Note how Holden doesn't care that the AI system be singular, unlike say the Metaculus AGI definition.) He continued (again emphasis his):

PASTA could resolve the same sort of bottleneck discussed in The Duplicator and This Can't Go On - the scarcity of human minds (or something that plays the same role in innovation).
PASTA could therefore lead to explosive science, culminating in technologies as impactful as digital people. And depending on the details, PASTA systems could have objectives of their own, which could be dangerous for humanity and could matter a great deal for what sort of civilization ends up expanding through the galaxy.
By talking about PASTA, I'm partly trying to get rid of some unnecessary baggage in the debate over "artificial general intelligence." I don't think we need artificial general intelligence in order for this century to be the most important in history. Something narrower - as PASTA might be - would be plenty for that.

When I read that last paragraph, I thought, yeah this seems like the right first-draft operational definition of "transformative AI", and I anticipated it to gradually disseminate into the broader conversation and be further refined, also because the person proposing this definition was Holden instead of some random alignment researcher or whatever. Instead it seems(?) mostly ignored, at least outside of Open Phil, which I still find confusing.

I'm not sure how you're thinking about OSIs, would you say they're roughly in line with what Holden meant above?

Separately, I do however think that the right operationalisation of AGI-in-particular isn't necessarily Holden's, but Steven Byrnes'. I like that entire subsection, so let me share it here in full:

A frequent point of confusion is the word “General” in “Artificial General Intelligence”:
The word “General” DOES mean “not specific”, as in “In general, Boston is a nice place to live.”
The word “General” DOES NOT mean “universal”, as in “I have a general proof of the math theorem.”
An AGI is not “general” in the latter sense. It is not a thing that can instantly find every pattern and solve every problem. Humans can’t do that either! In fact, no algorithm can, because that’s fundamentally impossible. Instead, an AGI is a thing that, when faced with a difficult problem, might be able to solve the problem easily, but if not, maybe it can build a tool to solve the problem, or it can find a clever way to avoid the problem altogether, etc.
Consider: Humans wanted to go to the moon, and then they figured out how to do so, by inventing extraordinarily complicated science and engineering and infrastructure and machines. Humans don’t have a specific evolved capacity to go to the moon, akin to birds’ specific evolved capacity to build nests. But they got it done anyway, using their “general” ability to figure things out and get things done.
So for our purposes here, think of AGI as an algorithm which can “figure things out” and “understand what’s going on” and “get things done”, including using language and science and technology, in a way that’s reminiscent of how most adult humans (and groups and societies of humans) can do those things, but toddlers and chimpanzees and today’s large language models (LLMs) can’t. Of course, AGI algorithms may well be subhuman in some respects and superhuman in other respects.
This image is poking fun at Yann LeCun’s frequent talking point that “there is no such thing as Artificial General Intelligence”. (Image sources: 1,2)
Anyway, this series is about brain-like algorithms. These algorithms are by definition capable of doing absolutely every intelligent behavior that humans (and groups and societies of humans) can do, and potentially much more. So they can definitely reach AGI. Whereas today’s AI algorithms are not AGI. So somewhere in between here and there, there’s a fuzzy line that separates “AGI” from “not AGI”. Where exactly is that line? My answer: I don’t know, and I don’t care. Drawing that line has never come up for me as a useful thing to do.

It seems entirely possible for a collection of AI systems to be a civilisation-changing PASTA without being at all Byrnes-general, and also possible for a Byrnes-general algorithm to be below average human intelligence let alone be a PASTA.

[-]TristanTrim1mo30

Wow, thank you. These are great definitions that I hadn't already taken note of.

[PASTA] seems like the right first-draft operational definition of "transformative AI"

I think "PASTA" is not quite the right definition, at least not for what I am trying to focus on. I hope my discussion of OISs below will make that more clear. I also "PASTA" is a rhetorically weak acronym and that contributes some of it's lack of adoption. I don't think "OIS" is perfect, but I did put some thought into how it can be used in language. First, it can be pronounced "oh-ee" or "oh-ees" for plural/possesive, which is a very short phonetic that does not already have any english meaning afaik. Second, none of the other things I found acronymized with "OIS" seem like they will interfere in any important context. And finally, I think the acronym itself "Outcome Influencing System" is quite good and on point once you know what is being talked about.

I'm not sure how you're thinking about OSIs, would you say they're roughly in line with what Holden meant above?

There are definitely motivational and definitional similarities between PASTA and OISs, but I see them as quite distinct. I'll try to explain.

It seems like both PASTA and OIS defs remove some elements from the general AGI def and add different elements. I think PASTA ends up more specific than AI, while OISs are less specific than AI.

Both emphasize that these are systems, not single programs or products like how people often think of AI.

PASTA is focused on kinds of AI systems with preferences for specific outcomes: advancing science and technology. On the other hand, you could think of OISs as an attempt to generalize the AI alignment problem. So the OIS definition just points out that there are outcomes that the OIS has preference for, but does not specify what those preferences are. So from this perspective, PASTA is an OIS with science and technology preferences.

Another place OIS are more general is "substrate". Within the OIS def, there is the concept of the system being implemented by a substrate. With PASTA and AI in general, the implied substrate is digital computers. OISs have no implied substrate, but I am hoping for the main substrate of focus will be the "sociotechnical" substrate. That is, a valid OIS might look like a group of humans working with computers and other technology (including paper documents, writing, and all other human tools). This seems like an important distinction since we do already seem to have OISs like these that seem to favour scientific and technological progress, and indeed, these OISs are using what they develop to improve their future development. Whether or not humans are part of the system may be less important than one would naively assume.

More on the idea that OIS generalize "misalignment": I want the OIS definition to also emphasize "alignment with respect to capabilities". The set of possible capabilities is highly articulate, which is important to note, but I think it's still fine to speak of "greater or lesser capability" even though it is not as precisely defined as I would like. (((IE, suppose there exists capability A, B, and C and OIS X and Y with capa(X)={A,B} and capa(Y)={A,C}. Then, is one more capable than the other? The question is not answerable without a metric over sets of capabilities, and to my knowledge, no such metric exists.))) This helps elucidate an important dynamic: incapable systems need not have articulate preferences to be safe / aligned, but, the more capable an OIS is, the more articulately it needs to target human beneficial outcomes in order to be considered "safely aligned". I note that PASTA, preferring science and technology, is misaligned with humanity as it's capabilities go to infinity.

About the Steven Byrnes operationalization, I agree that "generality" is not well defined. I want to keep considering the terminology, but currently I feel about "generality" the same as I feel about "capability", or rather, the capabilities of an OIS have both breadth and depth and can maybe be thought of as living in "the space of all possible tasks, aka task-space". I might call depth "skill" and breadth "generality" or "versatility". So an OIS's capability is defined by it's versatility and skill. Most OISs will not strictly outmatch others in the sense at being more skilled at every task the other can perform, so again, without a metric over task-space, "versatility" or "generality" are not well defined in the sense that you can't say "this OIS is more versatile than that OIS". But, again, people do have a sense of what a metric over task-space should be, so we can gesture at the idea of one OIS being more versatile, but people's intuition may disagree, and it is good to be aware that we are relying on peoples fuzzy intuition of what a sensible metric over task-space or capability-space should be.

Another thing I want emphasize is OISs are not discrete. Instead, they are like dense venn diagrams, both literally and in the sense that they are a lens which can be applied with great expressiveness to examine the world. I'll illustrate with an example... Two carpenters are at a worksite. Since both of them are instrumentally aligned to their work, we may temporarily drop their full human preferences and examine the OIS they have created within themselves, specifically, their role as carpenters. That is a first venn aspect. It might be accurate to describe it: the carpenter role is an OIS that uses a person as an implementing substrate. Now, the two carpenters share a saw, and as such, both of them have the capability to cut wood. As OISs, the saw is a part of both of them. This is a second venn aspect. Without the saw, each might have the capability "to use a saw" but it is the existence of the shared saw that transforms the useless "use a saw" capability into the useful "cut wood" capability. There are not two saws, but the saw is part of (at least) two OISs. Finally, the two carpenters are brothers who operate a family business. This business is itself an OIS, and is probably the most useful OIS to consider if you are interested in harnessing it's power to influence the outcome of wood into finely crafted furnishings. This is a third venn aspect. OISs are composed through their interaction to create larger, often more capable, OISs.

A final detail of the OIS def: It explicitly throws out "intelligence" as a term and instead uses the term "capabilities". This is because I observe people sometimes argue about what intelligence is as if intelligence was a puzzle handed to us rather than a word we use to label things we observe in the world. People have capabilities to think and plan and act, and some people call that intelligence. Some people do well in school, and some people call that intelligence. Both of those are important to examine, but don't, imo, benefit from use of the word "intelligence", and neither of those common definitions ever seems to point at the international social and economic forces that have transformed the surface of Earth. That is the level of capability I often want to refer to and the word "intelligence" only ever gets in the way.

If you read all that, thanks! Please send me more similar concepts if you have them and ask any questions you would like.

[-]Katalina Hernandez1mo30

I think you're thinking about this in a very useful way!

If we can narrow down a specific (but broad enough) set of capabilities that we consider illegal to incentivise, then this would be workable.

And yes, when I say "fines as cost of doing business": this is a very common conclusion that DPOs^[1] in Europe come to when asked about the effectiveness of GDPR enforcement.

It's way too easy for big corporations to just calculate and set off the cost of the fine against the profit margin produced by "the not compliant action".

Which is why I do support the idea of "banning" dangerous development, and how I started thinking about the definitions to begin with.

Again, really useful comment!

^{^}
Data Protection Officers

[-]O O1mo20

How has nuclear non proliferation been a success?

Short of something that would stop us from even pondering this, we’ve gotten dangerously close to nuclear exchanges multiple times and several rogue states have nukes or use how close they are to a nuke as a bargaining tool.

[-]kyleherndon1mo00

I like most of this post, but:

AGI != ASI. Defining AGI, and only post ante-fining a company into oblivion that makes AGI may be enough to prevent the death of humanity. I would put pretty good odds on it being enough, as long as it was strongly enforced and detected.

I would still support regulation preventing AGI, I just want the terminology to be straight. ASI is the thing that IABIED.

[-]Katalina Hernandez1mo30

So, let's start with defining AGI XD. Also, you can find many examples in Lesswrong of people talking about AI that leads to human extinction as "AGI/ASI". Using the "/". I do not know to what extent there is a firm distinction on everyone's minds when thinking about the type of AI they want to ban.

[-]kyleherndon1mo10

Artificial General Intelligence, AGI, is an AI, that can do anything an individual human can do, (especially on economically productive metrics).

Artificial Superintelligence, ASI, is an AI, that can do much more than all of humanity working together.

These definitions wouldn't be suitable for a legal purpose, I imagine, in that they lack a "technical" precision. However, in my mind, there is a very big difference between the two, and an observer wouldn't be likely to mislabel a system as ASI when it is actually AGI, or vice versa.

Yet, in my mind, one of the biggest risks of AGI is that it is used to build ASI, which is why I still agree with your post.

^{^}

I do not endorse the manufacturing of any weapons, I am merely using this as an example for illustrative purposes.

^{^}

One blind spot I notice is how rarely tech lawyers are brought into AI safety strategy. Lawyers in large firms or in-house roles often sit at the chokepoints of real leverage: they can delay deals, rewrite contractual terms, and demand transparency in ways that policymakers often cannot. In the EU especially, an AGI ban (or any ambitious legislative action) would ultimately be implemented, interpreted, and either undermined or strengthened by these lawyers. If they are left out of the conversation, the path of least resistance will be to map the gray areas and advise clients on how to bypass the spirit of the law.

LESSWRONG
LW

LESSWRONG
LW

249

The Problem with Defining an "AGI Ban" by Outcome (a lawyer's take).

249

249

TL;DR

Why outcome-based AGI bans' proposals don’t work

The luxury of "defining the thing" ex post

Actually defining the thing we want to ban

Credible bans depend on bright lines

Learning from nuclear treaties