MalcolmMcLeod — LessWrong

Really appreciate your laying out your thoughts here. Scattered late-night replies:

#0: I think this is a duplicate of #3.

#1: In one sense, all the great powers have an interest in not building superintelligence---they want to live! Of course, some great powers may disagree; they would like to subvert the treaty. But they still would prefer lesser powers not develop AGI! Indeed, everyone in this setup would prefer that everyone else abide by the terms of the treaty, which ain't nothing, and in fact is about as much as you'd hope for in the nuclear setting.

#2: I don't think the economic value here is a huge incentive by itself, though I agree it matters. If it's illegal to develop AGI, then any use of AGI for your economy would have to be secret. Of course, if developing AGI gives you a strategic advantage you can use to overwhelm the other nations of the world, then that's an incentive in itself! There's also the possibility that you develop safe AGI, so nobody bothers to enforce the treaty or punish you for breaking it. And this AGI is contained and obedient, and you prevent other countries from having it, so you crush the nations of the world? Or you share it globally and everyone is rich---in this story, your country wants what's best for the world but also believes that it knows what is best for the world. Plausible. (See: Iraq war.) All kinds of reasonable stories you can tell here, but the incentive isn't clear-cut. It would be a harder sell.

#3: Yeah, this is a real difficulty.

#4: Yep, even if we get a pause, the clock is (stochastically) on. Which means we must use the pause wisely and fast. But as for the procedural "difficulty in drawing a red line," I'm not too worried about that. We can pick arbitrary thresholds at which to set the legal compute cap. Sure, it's hard to know what's "correct," but I'm not too worried about that.

#5: Possibly. Parts of it are harder, parts of it are easier. Centrifuges are easier to hide than Stargate, but a single server rack is easier still. An international third party of verifiers could eliminate a lot of the mutual mistrust issues. Of course, that's not incorruptible, but hey, nothing's perfect. Probably someone should be working on "cryptographic methods for verifying that no frontier AI training is going on without disclosing much else about what's happening on these chips."

I agree with you that on net, this is a harder treaty to enforce than the global nuclear weapons treaties. On the other hand, I don't think we've tried that hard on the nonproliferation, relative to all the other things our civilization tries hard at. The obstacles you describe seem surmountable, at least for a while. And all we need to do is buy time. It ain't intractable. And I have no better ideas.

Natural Latents: Latent Variables Stable Across Ontologies

MalcolmMcLeod11d30

Really gorgeous stuff with philosophically significant and plausibly practical implications. Great work. I assume you've also looked at this from a categorial perspective? It would surprise me if treating latents as limits didn't simplify some of the arguments (or at least the presentation, which is already admirably clear). And I can't help but wonder whether "bigger" uniqueness/isomorphism/stability results for world-models or other features of agents might result from comparing Bayes net categories. If you haven't tried categorial abstractions (I dunno the specifics---there are a few categorification choices that I could see being useful here), play around with them.

A philosophical kernel: biting analytic bullets

MalcolmMcLeod1mo20

Yeah, contemporary Unitarian Universalists don't believe in much in particular. Mostly they're "people who would be atheists reconstructionist Jews (if they were ethnically Jewish), casual western Buddhists (if they were Californian), or "spiritual" (if they were middle-American young white women), but they are New Englanders descended from people named things like Hortense Rather." It's said that the only time you'll hear "Jesus" in a Unitarian church is when the janitor stubs his toe. Most Christians consider them "historically and aesthetically connected to Christianity, but not actually Christian." In the olden days they were more obviously "heterodox Christians"---like LDS, 7DA, or JW today, they would certainly consider themselves Christians holding the most truly Christian beliefs, though others considered them weirdos. I'm not sure how the transition occurred, but my impression is that the Universalism part of UU made it a uniquely easy religion to keep affirming as the early-20th-century weird-Christian milieu of New England rapidly turned into late-21st-century standard elite atheism.

A philosophical kernel: biting analytic bullets

MalcolmMcLeod1mo63

Consider an analogy: a Christian fundamentalist considers whether Christ's resurrection didn't really happen. He reasons: "But if the resurrection didn't happen, then Christ is not God. And if Christ is not God, then humanity is not redeemed. Oh no!"
There's clearly a mistake here, in that a revision of a single belief can lead to problems that are avoided by revising multiple beliefs at once. In the Christian fundamentalist case, atheists and non-fundamentalists already exist, so it's pretty easy not to make this mistake.

"Christ was resurrected" isn't a fundamentalist thing. It's the Main Thing about Christianity. If you don't believe it, you are a "cultural Christian" at most, which essentially all churches and communities say Does Not Count.

yrimon's Shortform

MalcolmMcLeod2mo10

Yikes. If they're telling the truth about all this---particularly the "useful for RL on hard-to-verify-solution-correctness problems"---then this is all markedly timeline-shortening. What's the community consensus on how likely this is to be true?

Nobody is Doing AI Benchmarking Right

MalcolmMcLeod2mo32

My colleagues and I were arguing about the nature of LLM intelligence and generalization. (In particular, they were talking about this paper: [2507.06952] What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models and using Kepler/Newton as an example). This is the only eval I know of that hits the question directly. If you want funding, make a nice website where people can sign up to play it (depending on your worries about data leakage? maybe you'd create a public-set-version?) and you show a good leaderboard. & you can solicit donations. This feels like ARC-AGI-3, sorta. (OTOH, although scientifically interesting, this project might be doom-increasing. "Feeding evals to capabilities labs" and all that. If I were aiming for AGI, this is the benchmark I would hill-climb.)

A case for courage, when speaking of AI danger

MalcolmMcLeod3mo1010

Yes---but when some people say "I think there is danger here" and others say "I think there is no danger here," most people (reasonably!) resolve that to "huh, there could be some danger here"... and the possibility of danger is a danger.

Kabir Kumar's Shortform

MalcolmMcLeod3mo30

Would you care to start now by giving an example?

New Endorsements for “If Anyone Builds It, Everyone Dies”

MalcolmMcLeod3mo1717

Seconded. The new hat and the pointier, greyer beard have taken him from "internet atheist" to "world-weary Jewish intellectual." We need to be sagemaxxing. (Similarly, Nate-as-seen-on-Fox-News is way better than Nate-as-seen-at-Google-in-2017.)

"Map of AI Futures" - An interactive flowchart

MalcolmMcLeod3mo10

I like this. My biggest difference with the implicit model is that I really think a "pause-and-resume" changes the game a whole lot. My p(workable alignment solution) is very different based on whether it comes after pausing-and-resuming versus coming in the near term! So I suppose I'd have a "branching" there? But one could argue for other branchings. And you gotta make design choices as some level! Good work.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments