LESSWRONG
LW

156
WilliamKiely
953111780
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
5WilliamKiely's Shortform
1mo
2
No wikitag contributions to display.
IABIED Misc. Discussion Thread
WilliamKiely7d72

Chapter 5, “Its Favorite Things,” starts with Yudkowsky’s “Correct-Nest parable” about intelligent aliens who care a lot about the exact number of stones found in their nests.

Immediately after the parable, on page 82:

Most alien species, if they evolved similarly to how known biological evolution usually works, and if given a chance to have things the way they liked them most, probably would not choose a civilization where all their homes contained a large prime number of stones. There are just a lot of other ways to be; there are a lot of other directions one could steer. Much like predicting that your next lottery ticket won’t be a winning one, this is an easy call.

Similarly, most powerful artificial intelligences, created by any method remotely resembling the current methods, would not choose to build a future full of happy, free people. We aren't saying this because we get a kick out of being bleak. It's just that those powerful machine intelligences will not be born with preferences much like ours.

This is just a classic “counting argument” against alignment efforts being successful, right?

I recall Alex Turner (TurnTrout) arguing that (at least some) counting arguments (that are often made) are wrong (Many arguments for AI x-risk are wrong) and quoting Nora Belrose and Quintin Pope arguing the same (Counting arguments provide no evidence for AI doom). Some people in the comments, such as Evan Hubinger, seem to disagree, but as a layperson the discussion became too technical for me to understand.

In any case, the version of the counting argument in the book seems simple enough that as a layperson I can tell that it's wrong. To me it seems like it clearly proves too much.

Insofar as Yudkowsky and Soares are saying here that an ASI created by any method remotely resembling the current method will likely not choose to build a future full of happy, free people because there are many more possible preferences that an ASI could have than the narrow subset of preferences that would lead to it building a future of happy, free people, then I think the argument is wrong.

It seems like this counting observation is a reason to think (so maybe I think the "no evidence" in the above linked post title is too strong) that the preferences an ASI ends up having might not be the preferences its creators try training into it, because the target preferences are indeed a narrow target and narrow targets are easier to miss than broad targets. But surely this counting observation is not sufficient to conclude that ASI creators will fail to hit their narrow target. It seems like you would need more reasons to conclude that.

Reply
Contra Collier on IABIED
WilliamKiely11d41

Agreed that current models fail badly at alignment in many senses.

I still feel like the bet that OP offered Collier in response to her stating that currently available techniques do a reasonably good job of making potentially alien and incomprehensible jealous ex-girlfriends like "Sydney" very rare was inappropriate, as the bet was clearly about a different claim than her claim about the frequency of Sydney-like behavior.

A more appropriate response from OP would have been to say that while current techniques may have successfully reduced the frequency of Syndey-like behavior, they're still failing badly in other respects, such as your observation with Claude Code.

Reply
Contra Collier on IABIED
WilliamKiely11d10

But the way you are reading it seems to mean her "strawmann[ed]" point is irrelevant to the claim she made!

I agree.

Reply
Contra Collier on IABIED
WilliamKiely13d1614

(I only skimmed your review / quickly read about half of it. I agree with some of your criticisms of Collier's review and disagree with others. I don't have an overall take.)

One criticism of Collier's review you appeared not to make that I would make is the following.

Collier wrote:

By far the most compelling argument that extraordinarily advanced AIs might exist in the future is that pretty advanced AIs exist right now, and they’re getting more advanced all the time. One can’t write a book arguing for the danger of superintelligence without mentioning this fact.

I disagree. I think it was clear decades before the pretty advanced AIs of today existed that extraordinarily advanced AIs might exist (and indeed probably would exist) eventually. As such, the most compelling argument that extraordinarily advanced AIs might or probably will exist in the future is not that pretty advanced AIs exist today, but the same argument one could have made (and some did make) decades ago.

One version of the argument is that the limits of how advanced AI could be in principle seem extraordinarily advanced (human brains are an existence proof and human brains have known limitations relative to machines) and it seems unlikely that AI progress would permantently stall before getting to a point where there are extraordinarily advanced AIs.

E.g. I.J. Good foresaw superintelligent machines, and I don't think he was just getting lucky to imagine that they might or probably would come to exist at some point. I think he had access to compelling reasons.

The existence of pretty advanced AIs today is some evidence and allows us to be a bit more confident that extraordinarily advanced AIs will eventually be built, but their existence is not the most compelling reason to expect significantly more capable AIs to be created eventually.

Reply
Contra Collier on IABIED
WilliamKiely13d22

[C]urrently available techniques do a reasonably good job of addressing this problem. ChatGPT currently has 700 million weekly active users, and overtly hostile behavior like Sydney’s is vanishingly rare.

Yudkowsky and Soares might respond that we shouldn’t expect the techniques that worked on a relatively tiny model from 2023 to scale to more capable, autonomous future systems. I’d actually agree with them. But it is at the very least rhetorically unconvincing to base an argument for future danger on properties of present systems without ever mentioning the well-known fact that present solutions exist.

It is not a “well-known fact” that we have solved alignment for present LLMs. If Collier believes otherwise, I am happy to make a bet and survey some alignment researchers.

I think you're strawmanning her here.

Her "present solutions exist" statement clearly refers to her "techniques [that] do a reasonably good job of addressing this problem [exist]" from the previous paragraph that you didn't quote (that I added in the quote above). I.e. She's clearly not claiming that alignment for present LLMs is completely solved, just that solutions that work "reasonably well" exist such that overtly hostile behavior like Bing Sydney's is rare.

Reply
IABIED Review - An Unfortunate Miss
WilliamKiely15d82

Fair review. As I've now said elsewhere, after listening to IABIED I think your book Uncontrollable is probably still the best overview of AI risk for a general audience. More people should definitely read your book. I'd be down to write a more detailed comparison in a week or two once I have hardcopies of each book (still in the mail).

Reply
I enjoyed most of IABIED
WilliamKiely15d50

FWIW Darren's book Uncontrollable is my current top recommended book on AI.

While I expected (75% chance) IABIED to overtake it, after listening to the audiobook Tuesday I don't think IABIED is better (though I'll wait until I receive and reread my hardcopy to declare that definitively).

As I wrote on Facebook 10 months ago:

The world is not yet as concerned as it should be about the impending development of smarter-than-human AI. Most people are not paying enough attention.

What one book should most people read to become informed and start to remedy this situation?

"Uncontrollable: The Threat of Artificial Superintelligence and the Race to Save the World" by Darren McKee is now my top recommendation, ahead of:

- "Superintelligence" by Nick Bostrom,

- "Human Compatible" by Stuart Russell, and

- "The Alignment Problem" by Brian Christian

It's a short, easy read (6 hours at ~120wpm / 2x speed on Audible) covering all of the most important topics related to AI, from what's happening in the world of AI, to what risks from AI humanity faces in the near future, to what each and everyone one of us can do to help with the most important problem of our time.

Reply
I enjoyed most of IABIED
WilliamKiely15d*20

I'm less worried about this after reading the book, because the book was good enough that it's hard for me to imagine someone else writing a much better one.

I was really hoping you'd say "after reading the book, I updated toward thinking that I could probably help a better book get written."

My view is still that a much better Intro to AI risk can still get written.

I currently lean toward Darren McKee's Uncontrollable still being a better intro than IABIED, though I'm going to reread IABIED once my hardcopy arrives before making a confident judgment.

Reply
I enjoyed most of IABIED
WilliamKiely15d10

I independently had this same thought when listening to the book on Tuesday, and think it's worth emphasizing:

I again think they’re inappropriately reasoning about what happens for arbitrarily intelligent models instead of reasoning about what happens with AIs that are just barely capable enough to count as ASI. Their arguments (that AIs will learn goals that are egregiously misaligned with human goals and then conspire against us) are much stronger for wildly galaxy-brained AIs than for AIs that are barely smart enough to count as superhuman.

Reply
I enjoyed most of IABIED
WilliamKiely15d20

"If anyone builds it (with techniques like those available today), everyone dies"

One could argue that the parenthetical caveat is redundant if the "it" means something like "superintelligent AI built with techniques like those available today".

I also listened to the book and don't have the written text available yet, so I'll need to revisit it when my hardcopy arrives to see if I agree that there are problematic uncaveated versions of the title throughout the text.

(At first I disliked the title because it seemed uncaveated, but again, the "it" in the title is ambiguous and can be interpreted as including the caveats, so now I'm more neutral about the title.)

Reply
Load More
5IABIED Misc. Discussion Thread
7d
5
5WilliamKiely's Shortform
1mo
2
32Why did interest in "AI risk" and "AI safety" spike in June and July 2025? (Google Trends)
Q
2mo
Q
4
10A Cheeky Pint with Anthropic CEO Dario Amodei
2mo
3
21Geoffrey Hinton - Full "not inconceivable" quote
3y
2
63Transcript: NBC Nightly News: AI ‘race to recklessness’ w/ Tristan Harris, Aza Raskin
3y
4
19[Linkpost] GatesNotes: The Age of AI has begun
3y
9
6Can AI systems have extremely impressive outputs and also not need to be aligned because they aren't general enough or something?
Q
3y
Q
3
99DeepMind: The Podcast - Excerpts on AGI
3y
12
62[Expired] 20,000 Free $50 Charity Gift Cards
5y
13
Load More