WilliamKiely — LessWrong

Leaving Open Philanthropy, going to Anthropic

What’s more, I think no private company should be in a position to impose this kind of risk on every living human, and I support efforts to make sure that no company ever is.

I don't see your name on the Statement on Superintelligence when I search for it. Assuming you didn't sign it, why not? Do you disagree with it?

It seems like an effort to make sure that no company is in the position to impose this kind of risk on every living human:

We call for a prohibition on the development of superintelligence, not lifted before there is
broad scientific consensus that it will be done safely and controllably, and
strong public buy-in.

(Several Anthropic, OpenAI, and Google DeepMind employees signed.)

Consider donating to Alex Bores, author of the RAISE Act

WilliamKiely19d140

IABIED Misc. Discussion Thread

WilliamKiely1mo72

Chapter 5, “Its Favorite Things,” starts with Yudkowsky’s “Correct-Nest parable” about intelligent aliens who care a lot about the exact number of stones found in their nests.

Immediately after the parable, on page 82:

Most alien species, if they evolved similarly to how known biological evolution usually works, and if given a chance to have things the way they liked them most, probably would not choose a civilization where all their homes contained a large prime number of stones. There are just a lot of other ways to be; there are a lot of other directions one could steer. Much like predicting that your next lottery ticket won’t be a winning one, this is an easy call.
Similarly, most powerful artificial intelligences, created by any method remotely resembling the current methods, would not choose to build a future full of happy, free people. We aren't saying this because we get a kick out of being bleak. It's just that those powerful machine intelligences will not be born with preferences much like ours.

This is just a classic “counting argument” against alignment efforts being successful, right?

I recall Alex Turner (TurnTrout) arguing that (at least some) counting arguments (that are often made) are wrong (Many arguments for AI x-risk are wrong) and quoting Nora Belrose and Quintin Pope arguing the same (Counting arguments provide no evidence for AI doom). Some people in the comments, such as Evan Hubinger, seem to disagree, but as a layperson the discussion became too technical for me to understand.

In any case, the version of the counting argument in the book seems simple enough that as a layperson I can tell that it's wrong. To me it seems like it clearly proves too much.

Insofar as Yudkowsky and Soares are saying here that an ASI created by any method remotely resembling the current method will likely not choose to build a future full of happy, free people because there are many more possible preferences that an ASI could have than the narrow subset of preferences that would lead to it building a future of happy, free people, then I think the argument is wrong.

It seems like this counting observation is a reason to think (so maybe I think the "no evidence" in the above linked post title is too strong) that the preferences an ASI ends up having might not be the preferences its creators try training into it, because the target preferences are indeed a narrow target and narrow targets are easier to miss than broad targets. But surely this counting observation is not sufficient to conclude that ASI creators will fail to hit their narrow target. It seems like you would need more reasons to conclude that.

Contra Collier on IABIED

WilliamKiely2mo41

Agreed that current models fail badly at alignment in many senses.

I still feel like the bet that OP offered Collier in response to her stating that currently available techniques do a reasonably good job of making potentially alien and incomprehensible jealous ex-girlfriends like "Sydney" very rare was inappropriate, as the bet was clearly about a different claim than her claim about the frequency of Sydney-like behavior.

A more appropriate response from OP would have been to say that while current techniques may have successfully reduced the frequency of Syndey-like behavior, they're still failing badly in other respects, such as your observation with Claude Code.

Contra Collier on IABIED

WilliamKiely2mo10

But the way you are reading it seems to mean her "strawmann[ed]" point is irrelevant to the claim she made!

I agree.

Contra Collier on IABIED

WilliamKiely2mo1614

(I only skimmed your review / quickly read about half of it. I agree with some of your criticisms of Collier's review and disagree with others. I don't have an overall take.)

One criticism of Collier's review you appeared not to make that I would make is the following.

Collier wrote:

By far the most compelling argument that extraordinarily advanced AIs might exist in the future is that pretty advanced AIs exist right now, and they’re getting more advanced all the time. One can’t write a book arguing for the danger of superintelligence without mentioning this fact.

I disagree. I think it was clear decades before the pretty advanced AIs of today existed that extraordinarily advanced AIs might exist (and indeed probably would exist) eventually. As such, the most compelling argument that extraordinarily advanced AIs might or probably will exist in the future is not that pretty advanced AIs exist today, but the same argument one could have made (and some did make) decades ago.

One version of the argument is that the limits of how advanced AI could be in principle seem extraordinarily advanced (human brains are an existence proof and human brains have known limitations relative to machines) and it seems unlikely that AI progress would permantently stall before getting to a point where there are extraordinarily advanced AIs.

E.g. I.J. Good foresaw superintelligent machines, and I don't think he was just getting lucky to imagine that they might or probably would come to exist at some point. I think he had access to compelling reasons.

The existence of pretty advanced AIs today is some evidence and allows us to be a bit more confident that extraordinarily advanced AIs will eventually be built, but their existence is not the most compelling reason to expect significantly more capable AIs to be created eventually.

Contra Collier on IABIED

WilliamKiely2mo22

[C]urrently available techniques do a reasonably good job of addressing this problem. ChatGPT currently has 700 million weekly active users, and overtly hostile behavior like Sydney’s is vanishingly rare.
Yudkowsky and Soares might respond that we shouldn’t expect the techniques that worked on a relatively tiny model from 2023 to scale to more capable, autonomous future systems. I’d actually agree with them. But it is at the very least rhetorically unconvincing to base an argument for future danger on properties of present systems without ever mentioning the well-known fact that present solutions exist.
It is not a “well-known fact” that we have solved alignment for present LLMs. If Collier believes otherwise, I am happy to make a bet and survey some alignment researchers.

I think you're strawmanning her here.

Her "present solutions exist" statement clearly refers to her "techniques [that] do a reasonably good job of addressing this problem [exist]" from the previous paragraph that you didn't quote (that I added in the quote above). I.e. She's clearly not claiming that alignment for present LLMs is completely solved, just that solutions that work "reasonably well" exist such that overtly hostile behavior like Bing Sydney's is rare.

IABIED Review - An Unfortunate Miss

WilliamKiely2mo82

Fair review. As I've now said elsewhere, after listening to IABIED I think your book Uncontrollable is probably still the best overview of AI risk for a general audience. More people should definitely read your book. I'd be down to write a more detailed comparison in a week or two once I have hardcopies of each book (still in the mail).

I enjoyed most of IABIED

WilliamKiely2mo50

FWIW Darren's book Uncontrollable is my current top recommended book on AI.

While I expected (75% chance) IABIED to overtake it, after listening to the audiobook Tuesday I don't think IABIED is better (though I'll wait until I receive and reread my hardcopy to declare that definitively).

As I wrote on Facebook 10 months ago:

The world is not yet as concerned as it should be about the impending development of smarter-than-human AI. Most people are not paying enough attention.
What one book should most people read to become informed and start to remedy this situation?
"Uncontrollable: The Threat of Artificial Superintelligence and the Race to Save the World" by Darren McKee is now my top recommendation, ahead of:
- "Superintelligence" by Nick Bostrom,
- "Human Compatible" by Stuart Russell, and
- "The Alignment Problem" by Brian Christian
It's a short, easy read (6 hours at ~120wpm / 2x speed on Audible) covering all of the most important topics related to AI, from what's happening in the world of AI, to what risks from AI humanity faces in the near future, to what each and everyone one of us can do to help with the most important problem of our time.

I enjoyed most of IABIED

WilliamKiely2mo*20

I'm less worried about this after reading the book, because the book was good enough that it's hard for me to imagine someone else writing a much better one.

I was really hoping you'd say "after reading the book, I updated toward thinking that I could probably help a better book get written."

My view is still that a much better Intro to AI risk can still get written.

I currently lean toward Darren McKee's Uncontrollable still being a better intro than IABIED, though I'm going to reread IABIED once my hardcopy arrives before making a confident judgment.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments