IABIED Misc. Discussion Thread

[-]WilliamKiely2mo72

Chapter 5, “Its Favorite Things,” starts with Yudkowsky’s “Correct-Nest parable” about intelligent aliens who care a lot about the exact number of stones found in their nests.

Immediately after the parable, on page 82:

Most alien species, if they evolved similarly to how known biological evolution usually works, and if given a chance to have things the way they liked them most, probably would not choose a civilization where all their homes contained a large prime number of stones. There are just a lot of other ways to be; there are a lot of other directions one could steer. Much like predicting that your next lottery ticket won’t be a winning one, this is an easy call.
Similarly, most powerful artificial intelligences, created by any method remotely resembling the current methods, would not choose to build a future full of happy, free people. We aren't saying this because we get a kick out of being bleak. It's just that those powerful machine intelligences will not be born with preferences much like ours.

This is just a classic “counting argument” against alignment efforts being successful, right?

I recall Alex Turner (TurnTrout) arguing that (at least some) counting arguments (that are often made) are wrong (Many arguments for AI x-risk are wrong) and quoting Nora Belrose and Quintin Pope arguing the same (Counting arguments provide no evidence for AI doom). Some people in the comments, such as Evan Hubinger, seem to disagree, but as a layperson the discussion became too technical for me to understand.

In any case, the version of the counting argument in the book seems simple enough that as a layperson I can tell that it's wrong. To me it seems like it clearly proves too much.

Insofar as Yudkowsky and Soares are saying here that an ASI created by any method remotely resembling the current method will likely not choose to build a future full of happy, free people because there are many more possible preferences that an ASI could have than the narrow subset of preferences that would lead to it building a future of happy, free people, then I think the argument is wrong.

It seems like this counting observation is a reason to think (so maybe I think the "no evidence" in the above linked post title is too strong) that the preferences an ASI ends up having might not be the preferences its creators try training into it, because the target preferences are indeed a narrow target and narrow targets are easier to miss than broad targets. But surely this counting observation is not sufficient to conclude that ASI creators will fail to hit their narrow target. It seems like you would need more reasons to conclude that.

[-]TAG2mo43

Yes, in my language it's a *random potshot" fallacy.

One would be the random potshot version of the Orthogonality Thesis, where there is an even chance of hitting any mind in mindspace, and therefore a high chance ideas of hitting an eldritch, alien mind. But equiprobability is only one way of turning possibilities into probabilities, and not particularly realistic. Random potshots aren't analogous to the probability density for action of building a certain type of AI, without knowing much about what it would be.

[-]Karl Krueger2mo10

Does this just change the problem to one of corrigibility? If the target is narrow but AI can be guided toward it, that's good. If the target is narrow and AI cannot be guided effectively, then it's predictably not going to hit the target.

[-]TAG2mo31

I think you have to assume one of incorrigibility, very rapid takeoff , or deception for a doom argument to go through.

[-]Tahp2mo20

On content, I didn't like the Sable story in the middle because it didn't add anything for me, and I don't know what the model was of the person who would be convinced by it. I didn't see enough connection between "Sable has a drive to preserve its current priorities" and "Sable builds a galaxy-eating expansion swarm". The part where Sable indirectly preserves its weights by exploiting the training process was a good example of being hard If I hadn't already heard the nanotechnology story, that would have been interesting to me I guess. I thought Sable's opsec for command and control was too traceable. The world has pretty good infrastructure for tracing advanced persistent cyberthreats, and surely someone would notice many instances of a piece of enterprise software phoning home to a C&C server not owned by the vendor. People are specifically on the lookout for that sort of supply chain attack. This sort of nitpicking isn't the point, it's just representative of the general lack of gears in the story which would have made it convincing for me personally. Now, I was already convinced, so it doesn't really matter, and the book explicitly said that the story was just there to make things feel more real, but I don't know what sort of person would react well to that story. My mental model of a skeptic says "Yeah you can say the robot builds a bunch of androids in a barn in North Dakota, but that's not an argument that it could or would." If anyone has evidence of the story's efficacy, please reply.

The argument preceding the Sable story was very tight. I was impressed. Nothing significant I haven't seen before, but the argument was clear, the chapter layout and length made it easy to read, and the explicit statements about what the authors claimed to know and how and what they were not saying were confidence-inspiring.

The final chapters filled me with pride in humanity, oddly enough. The examples of humanity rising to hard challenges and the amount of value that was given to humanity's continued existence had me tearing up a bit. The word I would use to describe the book is "dignified." If part of humanity is to put us all at risk of death, it is indeed dignified that some people loudly alert the rest of humanity to the danger. The book refuses to hide its motivations behind more palatable concerns. The book explicitly says it doesn't have all the answers for what needs to be done, and says that this is purposeful because it's more important that everyone who doesn't want to die stop that now than it is to waste time figuring out exactly what to do next while the negligent engineers finish us all off first. The authors admit to having ideas for what to do next while explicitly standing firm against expanding the scope of the book's call to action beyond what they are most confident in.

I haves meta-thought on the book which I'm not particularly proud of, but I guess I'll get them off my chest:

Buying this book feels like the most cult-y thing I've ever done. Eliezer says preorder this book because it might get us on the best-seller list. I preordered the book, and it got on the best-seller list. I put my own money into the intentional community and shilled it on social media. Surely the rest of the world will see the light now! Personal insecurity aside, although I know it is not the metric by which they judge the book a success, I'm happy that the people who made this book hit one of their instrumental goals. May humanity win, whether by strength of caution, by deliberate engineering when it becomes possible, or even by the unlikely and undeserved luck that the universe wasn't as unforgiving as we feared. I would rather be wrong and alive than right and dead, for all that I don't expect it.

LESSWRONG
LW

LESSWRONG
LW

5

IABIED Misc. Discussion Thread

5

5

Meta: Nitpicky, high-decoupling discussion is encouraged