Posts

Sorted by New

Wiki Contributions

Comments

GPT-2 does not - probably, very probably, but of course nobody on Earth knows what's actually going on in there - does not in itself do something that amounts to checking possible pathways through time/events/causality/environment to end up in a preferred destination class despite variation in where it starts out.

A blender may be very good at blending apples, that doesn't mean it has a goal of blending apples.

A blender that spit out oranges as unsatisfactory, pushed itself off the kitchen counter, stuck wires into electrical sockets in order to burn open your produce door, grabbed some apples, and blended those apples, on more than one occasion in different houses or with different starting conditions, would much more get me to say, "Well, that thing probably had some consequentialism-nature in it, about something that cashed out to blending apples" because it ended up at highly similar destinations from different starting points in a way that is improbable if nothing is navigating Time.

 

It doesn't seem  crazy to me that a GPT type architecture with the "Stack More Layers" could eventually model the world well enough to simulate consequentialist plans - i.e given a prompt like:

"If you are a blender with legs in environment X, what would you do to blend apples?" and provide a continuation with a detailed plan like the above (and GPT4/5 etc with more compute giving slightly better plans - maybe eventually at a superhuman level)

It also seems like it could do this kind of consequentialist thinking without itself having any "goals" to pursue. I'm expecting the response to be one of the following, but I'm not sure which:

  • "Well, if it's already make consequentialist plans, surely it has some goals like maximizing the amount of text it generates etc., and will try to do whatever it can to ensure that (similar to the "consequentialist alphago" example in the conversation) instead of just letting itself be turned off.
  • A LLM / GPT will never be able to reliably output such plans with the current architecture or type of training data.

Small world, I guess :) I knew I heard this type of argument before, but I couldn't remember the name of it.

So it seems like the grabby aliens model contradicts the doomsday argument unless one of these is true:

  • We live in a "grabby" universe, but one with few or no sentient beings long-term?
  • The reference classes for the 2 arguments are somehow different (like discussed above)

Thanks for the great writeup (and the video). I think I finally understand the gist of the argument now.

The argument seems to raise another interesting question about the grabby aliens part. 

He's using the hypothesis of grabby aliens to explain away the model's low probability of us appearing early (and I presume we're one of these grabby aliens). But this leads to a similar problem: Robin Hanson (or anyone reading this) has a very low probability of appearing this early amongst all the humans to ever exist.

This low probability would also require a similar hypothesis to explain away.  The only way to explain that is some hypothesis where he's not actually that early amongst the total humans to ever exist which means we turn out not to be "grabby"?

This seems like one the problems with anthropic reasoning arguments and I'm unsure how seriously to take them.