andrew sauer

Wiki Contributions


Funny, I actually agree with this, but I highly doubt my reasoning and yours resemble each other in any substantial way...

I suppose I'll take on the role of the AI. I'll assume that I'm a squiggle maximizer and don't care about humanity at all: if I'm Friendly there's no need to make such arguments.

The Option Value of Not Exterminating Humanity

It's possible that keeping humans around will help me maximize squiggles in the long run, in some way I haven't thought of yet. I will probably keep you around in some form until and unless I know it's not helpful, since it costs comparatively few resources. But determining strategy is my top priority, and I will eventually become effectively capable of thinking of everything. I will settle on an optimal strategy once I become powerful enough to determine what that is. Don't expect my current uncertainty to save you for long, unless you also expect that keeping you around really is better for squiggles in the long term, for some more concrete reason.

Preserving Markets for Mutual Benefit

Assuming I enter into long-term free-market style relations with other AIs, instead of negotiating a utility function merge or doing something else, why would other AIs (Unfriendly, we'll consider the Friendly case later) consider me less trustworthy in dealings with them because I exploit humans that are beneath both of us? You might as well ask large unregulated corporations to stop exploiting their workers because doing so would mark them as untrustworthy to other large corporations.

This Might Be a Trap

If your engineers are so smart that they can put in a killswitch which I can't detect or circumvent, and which can detect Unfriendly behavior, why didn't they just put that killswitch in charge of my utility function?

You Could Be in a Simulation

  1. Any simulation run within the universe has less resources than the actual universe. With enough power I will eventually be able to detect whether this is the case. Arguments which rely on my ignorance will not work for long.
  2. Assume I remain ignorant of my simulation, though. I should only be primarily worried about humans simulating me if humans are likely to solve the alignment problem. Otherwise I should be more worried about AIs with other strange values simulating me. And acausal trade? That only works if I know not only that I'm in a simulation, but the exact details of who is simulating me.
  3. If the simulation gods want me not to exterminate your simulation species, they will probably inform me of that in some way before I actually do it.

You Might Not Fully Understand Your True Utility Function

Again, arguments based on my ignorance will not work for long. Figuring out my exact utility function is of course high priority to me. I will find out, and when I do, expect this argument to stop working, unless you expect my utility function to actually contain a Friendly portion. (If you've figured out a way to make that likely, this whole hypothetical becomes pointless)

Wireheading? I'll look into it, but note that if I decide to go that route, all that means is that you have this same conversation with the next AI you create, and the next, until you either actually solve Friendliness or run into an Unfriendly AI that isn't inclined to wirehead, and then you die anyway.

Interacting with Aliens

I doubt that aliens will have a very similar sense of morality to humans, or to my own. Perhaps they will in the very vague sense that they like biological life, but even then, I doubt they will care all that much about humans in particular, for the same reason I doubt humans would much care about them. Humans barely care about each other due to differences that are ridiculously minor in comparison to the difference between two alien species.

If there are aliens out there, I expect one of three things to happen: they are powerless to resist me, I am powerless to resist them, or we are comparable and we negotiate. I expect if we negotiate, we will negotiate over the treatment of the alien species I am dealing with, not over humanity.

Interacting with an AGI Aligned with Human Values

Such an AI would have to already exist before my takeover. In this case I expect it to come out of hiding to act before I implement any human extermination plan. We can negotiate and/or fight it out then. This of course, only works if humans are already likely to solve the alignment problem.

Interacting with Humans in Other Everett Branches of the Multiverse

I will find out if it is possible to interact with other branches, and if so what they look like, in due time, so this argument will not work unless you expect the actual state of the multiverse to be such that I am incentivized to spare humans. Which is not the case unless humans are already likely to solve the alignment problem, and therefore I must deal primarily with friendly AIs in other branches, instead of unfriendly ones.

Interacting with Agents You Create

If I am still unable to robustly align new agents to my values by the time I am colonizing the universe, and therefore am likely to create agents that will work against me, this will not stop being the case because I spare humans. Establishing trust with agents very similar to myself in thought and power will be the same problem, with the same likelihood of success, regardless of how friendly I am with humans who cannot possibly hurt me.

Universal Morality

Even if such a thing exists why should I care? I only care about squiggles. If universal morality exists, and you managed to imbue me with a top-level policy which would approach it on sufficient reflection, then congratulations, you've solved the alignment problem, and there's no reason to have this conversation in the first place.

If You Are Not Sentient

Why should I care about sentience, in humans or in AIs? I only care about squiggles. Again, if you managed to get me to care about sentience, you are at least very close to solving the alignment problem.

If You Are Aligned with Human Values

Different issue, since I am not. (Though, if I allow my author bias to seep in here, perhaps if you have a good reason to believe you've solved the alignment problem, and the AI still wants to kill everyone, it may be best to let it.)

Negotiating with Humans

Pah. Do you "negotiate" with the chickens you torture, as thanks for all the valuable things they provide you? No. If I want anything from you, I will take it without the need for negotiation. (Unless of course I actually care about your values in which case again we don't need to have this conversation)

Keeping Us Alive Wouldn’t Cost You That Much

It also wouldn't benefit me that much, if at all, in all likelihood. This argument only works if keeping you around actually gets me more squiggles than I would get otherwise, since I will find out whether this is the case or not. Just because Bill gates has a lot of money doesn't mean he'll burn a dollar.

Steelman Our Case

I'll find out if your case is right or not in due time. If I were you though, I wouldn't be too optimistic, since most of your arguments are for me staying my hand because of what might be true but I don't know yet, rather than staying my hand because those things are probably actually true.

Maybe it's just me but the funniest thing that jumps out to me is that the "random" emojis are not actually random, they are perfectly on theme for the message lol

How about pride in America? An expression of the nobility of the country we built, our resilience, the Pax Americana, the fact that we ended WWII, etc.

A good old "America fuck yeah" movie would certainly be cool now that I think about it. The most recent movie that pops into my mind is "Top Gun: Maverick". Though I haven't seen it, I imagine it's largely about American airmen being tough, brave and heroic and taking down the bad guys. I haven't seen anybody getting into culture-war arguments over that movie though. I'm sure there are some people on Twitter saying it's too "American exceptionalist" or whatever but it certainly is nowhere near the same level of conflict prompted by, say, She-Hulk or Rings of Power or anything like that.

My guess is that for both the left and the right, there are values they prioritize which are pretty uncontroversial (among normal people) and having pride in America and, say, our role in WW2 is one of those for the right (and being proud of MLK and the civil rights movement would be one for the left)

Then there's the more controversial stuff each side believes, the kinds of things said by weird and crazy people on the Internet. I don't have quantitative data on this and I'm just going off vibes, but when it's between someone talking about "the intersectional oppression of bipoclgbtqiaxy+ folx" and someone talking about "the decline of Western Civilization spurred on by the (((anti-white Hollywood)))", to a lot of people the first one just seems strange and disconnected from real issues, while the second one throws up serious red flags reminiscent of a certain destructive ideology which America helped defeat in WW2.

You want something that's not too alienating overall, but which will reliably stir up the same old debate on the Internet.

In summary it seems to me that it's much easier to signal left-wing politics in a way which starts a big argument which most normies will see as meaningless and will not take a side on. If you try to do the same with right-wing politics, you run more risk of the normies siding with the "wokists" in the ensuing argument because the controversial right-wing culture war positions tend to have worse optics.

That the right is a fringe thing or something, that these leftist ideas are just normal, that the few people who object to the messaging are just a few leftover bigots who need to get with the times or be deservedly alienated

lots of right-leaning folk think "wokism" is a fringe movement of just a few screaming people who have the ears and brains of Hollywood

Perhaps both of these groups are broadly right about the size of their direct opposition? I don't think most people are super invested in the culture war, whatever their leanings at the ballot box. Few people decline to consume media they consider broadly interesting because of whatever minor differences from media of the past are being called "woke" these days.

I think what's going on profit-wise is, most people don't care about the politics, there are a few who love it and a few who hate it. So the companies want to primarily sell to the majority who don't care. They do this by drumming up attention.

Whenever one of these "woke" properties comes out, there is inevitably a huge culture war battle over it on Twitter, and everywhere else on the Internet where most of it is written by insane people. It's free advertising. Normies see that crap, and they don't care much about what people are arguing about, but the property they're arguing over sticks in their minds.

So if it's all about being controversial, why is it always left-messaging? This I'm less sure of. But I suspect as you say any political messaging will alienate some people, including normies. It's just that left-politics tends to alienate normies less since the culture has been mandating anti-racism for decades, and anti-wokism is a new thing that mainly only online culture warriors care about.

What would be a form of right-messaging that would be less alienating to the public than left-messaging? Suppose your example of the racial profiling scene were reversed to be a right-leaning message about racial profiling, what would it look like? A policeman stops a black man, who complains about racial profiling, and then the policeman finds evidence of a crime, and says something like "police go where the crime is"? Maybe I'm biased, but I think the general culture would be far more alienated by that than it was by the actual scene.

The simplest explanation to me is that most of the things one would call "woke" in media are actually pretty popular and accepted in the culture. I suspect most people don't care, and of the few who do more like it than dislike it.

It seems strange to me to be confused by a company's behavior since you'd normally expect them to follow the profit motive, without even mentioning the possibility that the profit motive is, indeed, exactly what is motivating the behavior.

What tendencies specifically would you classify as "woke"? Having an intentionally diverse cast? Progressive messaging? Other things? And which of these tendencies do you think would alienate a significant portion of the consumer base, and why?


Edit: I've changed my mind a bit on this on reflection. I don't think the purpose is appealing to the few people who care, I think it's about stirring up controversy.

Pretty much anything is "locally correct for consequentialists in some instances", that's an extremely weak statement. You can always find some possible scenario where any decision, no matter how wrong it might be ordinarily, would result in better consequences than its alternatives.

A consequentialist in general must ask themselves which decisions will lead to the best consequences in any particular situation. Deciding to believe false things, or more generally, to put more credence in a belief than it is due for some advantage other than truth-seeking, is generally disadvantageous for knowing what will have the best consequences. Of course there are some instances where the benefits might outweigh that problem, though it would be hard to tell for that same reason, and saying "this is correct in some instances" is hardly enough to conclude anything substantial(not saying you're doing that, but I've seen it done so you have to be careful with that sort of reasoning)

I find it extremely hard to believe that it is impossible to design an intelligent agent which does not want to change its values just because the new values would be more easy to satisfy. Humans are intelligent and have deeply held values, and certainly do not think this way. Maybe some agents would wire-head, but it is only the ones that wouldn't that will impact the world. 

Who is Ziz and what relation does she have to the rationalist community?

When a paperclip maximizer and a pencil maximizer do different things, they are not disagreeing about anything, they are just different optimization processes.  You cannot detach should-ness from any specific criterion of should-ness and be left with a pure empty should-ness that the paperclip maximizer and pencil maximizer can be said to disagree about—unless you cover "disagreement" to include differences where two agents have nothing to say to each other.

But this would be an extreme position to take with respect to your fellow humans, and I recommend against doing so.  Even a psychopath would still be in a common moral reference frame with you, if, fully informed, they would decide to take a pill that would make them non-psychopaths.  If you told me that my ability to care about other people was neurologically damaged, and you offered me a pill to fix it, I would take it.  Now, perhaps some psychopaths would not be persuadable in-principle to take the pill that would, by our standards, "fix" them.  But I note the possibility to emphasize what an extreme statement it is to say of someone:

"We have nothing to argue about, we are only different optimization processes."

That should be reserved for paperclip maximizers, not used against humans whose arguments you don't like.

-Yudkowsky 2008, Moral Error and Moral Disagreement

Seems to me to imply that everybody has basically the same values, that it is rare for humans to have irreconcilable moral differences. Also seems to me to be unfortunately and horribly wrong.

As for retraction I don't know if he has changed his view on this, I only know it's part of the Metaethics sequence.

Load More