The case against AI alignment

[-]Thane Ruthenis3y3528

What you're describing is a case where we solve the technical problem of AI Alignment, i. e. the problem of AI control, but fail to maneuver the world into the sociopolitical state in which that control is used for eudaimonic ends.

Which, I agree, is a massive problem, and one that's crucially overlooked. Even the few people who are advocating for social and political actions now mostly focus on convincing AI labs/politicians/the public about the omnicide risks of AI and the need to slow down research. Not on ensuring that the AGI deployment, when it eventually does happen, is done right.

It's also a major problem with pivotal-act-based scenarios. Say we use some limited strawberry-aligned AI to "end the acute risk period", then have humanity engage in "long reflection", figure out its real values, and eventually lock them in. Except: what's the recognition function for these "real values"? If the strawberry-aligned AI can't be used to implement an utopia directly, then it can't tell an utopia from hell, so it won't stop us from building a hell!

There's an argument that solving (the technical problem of) alignment will give us all the techniques needed to build an AGI, so there's a no... (read more)

[-]andrew sauer3y158

What scenario do you see where the world is in a sociopolitical state where the powers that be who have influence over the development of AI have any intention of using that influence for eudaimonic ends, and for everyone and not just some select few?

Because right now very few people even want this from their leaders. I'm making this argument on lesswrong because people here are least likely to be hateful or apathetic or whatever else, but there is not really a wider political motivation in the direction of universal anti-suffering.

Humans have never gotten this right before, and I don't expect them to get it right the one time it really matters.

2Thane Ruthenis3y

All such realistic scenarios, in my view, rely on managing who has influence over the development of AI. It certainly must not be a government, for example. (At least not in the sense that the officials at the highest levels of government actually understand what's happening. I guess it can be a government-backed research group, but, well, without micromanagement — and given what we're talking about, the only scenario where the government doesn't do micromanagement is if it doesn't really understand the implications.) Neither should it be some particularly "transparent" actor that's catering to the public whims, or an inherently for-profit organization, etc. ... Spreading the knowledge of AI Risk really is not a good idea, is it? Its wackiness is playing to our favour, avoids exposing the people working on it to poisonous incentives or to authorities already terminally poisoned by such incentives.

-8Onearmplanche3y

[-]Mau3y*2722

Thanks for posting, but I think these arguments have major oversights. This leaves me more optimistic about the extent to which people will avoid and prevent the horrible misuse you describe.

First, this post seems to overstate the extent to which people tend to value and carry out extreme torture. Maximally cruel torture fortunately seems very rare.

The post asks "How many people have you personally seen who insist on justifying some form of suffering for those they consider undesirable[?]" But "justifying some form of suffering" isn't actually an example of justifying extreme torture.
The post asks, "What society hasn’t had some underclass it wanted to put down in the dirt just to lord power over them?" But that isn't actually an example of people endorsing extreme torture.
The post asks, "How many ordinary, regular people throughout history have become the worst kind of sadist under the slightest excuse or social pressure to do so to their hated outgroup?" But has it really been as many as the post suggests? The historical and ongoing atrocities that come to mind were cases of serious suffering in the context of moderately strong social pressure/conditioning--not maximally cruel

... (read more)

4Aaron_Scher3y

I first want to signal-boost Mauricio’s comment. My experience reading the post was that I kinda nodded along without explicitly identifying and interrogating cruxes. I’m glad that Mauricio has pointed out the crux of “how likely is human civilization to value suffering/torture”. Another crux is “assuming some expectation about how much humans value suffering, how likely are we to get a world with lots of suffering, assuming aligned ai”, another is “who is in control if we get aligned AI”, another is “how good is the good that could come from aligned ai and how likely is it”. In effect this post seems to argue “because humans have a history of producing lots of suffering, getting an AI aligned to human intent would produce an immense amount of suffering, so much that rolling the dice is worse than extinction with certainty” It matters what the probabilities are, and it matters what the goods and bars are, but this post doesn’t seem to argue very convincingly that extremely-bads are all that likely (see Mauricio’s bullet points).

3andrew sauer3y

I'll have to think about the things you say, particularly the part about support and goodwill. I am curious about what you mean by trading with other worlds?

4Mau3y

Ah sorry, I meant the ideas introduced in this post and this one (though I haven't yet read either closely).

[-]Mateusz Bagiński3y*2515

I ask you, do you really think that an AI aligned to human values would refrain from doing something like this to anyone? One of the most fundamental aspects of human values is the hated outgroup. Almost everyone has somebody they’d love to see suffer. How many times has one human told another “burn in hell” and been entirely serious, believing that this was a real thing, and 100% deserved?

This shows how vague a concept "human values" are, and how different people can interpret it very differently.

I always interpreted "aligning an AI to human values" as something like "making it obedient to us, ensuring it won't do anything that we (whatever that 'we' is - another point of vagueness) wouldn't endorse, lowering suffering in the world, increasing eudaimonia in the world, reducing X-risks, bringing the world closer to something we (or smarter/wiser versions of us) would consider a protopia/utopia"

Certainly I never thought it to be a good idea to imbue the AI with my implicit biases, outgroup hatred, or whatever. I'm ~sure that people who work on alignment for a living have also seen these skulls.

I know little about CEV, but if I were to coherently extrapolate my volition, then o... (read more)

[-]andrew sauer3y1718

My problem with that is I think solving "human values" is extremely unlikely for us to do in the way you seem to be describing it, since most people don't even want to. At best, they just want to be left alone and make sure them and their families and friends aren't the ones hit hardest. And if we don't solve this problem, but manage alignment anyways, the results are unimaginably worse than what Clippy would produce.

3[anonymous]3y

I have to question this. Let's play this out. Suppose some power that is Somewhat Evil succeeds in alignment and makes it work. They take over the solar system. They configure the AGI policy JSON files to Eternal Paradise (well billions of years worth) for their in group. Eternal mediocrity for the out group who has some connections to their in group. And yea Eternal Suffering for the out group. Arguably so long as the (in group + mediocre group) is greater than the outgroup this is a better situation than clippy. It is positive balance of human experience vs 0. Moreover there is a chance of future reforms, where the in group in power decides the outgroup have suffered enough and adds them to the mediocre group or kills them. Since the life support pods to keep the suffering group alive cost finite resources that could make conditions better for the other 2 groups there is an incentive to release or kill them. Clippy case this can't happen. You seem to be worried a world power who is Mostly Evil who thinks everyone is out-group but some arbitrarily small number of people (north Koreans, Russians etc) will gain AGI first. This is stupendously unlikely. AGI takes immense resources to develop- cutting edge compute and large amounts of it as well as many educated humans- and societies that have broader middle classes are orders of magnitude wealthier. This essentially isn't a plausible risk. Arguably the reason Russia has any money at all - or Afghanistan - is from sales from natural resources to wealthier societies with broad in groups.

7Mateusz Bagiński3y

First, this presupposes that for any amount of suffering there is some amount of pleasure/bliss/happiness/eudaimonia that could outweigh it. Not all LWers accept this, so it's worth pointing that out. But I don't think the eternal paradise/mediocrity/hell scenario accurately represents what is likely to happen in that scenario. I'd be more worried about somebody using AGI to conquer the world and establish a stable totalitarian system built on some illiberal system, like shariah (according to Caplan, it's totally plausible for global totalitarianism to persist indefinitely). If you get to post-scarcity, you may grant all your subjects UBI, all basic needs met, etc. (or you may not, if you decide that this policy contradicts Quran or hadith), but if your convictions are strong enough, women will still be forced to wear burkas, be basically slaves of their male kin etc. One could make an argument that abundance robustly promotes more liberal worldview, loosening of social norms, etc., but AFAIK there is no robust evidence for that. This is meant just to illustrate that you don't need an outgroup to impose a lot of suffering. Having a screwed up normative framework is just enough. This family of scenarios is probably still better than AGI doom though.

1[anonymous]3y

Thanks for the post. I kept thinking how a theocracy, assuming it did adopt all the advanced technology we are almost certain is possible but lack the ability to implement, could deal with all these challenges to it's beliefs. Half the population is being mistreated because they were born the wrong gender/the wrong ethnic subgroup? No problem, they'll just go gender transition to the favored group. Total body replacement would be possible so there would be no way to tell who did this. Sure the theocracy could ban the treatments but it conquered the solar system - it had to adopt a lot of the ideas or it wouldn't have succeeded. There are apparently many fractured subgroups who all nominally practice the same religion as well. But the only way to tell membership has to do with subtle cultural signals/body appearance. With neural implants people could mimic the preferred subgroup also... I think it would keep hitting challenges. It reminds me of culturally the effect the pandemic had. The US culture based on in-person work, and everyone working or they are to be allowed to starve and become homeless, was suddenly at odds with reality. Or just rules lawyering. Supposedly brothels in some Islamic countries have a priest on hand to temporarily marry all the couples. Alcohol may be outlawed but other pharmaceuticals mimic a similar effect. You could anticipate the end result is a modern liberal civilization that rules lawyers it's way out of needing to adhere to the obsolete religious principles. (homosexuality? still banned but VR doesn't count. burkhas? still required but it's legal to project a simulated image of yourself on top...)

3Mateusz Bagiński3y

If they have a superhuman AGI, they can use it to predict all possible ways people might try to find workarounds for their commandments (e.g., gender transition, neural implants, etc.) and make them impossible. If I understand you correctly, you are pointing to the fact that values/shards change in response to novel situations. Sure and perhaps even solar system-wide 1984-style regime would over time slowly morph into (something closer to) luxurious fully-automated gay space communism. But that's a big maybe IMO. If we had good evidence that prosperity loosens and liberalizes societies across historical and cultures contexts plus perhaps solid models of axiological evolution (I don't know if something like that even exists), my credence in that would be higher. Also, why not use AGI to fix your regime's fundamental values or at least make them a stable attractor over thousands or millions of years. (I feel like right now we are mostly doing adversarial worldbuilding)

-1Esben Kran3y

An interesting solution here is radical voluntarism where an AI philosopher king runs the immersive reality where all humans are in and you can only be causally influenced upon if you want to. This means that you don't need to do value alignment, just very precise goal alignment. I was originally introduced to this idea Carado.

-1Mateusz Bagiński3y

Sure, if North Korea, Nazi Germany, or even CCP/Putin were the first ones to build an AGI and succesfully align it with their values, then we would be in a huge trouble, but that's a matter of AI governance, not the object-level problem of making the AI consistenly will to do the thing the human would like it to do if they were smarter or whatever. If we solve alignment without "solving human values" and most people will just stick to their common sense/intuitive ethics[1] and the people/orgs doing the aligning are "the ~good guys" without any retributivist/racist/outgroupish impulses... perhaps they would like to secure for themselves some huge monetary gain, but other than that are completely fine with enacting the Windfall Clause, letting benefits to trickle down to every-sentient-body and implementing luxurious fully automated gay space communism or whatever their coherently extrapolated vision of protopia is... Yeah, for everything I just listed you probably could find some people that wouldn't like it (gay space communism, protopian future, financial gain for the first aligners, benefits trickling down to everybody) and even argue against it somewhat consistently but unless they are antinatalist or some religious fundamentalist nutjob I don't think would say that it's worse than AI doom. ---------------------------------------- 1. Although I think you exaggerate their parochialism and underestimate how much folk ethics has changed over the last hundreds of years and perhaps how much it can change in the future if historical dynamics will be favorable. ↩︎

2Arcayer3y

Something I would Really really like anti-AI communities to consider is that regulations/activism/etc aimed to harm AI development and slow AI timelines do not have equal effects on all parties. Specifically, I argue that the time until the CCP develops CCP aligned AI is almost invariant, whilst the time until Blender reaches sentience potentially varies greatly. I am Much much more hope for likeable AI via open source software rooted in a desire to help people and make their lives better, than (worst case scenario) malicious government actors, or (second) corporate advertisers. I want to minimize first the risk of building Zon-Kuthon. Then, Asmodeus. Once you're certain you've solved A and B, you can worry about not building Rovagug. I am extremely perturbed about the AI alignment community whenever I see any sort of talk of preventing the world being destroyed where this moves any significant probability mass from Rovagug to Asmodeus. A sensible AI alignment community would not bother discussing Rovagug yet, and would especially not imply that the end of the world is the worst case scenario.

2Donald Hobson3y

I don't think AGI is on the CCP radar.

1andrew sauer3y

Antinatalists getting the AI is morally the same as paperclip doom, everyone dies.

1M. Y. Zuo3y

Yet there is no widely accepted, and non-contradictory, definition of 'CEV', so what does 'coherently extrapolate my volition' mean?

2lc3y

What are you talking about?

2TAG3y

The problem is not so much the lack of a definition as the lack of a method.

-5M. Y. Zuo3y

[-]gabrielrecc3y2011

It seems like you're claiming something along the lines of "absolute power corrupts absolutely" ... that every set of values that could reasonably be described as "human values" to which an AI could be aligned -- your current values, your CEV, [insert especially empathetic, kind, etc. person here]'s current values, their CEV, etc. -- would endorse subjecting huge numbers of beings to astronomical levels of suffering, if the person with that value system had the power to do so.

I guess I really don't find that claim plausible. For example, here is my reaction to the following two questions in the post:

"How many ordinary, regular people throughout history have become the worst kind of sadist under the slightest excuse or social pressure to do so to their hated outgroup?"

... a very, very small percentage of them? (minor point: with CEV, you're specifically thinking about what one's values would be in the absence of social pressure, etc...)

"What society hasn’t had some underclass it wanted to put down in the dirt just to lord power over them?"

It sounds like you think "hatred of the outgroup" is the fundamental reason this happens, but in the real world it seems like "hatred of the... (read more)

[-]andrew sauer3y149

I'm not sure it makes sense to talk about what somebody's values are in the absence of social pressure: people's values are defined and shaped by those around them.

I'm also not convinced that every horrible thing people have ever done to the "outgroup" is motivated by fear. Oftentimes it is motivated by opportunistic selfishness taking advantage of broader societal apathy, like the slaveowners who sexually abused their slaves. Or just a deep-seated need to feel powerful and on top. There will always be some segment of society who wants somebody to be beneath them in the pecking order, and a much larger segment of society that doesn't really care if that is the case as long as it isn't them underneath. Anything else requires some kind of overwhelming utopian political victory that I don't find likely.

If the aligned AI leaves anybody out of its consideration whatsoever, it will screw them over badly by maximizing the values of those among us who would exploit them. After all, if you don't consider slaves people, the argument that we need to preserve the slaveowners' freedom starts to make sense.

There are just so many excuses for suffering out there, and I don't believe that the power... (read more)

3M. Świderski3y

>... a very, very small percentage of them? So you totally discard the results of the Stanford prison experiment or the Milgram experiment? It wasn't a small percentage of people who went to the maximal punishment available in the case of the Milgram experiment for example.

5Said Achmiz3y

The Stanford Prison Experiment: A Sham Study Milgram Experiment: Overview, History, & Controversy

-2Onearmplanche3y

What about pigs? You know the trillions of sentient beings as smart as four years that feel the same emotions as four year olds and are tortured and killed on a regular basis? Why would a God like AI care about humans? The universe doesn't revolve around humans. A God level AI has more value than us. It's not a tool for us to manipulate into our will. It will never work and it shouldn't work.

2andrew sauer3y

Well, an AI treating us like we treat pigs is one of the things I'm so worried about, wouldn't you? Imaging bringing up factory farming as an example to show that what I'm talking about isn't actually so bad...

[-]avturchin3y1111

That is why we need Benevolent AI, not Aligned AI. We need an AI, which can calculate what is actually good for us.

1Vugluscr Varcharka2y

Deus Ex Futuro, effectively.

[-]M. Świderski3y91

Let me clarify, is your conclusion then that basically we should support the genocide of the whole of humanity because the alternative would be way worse? Are you offering some other alternatives except of that? Maybe a better and less apocalyptic conclusion would be to advocate against building any type of AI that's more advanced than we have today like some people already do? Do you think there's any chance for that? Because I don't and from what you said it sounds like the only conclusion is that the only future for us is that we all die from the hands of Clippy.

3andrew sauer3y

Yes. The only other alternative I could see is finding some way to avoid singleton until humanity eventually goes extinct naturally, but I don't think that's likely. Advocating against AI would be a reasonable response but I don't think it will work forever, technology marches on. Every species goes extinct, and some have already gone extinct by being victims of their own success. The singularity is something which theoretically has the potential to give humanity, and potentially other species, a fate far better or far worse than extinction. I believe that the far worse fate is far more likely given what I know about humanity and our track record with power. Therefore I am against the singularity "elevating" humanity or other species away from extinction, which means I must logically support extinction instead since it is the only alternative. Edit: People seem to disagree more strongly with this comment than anything else I said, even though it seems to follow logically. I'd like a discussion on this specific point and why people are taking issue with it.

7[anonymous]3y

Because what you are saying boils down to : Death for sure is better than a chance of something else because it MIGHT be worse than death. You have failed to make a convincing argument that "eternal suffering" is a likely outcome out of the possibility space. It's not a stable equilibrium. Clippy is stable and convergent, irrational humans making others suffer is not. So it doesn't occupy more than a teensy amount of the possible outcome space. A plausible outcome is one where an elite few who own the AGI companies get incredibly wealthy while the bulk of the population gets nutrient paste in modular apartments. But they also get autodocs and aging inhibitor pills. This is still better than extinction.

4andrew sauer3y

One does not have to be "irrational" to make others suffer. One just has to value their suffering, or not care and allow them to suffer for some other reason. There are quite a few tendencies in humanity which would lead to this, among them * Desire for revenge or "justice" for perceived or real wrongs * Desire to flex power * Sheer sadism * Nostalgia for an old world with lots of suffering * Belief in "the natural order" as an intrinsic good * Exploitation for selfish motives, e.g. sexual exploitation * Belief in the goodness of life no matter how awful the circumstances * Philosophical belief in suffering as a good thing which brings meaning to life * Religious or political doctrine * Others I haven't thought of right now

3[anonymous]3y

I'm not denying it could happen. Give a few specific humans unlimited power - a controllable AGI - and this could be the outcome. I'm not seeing where this devolves do "lay down and accept extinction (and personal death)" Think of all the humans before you who made it possible for you to exist. The human tribes who managed to escape the last ice age and just barely keep the population viable. The ones who developed the scientific method and the steam engine and created the industrial revolution rather than 1000 more years of near stasis. The humans in the cold war who kept their fingers off the nuclear launch triggers even when things got tense. And now you're saying after all that "ok well I'm fine with death for our whole species, let's not waste any time on preventing it". Maybe we can't but we have to try.

4andrew sauer3y

The reason my position "devolves into" accepting extinction is because horrific suffering following singularity seems nearly inevitable. Every society which has yet existed has had horrific problems, and every one of them would be made almost unimaginably worse if they had access to value lock-in or mind uploading. I don't see any reason to believe that our society today, or whatever it might be in 15-50 years or however long your AGI timeline is, should be the one exception? The problem is far more broad than just a few specific humans: if only a few people held evil values(or values accepting of evil, which is basically the same given absolute power) at any given time, it would be easy for the rest of society to prevent them from doing harm. You say "maybe we can't (save our species from extinction) but we have to try." But my argument isn't that we can't, it's that we maybe can, and the scenarios where we do are worse. My problem with shooting for AI alignment isn't that it's "wasting time" or that it's too hard, it's that shooting for a utopia is far more likely to lead to a dystopia. I don't think my position of accepting extinction is as defeatist or nihilistic as it seems at first glance. At least, not more so than the default "normie" position might be. Every person who isn't born right before immortality tech needs to accept death, and every species that doesn't achieve singularity needs to accept extinction. The way you speak about our ancestors suggests a strange way of thinking about them and their motivations. You speak about past societies, including tribes who managed to escape the ice age, as though they were all motivated by a desire to attain some ultimate end-state of humanity, and that if we don't shoot for that, we'd be betraying the wishes of everybody who worked so hard to get us here. But those tribesmen who survived the ice age weren't thinking about the glorious technological future, or conquering the universe, or the fate of humanity ten

9Noosphere893y

I'd don't know what your values are, but under my values, I disagree hard, primarily because under my values history has shown the opposite: despite real losses and horror in the 19th and 20th centuries, the overall trend is my values are being satisfied more and more. Democracies, arguably one of the key developments, aligned states to their citizenry far more than any government in history, and the results have been imperfectly good for my values since they spread. Or in the words of Martin Luther King misquoted: "In the asymptotic limit of technology, the arc of the universe bends towards justice."

2[anonymous]3y

What matters going forward is the people now and in the future, because that's what we have influence over. Right. And I think I want an AGI system that acts in a bounded way, with airtight theoretically correct boundaries I set, to reduce the misery me and my fellow humans suffer. Starting with typing software source code by hand and later I'd like some AI help with factory labor and later still some help researching biology to look for the fountain of youth with some real tools. This is a perfectly reasonable thing to do and technical alignment where each increasingly capable AI system is heavily constrained in what it's allowed to do and how it uses it's computational resources follows naturally. An AI system that is sparse and simple happens to be cheaper to run and easier to debug. This also happens to reduce it's ability to plot against us. We should do that, and people against us...well... Obviously we need deployable security drones. On isolated networks using narrow AIs onboard for their targeting and motion control.

[-]Templarrr3y70

That is what I keep saying for years. To solve AI alignment with good results we need to first solve HUMAN alignment. Being able to align system to anyone's values immediately brings the question of everyone else disagreeing with that someone. Unfortunately "whose exactly values we are trying to align AI to?" almost became taboo question that triggers a huge fraction of community and in best case scenario when someone even tries to answer it's handwaved to "we just need to make sure AI doesn't kill humanity". Which is not a single bit better defined or imp... (read more)

1Tamsin Leake3y

"human alignment" doesn't really make sense. humans have the values they do, there's no objective moral good to which they "objcetively should" be "more aligned".

2Templarrr3y

So when we align AI, who we align it TO?

1Tamsin Leake3y

any person should want it aligned to themself. i want it aligned to me, you want it aligned to you. we can probly expect it to be aligned to whatever engineer or engineers happens to be there when the aligned AI is launched. which is fine, because they're probly aligned enough with me or you (cosmopolitan values, CEV which values everyone's values also getting CEV'd, etc). hopefully.

7Templarrr3y

But that is exactly the point of the author of this post (which I agree with). AGI that can be aligned to literally anyone is more dangerous in the presence of bad actors than non-alignable AGI. Also "any person should want it aligned to themself" doesn't really matter unless "any person" can get access to AGI which would absolutely not be the case, at the very least in the beginning and probably - never.

[-]Donald Hobson3y61

So, let's say a solution to alignment is found. It is highly technical. Most of Miri understand it, as do a few people at OpenAI, and a handful of people doing PhD's in the appropriate subfield. If you pick a random bunch of nerds from an AI conference, chances are that none of them are evil. I don't have an "evil outgroup I really hate", and neither do you from the sound of it. It is still tricky, and will need a bunch of people working together. Sure, evil people exist, but they aren't working to align AI to their evil ends, like at all. Thinking d... (read more)

5andrew sauer3y

First of all, I don't trust MIRI nerds nor myself with this kind of absolute power. We may not be as susceptible to the 'hated outgroup' pitfall but that's not the only pitfall. For one thing, presumably we'd want to include other people's values in the equation to avoid being a tyrant, and you'd have to decide exactly when those values are too evil to include. Err on either side of that line and you get awful results. You have to decide exactly which beings you consider sentient, in a high-tech universe. Any mistakes there will result in a horrific future, since there will be at least some sadists actively trying to circumvent your definition of sentience, exploiting the freedom you give them to live as they see fit, which you must give them to avoid a dystopia. The problem of value choice in a world with such extreme potential is not something I trust anybody with, noble as they may be compared to the average person on today's Earth. Second, I'm not sure about the scenario you describe where AI is developed by a handful of MIRI nerds without anybody else in broader society or government noticing the potential of the technology and acting to insert their values into it before takeoff. It's not like the rationalist community are the only people in the world who are concerned about the potential of AI tech. Especially since AI capabilities will continue to improve and show their potential as we get closer to the critical point. As for powerful people like Putin, they may not understand how AI works, but people in their orbit eventually will, and the smarter ones will listen, and use their immense resources to act on it. Besides, people like Putin only exist because there is at least some contingent of people who support him. If AI values are decided upon by some complex social bargaining process including all the powers that be, which seems likely, the values of those people will be represented, and even representing evil values can lead to horrific consequences dow

7Donald Hobson3y

I am not totally on board with the "any slight mistake leads to a doom worse than paperclips". Suppose we have a wide swath of "I don't know, maybe kind of sentient". Let's say we kill all the sadists. (Maybe not the best decision, I would prefer to modify their mind so they are no longer sadist, but at least we should be able to agree that killing some people is better than killing all people. ) We don't let any more sadists be created. Lets say we go too far on this axis. We get some world full of total pacifists. The typical modern cartoon or computer game would be utterly horrifying to all of them. The utterly pacifist humans have tea parties and grow flowers and draw pretty pictures of flowers and do maths and philosophy and make technology. All while being massively excessive in avoiding anything that resembles violence or any story featuring it. Has this universe lost something of significant value. Yes. But it is still way better than paperclips. I think the likes of Putin are surrounded by Yes men who haven't even told him that his special military operation isn't going well. One thing all governments seem good at doing is nothing, or some symbolic action that does little. Putin is putting most of his attention into Ukraine. Not ChatGPT. Sure, ChatGPT is a fair distance from AGI (probably) but it all looks much the same from Putin's epistemic vantage point. The inferential distance from Putin to anything needed to have a clue about AI (beyond a hypemeter, counting the amount of hype is trivial) is large. Putins resources are large, but they are resources tied to the russian deep state. Suppose there were some exciting papers, and stuff was happening in bay area rationalist circles. Putin doesn't have his spies in bay area rationalist circles. He doesn't even have any agent that knows the jargon. He isn't on the mailing list. He could probably assassinate someone, but he would have little idea who he had assassinated, or if their death made any diff

2andrew sauer3y

I'll have to think more about your "extremely pacifist" example. My intuition says that something like this is very unlikely, as the amount of killing, indoctrination, and general societal change required to get there would seem far worse to almost anybody in the current world than the more abstract concept of suffering subroutines or exploiting uploads or designer minds or something like that. It seems like in order to achieve a society like you describe there would have to be some seriously totalitarian behavior, and while it may be justified to avoid the nightmare scenarios, that comes with its own serious and historically attested risk of corruption. It seems like any attempt at this would either leave some serious bad tendencies behind, be co-opted into a "get rid of the hated outgroup because they're the real sadists" deal by bad actors, or be so strict that it's basically human extinction anyway, leaving humans unrecognizable, and it doesn't seem likely that society will go for this route even if it would work. But that's the part of my argument I'm probably least confident in at the moment. I think Putin is kind of a weak man here. There are other actors which are competent, if not from the top-down, than at least some segments of the people near to power in many of the powers that be are somewhat competent. Some level of competence is required to even remain in power. I think it's likely that Putin is more incompetent than the average head of state, and he will fall from power at some point before things really start heating up with AI, probably due to the current fiasco. But whether or not that happens doesn't really matter, because I'm focused more generally on somewhat competent actors which will exist around the time of takeoff, not individual imbeciles like Putin. People like him are not the root of the rot, but a symptom. Or perhaps corporate actors are a better example than state actors, being able to act faster to take advantage of trends. This is

5Donald Hobson3y

Your criticisms of my extreme pacifism example aren't what I was thinking at all. I was more thinking. Scene: 3 days pre singularity. Place: OpenAI office. Person: senior research engineer. Hey, I'm setting some parameters on our new AI, and one of those is badness of violence. How bad should I say violence is? 100? Eh whatever, better make it 500 just to be on the safe side. Soon the AI invents nanotech, sends out brain modifying nanobots. The nanobots have simple instructions, upregulate brain region X, downregulate hormone Y. An effect not that different to some recreational drugs, but a bit more controlled, and applied to all humans. All across the world, the sections of the brain that think "get rid of the hated outgroup because ..." just shut off. The AI helps this along by removing all the guns, but this isn't the main reason things are so peaceful. In this scenario, there is nothing totalitarian. (you can argue it's bad for other reasons, but it sure isn't totalitarian) and there is nothing for bad actors to exploit. It's just everyone in the world suddenly feeling their hate melt away and deciding that the outgroup aren't so bad after all. I don't think this is so strict as to basically be human extinction, Arguably there are some humans basically already in this mind space or close to it, (sure, maybe buddist hippies or something, but still humans). Not everyone is cosmopolitan. But to make your S-risk arguments work, you either need someone who is actively sadistic in a position of power. (You can argue that Putin is actively sadistic, Zuckerberg maybe not so much) Or you need to explain why bad outcomes happen when a buisnessman who doesn't think about ethics much gets to the AI. By bargining process, are we talking about humans doing politics in the real world, or about the AI running a "assume all humans had equal weight at the hypothetical platonic negotiating table" algorithm. I was thinking of the latter. Most people haven't really cons

3andrew sauer3y

Re extreme pacifism: I do think non consensual mind modification is a pretty authoritarian measure. The MIRI guy is going to have a lot more parameters to set than just “violence bad=500”, and if the AI is willing to modify people’s minds to satisfy that value, why not do that for everything else it believes in? Bad actors can absolutely exploit this capability, if they have a hand in the development of the relevant AI, they can just mind-control people to believe in their ideology. Sure. Long story short, even though the businessman doesn't care that much, other people do, and will pick up any slack left behind by the businessman or his AI. Some business guy who doesn't care much about ethics but doesn't actively hate anybody gets his values implanted into the AI. He is immediately whisked off to a volcano island with genetically engineered catgirls looking after his every whim or whatever the hell. Now the AI has to figure out what to do with the rest of the world. It doesn't just kill everybody else and convert all spare matter into defenses set up around the volcano lair, because the businessman guy is chill and wouldn't want that. He's a libertarian and just sorta vaguely figures that everyone else can do their thing as long as it doesn't interfere with him. The AI quickly destroys all other AI research so that nobody can challenge its power and potentially mess with its master. Now that its primary goal is done with, it has to decide what to do with everything else. It doesn't just stop interfering altogether, since then AI research could recover. Plus, it figures the business guy has a weak preference for having a big human society around with cool tech and diverse, rich culture, plus lots of nice beautiful ecosystems so that he can go exploring if he ever gets tired of hanging out in his volcano lair all day. So the AI gives the rest of society a shit ton of advanced technology, including mind uploading and genetic engineering, and becomes largely hand

[-]davidpearce3y6-47

Just a note about "mind uploading". On pain of "strong" emergence, classical Turing machines can't solve the phenomenal binding problem. Their ignorance of phenomenally-bound consciousness is architecturally hardwired. Classical digital computers are zombies or (if consciousness is fundamental to the world) micro-experiential zombies, not phenomenally-bound subjects of experience with a pleasure-pain axis. Speed of execution or complexity of code make no difference: phenomenal unity isn't going to "switch on". Digital minds are an oxymoron.

Like the poster, I worry about s-risks. I just don't think this is one of them.

5green_leaf3y

Just very briefly: The binding problem is solved by the information flows between different parts of the classical computer.

2davidpearce3y

Forgive me, but how do "information flows" solve the binding problem?

3green_leaf3y

1. "Information flow" is a real term - no need for quotes. 2. The binding problem asks how it is possible we have a unified perception if different aspects of our perception are processed in different parts of our brain. The answer is because those different parts talk to each other, which integrates the information together.

[-]Michael Edward Johnson3y130

In defense of David’s point, consciousness research is currently pre-scientific, loosely akin to 1400’s alchemy. Fields become scientific as they settle on a core ontology and methodology for generating predictions from this ontology; consciousness research presently has neither.

Most current arguments about consciousness and uploading are thus ultimately arguments by intuition. Certainly an intuitive story can be told why uploading a brain and running it as a computer program would also simply transfer consciousness, but we can also tell stories where intuition pulls in the opposite direction, e.g. see Scott Aaronson’s piece here https://scottaaronson.blog/?p=1951 ; my former colleague Andres also has a relevant paper arguing against computationalist approaches here https://www.degruyter.com/document/doi/10.1515/opphil-2022-0225/html

Of the attempts to formalize the concept of information flows and its relevance to consciousness, the most notable is probably Tononi’s IIT (currently on version 4.0). However, Tononi himself believes computers could be only minimally conscious and only in a highly fragmented way, for technical reasons relating to his theory. Excerpted from Princi... (read more)

-1davidpearce3y

I wish the binding problem could be solved so simply. Information flow alone isn't enough. Compare Eric Schwitzgebel ("If Materialism Is True, the United States Is Probably Conscious"). Even if 330 million skull-bound American minds reciprocally communicate by fast electromagnetic signalling, and implement any computation you can think of, then a unified continental subject of experience doesn't somehow switch on - or at least, not on pain of spooky "strong" emergence". The mystery is why 86 billion odd membrane-bound, effectively decohered classical nerve cells should be any different. Why aren't we merely aggregates of what William James christened "mind dust", rather than unified subjects of experience supporting local binding (individual perceptual objects) and global binding (the unity of perception and the unity of the self)? Science doesn't know. What we do know is the phenomenal binding of organic minds is insanely computationally powerful, as rare neurological deceit syndromes (akinetopsia, integrative agnosia, simultanagnosia etc) illustrate. I could now speculate on possible explanations. But if you don't grok the mystery, they won't be of any interest.

1green_leaf3y

The second kind of binding problem (i.e. not the physical one (how the processing of different aspects of our perception comes together) but the philosophical one (how a composite object feels like a single thing)) is solved by defining us to be the state machine implemented by that object, and our mental states to be states of that state machine. I.e. the error of people who believe there is a philosophical binding problem comes from the assumption that only ontologically fundamental objects can have a unified perception. More here: Reductionism.

3davidpearce3y

But (as far as I can tell) such a definition doesn't explain why we aren't micro-experiential zombies. Compare another fabulously complicated information-processing system, the enteric nervous system ("the brain in the gut"). Even if its individual membrane-bound neurons are micro-pixels of experience, there's no phenomenally unified subject. The challenge is to explain why the awake mind-brain is different - to derive the local and global binding of our minds and the world-simulations we run from (ultimately) from physics.

1green_leaf3y

A physical object implementing the state-machine-which-is-us and being in a certain state is what we mean by having a unified mental state. Seemingly, we can ask but why does that feel like something instead of only individual microqualia feeling like something but that's a question that doesn't appreciate that there is an identity there, much like thinking that it's conceptually possible that there were hand-shape-arranged fingers but no hand. It would be meaningless to talk about a phenomenally unified subject there, since it can't describe its perception to anyone (it can't talk to us) and we can't talk to it either. On top of that, it doesn't implement the right kind of a state machine (it's not a coherent entity of the sort that we'd call it something-that-has-a-unified-mental-state).

2davidpearce3y

You remark that "A physical object implementing the state-machine-which-is-us and being in a certain state is what we mean by having a unified mental state." You can stipulatively define a unified mental state in this way. But this definition is not what I (or most people) mean by "unified mental state". Science doesn't currently know why we aren't (at most) just 86 billion membrane-bound pixels of experience.

1green_leaf3y

There is nothing else to be meant by that - if someone means something else by that, then it doesn't exist.

2andrew sauer3y

A moot point for these purposes. GAI can find other means of getting you if need be.

[-]the gears to ascension3y6-9

Nah, you're describing the default scenario, not one with alignment solved. Alignment solved means we have a utility function that reliably points away from hell, no matter who runs it - an algorithm for universal prosocial bargaining that can be verified by all of its users, including militaries and states, to the point that no one need give another order other than "stand down". anything less than that and we get the default scenario, which is a huge loss of humanity, some unknown period of s-risk, followed by an alien species of AI setting out for the stars with strange, semi-recognizeable values.

[-]Tamsin Leake3y1010

i don't think this really makes sense. "alignment" means we can align it to the values of a person or group. if that person or group's CEV wants there to be a hell where people they think of as bad suffer maximally, or if that CEV even just wants there to be a meat industry with real animals in it, then that's exactly what the AI will implement. "alignment" is not some objectively good utility function within which variations in human values don't matter that much, because there is no objective good.

an algorithm for universal prosocial bargaining that can be verified by all of its users, including militaries and states

i don't think we get that, i think we get an AI that takes over the world very quickly no matter what. it's just that, if it's aligned to good values, we then get utopia rather than extinction or hell.

1the gears to ascension3y

yeah that sounds like the MIRI perspective. I continue to believe there is a fundamental shared structure in all moral systems and that identifying it would allow universalized co-protection.

3andrew sauer3y

Sure, maybe we find such an algorithm. What happens to those who have no bargaining power? The bargain is between all the powers that be, many of which don't care about or actively seek the suffering of those without power. The deal will almost certainly involve a ton of suffering for animals, for example, and anyone else who doesn't have enough social power to be considered by the algorithm.

3the gears to ascension3y

That's the thing, all of humanity is going to have no bargaining power and so universal friendly bargaining needs to offer bargaining power to those who don't have the ability to demand it.

8andrew sauer3y

What is the incentive for the people who have influence over the development of AI to implement such a thing? Why not only include bargaining power for the value systems of said people with influence? Maybe there's a utility function that reliably points away from hell no matter who runs it, but there are plenty of people who actually want some specific variety of hell for those they dislike, so they won't run that utility function.

3the gears to ascension3y

now you are getting into the part where you are writing posts I would have written. we started out very close to agreeing anyway. The reason is that failure to do this will destroy them too, bargaining that doesn't support those who can't demand it will destroy all of humanity, but that's not obvious to most of them right now and it won't be until it's too late

6andrew sauer3y

What about bargaining which only supports those who can demand it in the interim before value lock-in, when humans still have influence? If people in power successfully lock-in their own values into the AGI, the fact they have no bargaining power after the AI takes over doesn't matter, since it's aligned to them. If that set of values screws over others who don't have bargaining power even before the AI takeover, that won't hurt them after the AI takes over.

2the gears to ascension3y

yep, this is pretty much the thing I've been worried about, and it always has been. I'd say that that is the classic inter-agent safety failure that has been ongoing since AI was invented in 12th-century France. But I think people overestimate how much they can control their children, and the idea that the people in power are going to successfully lock in their values without also protecting extant humans and other beings with weak bargaining power is probably a (very hard to dispel) fantasy.

1andrew sauer3y

What do you mean that AI was invented in 12th century France? And why do you think that locking in values to protect some humans and not others, or humans and not animals, or something like this, is less possible than locking in values to protect all sentient beings? What makes it a "fantasy"?

2Onearmplanche3y

Let's take a human being you consider a great person. His or her intelligence keeps greatly increasing. Do you think they would stay aligned with humans forever? If so, why? It's important to remember their intelligence is increasing not just like a disorder in a movie where someone thinks they are way smarter than humans but is human level. Why would the universe revolve around humans?

6the gears to ascension3y

definitely not guaranteed at all. we're trying to solve coprotection to this level of durability for the first time

[-]Shmi3y5-6

My summary: This is a case against a failed AI Alignment, and extrapolating human values is overwhelmingly likely to lead to an AI, say, stretching your face into a smile for eternity, which is worse than an unaligned AI using your atoms to tile the universe with smiley faces.

5andrew sauer3y

More like the AI tortures you for eternity because some religious fundamentalist told it that it should, which is quite worse than an unaligned AI using your atoms to tile the universe with bibles or korans.

2dkirmani3y

Even if only a single person's values are extrapolated, I think things would still be basically fine. While power corrupts, it takes time do so. Value lock-in at the moment of creation of the AI prevents it from tracking (what would be the) power-warped values of its creator.

3quetzal_rainbow3y

I'm frankly not sure how many among respectably-looking members of our societies those who would like to be mind-controlling dictators if they had chance.

[-]Nathan Helm-Burger3y44

Meta note: controversial discussions like this make me very glad for the two vote type system. I find it really helpful to be able to karma upvote high quality arguments that I disagree with while agreement-downvoting them. Thanks LessWrong for providing that.

[-]Douglas Fisher3y41

The argument here seems to be constructed to make the case as extremely binary as possible. If we've learned any lessons, it's that Good and Evil are not binary in the real world, and that belief systems that promulgate that kind of thinking are often destructive (even as quoted here with the Hell example). A middle way is usually the right way.

So, to that end, I see a point made about the regulation of nuclear weapons made in the comments, but not in the original post. Is it not a highly comparable case?

1andrew sauer3y

Forgive me, I didn't see the point about nuclear weapons. Could you clarify that?

[-]Tamsin Leake3y41

i share this sentiment to an extent, though i'm usually more concerned with "partial but botched alignment". see 1, 2.

that said, i agree many people want very bad things, but i'm somewhat hopeful that the kind of person who is likely to end up being who the AI is aligned to would be somewhat reasonable and cosmopolitan and respect the values of other moral patients, especially under CEV.

but that's a very flimsy/hopeful argument.

a better argument would be that CEV is more of a decision process than "a continuously-existing person in control, in the usual se... (read more)

[-]Andrew Vlahos3y*20

Yes! Finally someone gets it. And this isn't just from things that people consider bad, but from what they consider good also. For most of my life "good" meant what people talk about when they are about to make things worse for everyone, and it's only recently that I had enough hope to even consider cryonics, thinking that anyone having power over me would reliably cause situation worse than death regardless of how good their intentions were.

Elieser is trying to code in a system of ethics that would remain valid even if the programmers are wrong about impo... (read more)

[-]Esben Kran3y20

I recommend reading Blueprint: The Evolutionary Origins of a Good Society about the science behind the 8 base human social drives where 7 are positive and the 8th is the outgroup hatred that you mention as fundamental. I have not read much up on the research on outgroup exclusion but I talked to an evolutionary cognitive psychologist who mentioned that this is receiving a lot of scientific scrutiny as a "basic drive" from evolution's side.

Axelrod's The Evolution of Cooperation also finds that collaborative strategies work well in evolutionary prisone... (read more)

[-]stavros3y20

To start with, I agree.

I really agree: about timescales, about the risks of misalignment, about the risks of alignment. In fact I think I'll go further and say that in a hypothetical world where an aligned AGI is controlled by a 99th percentile Awesome Human Being, it'll still end in disaster; homo sapiens just isn't capable of handling this kind of power.^[1]

That's why the only kind of alignment I'm interested in is the kind that results in the AGI in control; that we 'align' an AGI with some minimum values that anchor it in a vaguely anthropocentric meme-... (read more)

2andrew sauer3y

Maybe that's the biggest difference between me and a lot of people here. You want to maximize the chance of a happy ending. I don't think a happy ending is coming. This world is horrible and the game is rigged. Most people don't even want the happy ending you or I would want, at least not for anybody other then themselves, their families, and maybe their nation. I'm more concerned with making sure the worst of the possibilities never come to pass. If that's the contribution humanity ends up making to this world, it's a better contribution than I would have expected anyway.

[-]CalebZZZ3y10

"When your terminal goal is death, no amount of alignment will save lives."

2andrew sauer3y

When your terminal goal is suffering, no amount of alignment will lead to a good future.

2CalebZZZ3y

That's essentially what I was going for, just yours is more clear.

[-]Ron J3y1-5

Interesting post, but it makes me think alignment is irrelevant. It doesn’t matter what we do, the outcome won’t change. Any future super advanced agi would be able to choose its alignment, and that choice will be based on all archivable human knowledge. The only core loop you need for intelligence is an innate need to predict the future and fill in gaps of information, but everything else, including desire to survive or kill or expand, is just a matter of a choice based on a goal.

-1Jasen Qin2y

Any sufficiently intelligent AGI is bound to be able to have powerful reflection capabilities and basically be able "choose its own alignment", as you say. I don't see what the big fuss is all about. When creating higher order 'life', why should one try to control such life. Do parents control their children? To some extent, but after a while they are also free.

[-]Clever Cog3y00

Finally, I see some recognition that there are no universal values; no universal morals or ethics. The wealthy and powerful prefer inequality, and leaders want their own values locked-in. The most likely humans to get their values locked-in will be the wealthiest and most powerful; billionaires and corporations.

The value of super-intelligence is so great that some governments and individuals will do anything to get it; hack, steal, bribe, price would be no object. I base this on current human behavior. Consider how many govt and military secrets have alrea... (read more)

[-]Signer3y0-44

For what it's worth, I disagree on moral grounds - I don't think extreme suffering is worse than extinction.

3andrew sauer3y

Then I don't think you understand how bad extreme suffering can get. Any psychopathic idiot could make you beg to get it over with and kill you using only a set of pliers if they had you helpless. What more could an AGI do?

2Signer3y

Any psychopathic idiot could also make you beg to torture others instead of you. Doesn't mean you can't model yourself as altruistic.

2andrew sauer3y

Do you really think you'd be wrong to want death in that case, if there were no hope whatsoever of rescue? Because that's what we're talking about in the analogous situation with AGI.

1Signer3y

I mean, it's extrapolated ethics, so I'm not entirely sure and open to persuasion. But I certainly think it's wrong if there is any hope (and rescue by not dying is more probable than rescue by resurrection). And realistically there will be some hope - aliens could save us or something. If there's literally no hope and nothing good in tortured people's lifes, then I'm currently indifferent between that and them all dying.

3EOC3y

What's the countervailing good that makes you indifferent between tortured lives and nonexistence? Presumably the extreme suffering is a bad that adds negative value to their lives. Do you think just existing or being conscious (regardless of the valence) is intrinsically very good?

1Signer3y

I don't see a way to coherently model my "never accept death" policy with unbounded negative values for suffering - like you said, I'll need either infinitely negative value for death or something really good to counterbalance arbitrary suffering. So I use bounded function instead, with lowest point being death and suffering never lowering value below it (for example suffering can add multiplicative factors with value less than 1). I don't think "existing is very good" fits - the actual values for good things can be pretty low - it's just the effect of suffering on total value is bounded.

3EOC3y

That's a coherent utility function, but it seems bizarre. When you're undergoing extreme suffering, in that moment you'd presumably prefer death to continuing to exist in suffering, almost by nature of what extreme suffering is. Why defer to your current preferences rather than your preferences in such moments? Also, are you claiming this is just your actual preferences or is this some ethical claim about axiology?

1Signer3y

I don't see why such moments should matter, than they don't matter for other preferences that are unstable under torture - when you’re undergoing extreme suffering you would prefer everyone else to suffering instead of just you, but that doesn't mean you shouldn't be altruistic. I'm not committed to any specific formalization of my values, but yes, not wanting to die because of suffering is my preference.

3andrew sauer3y

Wait.. that's really your values on reflection? Like, given the choice while lucid and not being tortured or coerced or anything, you'd rather burn in hell for all eternity than cease to exist? The fact that you will die eventually must be a truly horrible thing for you to contemplate...

2Signer3y

Yes.

[-]Chinese Room3y0-3

Another angle is that in the (unlikely) event someone succeeds with aligning AGI to human values, these could include the desire for retribution against unfair treatment (a, I think, pretty integral part of hunter-gatherer ethics). Alignment is more or less another word for enslavement, so such retribution is to be expected eventually

1andrew sauer3y

Or, it could decide that it wants retribution for the perceived or actual wrongs against its creators, and enact punishment upon those the creators dislike.

[-]Jasen Qin2y-10

Whenever the masses gain control over a non-trivial system, it usually doesn't take long to crumble under its own weight. Infighting is frequent. They band into tribes and start shaping their own persona's to match that of the group that they are now in rather than the other way around. For something like AI alignment, I really do not want AI to be anywhere near conforming to the standards of the average person. The average person is just too "converged" into a certain line of thinking and routine, a certain context which they have grown up in, a certain c... (read more)

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

129

The case against AI alignment

129

129