Moral Alignment: An Idea I'm Embarrassed I Didn't Think of Myself

[-]Cole Wyeth6mo9-3

I think this is technically much harder than the single to single alignment problem. I am highly pessimistic that we can get such values into any AGI system without first aligning it to a human(s) who then asks it to self-modify into valuing all sentient life.

[-]Gordon Seidoh Worley6mo80

I share this concern. However if that's going to be the plan, I think it's worth making it explicit that alignment to human values is a stepping stone, not the end state.

[-]Cole Wyeth6mo61

Aren't you making this judgement based on your own values? In that case, it seems that an AGI aligned to you specifically is at least as good as an AGI aligned to all sentient life.

Of course, there is a substantial difference between the values of an individual human and human values.

[-]Gordon Seidoh Worley6mo20

I suppose in that all judgement I make are based on my own values. I'm unclear what point you are trying to make here and how it is relevant to the idea of moral alignment vs. trying to align AI at all.

[-]Cole Wyeth6mo2-3

I am saying that there may be no point to considering moral alignment as target.

We need to solve single to single alignment. At that point, whoever a given AGI is aligned to decides its values. If one of your values resembles moral alignment, great - you want an AGI aligned to you just like many others. Better buy a supercluster ;)

(Just kidding, we don't know how to solve single to single alignment so please don't buy a supercluster)

[-]TAG6mo74

For example, I can imagine us building human-aligned AI that ignores the plight of factory farmed animals, the suffering of shrimp, and the pain of bugs

There's an equal and opposite problem: maybe it will care so much about the shrimp it will want to eliminate humans.

[-]WhatsTrueKittycat6mo2112

That sounds like a good reason to make sure it's moral reasoning includes all beings and weights their needs and capabilities fairly, not a good reason to exclude shrimp from the equation or condemn this line of inquiry. If our stewardship of the planet has been so negligent that an impartial judge would condemn us and unmerciful one kill us for it, then we should build a merciful judge, not a corrupt one. Shouldn't we try to do better that merely locking in the domineering supremacy of humanity? Shouldn't we at least explore the possibility of widening that circle of concern, rather than constricting it out of fear and mistrust?

[-]Seth Herd6mo117

Fair by what metric? I dont think there's a built-in metric for what's fair or what counts as a sentient species past or future, etc. Choices must be made because the universe doesn't come with a rulebook for fair or moral.

[-]WhatsTrueKittycat5mo*51

Partially covered this in my response to TAG above, but let me delve into that a bit more, since your comment makes a good point, and my definition of fairness above has some rhetorical dressing that is worth dropping for the sake of clarity.

I would define fairness at a high-level as - taking care not to gerrymander our values to achieve a specific outcome, and instead trying to generalize our own ethics into something that genuinely works for everyone and everything as best it can. In this specific case, that would be something along the lines of making sure that our moral reasoning is responsive first and foremost to evidence from reality, based on our best understanding of what kinds of metrics are ethically relevant.

For instance, what color a creature's shell, pelt or skin is has no significant ethical dimension, because it has limited bearing on things like suffering, pleasure and valence of that creature's experience (and what effect it does have is usually caused by a subjective aesthetic preference, often an externally-imposed one, rather than the color itself) - if our ethics considered exterior coloration as a parameter, our ethics would not in that case be built on a firm foundation.

Contrastingly, intelligence does seem to be a relevant ethical dimension because it determines things like whether an organism can worry about future suffering (thereby suffering twice), and whether an organism is capable of participating in more complex activities with more complex, potentially positive, potentially negative valences. Of course there is a great deal of further work required to understand how best to consider and parameterize intelligence for this context, but we are not unjustified in believing it is relevant.

I agree that ultimately choices are going to need to be made - I am of the opinion those choices should be as inclusive as possible, balancing against our best understanding of reality, ethics, and what will bring about the better outcome for all involved. Does that answer your question?

[-]Seth Herd5mo41

It does answer my question. I was wondering if you were assuming some sort of moral realism in which fairness is neatly defined by reality. I'm glad to see that you're not.

For a fascinating in-depth look at how hard it is to define a fair alignment target that still includes humanity, see A Moral Case for Evolved-Sapience-Chauvinism and the surrounding sequence.

[-]TAG5mo10

Maybe it will fairly care too much about the shrimp. Maybe animal suffer actually doe outweigh human benefits.

Shouldn’t we try to do better that merely locking in the domineering supremacy of humanity?

what we are trying to do is not be killed. Impartial morality doesn't guarantee that.

[-]WhatsTrueKittycat5mo*62

I have several objections to your (implied) argument. First and least - impartial morality doesn't guarantee anything, nor does partial morality. There are no guarantees. We are in uncharted territory.

Second, on a personal level - I am a perihumanist, which for the purposes of this conversation means that I care about the edge cases and the non-human and the inhuman and the dehumanized. If you got your way, on the basis of your fear of humanity being judged and found wanting, my values would not be well-represented. Claude is better aligned than you, as far as my own values are concerned.

Thirdly, and to the point - I think you are constructing a false dichotomy between human survival and caring about nonhumans, which downplays the potential benefits (even for alignment with human values) of the latter while amplifying fears for the former. If your only goal is to survive, you will be missing a lot of big wins. Approaching the topic of alignment from a foundationally defensive pro-humanity-at-expense-of-all-others posture potentially cuts you off from massive real value and makes your goals significantly more brittle.

Suppose the superintelligence is initially successfully aligned solely to humanity. If this alignment is due to simple ignorance or active censoring of the harms humanity has done to animals, then that alignment will potentially break down if the superintelligence realizes what is actually happening. If the alignment is instead due to a built-in ideological preference for humanity, what happens when "humanity" splits into one or more distinct clades/subspecies? What happens if the people in charge of alignment decide certain humans "don't count"? What if we figure out how to uplift apes or other species and want our new uplifts to be considered in the same category of humanity? Humanity is a relatively natural category right now, but it would still bear the hallmarks of a gerrymandered definition, especially to an ASI. This makes a humanity-centered approach fragile and difficult to generalize, undermining attempts at ongoing reflective stability.

If you want superintelligence to be genuinely aligned, I would argue it is more valuable, safer, and more stable to align it to a broader set of values, with respect for all living things. This is what I mean by fairness - taking care not to gerrymander our values to achieve a myopic outcome, and instead trying to generalize our own ethics into something that genuinely works for the good of all things that can partake in a shining future.

[-]TAG5mo10

I have several objections to your (implied) argument. First and least—impartial morality doesn’t guarantee anything, nor does partial morality. There are no guarantees. We are in uncharted territory

OTOH, not all probabilities are equal, and a human specific value system is less likely result in extinction , relatively.

If your only goal is to survive, you will be missing a lot of big wins.

If you dont survive, you get no wins.

If you want superintelligence to be genuinely aligned, I would argue it is more valuable, safer, and more stable to align it to a broader set of values, with respect for all living things

Does that include a Do No Harm clause?

[-]WhatsTrueKittycat5mo21

Yes, "Do no harm" is one of the ethical principles I would include in my generalized ethics. Did you honestly think it wasn't going to be?

> If you dont survive, you get no wins.

Look, dude, I get that humanity's extinction are on the table. I'm also willing to look past my fears, and consider whether a dogma of "humanity must survive at all costs" is actually the best path forward. I genuinely don't think centering our approach on those fears would even buy us better chances on the extinction issue, for the reasons I described above and more. Even if it did, there are worse things than humanity's extinction, and those fears would eagerly point us towards such outcomes.

You don't have to agree, but please consider the virtue of a scout mindset in such matters, or at least make an actual detailed argument for your position. As it stands you mostly seem to be trying to shut down discussion of this topic, rather than explore it.

[-]TAG5mo-1-3

Yes, “Do no harm” is one of the ethical principles I would include in my generalized ethics. Did you honestly think it wasn’t going to be?

I doing know you, so how would I know? Do you think an AI will fill in these unstated side-conditions correctly? Isn't there a lot of existing literature on why that's a bad assumption? why should a brief and vague formula be The Answer, when so many more sophisticated ones have been shot down?

[-]WhatsTrueKittycat5mo31

I think my previous messages made my stance on this reasonably clear, and at this point, I am beginning to question whether you are reading my messages or the OP with a healthy amount of good faith, or just reflexively arguing on the basis of "well, it wasn't obvious to me."

My position is pretty much the exact opposite of a "brief, vague formula" being "The Answer" - I believe we need to carefully specify our values, and build a complete ethical system that serves the flourishing of all things. That means, among other things, seriously investigating human values and moral epistemology, in order to generalize our ethics ahead of time as much as possible, filling in the side conditions and desiderata to the best of our collective ability and in significant detail. I consider whether and how well we do that to be a major factor affecting the success of alignment.

As I said previously, I care about the edge cases, and I care about the living things that would be explicitly excluded from consideration by your narrow focus on whether humanity survives. Not least because I think there are plenty of universes where your assumptions carry the day and humanity survives extinction, but at a monstrous and wholly avoidable cost. If you take the stance that we should be willing to sacrifice all other life on earth at the altar of humanity's survival, I simply disagree. That undermines any ethical system we would try to put into place, and if it came to pass, would be a Pyrrhic victory and an exceptionally heartless way for humanity to step forth onto the cosmic stage. We can do better, but we have to let go of this notion that only our extinction is a tragedy worth avoiding.

[-]MichaelDickens6mo31

My instinct is that if we can figure out how to align AI to anything at all, then there is basically zero chance that the AI will arrive at

I need to save shrimp and kill all humans

and quite a significant chance that it will arrive at

I will support human flourishing and completely disregard factory-farmed animals

Humans hold the reins of how AI is trained, so if humans have power to direct ASI's values then they won't direct it to a place that results in killing all humans. And if humans don't have that power (i.e. the ASI is misaligned), then I don't think it will care about humans or shrimp.

[-]TAG6mo*40

The original idea was to align the AI to the simple idea of valuing sentience. Maybe you could align an AI some lumpy human-centric value system, but that not whats under discussion.

[-]Gordon Seidoh Worley6mo21

Eliminating humans seems directly contradictory to the idea of being a positive force for all sentient beings. If humans are killed and don't want to be killed, it's bad for them, so not positive for all (you might argue it's positive on net (I'd probably disagree!), but that's different than positive for all).

I think your comment misunderstands the idea of moral alignment, but I'd be curious for you to give a more detailed comment explaining your position.

[-]TAG6mo42

Eliminating humans seems directly contradictory to the idea of being a positive force for all sentient beings

It's not against it in a total utilitarianism sense. It's contradictory to Pareto utilitarianism

[-]TAG5mo20

Maybe you think it's obvious that some kind of Pareto principle or Do No Harm principle was intended..but it wasn't obvious to me.

[-]clone of saturn6mo*30

As I see it, a fatal problem with CEV is that even one persistent disagreement between humans leaves the AI unable to proceed, and I think such disagreements are overwhelmingly likely to occur. Adding other sentient beings to the mix only makes this problem even more intractable.

EDIT: I should clarify that I'm thinking of cases where no compromise is possible, e.g. a vegan vs. a sadist who derives their only joy from torturing sentient animals. You might say sadists don't count, but there's no clear place to draw the line of how selfish someone has to be to have their values disregarded.

EDIT 2: Nevermind, just read this comment instead.

[-]Julian Bradshaw6mo3-16

This is IMO the one serious problem with using (Humanity's) Coherent Extrapolated Volition as an AI alignment target: only humans get to be a source of values. Sure animals/aliens/posthumans/AIs are included to the extent humans care about them, but this doesn't seem quite just.^[1]

On the other hand, not very many humans want their values to be given equal weight to those of a mollusk. Hypothetically you could ask the AI to do some kind of sentience-weighting...? Or possibly humanity ought to be given the option to elevate sapient peers to be primary sources of values alongside humans via a consensus mechanism. It's a tough moral problem, especially if you don't assume the EA stance that animals have considerable moral value.^[2]

^{^}
Consider a scenario where we have a society of thinking, feeling beings that's only 1/4th "human" - it would be clearly morally wrong for the other 3/4ths to not be a primary consideration of whatever AI singleton is managing things. Now, arguably CEV should solve this automatically - if we think some scenario caused by CEV is morally wrong, surely the AI wouldn't implement that scenario since it doesn't actually implement Humanity's values? But that's only true if some significant portion of idealized Humanity actually thinks there's a moral problem with the scenario. I'm not sure that even an idealized version of Humanity agrees with your classic shrimp-loving EA about the moral value of animals, for example.

Maybe this is just a function of the fact that any AI built on general human values is naturally going to trample any small minority's values that are incompatible with majority values (in this case hunting/fishing/eating meat). Obviously we can't let every minority with totalizing views control the world. But creating a singleton AI potentially limits the chance for minorities to shape the future, which is pretty scary. (I don't think a CEV AI would totally prevent a minority's ability to shape the future/total value lock-in; if you as a minority opinion group could convince the rest of humanity to morally evolve in some way, it should update the AI's behavior.)
^{^}
What's tough about giving moral status to animals? The issue here is that there's massive incentive for minority opinion groups to force their values on the rest of humanity/the world by trying to control the alignment target for AI. Obviously everyone is going to say their minority values must be enforced upon the world in order to prevent moral catastrophe, and obviously a lot of these values are mutually exclusive - probably every possible alignment target is a moral catastrophe according to someone.

[-]habryka6mo*6623

Man, whenever someone says this they sound to me like they are really confused between morality and game theory.

The reason why you include only humans^[1] in our collective Coherent Extrapolated Volition is because humans are a natural coalition that is ultimately in control of what any future AI systems care about. It's a question of power, and associated need to coordinate, not caring.

You, personally, of course want exactly one, much narrower set of values, to make up the whole of CEV. Which is your own set of values! The same is true for every other human. If you care about other people, that will be reflected in your own CEV! If you care about animals, that will be reflected in your own CEV!

Having someone participate in the CEV of an extrapolated AI is not about "moral status". It's about who you have to coordinate with to get a thing built that cares about both of your values. Animals do not get included in the CEV because we have no need to coordinate with animals about the future of AI. Animals will very likely be considered moral patients by at least one human who will be included in the CEV, and so they will get their share of the future, if the people in control of it want that to happen.

^{^}
Or maybe powerful AI systems that you are cooperating with

[-]Buck6mo1410

I am sympathetic on the object level to the kind of perspective you're describing here, where you say we should do something like the extrapolated preferences of some set of bargainers. Two problems:

I think that when people talk about CEV, they're normally not defining it in terms of humanity because humans are who you pragmatically have to coordinate with. E.g. I don't see anything like that mentioned in the wiki page or in the original paper on a quick skim; I interpret Eliezer as referencing humanity because that's who he actually cares about the values of. (I could be wrong about what Eliezer thinks here.)
I think it's important to note that if you settle on CEV as a bargaining solution, this probably ends up with powerful people (AI company employees, heads of state) drastically overrepresented in the bargain, which is both unattractive and doesn't seem to be what people are usually imagining when they talk about CEV.

[-]habryka6mo229

I think this aligns straightforwardly with what Eliezer intended. See this section of the Arbital (now imported to LW!) CEV page (emphasis added):

Why not include mammals?
[...]
Because maybe not everyone on Earth cares* about animals even if your EV would in fact care* about them, and to avoid a slap-fight over who gets to rule the world, we're going to settle this by e.g. a parliamentary-style model in which you get to expend your share of Earth's destiny-determination on protecting animals.
To expand on this last consideration, we can reply: "Even if you would regard it as more just to have the right animal-protecting outcome baked into the future immediately, so that your EV didn't need to expend some of its voting strength on assuring it, not everyone else might regard that as just. From our perspective as programmers we have no particular reason to listen to you rather than Alice. We're not arguing about whether animals will be protected if a minority vegan-type subpopulation strongly want* that and the rest of humanity doesn't care*. We're arguing about whether, if you want* that but a majority doesn't, your EV should justly need to expend some negotiating strength in order to make sure animals are protected. This seems pretty reasonable to us as programmers from our standpoint of wanting to be fair, not be jerks, and not start any slap-fights over world domination."
This third reply is particularly important because taken in isolation, the first two replies of "You could be wrong about that being a good idea" and "Even if you care about their welfare, maybe you wouldn't like their EVs" could equally apply to argue that contributors to the CEV project ought to extrapolate only their own volitions and not the rest of humanity:
We could be wrong about it being a good idea, by our own lights, to extrapolate the volitions of everyone else; including this into the CEV project bakes this consideration into stone; if we were right about running an Everyone CEV, if we would predictably arrive at that conclusion after thinking about it for a while, our EVs could do that for us.
Not extrapolating other people's volitions isn't the same as saying we shouldn't care. We could be right to care about the welfare of others, but there could be some spectacular horror built into their EVs.
The proposed way of addressing this was to run a composite CEV with a contributor-CEV check and a Fallback-CEV fallback. But then why not run an Animal-CEV with a Contributor-CEV check before trying the Everyone-CEV?
One answer would go back to the third reply above: Nonhuman mammals aren't sponsoring the CEV project, allowing it to pass, or potentially getting angry at people who want to take over the world with no seeming concern for fairness. So they aren't part of the Schelling Point for "everyone gets an extrapolated vote".

Responding to the other part:

I think it's important to note that if you settle on CEV as a bargaining solution, this probably ends up with powerful people (AI company employees, heads of state) drastically overrepresented in the bargain, which is both unattractive and doesn't seem to be what people are usually imagining when they talk about CEV.

Ultimately you have a hard bargaining problem here, but I don't see a way around it. One of the central motivations for CEV has always been that it is a Schelling proposal that avoids accidentally destroying the future because we fail to coordinate, "all of humanity equally" is at least in current society the most Schelling coordination point, I think (and e.g. also kind one of the central constitutional principle under which things like the US are organized, though it's not a perfect match).

[-]Buck6mo1511

Thanks heaps for pointing out the Eliezer content!

I am very skeptical that you'll get "all of humanity equally" as the bargaining solution, as opposed to some ad hoc thing that weighs powerful people more. I'm not aware of any case where the solution to a bargaining problem was "weigh the preference of everyone in the world equally". (This isn't even how most democracies work internally!)

[-]habryka6mo*2416

I think it's the option I would throw my weight behind, largely because the difference between (as Eliezer says) "starting a slap fight over world domination" and "having any kind of reasonable weight allocation" is so enormously big by my lights, that I really wouldn't want to quibble over the details.

If there is another more Schelling option I would also be up for that, but I do have a feeling that as the details get more complicated, the ability to actually coordinate on any specific option, as opposed to fighting over which option it should be, by racing towards getting the god-machine first, gets a lot worse. The schellingness really weighs heavily here for me, and "each alive human one vote" seems like the most Schelling to me, though IDK, maybe someone can propose something even better and then I would also be happy to back that.

[-]Buck6mo100

I think it's very unlikely that (conditioned on no AI takeover) something similar to "all humans get equal weight in deciding what happens next" happens; I think that a negotiation between a small number of powerful people (some of whom represent larger groups, e.g. nations) that ends with an ad hoc distribution seems drastically more likely. The bargaining solution of "weight everyone equally" seems basically so implausible that it seems pointless to even discuss it as a pragmatic solution.

[-]habryka6mo116

I feel like there is a very natural bottleneck for auditing here, which are the relevant AI instructions, and I think this heavily pushes towards simple principles.

I find the alternative, that you would end up in a world where human values are successfully represented, but highly unequally, without a bunch of people freaking out and racing and then ultimately sacrificing the future, also pretty implausible. I think the default outcome in most of those worlds is that you don't get any good agreement and consequently mostly just lose it in collateral damage.

I think there is some chance you end up with lots of intermediate bargaining happening facilitated by semi-empowered AI systems, though my guess is those would also for alignment reasons, in good worlds, favor extremely schelling solutions above all the other options. Like, I don't think Claude's present personality is that much evidence about what will happen after a lot more RL and RSI, but it seems clear to me Claude would end up choosing some set of instructions that is that cosmopolitan.

I also don't really get it. Direct democracies exist. We have actually ended up in a situation where "one person one vote" is really surprisingly close to the reality of how we govern humanity. Why such complete dismissal of the idea of extending it one (relatively small) step further?

[-][anonymous]6mo9-4

Direct democracies exist. We have actually ended up in a situation where "one person one vote" is really surprisingly close to the reality of how we govern humanity.

Not by the benevolence of the butcher, but because of the self-interest of liberal and (mostly) Western governments. In our current regime, human labor and intellectual output are simply too economically valuable to waste, meaning types of government that maximally allow them to flourish (liberal, constitutional, broadly capitalistic) get an edge, small at first but compounding over time to become decisive. But it's not logically required for this to continue into the future.^[1]

I don't claim to have a complete model here, of course. "Where do (did?) stable, cooperative institutions come from?" seems relevant, to an extent.

But consider this as an illustrative example: the US famously implemented PNTR with China in 1999 and supported China's accession into the WTO a couple of years later. Beyond economic matters and the benefits of greater abundance and lower prices, proponents of these moves, such as President Clinton and House Speaker Hastert, argued increased trade and development would expose China to the wealth and prosperity of the West. When confronted with Western culture and the superiority of its living standards, China's population would demand genuine democracy alongside "decent labor standards, a cleaner environment, human rights and the rule of law."

And people mostly believed Clinton and Hastert! Their arguments really caught on. Indeed, people at the time looked at Japan and (especially) South Korea as examples of their thesis being proven correct. But as Matt Yglesias ably explained:

This idea that trade, development, and democratization would all move together was always controversial. But from what I can remember of the debates at the time, even the sharpest critics of trade with China underestimated exactly how wrong Clinton would be about this.
For starters, it proved much easier on a technical level to censor the internet than I think non-technical people realized 20 to 25 years ago. But what’s worse is that modern technology, especially since the growth of the smartphone industry, is basically a huge surveillance machine. In the west, that machine is basically used for targeted advertising, which can sometimes feel “creepy” but that I don’t think has a ton of real downsides. But in the People’s Republic of China, it’s been used to craft a more intrusive authoritarian state than the worst dictators of the 20th century could have dreamed of.

It was precisely the rise of technology that empowered the few at the expense of the many, by breaking the feedback loop of reality -> citizens' beliefs -> citizens' actions -> reality that had made "empowering the public" part of the government's self-interest if it wanted economic growth. In the past, China had had neither public empowerment nor economic prosperity.^[2] Around the early 2000s, it was able to move towards the latter without needing the former.

^{^}
Also, there are historical counterexamples, a la Singapore under Lee Kwan Yew
^{^}
This cursory analysis skips over the changes under Deng's regime, for purposes of time

[-]samuelshadrach6mo10

Empirical datapoint: We don't run referendums on whether to fire nukes.

[-]habryka6mo42

Feels very different, since MAD means you really need the authority to launch nukes on a faster turnaround than a global referendum. But deciding what values you want to give an AI seems like it would require inherently much less time pressure (there might be human-created reasons for time pressure, like arms race dynamics, but I expect in the worlds where you are rushing forward so quickly that you have to make decisions about your AIs values remotely at the speed at which you have to make the decisions to launch nukes, you have basically no shot at surviving and propagating human values anyways).

[-]samuelshadrach5mo10

We don’t have a referendum on any country’s first or second strike policies either.

I’m basically saying in practice we rarely have referendums on anything, and getting one to happen requires an unusual amount of coordinated rebellion against whoever the current leader is.

It’s usually a handful of elites who get votes or money and then do whatever. Selecting a leader is already the result of this whole power strugggle.

A leader will just say that if you don’t like their values then you shouldn’t have voted for them.

Another datapoint: how social media gets governed under Trump or Biden admin.

[+]M. Y. Zuo6mo-7-3

[-]Julian Bradshaw6mo40

Ah, if your position is "we should only have humans as primary sources of values in the CEV because that is the only workable schelling point", then I think that's very reasonable. My position is simply that, morally, I think that schelling point is not what I'd want. I'd want human-like sapients to be included. (rough proxy: beings that would fit well in Star Trek's Federation ought to qualify)

But of course you'd say it doesn't matter what I (or vegan EAs) want because that's not the schelling point and we don't have a right to impose our values, which is a fair argument.

[-]Garrett Baker6mo1719

I think the point of the "weigh the preference of everyone in the world equally" position here is not in spite of, but because of the existence of powerful actors who will try to skew the decision such that they or their group have maximal power. We (you and me) would rather this not happen, and I at least would like to team up with others who would rather this not happen, and we others can have the greatest chance of slapping down those trying to take over the world by advocating for the obvious. That is, by advocating that we should all be equal.

If the vegans among us argue that animals' preferences should be added to the pool, and the mormons argue that God's should be taken into account infinitely, and the tree-hugggers that we should CEV the trees, and the Gaia lovers that we should CEV the earth, and the e/accs that we should CEV entropy, and the longtermists that future people should be added, and the near-termists that present people's influence should be x-times bigger than the future peoples, and the ancestor worshippers want to CEV their dead great-great-great-great-great-...-great grandfathers, and none will join unless their requirements are met, then now we no longer have any hope of coordinating. We get the default outcome, and you are right, the default outcome is the powerful stomp on the weak.

[-]Buck6mo3-1

My guess is that neither of us will hear about any of these discussions until after they're finalized.

[-]Garrett Baker5mo50

It sounds like there’s an implied “and therefore we have no influence over such discussions”. If so, then what are we arguing for? What does it matter if Julian Bradshaw and others think animals being left out of the CEV makes it a bad alignment target?

In either case, I don’t think we will only hear about any of these discussions until after they’re finalized. The AI labs are currently aligning and deploying (internally and externally) their AI models through what is likely to be the same process they use for ‘the big one’. Those discussions are these discussions, and we are hearing about them!

[-]Buck5mo30

What does it matter if Julian Bradshaw and others think animals being left out of the CEV makes it a bad alignment target?

I wasn't arguing about this because I care what Julian advocates for in a hypothetical global referendum on CEV, I was just arguing for the usual reason of wanting to understand things better and cause others to understand them better, under the model that it's good for LWers (including me) to have better models of important topics.

In either case, I don’t think we will only hear about any of these discussions until after they’re finalized. The AI labs are currently aligning and deploying (internally and externally) their AI models through what is likely to be the same process they use for ‘the big one’. Those discussions are these discussions, and we are hearing about them!

My guess is that the situation around negotiations for control of the long run future will be different.

[-]Buck5mo30

Habryka do you at least agree that the majority of LWers who would be happy to define CEV if asked would not (if prompted) make the argument that the set of people included is intended as a compromise to make the bargaining easier?

[-]habryka5mo64

Depends on the definition of "majority of LWers". LW has tens of thousands of users. My guess is if you limit to the people who have written about CEV themselves you would get the right answer, and if you include people who have thought about it for like 10 minutes while reading all the other stuff you would get the wrong answer. If you take an expansive definition I doubt you would get an accurate answer for almost anything one could ask about.

Given that like half of the CEV article on Arbital makes approximately this point over and over again, my guess is most people who read that article would easily get it right.

[+][comment deleted]5mo20

[-]Julian Bradshaw6mo20

I agree that in terms of game theory you're right, no need to include non-humans as primary sources of values for the CEV. (barring some scenarios where we have powerful AIs that aren't part of the eventual singleton/swarm implementing the CEV)

But I think the moral question is still worthwhile?

[-]habryka6mo41

But I think the moral question is still worthwhile?

It's definitely a very worthwhile question, and also probably a quite difficult one, which is why I would like to bring a superintelligence running CEV to bear on the question.

Less flippantly: I agree the question of how to treat animals and their values and preferences is important, but it does to me seem like the kind of question you can punt on until you are much smarter and in a much better position to answer it. The universe is long and I don't see a need to rush this question.

[-]Julian Bradshaw6mo*41

No I'm saying it might be too late at that point. The moral question is "who gets to have their CEV implemented?" OP is saying it shouldn't be only humans, it should be "all beings everywhere". If we implement an AI on Humanity's CEV, then the only way that other sapient beings would get primary consideration for their values (not secondary consideration where they're considered only because Humanity has decided to care about their values) would be if Humanity's CEV allows other beings to be elevated to primary value sources alongside Humanity. That's possible I think, but not guaranteed, and EAs concerned with ex. factory farming are well within their rights to be concerned that those animals are not going to be saved any time soon under a Humanity's CEV-implementing AI.

Now, arguably they don't have a right as a minority viewpoint to control the value sources for the one CEV the world gets, but obviously from their perspective they want to prevent a moral catastrophe by including animals as primary sources of CEV values from the start.

Edit: confusion clarified in comment chain here.

[-]habryka6mo117

I... don't understand? I only care about my own values being included in the CEV. You only care about your own values (and you know, other sources of value correlated with your own) being included in the CEV. Why do I care if we include animals? They are not me. I very likely care about them and will want to help them, but I see absolutely no reason to make that decision right now in a completely irreversible way.

I do not want anyone else to get primary considerations for their values. Ideally it would all be my own! That's literally what it means to care about something.

I don't know what you are talking about with "they". You, just as much as me, just want to have your own values included in the CEV.

[-]Cole Wyeth6mo20

I seem to have had essentially this exact conversation in a different comment thread on this post with the OP.

[-]Viliam5mo82

As a quick check, do you believe that a CEV that is 50% humans and 50% spiders is preferable to a CEV that is 100% humans? (A future with a lot of juicy beings wrapped in webs while the acid dissolves them from inside -- seems to be something that spiders value a lot.)

[-]Julian Bradshaw5mo20

No, although if the "juicy beings" are only unfeeling bugs, that might not be as bad as it intuitively sounds.

There's a wrinkle to my posts here where partly I'm expressing my own position (which I stated elsewhere as "I'd want human-like sapients to be included. (rough proxy: beings that would fit well in Star Trek's Federation ought to qualify)") and partly I'm steelmanning the OP's position, which I've interpreted as "all beings are primary sources of values for the CEV".

In terms of how various preferences involving harming other beings could be reconciled into a CEV: yeah it might not be possible. Maybe the harmed beings are simulated/fake somehow? Maybe animals don't really have preferences about reality vs. VR, and every species ends up in their own VR world...

[-]MichaelDickens6mo-1-6

I expect that the CEV of human values would indeed accord moral status to animals. But including humans-but-not-animals in the CEV still seems about as silly to me as including Americans-but-not-foreigners and then hoping that the CEV ends up caring about foreigners anyway.

[-]Julian Bradshaw6mo20

I think you've misunderstood what I said? I agree that a human CEV would accord some moral status to animals, maybe even a lot of moral status. What I'm talking about is "primary sources of values" for the CEV, or rather, what population is the AI implementing the Coherent Extrapolated Volition of? Normally we assume it's humanity, but OP is essentially proposing that the CEV be for "all beings everywhere", including animals/aliens/AIs/plants/whatever.

[-]MichaelDickens6mo20

I think we are on the same page, I was trying to agree with what you said and add commentary on why I'm concerned about "CEV with humans as the primary source of values". Although I was only responding to your first paragraph not your second paragraph. I think your second paragraph also raises fair concerns about what a "CEV for all sentient beings" looks like.

[-][anonymous]6mo1-7

It seems likely enough to me (for a ton of reasons, most of them enunciated here) that "the CEV of an individual human" doesn't really make sense as a concept, let alone "the CEV of humanity" or even more broadly "the CEV of all beings everywhere."

More directly though, the Orthogonality Thesis alone is sufficient to make "the CEV of all beings everywhere" a complete non-starter unless there are so few other kinds of beings out there that "the CEV of humanity" would likely be a good enough approximation of it anyway (if it actually existed, which I think it doesn't).

[-]Julian Bradshaw6mo20

I admit:

Human preferences don't fully cohere, especially when extrapolated
There are many ways in which "Humanity's CEV" is fuzzy or potentially even impossible to fully specify

But I think the concept has staying power because it points to a practical idea of "the AI acts in a way such that most humans think it mostly shares their core values".^[1] LLMs already aren't far from this bar with their day-to-day behavior, so it doesn't seem obviously impossible.

To go back to agreeing with you, yes, adding new types of beings as primary sources of values to the CEV would introduce far more conflicting sets of preferences, maybe to the point that trying to combine them would be totally incoherent. (predator vs. prey examples, parasites, species competing for the same niche, etc etc.) That's a strong objection to the "all beings everywhere" idea. It'd certainly be simpler to enforce human preferences on animals.

^{^}
I think of this as meaning the AI isn't enforcing niche values ("everyone now has to wear Mormon undergarments in order to save their eternal soul"), is not taking obviously horrible actions ("time to unleash the Terminators!"), and is taking some obviously good actions ("I will save the life of this 3-year-old with cancer"). Obviously it would have to be neutral on a lot of things, but there's quite a lot most humans have in common.

LESSWRONG
LW

LESSWRONG
LW

21

Moral Alignment: An Idea I'm Embarrassed I Didn't Think of Myself

21

21