Sometimes, I say some variant of “yeah, probably some people will need to do a pivotal act” and people raise the objection: “Should a small subset of humanity really get so much control over the fate of the future?”
(Sometimes, I hear the same objection to the idea of trying to build aligned AGI at all.)
I’d first like to say that, yes, it would be great if society had the ball on this. In an ideal world, there would be some healthy and competent worldwide collaboration steering the transition to AGI.
Since we don’t have that, it falls to whoever happens to find themselves at ground zero to prevent an existential catastrophe.
A second thing I want to say is that design-by-committee… would not exactly go well in practice, judging by how well committee-driven institutions function today.
Third, though, I agree that it’s morally imperative that a small subset of humanity not directly decide how the future goes. So if we are in the situation where a small subset of humanity will be forced at some future date to flip the gameboard — as I believe we are, if we’re to survive the AGI transition — then AGI developers need to think about how to do that without unduly determining the shape of the future.
The goal should be to cause the future to be great on its own terms, without locking in the particular moral opinions of humanity today — and without locking in the moral opinions of any subset of humans, whether that’s a corporation, a government, or a nation.
(If you can't see why a single modern society locking in their current values would be a tragedy of enormous proportions, imagine an ancient civilization such as the Romans locking in their specific morals 2000 years ago. Moral progress is real, and important.)
But the way to cause the future to be great “on its own terms” isn’t to do nothing and let the world get destroyed. It’s to intentionally not leave your fingerprints on the future, while acting to protect it.
You have to stabilize the landscape / make it so that we’re not all about to destroy ourselves with AGI tech; and then you have to somehow pass the question of how to shape the universe back to some healthy process that allows for moral growth and civilizational maturation and so on, without locking in any of humanity’s current screw-ups for all eternity.
Unfortunately, the current frontier for alignment research is “can we figure out how to point AGI at anything?”. By far the most likely outcome is that we screw up alignment and destroy ourselves.
If we do solve alignment and survive this great transition, then I feel pretty good about our prospects for figuring out a good process to hand the future to. Some reasons for that:
- Human science has a good track record for solving difficult-seeming problems; and if there’s no risk of anyone destroying the world with AGI tomorrow, humanity can take its time and do as much science, analysis, and weighing of options as needed before it commits to anything.
- Alignment researchers have already spent a lot of time thinking about how to pass that buck, and make sure that the future goes great and doesn’t have our fingerprints on it, and even this small group of people have made real progress, and the problem doesn't seem that tricky. (Because there are so many good ways to approach this carefully and indirectly.)
- Solving alignment well enough to end the acute risk period without killing everyone implies that you’ve cleared a very high competence bar, as well as a sanity bar that not many clear today. Willingness and ability to diffuse moral hazard is correlated with willingness and ability to save the world.
- Most people would do worse on their own merits if they locked in their current morals, and would prefer to leave space for moral growth and civilizational maturation. The property of realizing that you want to (or would on reflection want to) diffuse the moral hazard is also correlated with willingness and ability to save the world.
- Furthermore, the fact that — as far as I know — all the serious alignment researchers are actively trying to figure out how to avoid leaving their fingerprints on the future, seems like a good sign to me. You could find a way to be cynical about these observations, but these are not the observations that the cynical hypothesis would predict ab initio.
This is a set of researchers that generally takes egalitarianism, non-nationalism, concern for future minds, non-carbon-chauvinism, and moral humility for granted, as obvious points of background agreement; the debates are held at a higher level than that.
This is a set of researchers that regularly talk about how, if you’re doing your job correctly, then it shouldn’t matter who does the job, because there should be a path-independent attractor-well that isn't about making one person dictator-for-life or tiling a particular flag across the universe forever.
I’m deliberately not talking about slightly-more-contentful plans like coherent extrapolated volition here, because in my experience a decent number of people have a hard time parsing the indirect buck-passing plans as something more interesting than just another competing political opinion about how the future should go. (“It was already blues vs. reds vs. oranges, and now you’re adding a fourth faction which I suppose is some weird technologist green.”)
I’d say: Imagine that some small group of people were given the power (and thus responsibility) to steer the future in some big way. And ask what they should do with it. Ask how they possibly could wield that power in a way that wouldn’t be deeply tragic, and that would realistically work (in the way that “immediately lock in every aspect of the future via a binding humanity-wide popular vote” would not).
I expect that the best attempts to carry out this exercise will involve re-inventing some ideas that Bostrom and Yudkowsky invented decades ago. Regardless, though, I think the future will go better if a lot more conversations occur in which people take a serious stab at answering that question.
The situation humanity finds itself in (on my model) poses an enormous moral hazard.
But I don’t conclude from this “nobody should do anything”, because then the world ends ignominiously. And I don’t conclude from this “so let’s optimize the future to be exactly what Nate personally wants”, because I’m not a supervillain.
The existence of the moral hazard doesn’t have to mean that you throw up your hands, or imagine your way into a world where the hazard doesn’t exist. You can instead try to come up with a plan that directly addresses the moral hazard — try to solve the indirect and abstract problem of “defuse the moral hazard by passing the buck to the right decision process / meta-decision-process”, rather than trying to directly determine what the long-term future ought to look like.
Rather than just giving up in the face of difficulty, researchers have the ability to see the moral hazard with their own eyes and ensure that civilization gets to mature anyway, despite the unfortunate fact that humanity, in its youth, had to steer past a hazard like this at all.
Crippling our progress in its infancy is a completely unforced error. Some of the implementation details may be tricky, but much of the problem can be solved simply by choosing not to rush a solution once the acute existential risk period is over, and by choosing to end the acute existential risk period (and its associated time pressure) before making any lasting decisions about the future.
(Context: I wrote this with significant editing help from Rob Bensinger. It’s an argument I’ve found myself making a lot in recent conversations.)
Note that I endorse work on more realistic efforts to improve coordination and make the world’s response to AGI more sane. “Have all potentially-AGI-relevant work occur under a unified global project” isn’t attainable, but more modest coordination efforts may well succeed.
And I’m not stupid enough to lock in present-day values at the expense of moral progress, or stupid enough to toss coordination out the window in the middle of a catastrophic emergency with human existence at stake, etc.
My personal CEV cares about fairness, human potential, moral progress, and humanity’s ability to choose its own future, rather than having a future imposed on them by a dictator. I'd guess that the difference between "we run CEV on Nate personally" and "we run CEV on humanity writ large" is nothing (e.g., because Nate-CEV decides to run humanity's CEV), and if it's not nothing then it's probably minor.
See also Toby Ord’s The Precipice, and its discussion of “the long reflection”. (Though, to be clear, a short reflection is better than a long reflection, if a short reflection suffices. The point is not to delay for its own sake, and the amount of sidereal time required may be quite short if a lot of the cognitive work is being done by uploaded humans and/or aligned AI systems.)
What the heck is this supposed to mean? Great according to the Inherent Essence Of Goodness that lives inside futures, rather than as part of human evaluations? Because I've got bad news for that plan.
Honestly, I'm disappointed by this post.
You say you've found yourself making this argument a lot recently. That's fair. I think it's totally reasonable that there are some situations where this argument could move people in the right direction - maybe the audience is considering defecting about aligning AI with humanity but would respond to orders from authority. Or maybe they're outsiders who think you are going to defect, and you want to signal to them how you're going to cooperate not just because it's a good idea, but because it's an important moral principle to you (as evolution intended).
But this is not an argument that you should just throw out scattershot. Because it's literally false. There is no single attractor that all human values can be expected to fall into upon reflection. The primary advantage of AI alignment over typical philosophy is that when alignment researchers realize some part of what they were previously calling "alignment" is impossible, they can back up and change how they're cashing out "alignment" so that it's actually possible - philosophers have to keep caring about the impossible thing. This advantage goes away if we don't use it.
Yes, plenty of people liked this post. But I'm holding you to a high standard. Somewhere people should be expected to not keep talking about the impossible thing. Somewhere, there is a version of this post that talks about or directly references:
The rest of the quote explains what this means:
The present is "good on its own terms", rather than "good on Ancient Romans' terms", because the Ancient Romans weren't able to lock in their values. If you think this makes sense (and is a good thing) in the absence of an Inherent Essence Of Goodness, then there's no reason to posit an Inherent Essence Of Goodness when we switch from discussing "moral progress after Ancient Rome" to "moral progress after circa-2022 civilization".
Could you be explicit about what argument you're making here? Is it something like:
Regarding the second argument: I don't think that Catholicism is stable under reflection (because it's false, and a mind needs to avoid thinking various low-complexity true thoughts in order to continue believing Catholicism), so I don't think the Catholic and hedonic utilitarian's CEVs will end up disagreeing, even though the optimum for Catholicism and for hedonic utilitarianism disagree.
(I'd bet against hedonic utilitarianism being true as well, but this is obviously a much more open question. And fortunately, CEV-ish buck-passing processes make it less necessary for anyone to take risky bets like that; we can just investigate what's true and base our decisions on what we learn.)
Catholicism is a relatively easy case, and I expect plenty of disagreement about exactly how much moral disagreement looks like the Catholicism/secularism debate. I expect a lot of convergence on questions like "involuntarily enslaving people: good or bad?", on the whole, and less on questions like "which do you want more of: chocolate ice cream, or vanilla ice cream?". But it's the former questions that matter more for CEV; the latter sorts of questions are ones where we can just let individuals choose different lives for themselves.
"Correlations tend to break when you push things to extremes" is a factor that should increase our expectation of how many things people are likely to morally disagree about. Factors pushing in the other direction include 'not all correlations work that way' and evidence that human morality doesn't work that way.
E.g., 'human brains are very similar', 'empirically, people have converged a lot on morality even though we've been pushed toward extremes relative to our EAA', 'we can use negotiation and trade to build value systems that are good compromises between two conflicting value systems', etc.
Also 'the universe is big, and people's "amoral" preferences tend to be about how their own life goes, not about the overall distribution of matter in the universe'; so values conflicts tend to be concentrated in cases where we can just let different present-day stakeholders live different sorts of lives, given the universe's absurd abundance of resources.
Nate said "it shouldn’t matter who does the job, because there should be a path-independent attractor-well that isn't about making one person dictator-for-life or tiling a particular flag across the universe forever", and you said this is "literally false". I don't see what's false about it, so if the above doesn't clarify anything, maybe you can point to the parts of the Arbital article on CEV you disagree with (https://arbital.com/p/cev/)? E.g., I don't see Nate or Eliezer claiming that people will agree about vanilla vs. chocolate.
Footnote 2 says that Nate isn't "stupid enough to toss coordination out the window in the middle of a catastrophic emergency with human existence at stake". If that isn't an argument 'cooperation is useful, therefore we should take others' preferences into account', then what sort of argument do you have in mind?
I don't know what you mean by "egalitarianism", or for that matter what you mean by "why". Are you asking for an ode to egalitarianism? Or an argument for it, in terms of more basic values?
The present is certainly good on my terms (relative to ancient Rome). But the present itself doesn't care. It's not the type of thing that can care. So what are you trying to pack inside that phrase, "its own terms"?
If you mean it to sum up a meta-preference you hold about how moral evolution should proceed, then that's fine. But is that really all? Or are you going to go reason as if there's some objective essence of what the present's "own terms" are - e.g. by trying to apply standards of epistemic uncertainty to the state of this essence?
I'll start by quoting the part of Scott's essay that I was particularly thinking of, to clarify:
What's the claim I'm projecting onto Nate, that I'm saying is false? It's something like: "The goal should be to avoid locking in any particular morals. We can do this by passing control to some neutral procedure that allows values to evolve."
And what I am saying is something like: There is no neutral procedure. There is no way to avoid privileging some morals. This is not a big problem, it's just how it is, and we can be okay with it.
Related and repetitive statements:
So as you can see, I wasn't really thinking about differences between "the CEV" of different people - my focus was more on differences between ways of implementing CEV of the same people. A lot of these ways are going to be more or less as good - like comparing your favorite beef stew vs. a 30-course modernist meal. But not all possible implementations of CEV are good, for example you could screw up by modeling exposing people to extreme or highly-optimized stimuli when extrapolating them, leading to the AI causing large changes in the human condition that we wouldn't presently endorse.
By egalitarianism I mean building an AI that tries to help all people, and be responsive to the perspectives of all people, not just a select few. And yes, definitely an ode :D
I would say that there's a logical object that a large chunk of human moral discourse is trying to point at — something like "the rules of the logical game Morality", analogous to "the rules of the logical game Chess". Two people can both be discussing the same logical object "the rules of Chess", but have different beliefs about what that logical object's properties are. And just as someone can be mistaken or uncertain bout the rules of chess — or about their interaction in a specific case — someone can be uncertain about morality.
Do you disagree with any of that?
In the CEV Arbital page, Eliezer says:
"Even the terms in CEV, like 'know more' or 'extrapolate a human', seem complicated and value-laden."
If the thing you're saying is that CEV is itself a complicated idea, and it seems hard for humanity to implement such a thing without already having a pretty deep understanding of human values, then I agree. This seems like an important practical challenge for pulling off CEV: you need to somehow start the bootstrapping process, even though our current understanding of human values is insufficient for formally specifying the best way to do CEV.
If instead you just mean to say "there's no reason to favor human values over termite values unless you already care about humans", then yeah, that seems even more obvious to me. If you think Nate is trying to argue for human morality from a humanity-indifferent, View-From-Nowhere perspective, then you're definitely misunderstanding Nate's perspective.
If "neutral" here means "non-value-laden", then sure. If "neutral" here means "non-arbitrary, from a human POV", then it seems like an open empirical question how many arbitrary decisions like this are required in order to do CEV.
I'd guess that there are few or no arbitrary decisions involved in using CEV to answer
high-takeshigh-stakes moral questions.
This makes me think that you misunderstood Nate's essay entirely. The idea of "don't leave your fingerprints on the future" isn't "try to produce a future that has no basis in human values". The idea is "try to produce a future that doesn't privilege the AGI operator's current values at the expense of other humans' values, the values humans would develop in the future if their moral understanding improved, etc.".
If you deploy AGI and execute a pivotal act, don't leave your personal fingerprints all over the long-term future of humanity, in a way that distinguishes you from other humans.
When I think about the rules of chess, I basically treat them as having some external essence that I have epistemic uncertainty about. What this means mechanistically is:
So the rules of chess are basically just a pattern out in the world that I can go look at. When I say I'm uncertain about the rules of chess, this is epistemic uncertainty that I manage the same as if I'm uncertain about anything else out there in the world.
The "rules of Morality" are not like this.
So there's a lot of my uncertainty about morality that doesn't stem from being unaware about facts. Where does it come from? One source is self-modeling uncertainty - how do I take the empirical facts about me and the world, and use that to construct a model of myself in which I have preferences, so that I can reflect on my own preferences? There are multiple ways to do this.
So if, and I'm really not sure, but if you were thinking of everything as like uncertainty about the rules of chess, then I would expect two main mistakes: expecting there to be some procedure that takes in evidence and spits out the one right answer, and expecting aggregating over models for decision-making to look like linear aggregation.
Well, maybe I misunderstood. But I'm not really accusing y'all of saying "try to produce a future that has no basis in human values." I am accusing this post of saying "there's some neutral procedure for figuring out human values, we should use that rather than a non-neutral procedure."
My read was more "do the best we can to get through the acute risk period in a way that lets humanity have the time and power to do the best it can at defining/creating a future full of value." And that's in response and opposed to positions like "figure out / decide what is best for humanity (or a procedure that can generate the answer to that) and use that to shape the long term future."
I think what you're saying here ought to be uncontroversial. You're saying that should a small group of technical people find themselves in a position of enormous influence, they ought to use that influence in an intelligent and responsible way, which may not look like immediately shirking that responsibility out of a sense that nobody should ever exert influence over the future.
I have the sense that in most societies over most of time, it was accepted that of course various small groups would at certain time find themselves in positions of enormous influence w.r.t. their society, and of course their responsibility in such a situation would be to not shirk that responsibility but to wisely and unilaterally choose a direction forward for their society, as required by the situation at hand.
I have the sense that what would be ideal is for humanity to proceed with wisdom. The wisest moves we've made as a species to date (ending slavery? ending smallpox? landing on the moon?) didn't particularly look like "worldwide collaborations". Why, actually, do you say that the ideal would be a worldwide collaboration?
Why should a small subset of humanity not directly decide how the future goes? The goal ought to be good decision-making, not large- or small-group decision making, and definitely not non-decision-making.
Of course the future should not be a tightly scripted screenplay of contemporary moral norms, but to decide that is to decide something about how the future goes. It's not wrong to make such decisions, it's just important to get such decisions right.
I think Nate might've been thinking of things like:
I interpret Nate as making a concession to acknowledge the true and good aspects of the 'but isn't there something off about a random corporation or government doing all this?' perspective, not as recommending that we (in real life) try to have the UN build AGI or whatever.
I think your pushback is good here, as a reminder that 'but isn't there something off about a random corporation or government doing all this?' also often has less-reasonable intuitions going into it (example), and gets a weird level of emphasis considering how much more important other factors are, considering the track record of giant international collaborations, etc.
I'm guessing you two basically agree, and the "directly" in "a small subset of humanity not directly decide" is meant to exclude a "tightly scripted screenplay of contemporary moral norms"?
Nate also has the substantive belief that CEV-ish approaches are good, and (if he agrees with the Arbital page) that the base for CEV should be all humans. (The argument for this on Arbital is a combination of "it's in the class of approaches that seem likeliest to work", and "it seems easier to coordinate around, compared to the other approaches in that class". E.g., I'd say that "run CEV on every human whose name starts with a vowel" is likely to produce the ~same outcome as "run CEV on every human", but the latter is a better Schelling point.)
I imagine if Nate thought the best method for "not tightly scripting the future" were less "CEV based on all humans" and more "CEV based on the 1% smartest humans", he'd care more about distinctions like the one you're pointing at. It's indeed the case that we shouldn't toss away most of the future's value just for the sake of performative egalitarianism: we should do the thing that actually makes sense.
Yeah I also have the sense that we mostly agree here.
I have the sense that CEV stands for, very roughly, "what such-and-such a person would do if they became extremely wise", and the hope (which I think is a reasonable hope) is that there is a direction called "wisdom" such that if you move a person far enough in that direction then they become both intelligent and benevolent, and that this eventually doesn't depend super much on where you started.
The tricky part is that we are in this time where we have the option of making some moves that might be quite disruptive, and we don't yet have direct access to the wisdom that we would ideally use to guide our most significant decisions.
And the key question is really: what do you do if you come into a position of really significant influence, at a time when you don't yet have the tools to access the CEV-level wisdom that you might later get? And some people say it's flat-out antisocial to even contemplate taking any disruptive actions, while others say that given the particular configuration of the world right now and the particular problems we face, it actually seems plausible that a person in such a position of influence ought to seriously consider disruptive actions.
I really agree with the latter, and I also contend that it's the more epistemically humble position, because you're not saying that it's for sure that a pivotal act should be performed, but just that it's quite plausible given the specifics of the current world situation. The other side of the argument seems to be saying that no no no it's definitely better not to do anything like that in anything like the current world situation.
The thing I'd say in favor of this position is that I think it better fits the evidence. I think the problem with the opposing view is that it's wrong, not that it's more confident. E.g., if I learned that Nate assigns probability .9 to "a pivotal act is necessary" (for some operationalization of "necessary") while Critch assigns probability .2 to "a pivotal act is necessary", I wouldn't go "ah, Critch is being more reasonable, since his probability is closer to .5".
I agree with the rest of what you said, and I think this is a good way of framing the issue.
I'd add that I think discussion of this topic gets somewhat distorted by the fact that many people naturally track social consensus, and try to say the words they think will have the best influence on this consensus, rather than blurting out their relevant beliefs.
Many people are looking for a signal that stuff like this is OK to say in polite society, or many others are staking out a position "the case for this makes sense intellectually but there's no way it will ever attract enough support, so I'll preemptively oppose it in order to make my other arguments more politically acceptable". (The latter, unfortunately, being a strategy that can serve as a self-fulfilling prophecy.)
Not the most important bit but, how is landing on the moon an example of a wise move?
Well the photos taken from the moon did seem to help a lot of people understand how vast and inhospitable the cosmos are, how small and fragile the Earth is, and how powerful -- for better or worse -- we humans had by that point become.
Everything a person does puts their fingerprints on the future. Our present is made of the fingerprints of the past. Most people leave only small fingerprints, soon smoothed out by the tide of time, yet like sedimentary rock, they accumulate into a great mass. Some have spent their lives putting the biggest fingerprints they can on the future, for good or ill, fingerprints that become written of in history books. Indeed, history is the documenting of those fingerprints.
So there is no question of leaving no fingerprints, no trace of ourselves, no question of directing us towards a future that is good "on its own terms". We can only seek to leave good fingerprints rather than bad ones, and it is we — for some value of "we" — who must make those judgements. Moral progress means moral change in a desirable direction. Who can it be who desires it, but us?
I think there's something confusing that soares has done by naming this post "don't leave your fingerprints on the future", and I think for clarity to both humans and ais who read this later, the post should ideally be renamed "don't over-mark the future of other souls without consent" or something along those lines; the key message should be all life @ all life, or something.
To avoid misunderstanding, the kinds of "Pivotal Acts" you are talking about involve using an AGI to seize absolute power on a global scale. The "smallest" pivotal act given by Yudkowsky would still be considered an Act of War by any country affected.
The above is obvious to anyone who reads this forum but it's worth emphasizing the magnitude of what is being discussed.
I understand your argument to be as follows:
A. A small group of humans gaining absolute control over the lightcone is bad (but better than a lot of other options).
B. But because it's better than the other options, there is a moral imperative to commit a "pivotal act" if given the opportunity.
C. It is morally correct that this group then give up their power and safely hand it back to humanity.
I have two strongly held objections:
This argument overlooks the political and moral landscape we live in.
The political landscape:
Most (all?) groups of humans that have seized power and stated paternalistic intentions toward the subjugated people have abused that power in horrific ways. Everyone claims they're trying to help the world and most humans genuinely believe it. We're fantastic at spinning narratives that we're the good guys. Nobody will trust your claim that you will give up power nor should they.
If any domestic agency was to take understand your intentions and believe you realistically had the capacity to carry them out you would be (at best) detained. Realistically, if a rogue group of researchers were close to completing an AGI with the goal of using it to take over to world, then nuclear weapons would be warranted.
There is also going to be a strong incentive to seize or steal the technology during development. The hardware required for the "good guys" to perform a pivotal act will be dual use. The "bad guys" can also use it to perform a pivotal act.
The ideal team of fantastic, highly moral scientists probably won't be the ones who make the final decisions about what the future looks like.
The moral landscape:
In the worlds where we have access to the global utility function, then seizing power to improve the world makes objective sense. In the world we actually live in, if you find yourself wanting to seize power to improve the world, there's a good chance you're closer to a mad scientist (though a well intentioned one).
I don't know of any human (or group of humans) on the planet I would trust to actually give up absolute power.
This post itself will influence researchers who admire you in a way that is harmful.
This post serves to spread the idea that a unilateral pivotal act is inevitably the only realistic way to save humanity. But by writing this post, you're driving up closer to that world by discouraging people from looking into alternatives.
That wouldn't be a tragedy if I were a Roman.
Yes it would, at least if you mean their ancient understanding of morals.
Can you elaborate? Why would locking in Roman values not be a great success for a Roman who holds those values?
Roman values aren't stable under reflection; the CEV of Rome doesn't have the same values as ancient Rome. It's like a 5-year-old locking in what they want to be when they grow up.
Locking in extrapolated Roman values sounds great to me because I don't expect that to be significantly different than a broader extrapolation. Of course, this is all extremely handwavy and there are convergence issues of superhuman difficulty! :)
The point is that as moral attitudes/thoughts change, societies or individuals which exist long enough will likely come to regret permanently structuring the world according to the morality of a past age. The Roman will either live to regret it, or the society that follows the Roman will come to regret it even if the Roman dies happy, or the AI is brainwashing everyone all the time to prevent moral progress. The analogy breaks down a bit with the third option since I'd guess most people today would not accept it as a success and it's today's(ish) morals that might get locked in, not ancient Rome's.
In thinking about what alignment should be aiming towards, the phrase I've been coming back to a lot recently is "sustainable cooperative harmony".
"Sustainable" because an aligned future shouldn't exploit all resources until collapse. "Cooperative" because an aligned future should be about solving coordination problems, steering the goals of every subsystem into alignment with those of every other subsystem. "Harmony" because an aligned future should be beautiful, like the trillions of cells working in concert to create a unified organism, or like the hundreds of diverse instruments playing diverse notes in a unified symphony.
Harmony in particular is key here. In music, the principles of harmony are not about following pre-defined paths for building a musical piece, as though this chord should always follow that chord or this note should always pair with that note. There is no fixed formula for creating good harmony. Instead, creating harmony is really about avoiding disharmony. Certain notes will sound discordant when played together, so as long as you avoid that, the symphony can theoretically have as much going on at once as you would like. (Melody is similar, except that it is harmony stretched across time in sequence rather than in parallel.)
Similarly, harmony at the level of future civilization should be about avoiding conflicts, not forcing everyone into lockstep with a certain ideal. Maximizing diversity of thought, behavior, values, and goals while still preventing the butting of heads. Basically the libertarian ideal of "your right to swing your fist ends at my face", except more generalized.
There are innumerably many right ways to create Eutopia, but there are vastly more ways to create dystopia. Harmonization/alignment is about finding the thin manifold in civilization-space where disharmony/conflict/death/suffering is minimized. The key insight here, though, is that it is a manifold, not a single-point target. Either way, though, there is almost no way that human minds could find our way there on our own.
If we could build an AGI that is structurally oriented around creating sustainable cooperative harmony in any system it's involved in (whether repairing a human body, creating prosocial policies for human civilization, or improving the ecological structure of the biosphere), then we would have a shot at creating a future that's capable of evolving into something far beyond what we could design ourselves.
Isn't the problem that groups and individuals differ as to what they view as 'deeply tragic'?
This seems circular.
[only sort of on-topic] I'm concerned about coercion within CEV. Seems like to compute CEV, you're in some sense asking for human agency to develop further, whether within a simulation or hypothetical reasoning / extrapolation by an AI, or in meatspace. But what a human is, seems to require interacting with other humans. And if you have lots of humans interacting, by default many of them will in some contexts be in some sense coercive; e.g. making threats to extort things from each other, and in particular people might push other people to cut off their own agency within CEV (or might have done that beforehand).
Then that isn't the CEV operation.
The CEV operation tries to return a fixed point of idealized value-reflection. Running immortal people forward inside of a simulated world is very much insufficiently idealized value-reflection, for the reasons you suggest, so simply simulating people interacting for a long time isn't running their CEV.
How would you run their CEV? I'm saying it's not obvious how to do it in a way that both captures their actual volition, while avoiding coercion. You're saying "idealized reflection", but what does that mean?
Yeah, fair -- I dunno. I do know that an incremental improvement on simulating a bunch of people in an environment philosophizing is doing that but running an algorithm that prevents coercion, e.g.
I imagine that the complete theory of these incremental improvements (for example, also not running a bunch of moral patients for many subjective years while computing the CEV), is the final theory we're after, but I don't have it.
Like, encoding what "coercion" is would be an expression of values. It's more meta, and more universalizable, and stuff, but it's still something that someone might strongly object to, and so it's coercion in some sense. We could try to talk about what possible reflectively stable people / societies would consider as good rules for the initial reflection process, but it seems like there would be multiple fixed points, and probably some people today would have revealed preferences that distinguish those possible fixed points of reflection, still leaving open conflict.
This really doesn't prove anything. That measurement shouldn't be taken by our values, but by the values of the ancient romans.
Sure of course the morality of the past gets better and better. It's taking a random walk closer and closer to our morality. Now moral progress might be real.
The place to look is inside our own value functions, if after 1000 years of careful philosophical debate, humanity decided it was a great idea to eat babies, would you say, "well if you have done all that thinking, clearly you are wiser than me". Or would you say "Arghh, no. Clearly something has broken in your philosophical debate"? That is a part of your own meta value function, the external world can't tell you what to think here (unless you have a meta meta value function. But then you have to choose that for yourself)
It doesn't help that human values seem to be inarticulate half formed intuitions, and the things we call our values are often instrumental goals.
If, had ASI not been created, humans would have gone extinct to bioweapons, and pandas would have evolved intelligence, it the extinction of humans and the rise of panda-centric morality just part of moral progress?
If aliens arrive, and offer to share their best philosophy with us, is the alien influence part of moral progress, or an external fact to be removed?
If advertisers basically learn to brainwash people to sell more product, is that part of moral progress?
Suppose, had you not made the AI, that Joe Bloggs would have made an AI 10 years later. Joe Bloggs would actually have succeeded at alignment. And would have imposed his personal whims on all humanity forever. If you are trying not to unduely influence the future, do you make everyone beholden to the whims of Joe, as they would be without your influence.
Wait. The whole point of the CEV is to get the AI to extrapolate what you would want if you were smarter and more informed. That is, the delta from your existing goals to your CEV should be unknowable to you, because if you know your destination you are already there. This sounds like your object level values. And they sound good, as judged by your (and my) object level values.
I mean there is a sense in which I agree that locking in say your favourite political party, or a particular view on abortion, is stupid. Well I am not sure that particular view on abortion would be actually bad, it would probably have near no effect in a society of posthuman digital minds. These are things that are fairly clearly instrumental. If I learned that after careful philosophical consideration, and analysis of lots of developmental neurology data, people decided abortion was really bad, I would take that seriously. They have probably realized a moral truth I do not know.
I think I have a current idea of what is right, with uncertainty bars. When philosophers come to an unexpected conclusion, it is some evidence that the conclusion is right, and also some evidence the philosopher has gone mad.