Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Part 3 of AI, Alignment, and Ethics. This will probably make more sense if you start with Part 1.

In parts 1 and 2 I discussed how to sensibly select an ethical system for your society, and the consequences for AIs. What about uploads? They're neither AIs, not biological humans, but somewhere in between: what concerns does this raise and how could they best be handled? An important (and sometimes underappreciated) issue here is that uploads are not aligned (to humans, or even to each other), because humans are not aligned to other humans.

One Upload, One Vote

Let us suppose that, sooner or later, we have some means of doing whole-brain uploading and emulation. Possibly the reading process is destructive, and produces about a liter of greyish-pink goo as a by-product, or possibly not. Or perhaps it's more of a simulation effect, just one based on gigabytes, terabytes, or petabytes of data about an individual, rather than actually simulating every synapse and neurotransmitter flow, such that you end up with a sufficiently accurate emulation of their behavior. Bearing in mind the Bitter Lesson, this might even be based on something like a conventional transformer architecture that doesn't look at all like the inner complexities of human brain, but which has been trained to simulate it in great detail (possibly pretrained for humans in general, then fine-tuned on a specific individual), perhaps even down to some correlates of detailed neural firing patterns. The technical details don't really matter much.

I am not in fact a carbon-chauvinist. These uploads are person-like, intelligent and agentic systems, they have goals, they can even talk, and, importantly, their goals and desires are exactly what you'd expect for a human being. They are a high fidelity-copy of a human, they will have all the same desires and drives as a human, and they will get upset if they're treated as slaves or second-class citizens, regardless of how much carbon there may or many not be in the computational substrate they're running on. Just like you or I would (or indeed as likely would any member of pretty-much any sapient species evolved via natural selection).

If someone knew in advance that uploading themself meant becoming a slave or a second-class citizen, they presumably wouldn't do it, perhaps short of this being the only way to cheat death. They'd also campaign, while they were still alive, for upload rights. So we need to either literally or effectively forbid uploading, or else we need give uploads human rights, as close as we reasonably can.

Unlike the situation for AIs, there is a very simple human-fairness-instinct-compatible solution for how to count uploads in an ethical system. They may or may not have a body now, but they did once. So that's what gets counted: the original biological individual, back when they were individual. Then, if you destructively upload yourself, your upload inherits your vote and your human rights, is counted once in utility summations, and so forth. If your upload then duplicates themself, backs themself up, or whatever, there's still only one vote/one set of human rights/one unit of moral worth to go around between the copies, and they or we need some rules for how to split or assign this. Or, if you non-destructively upload yourself, you still only have one vote/set of human rights/etc, and it's now somehow split or assigned between the biological original of you still running on your biological brain and the uploaded copy of you, or even multiple copies of your upload.

With this additional rule, then the necessary conditions for the human fairness instinct to make sense are both still obeyed in the presence of uploads: they care about the same good or bad things as us, and via this rule they can be counted. So that's really the only good moral solution that fir the human sense of fairness.

OK, so we give uploads votes/human rights/moral worth. What could go wrong?

Humans are Not Aligned

I have seen people on Less Wrong assume that humans must automatically aligned with human values — I can only imagine on the basis that "they have human values, so they must be aligned to them" This is flat out, dangerously false. Please, please never make this mistake. Joseph Stalin was not well-aligned with the utility or values of the citizenry of Russia. History makes it clear that a vanishingly small proportion of people who become autocrats (possibly a biased sample) are anything like well-aligned with the well-being of the citizens of their countries: humans give autocracy a bad name. If humans were aligned, we would never have bothered to develop law enforcement, or locks.

Having human desires yourself and being pretty-much-aligned with other people's human desires are very different things. In humans, the latter behavior is unusual enough to have a special name: it's called 'love'. Admittedly, it is the case that humans generally (apart from the few percent who are sociopaths/psychopaths) have some sort of functioning conscience, a human instinctive sense of thing like fairness, and an appreciation that, for example, kittens are cute. So they're generally not quite as badly misaligned as, say, a paperclip maximizer; but they are still a long, long way from being well-aligned. An actual well-aligned agent is one that has no self-interested desires, its only interest in self-preservation is as a convergent goal so it can keep helping others, it's only terminal goal is a selfless desire for whatever's best for all of the humans it's aligned with, it already understands what this takes very well, and it wants to get even better at understanding it. Not even the human parental-love instinct can do that level of absolute selflessness — not even in extremely good parents. Morally speaking, an actual well-aligned agent isn't anything like a self-interested human: by human moral standards, it's well past the criteria for sainthood, more like an angel: something just completely out-of distribution for humans.

So, HUMAN UPLOADS ARE NOT ALIGNED. Not even the best of them, and the worst even less so. They cannot be trusted with the sort of super-human capabilities that we're trusting an ASI with. You might recall that "Power corrupts, and absolute power corrupts absolutely". We know how to keep humans with ordinary human capabilities under control and acting in a civilized way in a society: we have an entire complex legal system for doing this (and even then, in most jurisdictions the clearance rate for crimes short of murder is shockingly low). Giving a psychopath, criminal, or even just anyone corruptible by power even mildly superhuman capabilities is the physical realization of a fictional evil genius, and even someone very moral is in danger of becoming a micromanaging prescriptive autocrat. To keep superhuman enhanced uploads honest we're going to need an even smarter ASI legal system ready and able to catch, convict, and contain them them. Law enforcement is just a necessary system if you're trying to build a society of any significant size out of humans.

Digital minds have multiple advantages over biological ones: they can potentially be run a lot faster, they can be backed up, they can be duplicated, sufficiently similar ones can be remerged, they're much easier to modify, improve, train, or otherwise adds skills or knowledge to, they can communicate at much higher bandwidths, it's far easier for them to self-improve… These advantages are among the standard, well-known reasons why, if we ever build non-aligned digital minds as smart or smarter than us, they will almost certainly out-compete us.

Human uploads are not well-aligned, and depending on the details of the technology used, they will share some or all of these digital advantages. They certainly can be backed up, run faster, and almost certainly copies with at least small differences (some number of subjective minutes, hours, days, or months, say) can effectively be merged. They can certainly communicate with each other at a good fraction of the bandwidth of the peripheral nervous system, and with a little work likely at a good fraction of the bandwidth of the corpus callosum. At least minor upgrades are likely to be possible, larger ones may or may not be very hard.

If we don't want uploads to massively out-compete biological humans, and create a society where being biological is only for "breeders" who then upload as quickly as they are able, these inherent digital advantages of uploads are going to need to be compensated for in some way. Either these various technical possibilities need to be simply forbidden (possibly other than one-or-two of the most harmless, such backing yourself up), or else they need to come with costs/controls attached to them, such as loss of mental privacy, requirements to be modified to be more aligned than you were before, or close supervision by an aligned ASI capable enough to keep you in line. Or at very least, a long list of deontological rules and guidelines about what you can and can't do with these capabilities, and an ASI legal system willing and able to enforce them.

P(DOOM|uploading)

I have seen writers on Less Wrong suggest that uploading is the only solution to the alignment problem, if we can just manage it soon enough. I think that is not just dangerously wrong, but actually has things backwards. In my opinion, aligning AI (hard as that looks) is the only solution to the uploading problem. While uploading might work out successfully for whoever uploads early and then wins the race to self-improve fastest, and thus ends up running everything, it's very likely going to work out very badly for everyone else — including most of the uploads. Uploads aren't just not aligned with humans, they're also not aligned with each other, either. They are competitors, and competing with anything a lot smarter than you is a losing game. Dictatorships are just as possible in a virtual world, and thought police, torture, and mind control are all actually far more effective in a virtual world. Even even if you uploaded early and then upgraded yourself to IQ 1000, if someone with the morals of Joseph Stalin has also uploaded themself and then upgraded themselves to IQ 10,000, more than anyone else, then (in the absence of aligned AI) you are absolutely screwed. Even if the winner of the upgrading race only has the morals of one of the less-ethical tech billionaires, you're probably in a great deal of trouble. Imagine what many humans would do if they had superhuman powers of persuasion with respect to everyone else. Even if they intended to only use these for good, it's still not going to work out well.

Alignment, creating an AI with the morals of an angel who can actually safely be trusted with absolute power, looks to be a very hard technical task. Figuring out how to do the same thing to a human upload, and then persuading all of them to let you do it to them, is a technically even harder task with a political nightmare bolted on top of it. The only practical solution I can see to having humans and/or uploads with IQs that differ by more then maybe a factor of two or three (let alone by orders of magnitude) all interacting in the same society without the smartest ones massively taking advantage of the rest, is to have a well-aligned ASI legal system at least twice as smart as the smartest of the uploads making sure it doesn't happen and that everyone plays nice. Personally, I see human uploading as a lot more dangerous than artificial intelligence: my , the chance of uploading going badly without us first solving the alignment problem and creating ASI, is significantly higher than my  from AI without uploading. And, as noted above, the doom from uploading going badly even affects the great majority of the uploads, nor just the biological humans. So it's an x-risk.

I suspect many Transhumanists must just not have thought very hard about the resulting social pressures or possible failure modes from uploading followed by significant self-enhancement, or else they're a lot more trusting or idealistic than I am. How on Earth do you build a fair, functional, and stable society out of rapidly self-enhancing humans that isn't simply a winner-take-all race dynamic? I'd really like to know…

Virtual Palaces

One claim that I can imagine someone making is that virtual dictators won't be a big problem, because all any of them will want is virtual palaces full of virtual food and virtual servants, all of which can easily be provided, unlike their real-world equivalents (or at least, can easily be provided if you have AI to design these and act as the servants).

Sadly, I don't believe this for a moment. While they certainly will want these things, people also want things like power, security, financial reserves, the ability to make people do things the way the dictator thinks they should, the admiration of their near-peers, the adulation of the masses, and of course food, palaces, and servants for all their relatives: including real-world ones for their still-biological relatives. Many of these things involve imposing their will not just on their own personal virtual world, but everyone else's worlds, virtual and real. Dictators do not generally retire once they have a really nice dacha or even a multi-billion-dollar palace. In fact, they basically never retire (other than to the Hague), since once you've made that many enemies, it becomes no longer a safe option to relinquish power. [I have sometimes wondered whether the world would be a better place if the US or the UN established an upmarket analogue of the Witness Protection Program for dictators who were ready to retire…]

New Comment
5 comments, sorted by Click to highlight new comments since: Today at 5:16 AM

I think you make some good points here.... except, there is one path I think you didn't explore enough.

What if humanity is really stuck on AI alignment, and uploading has become a possibility, and making a rogue AGI agent is a possibility. If these things are being held back by fallible human enforcement, it might then seem that humanity is in a very precarious predicament.

A possible way forward using an Uploaded human then, could be to go the path of editing and monitoring and controlling it. Neuroscience knows a lot about how the brain works. Given that starting point, and the ability to do experiments in a lab where you have full read/write access to a human brain emulation, I expect you could get something far more aligned than you could with a relatively unknown artificial neural net.

Is that a weird and immoral idea? Yes. It's pretty dystopian to be enslaving and experimenting on a human(ish) mind. If it meant the survival of humanity because we were in very dire straights... I'd bite that bullet.

[-]Dagon5mo-4-5

I suspect your modeling of “the fairness instinct” is insufficient. Historically, there were many periods of time where slaves or mostly-powerless individuals were the significant majority. Even today, there are very limited questions where one-person-one-vote applies. Even in the few cases where that mechanism holds, ZERO allow any human (not even any embodied human) to vote. There are always pretty restrictive criteria of membership and accident of birth that limit the eligible vote population.

As I discuss in Part 1 A Sense of Fairness, societies have been becoming distinctly more egalitarian over the last few centuries. My suggestion is that having a large oppressed class has been becoming less viable as technology improved. As social and technological complexity increases, sabotage becomes more effective. This effect is going to become more and more the case as weapons of mass destruction become available to terrorists, revolutionaries, and anyone else sufficiently upset about their lot in life. A high-tech society needs to be egalitarian, because it can't afford to have even small numbers of highly disaffected people with the technical skill to cause massive damage.

[-]Dagon5mo-3-3

Sorry, kind of bounced off the part 1 - didn't agree, but couldn't find the handle to frame my disagreement or work toward a crux.  Which makes it somewhat unfair (but still unfortunately the case) to disagree now.

I like the focus on power (to sabotage or defect) as a reason to give wider voice to the populace.  I wonder if this applies to uploads.  It seems likely that the troublemakers can just be powered down, or at least copied less often.

So which aspect(s) of part 1 didn't you agree with? (Maybe we should have a discussion there?)