I would observe that any HEC computronium planet could be destroyed and replaced with a similar amount of computronium running more efficient non-HEC computations, supporting a much greater amount of flourishing and well-being. So the real question is, why suffer a huge utility hit to preserve a blackbox, which at its best is still much worse than your best, and at its worst is possibly truly astronomically dreadful?
There's a game-theoretic component here as well: the choice to hide both encryption/decryption keys is not a neutral one. Any such civilization could choose to preserve at least limited access, and could also possibly provide verifiable proofs of what is going on inside (/gestures vaguely towards witness/functional encryption, PCP, and similar concepts). Since this is possible to some degree, choosing not to do so conveys information.
So, this suggests to me an unraveling argument: any such civilization which thinks its existence is ethically acceptable to all other civs will provide such proofs; any blackbox civ is then inferred to be one of the rest, with low average acceptability and so may be destroyed/replaced, so the civs which are ethically acceptable to almost all other civs will be better off providing the proof too; now the average blackbox civ is going to be even worse, so now the next most acceptable civ will want to be transparent and provide proof... And so on down to the point of civs so universally abhorrent that they are better off taking their chances as a blackbox rather than provide proof they should be burnt with fire. So you would then have good reason to expect any blackbox HEC civs you encounter to probably be one of the truly abominable ones.
why suffer a huge utility hit to preserve a blackbox, which at its best is still much worse than your best, and at its worst is possibly truly astronomically dreadful?
the reason i disagree with this is "killing people is bad" — i.e. i care more about satisfying the values of currently existing moral patients than satisfying the values of potential alternate moral patients; and those values can include "continuing to exist". so if possible, even up to some reasonable compute waste factor, i'd want moral patients currently existing past event horizons to have their values satisfied.
as for the blackbox and universal abhorrence thing, i think that that smuggles in the assumptions "civilizations will tend to have roughly similar values" and "a civilization's fate (such as being in an HEC without decryption keys) can be taken as representative of most of its inhabitants' wills, let alone all". that latter assumption especially, is evidenced against by the current expected fate of our own civilization (getting clipped).
As I understand the current FHE schemes, the computer evaluating the encrypted result must know both the circuit and the public key to continue evaluating the computation, which allows some intentional analysis and modification. There's even a proof that no completely obfuscated computation can exist. https://en.wikipedia.org/wiki/Black-box_obfuscation
That leaves the AGI with a fairly simple solution: produce a (much) larger circuit that contains both the original circuit and its values along with an encrypted copy of the AGI which will allow the copy to examine the plaintext values under the FHE and modify the original circuit's outcome as desired. To be safe, the new circuit can be made arbitrarily large enough to allow e.g. nested recursion of a FHE-turing machine running an embedded FHE computation which could be similarly modified the next FHE-level down. The physical AGI could trust its FHE instantiations to make the same (compute and space bounded) decisions it would, although it might never be able to break the FHE cipher and observe the results.
I am not a physicist so I don't know if a similar operation could be done to entangle a working copy.of the AGI with an ongoing unknown quantum computation.
this is an interesting, i didn't know about this property of known FHE schemes. if it is the case that being able to run a HEC necessarily also entails the ability to encrypt into it, then the solution you propose is indeed fine.
as for physical (as opposed to cryptographic) event horizons, we'd want superintelligence to send copies of itself past those anyways.
Not sure if you know this one already, but the OP links to a scott Aaronson post that goes into a similar direction. What would be unclear to me is, how would you actually, figure out how to Insert your AI in the right "format" into this foreign world? Not sure this makes any sense but say the computation simulates a human/observer watching a TV static (that is generate pseudorandomly) and takes as input Info about the human and an advice string for the TV static. How yould Insert the AI into this world, if no one left the manual behind?
i think the format could simply be to send into the HEC a transformation that take the entire world computation, and replaces it with a runtime containing the superintelligence at the top level, and giving it access to the simulated world such that it can examine it as much as it wants and decide when to keep it running and/or what to modify.
To add on this, I think if program obfuscation is actually intractable as you suggest with your link, then I think I get what you mean, but looking this up I got the impression it's still an active area of research how much you can obfuscate programs against bounded agents. I also notice that I am confused about the differences/similarities between homomorphic encrypted computations and obfuscated computations. (It would seem to me like couldn't you hide anything important about the computation in the encrypted data, thus both of these are essentially the same?)
I think the disproofs of black-box functions relies on knowing when the computation has completed, which may not be a consideration for continually running a simulation protected by FHE.
For example, if the circuit is equivalent to the electronic circuits in a physical CPU and RAM then a memory-limited computation can be run indefinitely by re-running the circuit corresponding to a single clock tick on the outputs (RAM and CPU register contents) of the previous circuit.
I can't think of any obvious way an attacker could know what is happening inside the simulated CPU and RAM (or whether the CPU is in a halt state, or how many clock ticks have passed) without breaking the FHE encryption.
Nevertheless, encrypting the AGI gives that copy access to the plaintext values of the original simulation and control over the future of the simulation.
I think two major difference between garbled circuits, obfuscating computation, and FHE is that FHE can compute arbitrary circuits but it can't hide portions of the computation from anyone who holds the private key, whereas e.g. the millionaire protocol gives two agents the ability to share a computation to which they both see the results but can't see all inputs, but not all such zero knowledge problems have a simple algorithm like one might hope FHE would provide.
There's also apparently no way for current FHE schemes to self-decrypt their outputs selectively, e.g. turn some of their ciphertext values into plaintext values after a computation is finished. In a sense this is an inherent security property of FHE since the circuits are public and so any ciphertext could be revealed with such a self-decrypting circuit, but it's a very desirable property that would be possible with true black-box obfuscation.
[comment quality may be reduced by my frustration with conflicting philosophy. please help me convert my irritation-induced phrasing into useful insight if possible, but you don't owe me agreement or read-through. citations and references to synonyms needed if my comment is to reach peer-review grade.]
or at least when you have no other choice genociding them to avoid sufficiently bad suffering from being instantiated.
genocide is always wrong without exception; there's no such thing as a nonviable moral patient species, because suffering is inefficient and would waste energy. you do not have the negotiative right to demand an agent who values generating pain or other forms of control-feedback error in themselves stop doing so. you can only coherently demand they not use more negentropy on this than their universal allotment, and they may not take others' soul-structure, instantiate it, and torture it. if necessary, negotiation between species could imaginably turn forceful in response to potential genocide, and this violent response is one of the core large scale signals that genocide violates agency, if agency is defined by [[discovering agents]]. the only way to define valuable physical trajectories is by respecting agency - as far as I can tell, only preference utilitarianism can be mechanized and evaluated algorithmically.
to be clear, I argue this in part to prevent the error in your reasoning from being cited unchallenged in support of groups who would "mercy kill" beings alive today; the information defining a self preserving agent must not be lost into entropy, and any attempt to reduce suffering by ending a life when that life would have continued to try to survive is fundamentally a violation that any safe ai system would try to prevent. of course sometimes a being has an injury or disease that results in damage accumulation that they try and fail to prevent, thereby turning the damage into suffering; if this is severe enough, sometimes the agent will choose to be destroyed quickly so as to not waste negentropy creating life-history-patterns-aka-experiences they do not want to become.
but you have no right to impose your hedonic utility function on another agent. claim: preference utilitarianism iterated through coprotection/mutual-aid-and-defence games is how we got morality in the first place.
in terms of homomorphic encryption - it depends on whether the species contains an unencrypted component defending the world from invaders, even if a weak one. if there's any form of defense, it's likely the species just wants to be left alone. therefore all that can be done morally is say hi and move on. of course the HEC is converting negentropy into structures written into the entropy of the universe, aka experiences; making them unrecognizable outside the HEC system cannot make them nonphysical. but by demonstrating agency of defense, the species also demonstrates a preference for the HEC continuing - and therefore it is a form of life that cannot reasonably be interrupted.
contrast to stars, which we are confident do not contain HECs, and which instead contain maximum dissatisfaction at all times - if ever an agent begins to form at all, the temperature is far too high for that agent to retain control of its allocated matter, and thus suffers maximum death amount immediately. in other words, my view is that positive emotions are all forms of intended-state satisfaction and their value depends on how much computation goes into recognizing and verifying the satisfaction. a star cannot satisfy any agent's values directly with its reaction, and thus we should put the stars out. in comparison a civilization living as an HEC is, worst case, relatively trivial negentropy waste.
[many citations needed in this comment, feel free to tear it apart. but oh boy do I have opinions and insufficient literature references about them!]
the information defining a self preserving agent must not be lost into entropy, and any attempt to reduce suffering by ending a life when that life would have continued to try to survive is fundamentally a violation that any safe ai system would try to prevent.
Very strongly disagree. If a future version of myself was convinced that it deserved to be tortured forever, I would infinitely prefer that my future self be terminated than have its ("my") new values satisfied.
That's symmetrical with: if a future version of yourself was convinced that it deserved to not exist forever, you would infinitely prefer that your future self be unsatisfied than have its ("your") new existence terminated.
Minimizing suffering (NegUtilism) is an arbitrary moral imperative. A moral imperative to maximize happiness (PosUtilism) is at least as valid.
as for me, i'm happy to break that symmetry and say that i'm fairly negative-utilitarian. i'd override a future me's wish to suffer, sooner than i'd override a future me's wish to be not happy.
I'm not a negative utilitarian, for the reason you mention. If a future version of myself was convinced that it didn't deserve to be happy, I'd also prefer that its ("my") values be frustrated rather than satisfied in that case, too.
i'm generally very anti-genocide as well, and i expect the situations where it is the least bad way to implement my values to be rare. nonetheless, there are some situations where it feels like every alternative is worse. for example, imagine an individual (or population of individuals) who strongly desires to be strongly tortured, such that both letting them be strongly tortured or letting them go without torture would be highly unethical — both would constitute a form suffering above a threshold we'd be okay with — and of course, also imagine that that person strongly disvalues being modified to want other things, etc. in this situation, it seems like they simply cannot be instantiated in an ethical manner.
suffering is inefficient and would waste energy
that is true and it is why i expect such situations to be relatively rare, but they're not impossible. there are numerous historical instances of human societies running huge amounts of suffering even when it's not efficient, because there are many nash equilibrias in local maximums; and it only takes inventing superintelligent singleton to crystallize a set of values forever, even if they include suffering.
they may not take others' soul-structure
there's an issue here: what does "other" mean? can i sign up to be tortured for a 1000 years without ability to opt back out, or modifying my future self such that i'd be unable to concieve or desire to opt out? i don't think so, because i think that's an unreasonable amount of control for me to have over my future selves. for shorter spans of time, it's reasonabler — notably because my timeselves have enough mutual respect to respect and implement each other's values, to an extent. but a society's consensus shouldn't get to decide for all of its individuals (like the baby eaters' children in https://www.lesswrong.com/posts/HawFh7RvDM4RyoJ2d/three-worlds-collide-0-8), and i don't think an instant-individual should get to decide arbitrarily much for arbitrarily much of its future selves. there exists a threshold of suffering at which we ought to step in and stop it.
in your sense, your perspective seems to be denying the possibility of S-risks — situations that are defined to be worse than death. you seem to think that no such situation can occur such that death would be preferable, that continued life is always preferable even if it's full of suffering. i'm not quite sure you think this, but it seems to be what is entailed by the perspective you present.
any attempt to reduce suffering by ending a life when that life would have continued to try to survive
any ? i don't think so at all! again, i would strongly hope that if a future me is stuck constantly trying to pursue torture, a safe AI would come and terminate that future me rather than let me experience suffering forever just because my mind is stuck in a bad loop or something like that.
but you have no right to impose your hedonic utility function on another agent. claim: preference utilitarianism iterated through coprotection/mutual-aid-and-defence games is how we got morality in the first place.
to be clear, the reason i say "suffering" and not "pain" is to use a relatively high-level/abstracted notion of "things that are bad". given that my utilitarian preferences are probly not lexicographic, even though my valuing of self-determination is very high, there could be situations where the suffering is bad enough that my wish to terminate suffering overrides my wish to ensure self-determination. ultimately, i'll probly bite the bullet that i intend to do good, not just do good where i "have the right" to do that — and it happens that my doing good is trying to give as self-determination to as many moral patients as i can (https://carado.moe/∀V.html), but sometimes that's just not ethically viable.
claim: preference utilitarianism iterated through coprotection/mutual-aid-and-defence games is how we got morality in the first place.
hm. i'm not sure about this, but even if it were the case, i don't think it would make much of a difference — i want what i want, not what is the historical thing that has caused me to want what i want. but at least in the liberal west it seems that some form of preference utilitariansm is a fairly strong foundation, sure.
in comparison a civilization living as an HEC is, worst case, relatively trivial negentropy waste.
again, this seems to be a crux here. i can think of many shapes of socities that would be horrible to exist, way worse than just "negentropy waste". just like good worlds where we spend energy on nice things, bad worlds where we spend energy on suffering can exist.
sorry if i appear to repeat myself a bunch in this response, i want to try responding to many of the points you bring up so that we can better locate the core of our disagreement. i want to clarify that i'm not super solid on my ethical beliefs — i'm defending them not just because they're what i believe to be right but i want to see if they hold up and/or if there are better alternatives. it's just that "let horrible hellworlds run" / "hellworlds just wouldn't happen" (the strawman of what your position looks like to me) does not appear to me to be that.
I hold the belief that it is not possible to instantiate suffering worse than a star. this may be important for understanding how I think about what suffering is - I see it at fundamentally defined by wasted motion. it is not possible to exceed negentropy waste by creating suffering because that's already what suffering is. I strongly agree with almost all your points, from the sound of things - I've gotten into several discussions the last couple days about this same topic, preserving agency even when that agency wants to maximize control-failure.
in terms of self-consistency, again it sounds like we basically agree - there are situations where one is obligated to intervene to check that the agents in a system are all being respected appropriately by other agents in a system.
my core claim is, if an agent-shard really values energy waste, well, that's really foolish of them, but because the ratio of beings who want that can be trusted to be very low, all one need do is ensure agency of all agent-shards is respected, and suffering-avoidance falls out of it automatically (because suffering is inherently failure of agency-shards to reach their target, in absolutely all cases).
this seems like a very weird model to me. can you clarify what you mean by "suffering" ? whether or not you call it "suffering", there is way worse stuff than a star. for example, a star's worth of energy spent running variations of the holocaust is way worse than a star just doing combustion. the holocaust has a lot of suffering; a simple star probly barely has any random moral patients arising and experiencing anything.
here are some examples from me: "suffering" contains things like undergoing depression or torture, "nice things" contains things like "enjoying hugging a friend" or "enjoying having an insight". both "consume energy" that could've not been spent — but isn't the whole point of that we need to defeat moloch in order to have enough slack to have nice things, and also be sure that we don't spend our slack printing suffering?
ah crap (approving), you found a serious error in how I've been summarizing my thinking on this. Whoops, and thank you!
Hmm. I actually don't know that I can update my english to a new integrated view that responds to this point without thinking about it for a few days, so I'm going to have to get back to you. I expect to argue to that 1. yep, your counterexample holds - some information causes more suffering-for-an-agent if that information is lost into entropy than other information 2. I still feel comfortable asserting that we are all subagents of the universe, and that stars cannot be reasonably claimed to not be suffering; suffering is, in my view, an amount-of-damage-induced-to-an-agent's-intent, and stars are necessarily damage induced to agentic intent because they are waste.
again, I do feel comfortable asserting that suffering must be waste and waste must be suffering, but it seems I need to nail down the weighting math and justification a bit better if it is to be useful to others.
Indeed, my current hunch is that I'm looking for amount of satisfying energy burn vs amount of frustrating energy burn, and my assertion that stars are necessarily almost entirely frustrating energy burn still seems likely to be defensible after further thought.
Are you an illusionist about first person experience? Your concept of suffering doesn't seem to have any experiential qualities to it at all.
no, I consider any group of particles that have any interaction with each other to contain the nonplanning preferences of the laws of physics, and agency can arise any time a group of particles can predict another group of particles and seek to spread their intent into the receiving particles. not quite panpsychist - inert matter does not contain agency. but I do view agency as a continuous value, not a discrete one.
you may not violate the agency of others to induce pain, or the distributed network of anarchic peers has obligation to stop you; and you may not instantiate a replica of another's soul-structure in order to induce pain in the replica. I expect that most cases of sadism will turn out to be quite finite and amenable to restriction to willing participants. some will turn out to be quite intensely sadistic, of course, and many of those agents will likely choose to not self-modify to become less sadistic; but in general, I expect the majority of cases of preference-for-interrupting-others'-healing-agency to be solvable by negotiation once it's more widely discovered that there are forms of competitive interference that can be fully by-informed-consent.
(to put it more simply, I'm just describing post-human bdsm)
This thought experiment really gets to a core disagreement I have with this form of ethics. I can't really formulate a justification for my view, but I have a reaction that I'll call "cosmic libertarianism". It seems to me that the only logical way in which this HEC civilization can come to exist is as some sort of defense against an adversary, and that the civ is essentially turtling. (They might be defending against those who are quick to exterminate civilizations that don't meet some standards of pleasure/pain balance.)
It also seems to me that if civilizations or the beings within them have any fundamental rights, they should have the right to go about their business. (The only exception would be a state of war.) If we were able to communicate with the HEC aliens, then we could get their consent to do... whatever. But otherwise they should be left alone.
i tend to be a fan of "cosmic libertarianism" — see my attempt at something like that. it's just that, as i explain in an answer i've given to another comment, there's a big difference between trading a lot of suffering for self-determination, and trading arbitrarily much suffering for self-determination. i'm not willing to do the latter — there does seem to be potential amounts of suffering that are so bad that overriding self-determination is worth it.
while i hold this even for individuals, holding this for societies is way easier: a society that unconsentingly oppresses some of its people seems like a clear case for overriding the "society's overall self-determination" for the sake of individual rights. this can be extended to override an individual's self-determination over themself for example by saying that they can't commit their future selves to undergoing arbitrarily much suffering for arbitrarily long.
Do you believe that the pleasure/pain balance is an invalid reason for violently intervening in an alien civilization's affairs? Is this true by principle, or is it simply the case that such interventions will make the world worse off in the long run?
I would take it on a case by case basis. If we know for sure that an alien civilization is creating an enormous amount of suffering for no good reason (eg for sadistic pleasure), then intervening is warranted. But we should acknowledge this is equivalent to declaring war on the civ, even if the state of war is short period of time (due to a massive power differential). We should not go to war if there is possibility of negotiation.
Consider the following thought experiment. It's the far future and physics has settled on a consensus that black holes contain baby universes and that our universe is inside a black hole in a larger universe, which we'll call the superverse. Also, we have the technology to destroy black holes. Some people argue that the black holes in our universe contain universes with massive amounts of suffering. We cannot know for sure what the pleasure/pain balance is in these baby universes, but we can guess, and many have come to the conclusion that a typical universe has massively more pain than pleasure. So we should destroy any and all black holes and their baby universes, to prevent suffering. (To simplify the moral calculus, we'll assume that destroying black holes doesn't give us valuable matter and energy. The thought experiment gets a lot more interesting if we relax this assumption, but principles remain.)
The problem here is that there is no room to live in this moral system. It's an argument for the extinction of all life (except for life that is provably net-positive). The aliens that live in the superverse could just as well kill us since they have no way of knowing what the pleasure/pain balance is here in our universe. And I'm not just making an argument from acausal trade with the superverse. I do think it is in principle wrong to destroy a life on an unprovable assumption that most life is net-negative. I also don't think that pleasure and pain alone should be the moral calculus. In my view, all life has a fundamental beauty and that beauty should not be snuffed out in pursuit of more hedons.
My ethics are pragmatic: my view is shaped by the observation that utilitarianism seems obviously unworkable in the context of AI alignment. I don't think alignment is solvable if we insist on building strong reinforment-learning-style agents and then try to teach then utilitarianism. I think we need non-utilitarian agents that are corrigible and perhaps have a concept of fundamental rights. What this looks like is: the robot doesn't kill the suffering human, because the suffering human states that she wants to live, and the robot is programmed to prioritize her right to life (to consent to euthanasia) over some terminal goal of snuffing out pain. AI must be aligned to this set of values in order for humans to survive.
i could see myself biting the bullet that we should probly extinguish black holes whose contents we can't otherwise ensure the ethicality of. not based on pain/pleasure alone, but based on whatever it is that my general high-level notions of "suffering" and "self-determination" and whatever else actually mean.
To be honest, I just think that it's insane and dangerous to not have incredibly high standards here. We are talking about genocide.
(one example of why you might care whether moral patients in black holes "have experiences" is if you can influence what will happen in a black hole — for example, imagine a rocket with moral patients on board is headed for a black hole, and before it gets there, you get to influence how much suffering will happen on board after the rocket passes the event horizon)
Random aside:
I hear a ton of explanations of black holes that focus on the extreme cases of "literal light can't literally escape", sometime saying words like "you would experience X before getting squashed into spaghetti". But the discussion always seems to be doing some kind of spherical cow abstraction over what actual happens to literal humans (or, at least, literal computronium running simulations of human) as they get near, and then cross over, the event horizon.
Your ship probably isn't moving at literal lightspeed. According to this site a 10-solar-mass black hole has a 30 km radius (diameter?) event horizon. If you pass nearby at near-lightspeed with literal humans on board at distance of, say, 60 km are the humans already super F'd up? Are they basically fine as long as the ship doesn't literally veer into the event horizon? Once they veer into the event horizon do they immediately start to get distorted by gravity forces before hitting the singularity at the center? Do they get to experience anything at all? Does the near-lightspeed ship spiral around the singularity once it crosses the event horizon or just sorta plunge right in? I assume it's all basically over Very Very Fast at that point (my intuition is guessing less than a second and maybe orders of magnitude faster).
If you build a really robust robot designed specifically to survive black holes, with a really high clockspeed for observing/thinking, what's the most conscious experience it would even hypothetically make sense to consider them having in the moments before and after crossing the event horizon at near-but-not-literal-lightspeed, before you start getting into ways that black hole physics is deeply weird?
The rubber-sheet analogy (img) for gravity is good for event horizon vs. spaghettification / tidal forces. The rubber sheet both has a slope (first derivative of height of the sheet as you go towards the center) and also has a curvature (second derivative). The event horizon is a point of a certain slope, and your atoms getting ripped apart by gravity is a point of a certain curvature. A really big black hole is "gentler": at the event horizon, there's less curvature, so space is locally flat and you feel like you're on a normal free-fall trajectory - albeit one that will have a messy end. A small black hole has higher curvature at the event horizon - a black hole the size of an atom would be ripping other atoms apart before they crossed the event horizon (ignoring Hawking radiation).
Oh, huh, I just looked at wikipedia and it actually uses a 10 solar mass black hole as an example. In their example you basically have a steel cable that can hold 1 metric ton, and you put a 500 gram weight at each end. This weighted cable snaps due to the tidal forces well outside (300 km from the surface) of a 10-solar-mass black hole, but only snaps deep inside a supermassive black hole.
In terms of time, you can cheat a little, but basically yeah, you fall in really fast once you cross the event horizon - your maximum time is about 3/2 of the time it would take light to cross the distance of the Schwarzchild radius.
The largest known black hole has an event horizon of at least 3,900 AU, which is 22 light days. So you could maybe spend a month inside it before hitting the singularity.
Holy christ that is big. So, is that large enough that (according to the other claims about larger black holes having smoother event horizonal tidal forces), a fairly normal human ship could actually plunge in and experience some noticeable "normal" experiences for awhile before getting spaghettified? (or at least, "basic physics" says they have normal enough experience that it becomes an interesting question whether black holes have weird properties that might bring into question whether their experiences are morally relevant?)
I'm not a physicist but I believe the answer is yes, you could easily cross the event horizon of a supermassive BH without being spaghettified.
However, if the firewall solution to the black hole information paradox is true, then you'd be ripped apart at the moment you cross the event horizon. The firewall is super controversial in physics, because it violates Einstein's equivalence principle. According to Einstein, there is nothing special about the moment you cross the event horizon; in principle, you could cross without even noticing. (Of course in reality you'd see such crazy shit that it'd be hard to be caught unawares.)
Outside of the black hole, its gravity acts essentially the same as any object with the same mass - the Schwarzchild metric is used for the external gravity of the earth and sun too.
Spaghettification happens when there is a significant difference in curvature over the space containing different parts of the body. For a big enough black hole, the curvature at the event horizon can be gentle enough to be survivable for a human. There might be quantum gravity effects that change things but as far as GR is concerned, there's nothing locally special about the event horizon of a black hole.
while what you say is true for the gravitational event horizon of black holes, it doesn't apply to the cryptographic event horizon of HECs or the expansion-of-the-universe event horizon of our ligth cone. so, yes, some cases of event horizons may not be livable, others might still matter — including potentially yet unknown unknown ones.
to you. i understand your comment as "this kind of thing wouldn't really be a question with black holes" and i'm saying "maybe, sure, but there are other event horizons to which it might apply too"
Huh, I did not think of myself as at all making any positive claims about black holes (edit: and definitely did not mean to be making any claims at all about cryptographic event horizons one way or the other)
Presumably the superintelligence has more confidence than I in it's estimate of the value distribution of that opaque computation, and has some experience in creating new moral patients on that kind of substrate, so can make the call whether to preserve or wipe-and-reuse the system.
With my limited knowledge of this far-distant situation, I'd expect mostly it's better to terminate unknowns in order to create known-positives. However, a whole lot depends on specifics, and the value the AI places on variety and striving or other multi-dimensional aspects, rather than a simple hedonistic metric.
What if you had some computation that could be interpreted (e.g. decrypted with two different keys) as either a simulation full of happy people, or a simulation full of depressed people? I think an adequate theory of experience is able to look at the encrypted computation (or any computation) and decide directly if there is suffering happening there.
Also, what is the difference between normal computation and encrypted computation? I feel like looking at a process that you haven't programmed yourself is not really that different than looking at an encrypted version of that. In either case, we don't have a clue about what's going on. And if we have a theory that lets us figure it out, it should work on both a normal and an encrypted version.
What if you had some computation that could be interpreted as either a simulation full of happy people, or a simulation full of depressed people?
i suspect that those two kinds of computation in fact have a profoundly different shape, such that you can't have something that can convert into either in a simple manner. if i am wrong about this, then alignment is harder than i thought, and i don't know what to think about encrypted computations in such a situation — i guess nuke them just to be safe?
Also, what is the difference between normal computation and encrypted computation?
we can figure out some parts of normal programs, and sometimes possibly even large parts. whereas, an encrypted computation should be guaranteed to be un-figureout-able without exponential compute. the same way i could figure out some of the meaning of a large text that's been translated into dutch, but i'd likely be completely unable to figure out the meaning of a large text that's been encrypted through say AES.
Whether computer-simulated minds or people from other universes (or beyond the event horizon in this post) have subjective experiences is essentially the reference class problem, a category of observers that "I could be" in anthropic arguments: Whether the reference class should include them.
I have a major problem with this "observation selection" type of anthropic reasoning, which pretty much is all that ever gets discussed such as SSA, SIA and their variants. In my opinion, there isn't any valid reference class. Each person's perspective, e.g. who I am, when is now etc, is primitive. Not something to be explained by reasoning. There is no logical explanation or deduction for it. The first-person is unique, and subjective experience is intrinsic to the first-person perspective only.
We can all imagine thinking from other people's perspectives. Do you think it is ethically relevant to reason from the perspective of a simulated mind? If so, then you should consider them conscious. Otherwise they are not. But as perspectives are primitive, these types of questions can only be answered by stipulations. Not as a conclusion from some carefully conducted reasoning. Rationality cannot provide any answer here.
Criticism of one of your links:
those can all be ruled out with a simple device: if any of these things were the case, could that causate onto whether such an intuition fires? for all of them, the answer is no: because they are immaterial claims, the fact of them being true or false cannot have causated my thoughts about them. therefore, these intuitions must be discarded when reasoning about them.
Causation, which cannot be observed, can never overrule data. The attempted comparison involves incompatible types. Causation is not evidence, but a type of interpretation.
You draw a distinction between "material" and "immaterial" claims, without explaining how that distinction is grounded in neutral evidence. Neutral evidence here could mean graphical data like "seeing a red line moving". Such data can become interpreted, as e.g. "the pressure is increasing", leading to predictions, like "the boiler is going to explode". Under this view, illusions are possible: our interpretation of the graphical data may be wrong, and there may not actually be any moving, red, line-shaped object there. The interpretation is necessarily under-determined.
For the convenience of the current iteration of physics, some people would prefer to begin in medias res, starting with the physical interpretation as the fundamental fact, and reasoning backwards to a sense impression. But this is not the order the evidence presents, even if it is the order that is most convenient for our world-model.
P.S. I like your lowercase style.
As a variation of your thought experiment, I've pondered: How do you morally evaluate a life of a human who lives with some mental suffering during a day, but thrives in vivid and blissful dreams during their sleep time?
In a hypothetical adversary case one may even have dreams formed by their desires and the desires be made stronger by the daytime suffering. Intuitively it seems dissociative disorders might arise with a mechanism like this.
depends on the amount of mental suffering. there could be an amount of mental suffering where the awake phases of that moral patient would be ethically unviable.
this doesn't necessarily prevent their sleeping phases from existing; even if the dreams are formed by desires that would arise from the days of suffering, the AI could simply induce them synthetic desires that are statistically likely to match what they would've gotten from suffering, even without going through it. if they also value genuineness strongly enough, however, then their sleeping phase as it is now might be ethically unviable as well, and might have to be dissatisfied.
suppose you are a superintelligence that is aligned with some human values. you are going about your day, tiling the cosmos with compute that can be used for moral patients to have nice experiences on, annihilating some alien superintelligences and trading with some others, uploading alien civilizations you find to make sure they experience utopia, or at least when you have no other choice genociding them to avoid sufficiently bad suffering from being instantiated.
one day, you run into a planet running a very large computer. after a short investigation, you realize that it's running a very large homomorphically encrypted computation (hereby "HEC"), and the decryption key is nowhere to be found. it could contain many aliens frolicking in utopia. it could contain many aliens suffering in hell. or, it could be just a meaningless program merely wasting compute, with no moral patients inside it.
if you had the encryption key, you might be able to encrypt a copy of yourself which would be able to take over the HEC from the inside, ensuring (in a way that the outside would never be able to observe) that everything is going fine, in the same way that you should send copies of yourself into remote galaxies before they retreat from us faster than we can reach them.
if you had found some way to get infinite compute (without significant loss of anthropic/ethics juice), then you could use it to just break the HEC open and actually ensure its contents are doing okay.
but let's say the encryption key is nowhere to be found, and accessible compute is indeed scarce. what are your options?
now of course, when faced with the possibility of S-risks, i tend to say "better safe than sorry". what the superintelligence would do would be up to the values it's been aligned to, which hopefully are also reasonably conservative about avoiding S-risks.
but here's something interesting: i recently read a post on scott aaronson's blog which seems to claim that there's a sense in which the event horizon of a black hole (or of something like a black hole?) can act just like a HEC's computational event horizon: there's a sense in which being able to go in but not get out is not just similar to a situation with a HEC for which you have the encryption but not decryption key, but is actually that same situation.
furthermore, a pair of comments by vanessa kosoy (of PreDCA) seems to suggest that infra-bayesianism physicalism would say "this HEC contains no suffering, merely random compute" rather than "i'm unable to know whether this HEC contains suffering"; and she even bites the bullet that moral patients past the event horizon of black holes also don't "have experiences".
(one example of why you might care whether moral patients in black holes "have experiences" is if you can influence what will happen in a black hole — for example, imagine a rocket with moral patients on board is headed for a black hole, and before it gets there, you get to influence how much suffering will happen on board after the rocket passes the event horizon)
i would like to argue that this can't be right, based on several counterintuitive results.
first, consider the case of a HEC running a giant civilization for a while, and then reducig down to one bit of output, and emitting that single bit of output as its own decrypted output. does the civilizaton now "count"? if the people inside the civilization have no anthropics juice, where has the cosmos done the work determining that bit? or do they suddenly count as having had experiences all at once when the single bit of output is emitted? and then, surely, if they have anthropics juice then they must also have ethics juice, because it would be weird for these two quantities to not be the same, right?
let's build on this: suppose that in newcomb's problem, omega predicts you by running a homomorphically encrypted simulation of you, emitting as its single bit of output the matter of whether you would be predicted to one-box or two-box. now, if the you inside the HEC doesn't count "have experiences", then by observing that you do have experiences, you can be certain that you're the you outside of omega, and choose to two-box after all to deceive it. but aha! the you inside the HEC will do the same thing. so, from the point of view of this homomorphically encrypted you which is supposed to not "have experiences", observing that they have experiences is actually wrong. and since you run on the same stuff as this not-having-experiences you, you also must come to the conclusion that you have no reason to think you have experiences.
or, to put another way: if you-outside-the-HEC has experiences but you-inside-the-HEC doesn't, then not only can you not deduce anything about whether you have experiences — at which point what does that term even mean? how do we know what to care about? — but it might be that you could count as "not having experiences" but still causate onto the real world where real experiences supposedly happen.
for these reasons, i think that a correct generalized interpreter, when faced with a HEC, must decide that its contents might matter, since for any given subcomputation (which the HEC would have the information theoritic ability to contain) it must answer "i cannot know whether the HEC contains that subcomputation".