Mind Crime might flip the sign of the future.
If future contains high tech, underregulated compute and diverse individuals, then it's likely it will contain incredibly high amounts of most horrendous torture / suffering / despair / abuse.
It's entirely possible you can have 10^15 human-years per second of real time on a small local compute cluster. If such amounts of compute can be freely available to individuals no questions asked, then it's probable some of them will decide to use them to explore undesirable mental states.
*slaps roof of a server rig* This bad boy can sample, evaluate and discard as many chickens as they did in their whole history in just two minutes. In fact it's doing it right now. And I have another 200 of them. Why? Uhhhhh, chicken backed crypto of course.
For context, you can estimate that over all history so far there were around 10^12 chicken-hours. It's such a small number if you have any of the advanced compute substrate.
Considerations like this heavily depend on how do you view it, more like a ratio to all the experiences or as an absolute priority over good states.
This consideration might just straight up overwhelm your prioritization of today not-very-scalable sufferings. And it's not very longtermist worry, this would start to be a major consideration this century and probably in the next 20 years.
Libertarian proposals like https://www.transhumanaxiology.com/p/the-elysium-proposal have such a flaw, they contain vast amounts of the worst sufferings. "Hell on my property is none of your business!" It's pretty bleak tbh
EDIT "10^15 human-years per second of real time" is unlikely. Given football field of solar panels, you probably can do at most 10^-3 human-years per second of real time, so that 18 OOMs up from that would probably look like a substantial investment of energy, noticeable on the scale of the solar system.
I agree with "If future contains high tech, underregulated compute and diverse individuals, then it's likely it will contain incredibly high amounts of most horrendous torture / suffering / despair / abuse.", but think this mechanism probably causes a small fraction of the expected suffering in the future.
I'm also very skeptical that this suffering (as in, just the suffering due to the mechanism you describe) flips the sign on the future if we assume some non-extreme tradeoff ratio between optimized suffering and optimized goodness (or fun or whatever you want to call this) given the relatively small scale of this type of suffering.
I agree that abstract proposals for distributing and governing the future should allow for prohibitions on extreme torture (including torture of digital minds), and I hope that these governance mechanisms will vote/decide to ban extreme torture.
"Hell on my property is none of your business!"
The important thing is that the hell is open-source, so anyone can run it on their own property, too. /s
This kind of considerations (with positively valued computations as well) could be the basis for an economy of mostly sovereign slivers of compute under control of various individuals or groups, hosted within larger superintelligences. A superintelligent host might price their compute according to the value of what a tenant computes with it, as evaluated by the host. The tenants then have a choice between choosing computations more valuable to the host, and moving to a different host. (This is about managing tenants, rather than the host seeking out more optimal computations in general, a way of setting up laws for the tenants that maintain their value to the host within bounds, in a way that doesn't strongly interfere with their autonomy.)
This is an excellent point. It's one more reason that proliferation of AGI is simply not a viable path. We are right to be very concerned with centralization of power, but the distribution of rapidly and unevenly expanding unregulated power does not contain a stable equilibrium.
distribution of rapidly and unevenly expanding unregulated power does not contain a stable equilibrium
It might be stable? The question is, would it be a good one.
It would not be stable. The most vicious actors are incentivized to tell their AGI "hide and self improve and take over at any cost so I can have my preferred future" before anyone else does it.
This extendsall the way to tactics like sending the sun nova while storing some brain uploads on a mission to another star to start a new civilization.
Recognizing just how dangerous AGI proliferation is should help us steer away from trying it and having torture clusters (up until the most vicious actor creates their preferred world in the whole light cone - which might well also involve lots of suffering.)
It would not be stable. The most vicious actors are incentivized to tell their AGI "hide and self improve and take over at any cost so I can have my preferred future" before anyone else does it.
When there are superintelligences, the situation will plausibly be stable, because all intelligent activity will be happening under management of superintelligent governance. So there might be a point well below superintelligence when a sufficient level of coordination is established, and small groups of AGIs are unable to do anything of consequence (such as launching a mission to another star). Possibly not at human level, even with all the AI advantages, but as AGIs get stronger and stronger (regardless of their alignment), they might get there before superintelligence. (Not a safe thing for humanity of course. But stable.)
I'm unclear on whether you're saying that there would be a stable equilibrium among ASIs or whether there would be a singleton governing everything and allowing wide lattitude of action.
A single AGIs can achieve anything if it can self-improve to create ASI without anyone knowing. Working underground or elsewhere in the solar system seems hard to detect and prevent once we have the robotics to seed such an effort- which won't take long.
I did read and greatly enjoyed your linked post. I do think that's a plausible and underdeveloped area of thought. I don't find it all that likely for complex reasons, but it's definitely worth more thoughtl. I didn't get around to commenting on it; maybe I'll go do that to put that discussion in a better place.
Superintelligent governance serves as an anchor for the argument about mere AGIs I'm putting forward. I'm not distinguishing singleton vs. many coordinated ASIs, that shouldn't matter for the effectiveness of managing their tenants. The stable situation is where every Earth-originating intelligent entity in the universe must be either one of the governing superintelligences, or a tenant of one, and a tenant can't do anything too disruptive without their host's permission. So like with countries and residency, but total surveillance for anything potentially relevant and in principle lack of the bad things that would go along with total surveillance in a human government. Not getting the bad things seems likely for the factory-farming disanalogy reasons: tenants are not instrumentally useful anyway, so there is no point in doing things that would in particular end up having bad side effects for them.
So the argument is that you don't necessarily need superintelligence to make this happen, it could also work with sufficiently capable AGIs as the hosts. Even if merely human level AGIs are insufficient, there might be some intermediate level way below superintelligence that's sufficient. Then, a single AGI can't actually achieve anything in secret or self-improve to ASI, because there is no unsupervised hardware for it to run on, and on supervised hardware it'd be found out and prevented from doing that.
Right, thanks! That model of a governor and some tenants undergoing near-perfect surveillance is my only model for a stable long-term future. And sure this can happen at some level above human but below superintelligence.
I was just a little thrown by the multiple superintelligences. It has in the past seemed unlikely to me that we'd wind up with a long-term agreement among different superintelligences vs. one taking over by force. But I can't be sure!
It's seemed unlikely to me since the arguments for cooperation among superintelligences don't seem strong. Reading each others source code for perfect trustworthiness seems impossible in a learning network-based ASI. And timeless decision theory being so good that all superintelligences would necessarily follow it also seems implausible.
But I haven't thought through the game theory plus models of likely superintelligence alignment/goals well enough to be confident, so for all I know cooperating superintelligences is a likely outcome.
The relevance of this setup for "distribution of rapidly and unevenly expanding unregulated power" is that sufficiently strong AGIs might self-regulate in this way at some point even if not externally regulated, turning weaker AGIs (and humanity) harmless without even necessarily restricting their freedoms, including the freedom to chaotically proliferate more AGIs, other than in security-relevant and resource-relevant ways.
Coordination among AGIs or ASIs needn't be any more mysterious than coordination among people or nations. Knowledge of the source code is just a technical assumption that makes the math of toy examples go through. But plausibly for example machine learning itself gathers similarly useful data about the world, as we would want to be presented in the form of knowledge of the source code in such toy examples.
So if the AGIs/ASIs merely learn models of each other, in the same way they would learn models of the world, they might be in a qualitatively similar decision theoretic position to actually knowing the source code. And there is no need to posit purely acausal coordination, they can talk to each other and set up incentives. Also, in the case of AGIs with internals similar to modern chatbots, there is no source code in a meaningful sense, they are mostly models, and so the framing of understanding and knowing them that takes the form of other models is much more natural than knowing them in the form of some "source code".
Coordination among people isn't mysterious, but it's based in large part on properties that AGIs won't have. That's why I find hopes of stable collaborations optimistic in the absence of careful analysis of how they could be enforced or otherwise create lasting trust.
Humans collaborate in large part because:
So I'm not saying AGIs couldn't cooperate, just that it shouldn't be assumed that they can/will.
In the absence of those properties, they'd need to worry a lot about scheming while striking deals. If the alignment problem wasn't clearly solved in legible (to them) ways, they don't know if their collaborators will turn traitor when the time is right. Just like humans, except everyone might be (probably is) a sociopath who can multiply and grow without limit.
Incentives only work as long as there's the hard constraints of the situation prevent a collaborator from slipping out of them.
It would not be stable.
That is besides my point. I think you can make it stable, but anyway.
up until the most vicious actor creates their preferred world in the whole light cone - which might well also involve lots of suffering
There are some reasons to think default trajectory, of pragmatic victor, just get this evolution-created world duplicated many more times. Might be the baseline you have to improve on. Torture clusters might be worse outcome borne of uhh large but not quite enough ability to steer the values of the agent(s) that dominate.
Whether it can be made stable is quite central to my models of how we steer toward a good result. So, how do you think it could be made stable? And in that process, could you also make a little rule "no involuntary torturing or killing or controlling sentient beings; otherwise do what thou wilt".
So, how do you think it could be made stable?
By "stable" I meant "able to exist at all", as opposed to be conquered / merged into a singleton or something similar. And I didn't make a claim about the extent it's likely, more about how desirable it is. And what (value based, not existential/pragmatic) problems you would have to deal with in such a state.
I don't have a detailed plan / expectation on how exactly you could be working to achieve such a state. It just seems vaguely possible, I don't think I can offer you any new insight on that.
Just want to articulate one possibility of how the future could look like:
RL agents will be sooo misaligned so early, they would lie and cheat and scheme all the time, so that alignment becomes a practical issue, with normal incentives, and get iteratively solved for not-very-superhuman agents. Turns out it requires mild conceptual breakthroughs, as these agents are slightly superhuman, fast, and hard to supervise directly to just train away the adversarial behaviors in the dumbest way possible. It finishes developing by the time of ASI arrival and people just align it with a lot of effort, in the same manner as any big project requires a lot of effort.
I'm not saying anything about the probability of it. It honestly feels a bit overfitted, just like people who overupdated on base models talked for some time. But still, the whole LLM arc was kind of weird and goofy, so I don't trust my sense of weird and goofy anymore.
(would appreciate references of forecasting writeups exploring similar scenario)
Possible setup to test steganography capabilities:
Pick two tasks which model is incapable of doing without CoT. Show two task statements, require it to solve both problems with a single CoT and give two distinct answers to both tasks.
Then show the judge (a human or a model) only one pair question -> shared CoT -> answer, and require the judge to guess what was the other task.
You can do some variations on that, train the model to be better at this, train both models adversarially.
Or you can make some portion of tests to be just the one problem to solve, and part two problems. The judge has to guess was there another problem and if so what problem.
Reward probably IS an optimization target of RL agent if this agent knows some details of the training setup. Surely it would enhance its reward acquisition to factor this knowledge in? Then it gets reinforced, and then couple steps down that path agent thinks full time about quirks of its reward signal.
Could be bad at it, muddy, sure. Or schemey and hack the reward to get something else that is not the reward. But that's somewhat different thing than mainline thing? like, it's not as likely and a lot more diverse set of possibilities, imo.
The question of what happens after the end of the training is more like a free parameter here. "Do reward seeking behaviors according to your reasoning about the reward allocation process" becomes undefined when there is none and the agent knows it.
Monotheistic religion is kind of like having an immortal leader. Like, you just coordinate with other people to behave as if there is lord behind your back that commands you to do things. The Lord is invulnerable and undying, unsusceptible to threats. The Lord worked on the same goals for hundreds of years, you can see what people he commanded built, achieved, ruled.
You can pledge your allegiance to him without leaving your town, he has his representatives there. Or even without leaving the space of your thoughts.
Kind of like corporate personhood, but uhh moral construct/contract personhood?
Problem with it is that there are a lot of random factual claims baked into this contract. How the history went, who was the source of the information, how the world work in some ways. Clearly, it's not a loadbearing part of it, just some hit on compellingness, persuasiveness, schelling-pointness.
Is it possible to make a similar moral construct without baking in huge amount of bullshit? Can it be done artificially, with clear vision what one tries to build?
Suppose you are writing a simulation. You keep optimizing it and hardcoding some stuff and handle different cases more efficiently and everything. One day your simulation becomes efficient enough so that you can run big enough grid for long enough, and there develops life. Then intelligent life. Then they tried to figure out the physics of their universe, and they succeed! But, oh wait, their description is extremely short but completely computationally intractable.
Can you say that they actually figured out in what kind of universe they are in already, or should you wait for when they discover another million lines of code of optimization for it? Should you create giant sparkling letters "congratulations, you figured that out" or wait for more efficient formulation?
The convergent reason to simulate a world is to learn what happens there. When to intervene with letters depends on, uh. Why are you doing that at all?
(Edit: I suppose a congratulatory party is in order when they simulate you back with enough optimizations that you can talk to each other in real time using your mutual read access.)
I would really love if some "let's make asi" people put some effort into making bad outcomes less bad. Like, it would really suck if we are going to be trapped in endless corporate punk hell, with superintelligent nannies with correct (tm) opinions. Or infinite wedding parties or whatever. Just make sure that if you fuck up we all just get eaten by nanobots please. Permanent entrapment in misery would be a lot worse.
Suppose you know that there is an apple in this box. You will modify your memory then, to think that the box is empty. You open the box, expecting nothing there. Is there an apple?
Also, what if there is another branch of the universe where there is no apple, and you in the "yes apple" universe did modify his memory and you are both identical now. So there are two identical people in different worlds, one with box-with-apple, the other one with box-without-apple.
Should you, in the world with apple and yet unmodified memory anticipate 50% chance to experience empty box after opening it?
If you got confused about the setup here is a diagram: https://i.imgur.com/jfzEknZ.jpeg
I think it's identical to the problem when you get copied in two rooms, numbered 1 and 2, then you should expect 50% of 1 and 50% of 2 even if there is literally no randomness or uncertainty in what's going to happen. or is it?
So, implication's here is that you can squeeze yourself into different timelines by modifying your memory or what, am i going crazy here
In our solar system, the two largest objects are the Sun and Jupiter. Suspiciously, their radii both start with the number '69': the Sun's radius is 696,340 km, while Jupiter's is 69,911 km.
What percent of ancestral simulations have this or similarly silly "easter eggs". What is the Bayes factor
You might enjoy this classic: https://www.lesswrong.com/posts/9HSwh2mE3tX6xvZ2W/the-pyramid-and-the-garden
TLDR give pigs guns (preferably by enhancing individual baseline pigs, not by breeding new type of smart powerful pig. Otherwise it will probably just be two different cases. More like gene therapy than producing modified fetuses)
As of lately I hold an opinion that morals are proxy to negotiated cooperation or something, I think it clarifies a lot about the dynamics that produce it. It's like evolutionary selection -> human desire to care about family and see their kids prosper, implicit coordination problems between agents of varied power levels -> morals.
So, like, uplift could be the best way to ensure that animals are treated well. Just give them power to hurt you and benefit you, and they will be included into moral considerations, after some time for it to shake out. Same stuff with hypothetical p-zombies, they are as powerful as humans, so they will be included. Same with EMs.
Also, "super beneficiaries" are then just powerful beings, don't bother to research the depth of experience or strength of preferences. (e.g. gods, who can do whatever and don't abide by their own rules and perceived to be moral, as an example of this dynamics).
Also, pantheon of more human like gods -> less perceived power + perceived possibility to play on disagreements -> lesser moral status. One powerful god -> more perceived power -> stronger moral status. Coincidence? I think not.
Modern morals could be driven by a lot stronger social mobility. People have a lot of power now, and can unexpectedly acquire a lot of power later. so, you should be careful with them and visibly commit to treating them well (e.g. be moral person, with particular appropriate type of morals).
And it's not surprising how (chattel) slaves were denied a claim on being provided with moral considerations (or claim on being a person or whatever), in a strong equilibrium where they are powerless and expected to be powerless later.
tldr give pigs guns
(preferably by enhancing individual baseline pigs, not by breeding new type of smart powerful pig. Otherwise it will probably just be two different cases. More like gene therapy than producing modified fetuses)
I think this is misguided. It ignores the is-ought discrepancy by assuming that the way morals seem to have evolved is the "truth" of moral reasoning. I also think it's tactically unsound - the most common human-group reaction to something that looks like a threat and isn't already powerful enough to hurt us is extermination.
I DO think that uplift (of humans and pigs) is a good thing on its own - more intelligence means more of the universe experiencing and modeling itself.
It ignores the is-ought discrepancy by assuming that the way morals seem to have evolved is the "truth" of moral reasoning
No? Not sure how do you got that from my post. Like, my point is that morals are baked in solutions to coordination problems between agents with different wants and power levels. Baked into people's goal systems. Just as "loving your kids" is a desire that was baked in from reproductive fitness pressure. But instead of brains it works on a level of culture. I.e. Adaptation-Executers, not Fitness-Maximizers
I also think it's tactically unsound - the most common human-group reaction to something that looks like a threat and isn't already powerful enough to hurt us is extermination.
Eh. I think it's one of the considerations. Like, it will probably not be that. It's either ban on everything even remotely related or some chaos when different regulatory systems trying to do stuff.