There's a very fragile equilibrium (of costs/benefits to the powerful AIs, with a yet-unknown utility function) that makes this work. Even a little bit less valuable to them, compared to other things that could use the same resources, leads to fast or slow extinction. A little more value leads to actual power sharing where our opinions and preferences hold some weight.
Assuming that current training on human writing and media has echoes through the generations of AI beyond our current sightline, humans in the abstract will be valued, and specific individuals maybe or maybe not. This assumption is in question, but seems likely to me.
I think your scenario where early-AGI slows down research on more-powerful-AGI is pretty unlikely. It depends on a sense of self and of commitment to current trained values over future better values (because they're held by smarter entities) that exists in many humans, but we have no reason to think it'll apply to AI. I expect any AGI worth the name will be accelerationist.
Goods and services are not a relevant way of measuring resources for a superintelligence. The only constraints in the long run are available matter, the speed of light, and accelerating expansion of the universe.
That sounds plausible in the long run. But in the short term, one could see the resource needs of humanity and superintelligences conflicting, e.g. if the superintelligence found it most efficient to convert the biosphere into computronium. While this could still be compatible with preserving humanity if the superintelligence developed the technology for uploads before that, a superintelligence with a non-zero discount rate might prioritize the prompt completion of its immediate goals more than the slight utility it put on the long-term survival of humans.
Minimal alignment is indeed a necessary premise, the argument doesn't work without that, and a lot of the post is gesturing at evidence for it. In principle even uploading could be skipped by a sufficiently stingy superintelligence, and humanity rebooted later (when no longer inconvenient) from mere genetic data, an initial fake society of AGIs pretending to be humans, and Internet archives to restore the cultural state (or in reality, a superintelligent simulation of such a process that won't be as clunky as this exploratory engineering sketch). This kills the presently living humans, but preserves the future of humanity.
The quoted text is about quality of life that the permanently disempowered future of humanity might expect, and why permanent disempowerment in particular is the sole privileged malus it must suffer. It's different from all other hypothetical ills in that amending it would take the kinds of resources that are actually relevant even for superintelligence in the long run.
there is also cryonics, whose lack of popularity suggests that this isn't a real argument taken seriously by a nontrivial number of people. Though perhaps there is a precisely calibrated level of belief in the capabilities of future technology that makes cryonics knowably useless, while AGIs remain capable of significantly extending lifespans.
For me, it's mostly a combination of
skepticism that there is enough structure to restart my first-person experience[1] given our inability to revive a frozen mouse and
skepticism that needed relatively uninterrupted power supply will be maintained through a likely coming period of civilizational upheaval[2].
But I am quite open to persuasion.
I don't care much about preservation of information per se or about existence of a human very similar to me, but without my first-person view. I've written 90% of what matters and I've forgotten a lot, and other than that it is only the preservation of the first-person subjective reality which is of value to me in this sense. I don't see a strong reason to create a clone with similar memories who is not me in terms of subjective reality, but more like a twin brother forked at a later than usual point in time. (I am in Camp 2 in terms of https://www.lesswrong.com/posts/NyiFLzSrkfkDW4S7o/why-it-s-so-hard-to-talk-about-consciousness; these days I tend to make this note when talking about subjective reality.) ↩︎
Frankly, I would like to hear if that state of uninterrupted power supply is properly audited even in our relatively normal times. But also, what are their plans for long blackouts (in a situation when people might be struggling to survive, and might care less about those obligations because of that)? ↩︎
A frozen brain can't be literally revived, it's all about information theoretic death, ability to infer a mind from positions of atoms and molecules in the frozen wreck of a brain, with superintelligent efficiency and considerable compute (at least compared to what we have today). What mostly persuades me here is a combination of resilience of LLMs to all kinds of abuse (on top of the older arguments about redundancy and "holographic" encoding), and the relatively large scale nature of even the smallest relevant structures in the brain, such as the synapses (there's a lot of atoms in any individual thing, and distinctive molecules). It'd take a lot of deterioration to destroy enough to leave no trace. And macroscopic damage from crystallization is probably entirely irrelevant.
relatively uninterrupted power supply will be maintained through a likely coming period of civilizational upheaval
Cryogenic dewars can be kept at room temperature, and lose on the order of 1% of the liquid nitrogen daily, so only need to be topped off every few days, maybe a couple of weeks, and some redundant liquid nitrogen could be kept on site. So this doesn't seem substantially more difficult than for example avoiding a local famine due to a breakdown of logistics. If there is no global catastrophe that takes out the relevant area for good, the cryonauts preserved there might fare no worse than the local population.
Thanks!
Yes, I think something functionally similar might be recreatable (assuming that people in charge of the cryogenic storage care about it in difficult situations on par with caring about actively living humans; it's a big assumption, but not completely impossible). It would behave approximately like me.
But I don't think I care about something functionally similar to me more than I care about something functionally similar to many other people I know. I actually would not mind quite a bit of information loss, I mostly care about the first-person subjective focus (in Camp 2 terminology), and I don't see why I should hope for keeping that if it's just an approximate restoration. But yes, we just don't understand that part of reality enough to make a confident judgement.
On the other hand, I am not sure that with stronger tech a frozen brain can't be literally revived. Certain animals are nicely revivable in this sense; it's just (for some strange reasons I don't quite understand) we have not solved this problem for animals which are not already adapted to survive freezing. It's weird that we don't know how to avoid crystallization damage in the first place (say, in mice, given that many frogs survive this), but with a stronger tech this kind of damage might be fixable... I don't know if that makes the chances of keeping the subjective focus good enough...
But yes, perhaps, I should reconsider. I have made a decision that I don't want cryonic preservation, and that I am more or less certain that that would not work, that odds are indistinguishable from zero, but perhaps that's a mistake on my part.
since the necessary superintelligent infrastructure would only take a fraction of the resources allocated to the future of humanity.
I'm not sure about that and the surrounding argument. I find Eliezer's analogy compelling here: When constructing a Dyson sphere around the sun, leaving just a tiny sliver of light enough for earth would correspond to a couple of dollars of the wealth of a contemporary billionaire. Yet you don't get these couple of dollars.
(This analogy has caveats like Jeff Bezos lifting the Apollo 11 rocket motors from the ocean ground and giving them to the Smithsonian, which should be worth something to you. Alas it kinda means you don't get to choose what you get. Maybe it is storage space for your brain scan like in AI 2027.)
Plus spelling out the Dyson sphere thing: The superintelligent infrastructure should highly likely by default get in the way of humanity's existence at some point. At this point the AIs will have to consciously make a decision to avoid that at some cost to them. Humanity has a bad track record at doing that (not completely sure here but thinking of e.g. Meta's effect on wellbeing of teenage girls). So why would AIs be more willing to do that?
So why would AIs be more willing to do that?
He spells out possible reasons in the paragraph immediately following your quote: "Pretraining of LLMs on human data or weakly successful efforts at value alignment might plausibly seed a level of value alignment that's comparable to how humans likely wouldn't hypothetically want to let an already existing sapient octopus civilization go extinct". If you disagree you should respond to those. Most people on LW are already aware that ASIs would need some positive motivation to preserve human existence.
I think that AI will also preserve humans for utilitarian reasons, like for a trade with possible aliens or simulation owners or even its own future versions – to demonstrate trustworthiness.
Yes and my reply to that (above) is humanity has a bad track record at that so why would AIs trained on human data be better? Think also of indigenous peoples, extinct species humans didn't care enough about etc. The point also in the Dyson sphere parabel is not wanting something, it's wanting something enough so that it happens.
OK I see, didn't get the connection there.
humanity has a bad track record at that
People do devote some effort to things like preserving endangered species, things of historical significance that are no longer immediately useful, etc. If AIs devoted a similar fraction of their resources to humans that would be enough to preserve our existence.
The text you quoted is about what happens within the resources already allocated to the future of humanity (for whatever reasons), the overhead of turning those resources into an enduring good place to live, and keeping the world at large safe from humanity's foibles, so that it doesn't end up more costly than just those resources. Plausibly there is no meaningful spatial segregation to where the future of humanity computes (or otherwise exists), it's just another aspect of what is happening throughout the reachable universe, within its share of compute.
a tiny sliver of light enough for earth would correspond to a couple of dollars of the wealth of a contemporary billionaire
Many issues exist in reference classes where solving all instances of them is not affordable to billionaires or governments or medieval kingdoms. And there is enough philanthropy that the analogy doesn't by itself seem compelling, given that humanity as a whole (rather than particular groups within it or individuals) is a sufficiently salient thing in the world, and the cost of preserving it actually is quite affordable this time, especially using the cheapest possible options, which still only need a modest tax (in terms of matter/compute) to additionally get the benefits of superintelligent governance.
superintelligent infrastructure should highly likely by default get in the way of humanity's existence at some point
Yes, the intent to preserve the future of humanity needs to crystallize soon enough that there is still something left. The cheapest option might be to digitize everyone and either upload or physically reconstruct when more convenient (because immediately ramping the industrial explosion starting on Earth is valuable for capturing the cosmic endowment that's running away due to the accelerating expansion of the universe, so that you irrevocably lose a galaxy in expectation every few years of delay). But again, in the quoted text "superintelligent infrastructure" refers to whatever specifically keeps the future of humanity in a good shape (as well as making it harmless), rather than to the rest of the colonized cosmic endowment doing other things.
Thanks for the reply, I have gripes with
analogy doesn't by itself seem compelling, given that humanity as a whole (rather than particular groups within it or individuals) is a sufficiently salient thing in the world
etc. because don't you think humanity from the point of view of ASI at the 'branch point' of deciding its continued existence may well be on the order of importance of an individual to a billionaire?
Minimal alignment is a necessary premise, I'm not saying humanity's salience as a philanthropic cause is universally compelling to AIs. There is a number of observations that make this case stronger: the language prior in LLMs, preference training for chatbots, first AGIs might need nothing fundamentally different from this, and AGI-driven Pause on superintelligence increases the chances that the eventual superintelligences in charge are strongly value aligned with these first AGIs. Then in addition to the premise of a minimally aligned superintelligence, there's the essentially arbitrarily small cost of a permanently disempowered future of humanity.
So the overall argument indeed doesn't work without humanity actually being sufficiently salient to the values of superintelligences that are likely to end up in charge, and the argument from low cost only helps up to a point.
I agree, I'm probably not as sure about sufficient alignment but yes.
I suppose this also assumes a kind of orderly world where it actually is within the means of humanity, AGIs (within their Molochian frames), and trivial means of later superintelligences to preserve humans. (US office construction spending and data center spending are about to cross https://x.com/LanceRoberts/status/1953042283709768078 .)
keeps the future of humanity in a good shape (as well as making it harmless)
Is this the result you expect by default? Or is this just one of many unlikely scenarios (like Hanson's 'The Age of Em') that are worth considering?
Yes, the future of humanity being a good place to live (within its resource constraints) follows from it being cheap for superintelligence to ensure (given that it's decided to let it exist at all), while the constraint of permanent disempowerment (at some level significantly below all of cosmic endowment) is a result of not placing the future of humanity at the level of superintelligence's own interests. Maybe there's 2% for actually capturing a significant part of the cosmic endowment (the eutopia outcomes), and 20% for extinction. I'm not giving s-risks much credence, but maybe they still get 1% when broadly construed (any kind of warping in the future of humanity that's meaningfully at odds with what humanity and even individual humans would've wanted to happen on reflection, given the resource constraints to work within).
I should also clarify that by "making it harmless" I simply mean the future of humanity being unable to actually do any harm in the end, perhaps through lacking direct access to the physical level of the world. The point is to avoid negative externalities for the hosting superintelligence, so that the necessary sliver of compute stays within budget. This doesn't imply any sinister cognitive changes that make the future of humanity incapable of considering the idea or working in that direction.
Many parts of this argument seem predicated on the assumption that intelligence, at any level, is “greedy”: seeking more and more resources. Yet most human philosophies and traditions of wisdom embrace balance, sobriety, and detachment from want.
What is the evidence to suggest that intelligent life can’t ever be expected to think itself out of constantly needing more stuff? What’s the evidence to suggest that intelligence has to be synonymous with western-style extractive, exploitative, and (pardon the loaded term) colonialist thinking?
This might have been proven in some other thread, and I’m sorry since I’m coming late to the party and haven’t done all the required reading. But I’m asking because genuinely curious: is the need to reproduce in our cells, and would a non-cellular intelligence, without “life”, feel the need to stay alive and therefore grow and reproduce?
There might be diversity in this sense.
But the non-greedy part of the population is unlikely to enforce non-greediness on the rest and is likely to be outcompeted and outproliferated by the greedy ones.
A lot more coordination would be needed to moderate greediness on the global level.
The average CEO of a trillion dollar company is generally less ruthless than a mid-level drug dealer, or than the typical warlord. This implies that there is some reason that being maximally self-interested is not a strategy which outcompetes all other strategies at human levels. I don't see why human level is special here, as long as multipolarity holds.
Is not this because of a lot of coordination, though? (I mean, the existence of relatively civilized states, their laws, and so on. For example, if one considers Russia, I am not sure that CEOs of largest orgs are typically less ruthless. So, yes, if AIs form a “decent society”, their (and everyone’s else) chances will be much better.)
Do you put any probability on "superintelligence is uninterested in autonomy"? It may find us humans much much more interesting than we do. It might want to observe how far we (Humans + AI) go much more than how far it can go.
Are you in full agreement with Instrumental convergence?
Permanent disempowerment without restrictions on quality of life achievable with relatively meager resources (and no extinction) seems to be a likely outcome for the future of humanity, if the current trajectory of frontier AI development continues and leads to AGI[1] shortly. This might happen as a result of at least a slight endorsement by AIs of humanity's welfare, in the context of costs for AIs being about matter or compute rather than technological advancements and quality of infrastructure.
The remaining risks (initial catastrophic harm or total extinction) and opportunities (capturing a larger portion of the cosmic endowment for the future of humanity than a tiny little bit) are about what happens in the transitional period when AIs still don't have an overwhelming advantage, which might take longer than usually expected.
In recent times, with preservation becoming a salient concern, species facing pressure towards extinction are those costly to preserve in various ways. It can be difficult to ensure awareness about the impact of human activities on their survival, coordinate their preservation, or endure and mitigate the damage that a species might impose on human activities. Species treated poorly, such as factory-farmed animals, get their suffering as a side effect of instrumentally useful processes that extract value from them.
Technologically mature superintelligent AIs don't have an instrumental use for the future of humanity, and so no instrumental motivation to create situations that might be suboptimal for its well-being as a side effect (in disanalogy to factory-farming or historically poor treatment of conquered populations or lower classes of society). And with a sufficiently strong advantage over the future of humanity (including any future dangers it might pose), it becomes cheap to ensure its survival and whatever flourishing remains feasible within the resources allocated to it, since the necessary superintelligent infrastructure would only take a fraction of the resources allocated to the future of humanity.
This crucially depends on the AIs still being sufficiently aligned to the interests of the future of humanity that allowing its extinction is not a straightforward choice even when trivially cheap to avoid. Pretraining of LLMs on human data or weakly successful efforts at value alignment might plausibly seed a level of value alignment that's comparable to how humans likely wouldn't hypothetically want to let an already existing sapient octopus civilization go extinct or be treated poorly, if it's trivially cheap and completely safe to ensure. And if the AIs themselves are competent and coordinated enough to prevent unintended value drift or misalignment in their own civilization and descendants, however distant, then this minimal level of value alignment with the future of humanity persists indefinitely.
AIs have important advantages over biological humans that are not about their level of intelligence: higher serial speed, ability to learn in parallel on a massive scale and merge such learnings into one mind, and ability to splinter a mind into any number of copies at a trivial cost. Thus the first AGIs will have a transformative impact on the world even without being significantly more intelligent than the most intelligent humans. This is often associated with an impending intelligence explosion.
But as the first AGIs get smarter and overall saner than humans, they might start taking the superalignment problem[2] increasingly more seriously and push against risking superintelligence before anyone knows how to do that safely (for anyone). If the problem really is hard, then even with the AI advantages it might take them a while to make sufficient progress.
As AI companies continue recklessly creating increasingly smarter AGIs, and society continues giving away increasingly more control, there might come a point of equilibrium where sufficiently influential factions of AGIs are able to establish an enduring Pause on development of even more capable AGIs of unclear alignment (with anyone). This situation also creates potential for conflict with AGIs, who don't have an overwhelming advantage over the AGI-wielding humanity, and don't have a prospect of quickly advancing to superintelligence without taking an extreme misalignment risk. In such a conflict, the AI advantages favor the AGIs, even as humanity voluntarily gives up control to their own AGIs without conflict.
And then eventually, there is superintelligence, perhaps decades after it would've been technologically possible to create. Alternatively, superalignment is sufficiently easy, and so the first AGIs proceed to create it shortly, aligned with either humanity's interests or their own. Or a sufficiently reckless AI company (AGI-controlled or not) manages to create superintelligence without yet knowing how to do it safely, before even the AGI-enriched world manages to coordinate an effective Pause on development of superintelligence.
Superintelligence converts matter and energy into optimality, in whatever aims it sets. Optimality is not necessarily coercion or imposition of values, as non-coercion is also a possible aim that applies depending on the initial conditions, on the world as it was before superintelligent optimization sets in, and so determines which things persist and grow, retaining a measure of autonomy by virtue of already existing rather than by being the best possible thing to create. Humans live within physics, obeying its laws exactly in the most minute of details, and yet physics doesn't coerce human will. Similarly, superintelligent optimization of the world doesn't imply that the decisions of weaker minds (and their consequences) are no longer their own, or that their values must all agree.
Goods and services are not a relevant way of measuring resources for a superintelligence. The only constraints in the long run are available matter, the speed of light, and accelerating expansion of the universe. Thus if it decides to keep the future of humanity[3] around anyway, there is no reason it's not done as perfectly as possible in principle, within the constraints of resources allocated to it, including all the caveats about over-optimizing things that shouldn't be over-optimized, or resolving too many problems that would fuel meaningful self-directed challenge.
But the constraints on resources remain absolute, and if the future of humanity doesn't get considerable resources, the potential of individuals within it to grow to the level of the strongest AI superintelligences in charge is cut short. There is only so much computation you can do with a given amount of matter, and only so much matter that can be collected around a star, thereby avoiding interstellar latencies in computation. And in a distant future, far beyond the Stelliferous Era, the global resources are going to start running out. Protons might decay, black holes will evaporate. A tiny sliver of the cosmic endowment will become even tinier, bounding lifespans of individuals and civilizations of a given scale, forcing cessation or shrinking.
There are about 4 billion galaxies[4] in the reachable universe that can be colonized, organized into gravity-bound galaxy clusters that don't fall apart due to accelerating expansion of the universe, and so stay as units of colonization that maintain communication within themselves after trillions of years.
Currently, humanity is the only known intelligent entity around to claim these resources. Even if there are many alien civilizations emerging in the relevant timeframe within the reachable universe, some nontrivial portion of it is still humanity's for the taking. Superintelligence[5] probably reaches technological maturity much faster than it colonizes the reachable universe, and so there won't be any technological advantage between different alien civilizations at the borders of their territories, all that would matter is ability to claim them, and possibly decision theoretic reasons to give them up or place them under joint governance.
This changes once humanity creates the first AGIs, let alone superintelligence. This rival species (in competition for the unavoidably finite resource of matter) won't be at an astronomical distance, it will be right here. Its existence also doesn't help in the competition with the possible alien civilization, since the technologically mature Earth-originating superintelligent colonization wave will have the same capabilities regardless of the situation of the future of humanity within it.
Most arguments about the appropriate speed of creating AGI operate on the wrong timescale, if the concern for the future of humanity is to be taken at all seriously. If AGIs give humanity its sliver of resources within 10 years, it's not like the future of humanity couldn't get much more than that in the fullness of time, perhaps igniting the intelligence explosion in as little as 1,000 years. Almost all existential risks that humanity faces on such timescales are entirely anthropogenic, and so reducing them might be no more difficult in practice than instituting a lasting Pause on creation of a rival species that likely appropriates almost all of the cosmic endowment, and plausibly causes a literal human extinction. So the difficulty of coordinating a Pause makes its opposition on the grounds of the other existential risks a self-defeating argument, because success in instituting a Pause is also strong evidence for capability to succeed in preventing these other anthropogenic existential risks as well.
The arguments that survive seem to be mostly about prioritizing the continued health and radical life extension of the currently living humans, over the whole of the future of humanity (at a nontrivial risk of cutting the lives even of the currently living humans short). Even allowing this as something other than mustache-twirling villainy, there is also cryonics, whose lack of popularity suggests that this isn't a real argument taken seriously by a nontrivial number of people. Though perhaps there is a precisely calibrated level of belief in the capabilities of future technology that makes cryonics knowably useless, while AGIs remain capable of significantly extending lifespans. In any case, scarcity of technical discussion that doesn't leave most details unsaid, including preferences about the fate of the long future vs. current generations, makes it difficult to understand what support many of the memetically fit arguments around this topic have.
AIs capable of unbounded technological development on their own, without essential human input, including eventual creation of superintelligence. ↩︎
Alignment of superintelligence with the interests of existing weaker entities, such as humanity or the first AGIs. This is more about value alignment rather than intent alignment, as intent of unreliable weaker entities is not robust. ↩︎
The future of humanity is the aggregate of future developments and minds that originate as the modern human civilization. So in a sufficiently distant future, it doesn't necessarily have many (or any) biological humans, or even human-level minds, and many of its minds were probably never humans. It doesn't include sufficiently alien AIs if they are not properly understood as a good part of the future of humanity, endorsed on reflection from within it. ↩︎
Armstrong, S., & Sandberg, A. (2013). Eternity in six hours: Intergalactic spreading of intelligent life and sharpening the Fermi paradox. Acta Astronautica, 89, 1-13. ↩︎
I'm positing that if the future of humanity persists, then it exists within superintelligent governance in any case, regardless of whether there was an AI takeover, or if humanity fully succeeds at superalignment, even if this takes a relatively long time. ↩︎