All of Sebastian_Hagen's Comments + Replies

In parallel, if I am to compare two independent scenarios, the at-least-one-in-ten-billion odds that I'm hallucinating all this, and the darned-near-zero odds of a Pascal's Mugging attempt, then I should be spending proportionately that much more time dealing with the Matrix scenario than that the Pascal's Mugging attempt is true

That still sounds wrong. You appear to be deciding on what to precompute for purely by probability, without considering that some possible futures will give you the chance to shift more utility around.

If I don't know anything ab... (read more)

I agree, but I think I see where DataPacRat is going with his/her comments. First, it seems as if we only think about the Pascalian scenarios that are presented to us. If we are presented with one of these scenarios, e.g. mugging, we should consider all other scenarios of equal or greater expected impact. In addition, low probability events that we fail to consider can possibly obsolete the dilemma posed by PM. For example, say a mugger demands your wallet or he will destroy the universe. There is a nonzero probability that he has the capability to destroy the universe, but it is important to consider the much greater, but still low, probability that he dies of a heart attack right before your eyes.

Continuity and independence.

Continuity: Consider the scenario where each of the [LMN] bets refer to one (guaranteed) outcome, which we'll also call L, M and N for simplicity.

Let U(L) = 0, U(M) = 1, U(N) = 10**100

For a simple EU maximizer, you can then satisfy continuity by picking p=(1-1/10**100). A PESTI agent, OTOH, may just discard a (1-p) of 1/10**100, which leaves no other options to satisfy it.

The 10**100 value is chosen without loss of generality. For PESTI agents that still track probabilities of this magnitude, increase it until they don't.

Indepen... (read more)

Thus, I can never be more than one minus one-in-ten-billion sure that my sensory experience is even roughly correlated with reality. Thus, it would require extraordinary circumstances for me to have any reason to worry about any probability of less than one-in-ten-billion magnitude.

No. The reason not to spend much time thinking about the I-am-undetectably-insane scenario is not, in general, that it's extraordinarily unlikely. The reason is that you can't make good predictions about what would be good choices for you in worlds where you're insane and totally unable to tell.

This holds even if the probability for the scenario goes up.

/A/ reason not to spend much time thinking about the I-am-undetectably-insane scenario is as you describe; however, it's not the /only/ reason not to spend much time thinking about it. I often have trouble explaining myself, and need multiple descriptions of an idea to get a point across, so allow me to try again: There is roughly a 30 out of 1,000,000 chance that I will die in the next 24 hours. Over a week, simplifying a bit, that's roughly 200 out of 1,000,000 odds of me dying. If I were to buy a 1-in-a-million lottery ticket a week, then, by one rule of thumb, I should be spending 200 times as much of my attention on my forthcoming demise than I should on buying that ticket and imagining what to do with the winnings. In parallel, if I am to compare two independent scenarios, the at-least-one-in-ten-billion odds that I'm hallucinating all this, and the darned-near-zero odds of a Pascal's Mugging attempt, then I should be spending proportionately that much more time dealing with the Matrix scenario than that the Pascal's Mugging attempt is true; which works out to darned-near-zero seconds spent bothering with the Mugging, no matter how much or how little time I spend contemplating the Matrix. (There are, of course, alternative viewpoints which may make it worth spending more time on the low-probability scenarios in each case; for example, buying a lottery ticket can be viewed as one of the few low-cost ways to funnel money from most of your parallel-universe selves so that a certain few of your parallel-universe selves have enough resources to work on certain projects that are otherwise infeasibly expensive. But these alternatives require careful consideration and construction, at least enough to be able to have enough logical weight behind them to counter the standard rule-of-thumb I'm trying to propose here.)
In the I-am-undetectably-insane scenario, your predictions about the worlds where you're insane don't even matter, because your subjective experience doesn't actually take place in those worlds anyways.

It's the most important problem of this time period, and likely human civilization as a whole. I donate a fraction of my income to MIRI.

Which means that if we buy this [great filter derivation] argument, we should put a lot more weight on the category of 'everything else', and especially the bits of it that come before AI. To the extent that known risks like biotechnology and ecological destruction don't seem plausible, we should more fear unknown unknowns that we aren't even preparing for.

True in principle. I do think that the known risks don't cut it; some of them might be fairly deadly, but even in aggregate they don't look nearly deadly enough to contribute much to the great filter.... (read more)

This issue is complicated by the fact that we don't really know how much computation our physics will give us access to, or how relevant negentropy is going to be in the long run. In particular, our physics may allow access to (countably or more) infinite computational and storage resources given some superintelligent physics research.

For Expected Utility calculations, this possibility raises the usual issues of evaluating potential infinite utilities. Regardless of how exactly one decides to deal with those issues, the existence of this possibility does shift things in favor of prioritizing for safety over speed.

Infinity Shades for the win! Seriously though. I'm highly in favor of infinity shades. This whole "let's burn the universe searching for the Omega point or perpetual machines" makes me unhappy.

I used "invariant" here to mean "moral claim that will hold for all successor moralities".

A vastly simplified example: at t=0, morality is completely undefined. At t=1, people decide that death is bad, and lock this in indefinitely. At t=2, people decide that pleasure is good, and lock that in indefinitely. Etc.

An agent operating in a society that develops morality like that, looking back, would want to have all the accidents that lead to current morality to be maintained, but looking forward may not particularly care about how the rem... (read more)

That does not sound like much of a win. Present-day humans are really not that impressive, compared to the kind of transhumanity we could develop into. I don't think trying to reproduce entites close to our current mentality is worth doing, in the long run.

By "things like humans" I meant "things that have some of the same values or preferences."

While that was phrased in a provocative manner, there /is/ an important point here: If one has irreconcilable value differences with other humans, the obvious reaction is to fight about them; in this case, by competing to see who can build an SI implementing theirs first.

I very much hope it won't come to that, in particular because that kind of technology race would significantly decrease the chance that the winning design is any kind of FAI.

In principle, some kinds of agents could still coordinate to avoid the costs of that kind of outcome. In practice, our species does not seem to be capable of coordination at that level, and it seems unlikely that this will change pre-SI.

True, but it would nevertheless make for a decent compromise. Do you have a better suggestion?

It isn't much of a compromise. It presumes enough coherence for everyone to agree on a supreme power that all shall obey, to divide the cake and enforce the borders. To see how the situation is actually handled, look around you, at the whole world now and in the past. Whenever there is no common resolve to leave each other alone, then as has been observed of old, the strong do what they will and the weak bear what they must. Europe managed it in the Treaty of Westphalia, but it took thirty years of slaughtering each other for them to decide that no-one was ever going to win, and draw up a massive agreement to disagree. Best of luck getting any such agreement today between (for example) jihadists and everyone else. Or SJWs and neoreactionaries. Adding superintelligence would be like squabbling children getting hold of nuclear weapons.

allocating some defense army patrol keeping the borders from future war?

Rather than use traditional army methods, it's probably more efficient to have the SI play the role of Sysop in this scenario, and just deny human actors access to base-layer reality; though if one wanted to allow communication between the different domains, the sysop may still need to run some active defense against high-level information attacks.

That seems wrong.

As a counterexample, consider a hypothetical morality development model where as history advances, human morality keeps accumulating invariants, in a largely unpredictable (chaotic) fashion. In that case modern morality would have more invariants than that of earlier generations. You could implement a CEV from any time period, but earlier time periods would lead to some consequences that by present standards are very bad, and would predictably remain very bad in the future; nevertheless, a present-humans CEV would still work just fine.

I don't know what you mean by invariants, or why you think they're good, but: If the natural development from this earlier time period, unconstrained by CEV, did better than CEV from that time period would have, that means CEV is worse than doing nothing at all.

Perhaps. But it is a desperate move, both in terms of predictability and in terms of the likely mind crime that would result in its implementation, since the conceptually easiest and most accurate ways to model other civilizations would involve fully simulating the minds of their members.

If we had to do it, I would be much more interested in aiming it at slightly modified versions of humanity as opposed to utterly alien civilizations. If everyone in our civilization had taken AI safety more seriously, and we could have coordinated to wait a few hundred yea... (read more)

I agree, the actual local existence of other AIs shouldn't make a difference, and the approach could work equally either way. As Bostrom says on page 198, no communication is required.

Nevertheless, for the process to yield a useful result, some possible civilization would have to build a non-HM AI. That civilization might be (locally speaking) hypothetical or simulated, but either way the HM-implementing AI needs to think of it to delegate values. I believe that's what footnote 25 gets at: From a superrational point of view, if every possible civilization (or every one imaginable to the AI we build) at this point in time chooses to use an HM approach to value coding, it can't work.

If all civilizations HailMary to value-code they would all find out the others did the same and because the game doesn't end there, in round two they would decide to use a different approach. Possibly, like undifferentiated blastula cells use an environmental asymmetric element (gravity) to decide to start differentiating, AGI's could use local information to decide whether they should HailMary again on the second hypothetical round or if they should be the ones deciding for themselves (say information about where you are located in your Hubble volume, or how much available energy there still is in your light cone or something).

Powerful AIs are probably much more aware of their long-term goals and able to formalize them than a heterogenous civilization is. Deriving a comprehensive morality for post-humanity is really hard, and indeed CEV is designed to avoid the need of having humans do that. Doing it for an arbitrary alien civilization would likely not be any simpler.

Whereas with powerful AIs, you can just ask them which values they would like implemented and probably get a good answer, as proposed by Bostrom.

The Hail Mary and Christiano's proposals, simply for not having read about them before.

Davis massively underestimates the magnitude and importance of the moral questions we haven't considered, which renders his approach unworkable.

I feel safer in the hands of a superintelligence who is guided by 2014 morality, or for that matter by 1700 morality, than in the hands of one that decides to consider the question for itself.

I don't. Building a transhuman civilization is going to raise all sorts of issues that we haven't worked out, and do so quickly. A large part of the possible benefits are going to be contingent on the controlling system be... (read more)

One obvious failure mode would be in specifying which dead people count - if you say "the people described in these books," the AI could just grab the books and rewrite them. Hmm, come to think of it: is any attempt to pin down human preferences by physical reference rather than logical reference vulnerable to tampering of this kind, and therefore unworkable?

Not as such, no. It's a possible failure mode, similar to wireheading; but both of those are avoidable. You need to write the goal system in such a way that makes the AI care about the ori... (read more)

To the extent that CUs are made up of human-like entities (as opposed to e.g. more flexible intelligences that can scale to effectively use all their resources), one of the choices they need to make is how large an internal population to keep, where higher populations imply less resources per person (since the amount of resources per CU is constant).

Therefore, unless the high-internal-population CUs are rare, most of the human-level population will be in them, and won't have resources of the same level as the smaller numbers of people in low-population CUs.

This scenario is rather different than the one suggested by TedHowardNZ, and has a better chance of working. However:

Is there some reason to expect that this model of personhood will not prevail?

One of the issues is that less efficient CUs have to defend their resources against more efficient CUs (who spend more of their resources on work/competition). Depending on the precise structure of your society, those attacks may e.g. be military, algorithmic (information security), memetic or political. You'd need a setup that allows the less efficient CUs to ... (read more)

I am assuming (for now), a monopoly of power that enforces law and order and prevents crimes between C.U.s. I don't follow this. Can you elaborate?

Given a non-trivial population to start with, it will be possible to find people that will consent to copying given absolutely minimal (quite possibly none at all) assurances for what happens to their copy. The obvious cases would be egoists that have personal value systems that make them not identify with such copies; you could probably already find many of those today.

In the resulting low-wage environment, it will likewise be possible to find people who will consent to extensive modification/experimentation of their minds given minimal assurances for wha... (read more)

With near subsistence wages there's not much to donate, so no need to bother.

It is actually relatively easy to automate all the jobs that no-one wants to do, so that people only do what they want to do. In such a world, there is no need of money or markets.

How do you solve the issue that some people will have a preference for highly fast reproduction, and will figure out a way to make this a stable desire in their descendants?

AFAICT, such a system could only be stabilized in the long term by extremely strongly enforced rules against reproduction if it meant that one of the resulting entities would fall below an abundance wealth level, and that kind of rule enforcement most likely requires a singleton.

Is it feasible to make each "family" or "lineage" responsible for itself? You can copy yourself as much as you want, but you are responsible for sustaining each copy? Could we carry this further?: legally, no distinction is made between individuals and collections of copied individuals. It doesn't matter if you're one guy or a "family" of 30,000 people all copied (and perhaps subsequently modified) from the same individual: you only get one vote, and you're culpable if you commit a crime. How these collectives govern themselves is their own business, and even if it's dictatorial, you might argue that it's "fair" on the basis that copies made choices (before the split up) to dominate copies. If you're a slave in a dictatorial regime, it can only be because you're the sort of person who defects on prisoner dilemmas and seizes control when you can. Maybe when some members become sufficiently different from the overall composition, they break off and become their own collective? Maybe this happens only at set times to prevent rampant copying to swamp elections?

Their physical appearance and surroundings would be what we'd see as very luxurious.

Only to the extent that this does not distract them from work. To the extent that it does, ems that care about such things would be outcompeted (out of existence, given a sufficiently competitive economy) by ones that are completely indifferent to them, and focus all their mental capacity on their job.

Yes, the surroundings would need to be not overly distracting. But that is quite consistent with luxurious.

Adaption executers, not fitness maximizers. Humans probably have specific hard-coded adaptations for the appreciation of some forms of art and play. It's entirely plausible that these are no longer adaptive in our world, and are now selected against, but that this has not been the case for long enough for them to be eliminated by evolution.

This would not make these adaptations particularly unusual in our world; modern humans do many other things that are clearly unadaptive from a genetic fitness perspective, like using contraceptives.

These include powerful mechanisms to prevent an altruistic absurdity such as donating one's labor to an employer.

Note that the employer in question might well be your own upload clan, which makes this near-analogous to kin selection. Even if employee templates are traded between employers, this trait would be exceptionally valuable in an employee, and so would be strongly selected for. General altruism might be rare, but this specific variant would probably enjoy a high fitness advantage.

I like it. It does a good job of providing a counter-argument to the common position among economists that the past trend of technological progress leading to steadily higher productivity and demand for humans will continue indefinitely. We don't have a lot of similar trends in our history to look at, but the horse example certainly suggests that these kinds of relationships can and do break down.

Note that multipolar scenarios can arise well before we have the capability to implement a SI.

The standard Hansonian scenario starts with human-level "ems" (emulations). If from-scratch AI development turns out to be difficult, we may develop partial-uploading technology first, and a highly multipolar em scenario would be likely at that point. Of course, AI research would still be on the table in such a scenario, so it wouldn't necessarily be multipolar for very long.

Yes. The evolutionary arguments seem clear enough. That isn't very interesting, though; how soon is it going to happen?

The only reason it might not be interesting is because it's clear; the limit case is certainly more important than the timeline.

That said, I mostly agree. The only reasonably likely third (not-singleton, not-human-wages-through-the-floor) outcome I see would be a destruction of our economy by a non-singleton existential catastrophe; for instance, the human species could kill itself off through an engineered plague, which would also avoid this scenario.

Not necessarily, there may be not enough economic stability enough to avoid constant stealing, which would redistribute resources in dynamical ways. The limit case could never be reached if forces are sufficiently dynamic. If the "temperature" is high enough.

Intelligent minds always come with built-in drives; there's nothing that in general makes goals chosen by another intelligence worse than those arrived through any other process (e.g. natural selection in the case of humans).

One of the closest corresponding human institutions - slavery - has a very bad reputation, and for good reason: Humans are typically not set up to do this sort of thing, so it tends to make them miserable. Even if you could get around that, there's massive moral issues with subjugating an existing intelligent entity that would prefer n... (read more)

For a counterargument to your first claim, see the Wisdom of Nature paper by Bostrom and Sandberg (2009 I think).

So we are considering a small team with some computers claiming superior understanding of what the best set of property rights is for the world?

No. That would be worked out by the FAI itself, as part of calculating all of the implications of its value systems, most likely using something like CEV to look at humanity in general and extrapolating their preferences. The programmers wouldn't need to, and indeed probably couldn't, understand all of the tradeoffs involved.

If they really are morally superior, they will first find ways to grow the pie, then c

... (read more)

How do you know? It's a strong claim, and I don't see why the math would necessarily work out that way. Once you aggregate preferences fully, there might still be one best solution, and then it would make sense to take it. Obviously you do need a tie-breaking method for when there's more than one, but that's just an optimization detial of an optimizer; it doesn't turn you into a satisficer instead.

The more general problem is that we need a solution to multi-polar traps (of which superintelligent AI creationg is one instance). The only viable solution I've seen proposed is creating a sufficiently powerful Singleton.

The only likely viable ideas for Singletons I've seen proposed are superintelligent AIs, and a human group with extensive use of thought-control technologies on itself. The latter probably can't work unless you apply it to all of society, since it doesn't have the same inherent advantages AI does, and as such would remain vulnerable to bei... (read more)


Why? As you say, humans don't. But human minds are weird, overcomplicated, messy things shaped by natural selection. If you write a mind from scratch, while understanding what you're doing, there's no particular reason you can't just give it a single utility function and have that work well. It's one of the things that makes AIs different from naturally evolved minds.

This perfect utility function is an imaginary, impossible construction. It would be mistaken from the moment it is created. This intelligence is invariably going to get caught up in the process of allocating certain scarce resources among billions of people. Some of their wants are orthogonal. There is no doing that perfectly, only well enough. People satisfice, and so would an intelligent machine.

What idiot is going to give an AGI a goal which completely disrespects human property rights from the moment it is built?

It would be someone with higher values than that, and this does not require any idiocy. There are many things wrong with the property allocation in this world, and they'll likely get exaggerated in the presence of higher technology. You'd need a very specific kind of humility to refuse to step over that boundary in particular.

If it has goals which were not possible to achieve once turned off, then it would respect property rights fo

... (read more)
So we are considering a small team with some computers claiming superior understanding of what the best set of property rights is for the world? Even if they are generally correct in their understanding, by disrespecting norms and laws regarding property, they are putting themselves in the middle of a billion previously negotiated human-to-human disputes and ambitions, small and large, in an instant. Yes, that is foolish of them. Human systems like those which set property rights either change over the course of years, or typically the change is associated with violence. I do not see a morally superior developer + AGI team working so quickly on property rights in particular, and thereby setting off a violent response. A foolish development team might do that, but a wise team would roll the technology and the wrenching changes out gradually. If they really are morally superior, they will first find ways to grow the pie, then come back to changing how it gets divided up.

Any level of perverse instantiation in a sufficiently powerful AI is likely to lead to total UFAI; i.e. a full existential catastrophe. Either you get the AI design right so that it doesn't wirehead itself - or others, against their will - or you don't. I don't think there's much middle ground.

OTOH, the relevance of Mind Crime really depends on the volume. The FriendlyAICriticalFailureTable has this instance:

22: The AI, unknown to the programmers, had qualia during its entire childhood, and what the programmers thought of as simple negative feedback corr

... (read more)

"I need to make 10 paperclips, and then shut down. My capabilities for determining if I've correctly manufactured 10 paperclips are limited; but the goal imposes no penalties for taking more time to manufacture the paperclips, or using more resources in preparation. If I try to take over this planet, there is a significant chance humanity will stop me. OTOH, I'm in the presence of individual humans right now, and one of them may stop my current feeble self anyway for their own reasons, if I just tried to manufacture paperclips right away; the total pr... (read more)

If I understand you correctly, your proposal is to attempt to design obedient designs purely based on behavioral testing, without a clean understanding of safe FAI architecture (if you had that, why limit yourself to the obedient case?). Assuming I got that right:

The team continues rounds of testing until they identify some mind designs which have an extremely low likelihood of treacherous turn. These they test in increasingly advanced simulations, moving up toward virtual reality.

That kind of judgement sounds inherently risky. How do you safely distin... (read more)

Yes, the team I imagine is doing something dangerous. No, I am not imagining them using only behavioral testing in their design approach. I am trying to imagine them doing the best job they possibly can, both in design and testing, in order to see how much they can reduce the risk. They are in the early stages. They are working on partial minds, purposefully handicapped minds, AGI components, just the vision system, etc. Suppose that they know what they are doing is dangerous, and they are using their substantial brilliance to try to design, not just a boxing approach, but a testing method and roll-out mechanism that mitigates the danger. They have the following means of testing the AI in handicapped mode, at minimum: -Run it on slow hardware or clock down the processors -Allow it to use no more than (for instance) 1 gig of memory -Allow it to use no more than 1 trillion instructions, then the computer turns off -Expose it to limited data about the world, or no data about the real world -Provide it little or no information about its operators or the society they live in -Provide it little or no information about its own hardware or software architecture -If it shows evidence of having mastered a valuable sub-task (say, as examples, it can solve a real-world problem that, as far as we know, can only be solved by an attorney, using differential equations or advanced persuasive methods) turn it off. -Run in only in simulations and virtual realities -Tripwires connected to honeypots, and otherwise -Build it off-the-grid -Give it no objective function -Give it no access to methods to manipulate physical objects -All of the staff has gone through personality testing and a security clearance. -Very few, if any, of the staff know how to create all of the parts of the software or hardware. -No real-time communication with a person. -All communication with people is logged. ... The team seems to have a lot of tools to continue their work with limited risk. If they dep
What about hard wired fears, taboos and bad conscience triggers? Recapitulating Omohundro "AIs can monitor AIs" - assume to implement conscience as an agent - listening to all thoughts and taking action in case. For safety reasons we should educate this concience agent with utmost care. Conscience agent development is an AI complete problem. After development the conscience functionality must be locked against any kind of modification or disabling.

Relevant post: Value is Fragile. Truly Friendly goal systems would probably be quite complicated. Unless you make your tests even more complicated and involved (and do it in just the right way - this sounds hard!), the FAI is likely to be outperformed by something with a simpler utility function that nevertheless performs adequately on your test cases.

Yes, I agree that getting the right tests is probably hard. What you need is to achieve the point where the FAI's utility function + the utility function that fits the test cases compresses better than the unfriendly AI's utility function + the utility function that fits the test cases.
To prevent human children taking a treacherous turn we spend billions: We isolate children from dangers, complexity, perversitiy, drugs, porn, aggression and presentations of these. To create a utility function that covers many years of caring social education is AI complete. A utility function is not enough - we have to create as well the opposite: the taboo and fear function.

For example, if the AI was contained in a simulation, inside of which the AI was contained in a weak AI box, then it might be much more difficult to detect and understand the nature of the simulation than to escape the simulated AI box, which would signal treacherous turn.

That approach sounds problematic. Some of the obvious escape methods would target the minds of the researchers (either through real-time interaction or by embedding messages in its code or output). You could cut off the latter by having strong social rules to not look at anything beyo... (read more)

Approach #1: Goal-evaluation is expensive

You're talking about runtime optimizations. Those are fine. You're totally allowed to run some meta-analysis, figure out you're spending more time on goal-tree updating than the updates gain you in utility, and scale that process down in frequency, or even make it dependent on how much cputime you need for itme-critical ops in a given moment. Agents with bounded computational resources will never have enough cputime to compute provably optimal actions in any case (the problem is uncomputable); so how much you spe... (read more)

My reading is that what Bostrom is saying is that boundless optimization an easy bug to introduce, not that any AI has it automatically.

I wouldn't call it a bug, generally. Depending on what you want your AI to do, it may very well be a feature; it's just that there are consequences, and you need to take those into account when deciding just what and how much you need the AI's final goals to do to get a good outcome.

I think I see what you're saying, but I am going to go out on a limb here and stick by "bug." Unflagging, unhedged optimization of a single goal seems like an error, no matter what. Please continue to challenge me on this, and I'll try to develop this idea. Approach #1: I am thinking that in practical situations single-mindedness actually does not even achieve the ends of a single-minded person. It leads them in wrong directions. Suppose the goals and values of a person or a machine are entirely single-minded (for instance, "I only eat, sleep and behave ethically so I can play Warcraft or do medical research for as many years as possible, until I die") and the rest are all "instrumental." I am inclined to believe that if they allocated their cognitive resources in that way, such a person or machine would run into all kinds of problems very rapidly, and fail to accomplish their basic goal.. If you are constantly asking "but how does every small action I take fit into my Warcraft-playing?" then you're spending too much effort on constant re-optimization, and not enough on action. Trying to optimize all of the time costs a lot. That's why we use rules of thumb for behavior instead. Even if all you want is to be an optimal WarCraft player, it's better to just designate some time and resources for self-care or for learning how to live effectively with the people who can help. The optimal player would really focus on self-care or social skills during that time, and stop imagining WarCraft games for a while. While the optimal Warcraft player is learning social skills, learning social skills effectively becomes her primary objective. For all practical purposes, she has swapped utility functions for a while. Now let's suppose we're in the middle of a game of WarCraft. To be an optimal Warcraft player for more one game, we also have to have a complex series of interrupts and rules (smell smoke, may screw up important relationship, may lose job and therefore not be a

As to "no reason to get complicated", how would you know?

It's a direct consequence of the orthogonality thesis. Bostrom (reasonably enough) supposes that there might be a limit in the opposite direction - to hold a goal you do need to be able to model it to some degree, so agent intelligence may set an upper bound on the complexity of goals the agent can hold - but there's no corresponding reason for a limit in the opposite direction: Intelligent agents can understand simple goals just fine. I don't have a problem reasoning about what a cow is trying to do, and I could certainly optimize towards the same had my mind been constructed to only want those things.

I don't understand your reply. How would you know that there's no reason for terminal goals of a superintelligence "to get complicated" if humans, being "simple agents" in this context, are not sufficiently intelligent to consider highly complex goals?

I have doubts that goals of a superintelligence are predictable by us.

Do you mean intrinsic (top-level, static) goals, or instrumental ones (subgoals)? Bostrom in this chapter is concerned with the former, and there's no particular reason those have to get complicated. You could certainly have a human-level intelligence that only inherently cared about eating food and having sex, though humans are not that kind of being.

Instrumental goals are indeed likely to get more complicated as agents become more intelligent and can devise more involved schemes to ... (read more)

I mean terminal, top-level (though not necessarily static) goals. As to "no reason to get complicated", how would you know? Note that I'm talking about a superintelligence, which is far beyond human level.

You're suggesting a counterfactual trade with them?

Perhaps that could be made to work; I don't understand those well. It doesn't matter to my main point: even if you do make something like that work, it only changes what you'd do once you run into aliens with which the trade works (you'd be more likely to help them out and grant them part of your infrastructure or the resources it produces). Leaving all those stars on to burn through resources without doing anything useful is just wasteful; you'd turn them off, regardless of how exactly you deal with alien... (read more)

I am suggesting, that methastasis method of growth could be good for first multicell organisms, but unstable, not very succesful in evolution and probably refused by every superintelligence as malign.

I fully agree to you. We are for sure not alone in our galaxy.

That is close to the exact opposite of what I wrote; please re-read.

AGI might help us to make or world a self stabilizing sustainable system.

There are at least three major issues with this approach, any one of which would make it a bad idea to attempt.

  1. Self-sustainability is very likely impossible under our physics. This could be incorrect - there's always a chance our models are missing something crucial - but right now, the laws of thermodynamics strongly point at a world where you ne

... (read more)
Your argument we could be the first intelligent species in our past light-cone is quite weak because of the extreme extension. You are putting your own argument aside by saying: A time frame for our discussion is covering maybe dozens of millenia, but not millions of years. Milky way diameter is about 100,000 lightyears. Milky way and its satellite and dwarf galaxies around have a radius of about 900,000 lightyears (300kpc). Our next neighbor galaxy Andromeda is about 2.5 million light years away. If we run into aliens this encounter will be within our own galaxy. If there is no intelligent life within Milky Way we have to wait for more than 2 million years to receive a visitor from Andromeda. This weeks publication of a first image of planetary genesis by ALMA radio telescope makes it likely that nearly every star in our galaxy has a set of planets. If every third star has a planet in the habitable zone we will have in the order of 100 billion planets in our galaxy where life could evolve. The probability to run into aliens in our galaxy is therefore not neglectable and I appreciate that you discuss the implications of alien encounters. If we together with our AGIs decide against CE with von Neumann probes for the next ten to hundred millenia this does not exclude that we prepare our infrastructure for CE. We should not "leaving the resources around". If von Neumann probes were found too early by an alien civilization they could start a war against us with far superior technology. Sending out von Neumann probes should be postponed until our AGIs are absolutely sure that they can defend our solar system. If we have transformed our asteroid belt into fusion powered spaceships we could think about CE, but not earlier. Expansion into other star systems is a political decision and not a solution to a differential equation as Bostrum puts it.
Think prisoner's dilemma! What would aliens do? Is selfish (self centered) reaction really best possibitlity? What will do superintelligence which aliens construct? (no discussion that humans history is brutal and selfish)

FWIW, there already is one organization working specifically on Friendliness: MIRI. Friendliness research in general is indeed underfunded relative to its importance, and finishing this work before someone builds an Unfriendly AI is indeed a nontrivial problem.

So would be making international agreements work. Artaxerxes phrased it as "co-ordination of this kind would likely be very difficult"; I'll try to expand on that.

The lure of superintelligent AI is that of an extremely powerful tool to shape the world. We have various entities in this world... (read more)

Thanks Sebastian. I agree with your points and it scares me even more to think about the implications of what is already happening. Surely the US, China, Russia, etc., already realize the game-changing potential of superintelligent AI and are working hard to make it reality. It's probably already a new (covert) arms race. But this to me is very strong support for seeking int'l treaty solutions now and working very hard in the coming years to strengthen that regime. Because once the unfriendly AI gets out of the bag, as with Pandora's Box, there's no pushing it back in. I think this issue really needs to be elevated very quickly.

Novel physics research, maybe. Just how useful that would be depends on just what our physics models are missing, and obviously we don't have very good bounds on that. The obvious application is as a boost to technology development, though in extreme cases it might be usable to manipulate physical reality without hardware designed for the purpose, or escape confinement.

I think Bostrom wrote it that way to signal that while hist own position is that digital mind implementations can carry the same moral relevance as e.g. minds running on human brains, he acknowledges that there are differing opinions about the subject, and he doesn't want to entirely dismiss people who disagree.

He's right about the object-level issue, of course: Solid state societies do make sense. Mechanically embodying all individual minds is too inefficient to be a good idea in the long run, and there's no overriding reason to stick to that model.

I see no particular reason to assume we can't be the first intelligent species in our past light-cone. Someone has to be (given that we know the number is >0). We've found no significant evidence for intelligent aliens. None of them being there is a simple explanation, it fits the evidence, and if true then indeed the endowment is likely ours for the taking.

We might still run into aliens later, and either lose a direct conflict or enter into a stalemate situation, which does decrease the expected yield from the CE. How much it does so is hard to say; we have little data on which to estimate probabilities on alien encounter scenarios.

I fully agree to you. We are for sure not alone in our galaxy. But I disagree to Bostrums instability thesis either extinction or cosmic endowment. This duopolar final outcome is reasonable if the world is modelled by differential equations which I doubt. AGI might help us to make or world a self stabilizing sustainable system. An AGI that follows goals of sustainability is by far safer than an AGI thriving for cosmic endowment.
Load More