According to Eliezer, making AI safe requires solving two problems:

1) Formalize a utility function whose fulfillment would constitute "good" to us. CEV is intended as a step toward that.

2) Invent a way to code an AI so that it's mathematically guaranteed not to change its goals after many cycles of self-improvement, negotiations etc. TDT is intended as a step toward that.

It is obvious to me that (2) must be solved, but I'm not sure about (1). The problem in (1) is that we're asked to formalize a whole lot of things that don't look like they should be necessary. If the AI is tasked with building a faster and more efficient airplane, does it really need to understand that humans don't like to be bored?

To put the question sharply, which of the following looks easier to formalize:

a) Please output a proof of the Riemann hypothesis, and please don't get out of your box along the way.

b) Please do whatever the CEV of humanity wants.

Note that I'm not asking if (a) is easy in absolute terms, only if it's easier than (b). If you disagree that (a) looks easier than (b), why?

New Comment
78 comments, sorted by Click to highlight new comments since: Today at 11:03 PM
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings
[-][anonymous]13y130

It strikes me that this is the wrong way to look at the issue.

The problem scenario is if someone, anywhere, develops a powerful AGI that isn't safe for humanity. How do you stop the invention and proliferation of an unsafe technology? Well, you can either try to prevent anybody from building an AI without authorization; or you can try to make your own powerful friendly AGI before anybody else gets unfriendly AGI. The latter has the advantage that you only have to be really good at technology, you don't have to enforce an unenforceable worldwide law.

Building an AI that doesn't want to get out of its box doesn't solve the problem that somewhere, somebody may build an AI that does want to get out of its box.

-8timtyler13y

I think that a is just a special case of a narrow AI.

Like, GAI is dangerous because it can do anything, and would probably ruin this section of the universe for us if its goals were misaligned with ours.

I'm not sure if GAI is needed to do highly domain-specific tasks like a.

4cousin_it13y
Yeah, this looks right. I guess you could rephrase my post as saying that narrow AI could solve most problems we'd want an AI to solve, but with less danger than the designs discussed on LW (e.g. UDT over Tegmark multiverse).
5Vladimir_Nesov13y
That's what evolution was saying. Since recently I expect narrow AI developments to be directly on track to an eventual intelligence explosion.
5cousin_it13y
What narrow AI developments do you have in mind?
-1Dr_Manhattan13y
Who's 'evolution'?
1Dr_Manhattan13y
Apparently whoever downvoted understood what Vladimir was saying, can you please explain? I can't parse "what evolution was saying".
8FAWS13y
Vladimir's writing style has high information density, but he leaves the work of unpacking to the reader. In this context "that's what evolution was saying" seems to be a shorthand for something like: Evolution optimized for goals that did not necessarily imply general intelligence, nor did evolution ever anticipate creating a general intelligence. Nevertheless a general intelligence appeared as the result of evolution's optimizations. By analogy we should be not be too sure about narrow AI developments not leading to AGI.
3Dr_Manhattan13y
Ah. This seems about right, though I think Vladimir's statement was denser either denser and/or more ambiguous than usual.

Don't preemptively refer to anyone who disagrees with you as brainwashed.

0[anonymous]13y
It seems I am missing something obvious -- to what part, in what way, are you referring to the article? (genuine question, not-a-rant) (edit)Ok, I got the original wording from the comments below. Stupid me.
0[anonymous]13y
I'd be more interested in hearing actual arguments...

It might be worth noting that I often phrase questions as "how would we design an FAI to think about that" not because I want to build an FAI, but because I want the answer to some philosophical question for myself, and phrasing it in terms of FAI seems to be (1) an extremely productive way of framing the problem, and (2) generates interest among those who have good philosophy skills and are already interested in FAI.

ETA: Even if we don't build an FAI, eventually humanity might have god-like powers, and we'd need to solve those problems to figure out what we want to do.

If you figured out artificial general intelligence that is capable of explosive recursive self-improvement and know how to achieve goal-stability and know how to constrain it then you ought to concentrate on taking over the universe because of the multiple discovery hypothesis and that you can't expect other humans to be friendly.

5CuSithBell13y
Why is this downvoted? Isn't this one of the central theses of FAI?
2XiXiDu13y
Possible reasons: * I implicitly differentiated between AGI in general and the ability to recursively self-improve (which is usually lumped together on LW). I did this on purpose. * I included the ability to constrain such an AGI as a prerequisite to run it. I did this on purpose because friendliness is not enough if the AGI is free to hunt for vast utilities irregardless of tiny probabilities. Even an AGI equipped with perfect human-friendliness might try to hack the Matrix to support 3^^^^3 people rather than just a galactic civilisation. This problem isn't solved and therefore, as suggested by Yudkowsky, it needs to be constrained using a "hack". * I used the phrasing "taking over the universe" which is badly received yet factually correct if you got a fooming AI and want to use it to spawn a positive Singularity. * I said that you can't expect other humans to be friendly which is not the biggest problem, it is stupidity. * I said one "ought" to concentrate on taking over the universe. I said this on purpose to highlight that I actually believe that to be the only sensible thing to do once fooming AI is possible because if you waste too much time with spatiotemporal bounded versions then someone who is ignorant of friendliness will launch one that isn't constrained that way. * The comment might have been deemed unhelpful because it added nothing new to the debate. That's my analysis of why the comment might have initially been downvoted. Sadly most people who downvote don't explain themselves, but I decided to stop complaining about that recently.
1CuSithBell13y
Awesome, thanks for the response. Do you know if there's been any progress on the "expected utility maximization makes you do arbitrarily stupid things that won't work" problem? Though, stupidity is a form of un-Friendliness, isn't it?
0XiXiDu13y
I only found out about the formalized version of that dilemma around a week ago. As far as I can tell it has not been shown that giving in to a Pascal's mugging scenario would be irrational. It is merely our intuition that makes us believe that something is wrong with it. I am currently far too uneducated to talk about this in detail. What I am worried about is that basically all probability/utility calculations could be put into the same category (e.g. working to mitigate low-probability existential risks), where do you draw the line? You can be your own mugger if you weigh in enough expected utility to justify taking extreme risks.
0jimrandomh13y
There's a formalization I gave earlier that distinguishes Pascal's Mugging from problems that just have big numbers in them. It's not enough to have a really big utility; a Pascal's Mugging is when you have a statement provided by another agent, such that just saying a bigger number (without providing additional evidence) increases what you think your expected utility is for some action, without bound. This question has resurfaced enough times that I'm starting to think I ought to expand that into an article.
0JamesAndrix13y
Minor correction: It may need a hack if it remains unsolved.
0[anonymous]13y
My actions in this scenario depend on other factors, like how much time I have. If I had reasonable confidence of e.g. a month's head start over other groups, I'd spend the month trying to work out some way to deter other groups from launching, because I prefer the world where no one launches to the world where I take over. I commented to that effect sometime ago.

I'm not sure about the Riemann hypothesis since there's a likely chance that RH is undecidable in ZFC. But this might be more safe if one adds a time limit to when one wants the answer by.

But simply in terms of specification I agree that formalizing "don't get out of your box" is probably easier than formalizing what all of humanity wants.

2AlephNeil13y
Why? I know certain people (i.e. Chaitin, who's a bit cranky in this regard) have toyed around with the idea, but is there any reason to believe it?
2JoshuaZ13y
Not any strong one. We do know that some systems similar to the integers have their analogs to be false, but for most analogs (such as the finite field case) it seems to be true. That's very weak evidence for undecidability. However, I was thinking more in contrast to something like the classification of finite simple groups as of 1975 where there was a general program of what to do that had no obvious massive obstructions.

making AI friendly requires solving two problems

The goal is not to "make an AI friendly" (non-lethal), it's to make a Friendly AI. That is, not to make some powerful agent that doesn't kill you (and does something useful), but make an agent that can be trusted with autonomously building the future. For example, a merely non-lethal AI won't help with preventing UFAI risks.

So it's possible that some kind of Oracle AI can be built, but so what? And the risk of unknown unknowns remains, so it's probably a bad idea even if it looks provably safe.

6Lightwave13y
Doesn't this also apply to provably friendly Friendly AI? Perhaps even more so, given that it is a project of higher complexity.
3Vladimir_Nesov13y
With FAI, you have a commensurate reason to take the risk.
4Lightwave13y
Sure, but if the Oracle AI is used as a stepping stone towards FAI, then you also have a reason to take the risk. I guess you could argue that the risk of Oracle + Friendly AI is higher than just going straight for FAI, but you can't be sure how much the FAI risk could be mitigated by the Oracle AI (or any other type of not-so-powerful / constrained / narrow-domain AI). At least it doesn't seem obvious to me.
2Vladimir_Nesov13y
To the extent you should expect it to be useful. It's not clear in what way it can even in principle help with specifying morality. (See also this thread.) Assume you have a working halting oracle. Now what? (Actually you could get inside to have infinite time to think about the problem.)
-1ShardPhoenix13y
I think he means Oracle as in general powerful question-answer, not as in a halting oracle. A halting oracle could be used to answer many mathematical questions (like the aforementioned Riemann Hypothesis) though.
2Vladimir_Nesov13y
I know he doesn't mean a halting oracle. A halting oracle is a well-specified superpower that can do more than real Oracles. The thought experiment I described considers an upper bound on usefulness of Oracles.
2timtyler13y
I figure we will build experts and forecasters before both oracles and full machine intelligence. That will be good - since forecasters will help to give us foresight - which we badly need. Generally speaking, replacing the brain's functions one-at-a-time seems more desirable than replacing them all-at-once. It is likely to result in a more gradual shift, and a smoother transfer - with a reduced chance of the baton getting dropped during the switch over.
2benelliott13y
If we get a working Oracle AI, couldn't we just ask it how to build an FAI. I just don't think this is of much use since the Oracle route doesn't really seem much easier than the FAI route.
5Vladimir_Nesov13y
No, it won't know what you mean. Even you don't know what you mean, which is part of the problem.
-1timtyler13y
Experts and general forecasters are easier to build than general intelligent agents - or so I argue in my section on Machine Forecasting Implications. That is before we even get to constraints on how we want them to behave. At a given tech level, if you trying to use an use an general oracle on its own to create a general intelligence would probably produce a less intelligent agent than could be produced by other means, using a broader set of tools. An oracle might well be able to help, though.
2Alexandros13y
If: (1) There is a way to make an AI that is useful and provably not-unfriendly (2) This requires a subset of the breakthroughs required for a true FAI (3) It can be used to provide extra leverage towards building a FAI (i.e. using it to generate prestige and funds for hiring and training the best brains available. How? Start by solving protein folding or something.) Then this safe & useful AI should certainly be a milestone on the way towards FAI.
0Vladimir_Nesov13y
Just barely possible, but any such system is also a recipe for destroying the universe, if mixed in slightly different proportions. Which on the net makes the plan wrong (destroy-the-universe wrong).
6Alexandros13y
I just don't think that this assertion has been adequately backed up.

The primary task that EY and SIAI have in mind for Friendly AI is "take over the world". (By the way, I think this is utterly foolish, exactly the sort of appealing paradox (like "warring for peace") that can nerd-snipe the best of us.)

To some extent technolology itself (lithography, for example) is actually Safe technology, (or BelievedSafe technology). As part of the development of the technology, we also develop the safety procedures around it. The questions and problems about "how should you correctly draw up a contract with th... (read more)

2cousin_it13y
Could you explain this in more detail?
2Johnicholas13y
As I understand it, EY worked through a chain of reasoning about a decade ago, in his book "Creating Friendly AI". The chain of reasoning is long and I won't attempt to recap it here, but there are two relevant conclusions. First, that self-improving artificial intelligences are dangerous, and that projects to build self-improving artificial intelligence, or general intelligence that might in principle become self-modifying (such as Goertzel's), are increasing existential risk. Second, that the primary defense against self-improving artificial intelligences is a Friendly self-improving artificial intelligence, and so, in order to reduce existential risk, EY must work on developing (a restricted subset of) self-improving artificial intelligence. This seems nigh-paradoxical (and unnecessarily dramatic) to me - you should not do , and yet EY must do . As I said before, this "cancel infinities against one another" sort of thinking (another example might be MAD doctrine), has enormous appeal to a certain (geeky) kind of person. The phenomenon is named "nerd-sniping" in the xkcd comic: http://xkcd.com/356/ Rather than pursuing Friendly AGI vigorously as last/best/only hope for humanity, we should do at least two things: 1. Look hard for errors in the long chain of reasoning that led to these peculiar conclusions, on the grounds that reality rarely calls for that kind of nigh-paradoxical action, and it's far more likely that either all AI development is generally a good thing for existential risks, or all AI development is a generally bad thing for existential risks - EY shouldn't get any special AI-development license. 2. Look hard for more choices - for example, building entities that are very capable at defeating rogue Unfriendly AGI takeoffs, and yet which are not themselves a threat to humanity in general, nor prone to hard takeoffs. It may be difficult to imagine such entities, but all the reduce-existential-risk tasks are very difficult.
5TheOtherDave13y
In my experience, reality frequently includes scenarios where the best way to improve my ability to defend myself involves also improving my ability to harm others, should I decide to do that. So it doesn't seem that implausible to me. Indeed, militaries are pretty much built on this principle, and are fairly common. But, sure... there are certainly alternatives.
3Johnicholas13y
I am familiar with the libertarian argument that if everyone has more destructive power, the society is safer. The analogous position would be that if everyone pursues (Friendly) AGI vigorously, existential risk would be reduced. That might well be reasonable, but as far as I can tell, that's NOT what is advocated. Rather, we are all asked to avoid AGI research (and go into software development and make money and donate? How much safer is general software development for a corporation than careful AGI research?) and instead sponsor SIAI/EY doing (Friendly) AGI research while SIAI/EY is fairly closed-mouth about it. It just seems to me like it would take a terribly delicate balance of probabilities to make this the safest course forward.
2cousin_it13y
I have similar misgivings, they prompted me to write the post. Fighting fire with fire looks like a dangerous idea. The problem statement should look like "how do we stop unfriendly AIs", not "how do we make friendly AIs". Many people here (e.g. Nesov and SarahC) seem convinced that the latter is the most efficient way of achieving the former. I hope we can find a better way if we think some more.
9Wei Dai13y
If the universe is capable of running super-intelligent beings, then eventually either there will be one, or civilization will collapse. Maintaining the current state where there are no minds more intelligent than base humans seems very unlikely to be stable in the long run. Given that, it seems the problem should be framed as "how do we end up with a super-intelligent being (or beings) that will go on to rearrange the universe the way we prefer?" which is not too different from "how do we make friendly AIs" if we interpret things like recursively-improved uploads as AIs.

The Riemann hypothesis seems like a special case, since it's a purely mathematical proposition. A real world problem is more likely to require Eliezer's brand of FAI.

Also, I believe solving FAI requires solving a problem not on your list, namely that of solving GAI. :-)

If you disagree that (a) looks easier than (b), congratulations, you've been successfully brainwashed by Eliezer :-)

This was supposed to be humour, right?

2cousin_it13y
OK, that didn't come across as intended. Edited the post. It seems to me that human engineers don't spend a lot of time thinking about the value of boredom or the problem of consciousness when they design airplanes. Why should an AI need to do that? If the answer involves "optimizing too hard", then doesn't the injunction "don't optimize too hard" look easier to formalize than CEV?
4timtyler13y
"Don't optimise for too long" looks easier to formalise. Or so I argued here.
0Vladimir_Nesov13y
Injecting randomness doesn't look like a property of reasoning that would stand (or, alternatively, support) self-modification. This leaves the option of limiting self-modification (for the same reason), although given enough time and sanity even a system with low optimization pressure could find a reliable path to improvement.
0cousin_it13y
Superintelligence isn't a goal in itself. I'll take super-usefulness over superintelligence any day. I know you want to build superintelligence because otherwise someone else will, but the same reasoning was used to justify nuclear weapons, so I suspect we should be looking for other ways to save the world. (I see you've edited your comment. My reply still applies, I think.)
2timtyler13y
Are you arguing that the USA should not have developed nuclear weapons? Use of nuclear weapons is often credited with shortening the war - and saving many lives - e.g. see here:
0sark13y
Well, that was what in fact happened. But what could have happened was perhaps a nuclear war leading to "significant curtailment of humankind's potential". cousin_it's point was that perhaps we should not even begin the arms race. Consider the Terminator scenario where they send the terminator back in time to fix things, but this sending back of the terminator is precisely what provided the past with the technology that will eventually lead to the cataclysm in the first place. EDIT: included Terminator scenario
-1Vladimir_Nesov13y
Of course. But super-usefulness unfortunately requires superintelligence, and superintelligence is super-dangerous. Limited intelligence gives only limited usefulness, and in the long run even limited intelligence would tend to improve its capability, so it's not reliably safe. And not very useful. Someone will eventually make an intelligence explosion that destroys the world. That would be bad. Any better ideas on how to mitigate the problem? This is an analogy that you use as an argument? As if we don't already understand the details of the situation a few levels deeper than is covered by the surface similarity here. In making this argument, you appeal to intuition, but individual intuitions (even ones that turn out to be correct in retrospect or on reflection) are unreliable, and we should do better than that, find ways of making explicit reasoning trustworthy.
2[anonymous]13y
Is this not exactly the point that the cousin it is questioning in the OP? I'd think a "limited" intelligence that was capable of solving the Riemann hypothesis might also be capable of cracking some protein-folding problems or whatever.
1Vladimir_Nesov13y
If it's that capable, it's probably also that dangerous. But at this point the only way to figure out more about how it actually is, is to consider specific object-level questions about a proposed design. Absent design, all we can do is vaguely guess.
4cousin_it13y
No. We already have computers that help design better airplanes etc., and they are not dangerous at all. Sewing-Machine's question is right on. Building machines that help us solve intelligence-bound problems (even if these problems are related to the real world, like building better airplanes) seems to be massively easier than building machines that will "understand" the existence of the real world and try to take it over for whatever reason. Evidence: we have had much success with the former task, but practically no progress on the latter. Moreover, the latter task looks very dangerous, kinda like nuclear weaponry. Why do some people become so enamored with the singleton scenario that they can't settle for anything less? What's wrong with humans using "smart enough" machines to solve world hunger and such, working out any ethical issues along the way, instead of delegating the whole task to one big AI? If you think you need the singleton to protect you from some danger, what can be more dangerous than a singleton?
1Vladimir_Nesov13y
It's potentially dangerous, given the uncertainty about what exactly you are talking about. If it's not dangerous, go for it. Settling for something less than a singleton won't solve the problem of human-indifferent intelligence explosion. Another singleton, which is part of the danger in question.
1[anonymous]13y
There are already computer programs that have solved open problems, e.g. That was a much simpler and less interesting question than the Riemann Hypothesis, but I don't know that it's fundamentally different or less dangerous than what cousin it is proposing.
2Vladimir_Nesov13y
Yes, there are non-dangerous useful things, but we were presumably talking about AI capable of open-ended planning.

2) Invent a way to code an AI so that it's mathematically guaranteed not to change its goals after many cycles of self-improvement, negotiations etc. TDT is intended as a step toward that.

Only superficially. It would be possible to create an AI with said properties with CDT.

To put the question sharply, which of the following looks easier to formalize:

a) Please output a proof of the Riemann hypothesis, and please don't get out of your box along the way.

b) Please do whatever the CEV of humanity wants.

The difficulty level seems on the same order of magnitude.

5cousin_it13y
This looks suspicious. Imagine you didn't know about Risch's algorithm for finding antiderivatives. Would you then consider the problem "find me the antiderivative of this function, and please don't get out of the box" to be on the same order of difficulty as (b)? Does Wolfram Alpha overturn your worldview? Last I looked, it wasn't trying to get out...
0wedrifid13y
Not even remotely. I don't accept the analogy.
-1timtyler13y
Wolfram Alpha isn't really "in a box" in the first place. Like most modern machines, its sensors and actuators extend into the real world. We do restrain machines - but mostly when testing them. Elsewhere, constraints are often considered to be unnecessary expense. If a machine is dangerous, we typically keep humans away from it - and not the other way around.

How much friendliness is enough?

I'm for 'bool friendly = true'.

-8timtyler13y