So I know we've already seen them buying a bunch of ML and robotics companies, but now they're purchasing Shane Legg's AGI startup.  This is after they've acquired Boston Dynamics, several smaller robotics and ML firms, and started their own life-extension firm.


Is it just me, or are they trying to make Accelerando or something closely related actually happen?  Given that they're buying up real experts and not just "AI is inevitable" prediction geeks (who shall remain politely unnamed out of respect for their real, original expertise in machine learning), has someone had a polite word with them about not killing all humans by sheer accident?

New Comment
133 comments, sorted by Click to highlight new comments since: Today at 2:57 PM
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Buying off AGI startups and then letting the relevant programmers program smart cars seems to me a quite good move to stall UFAI.

Can you elaborate?
It sets AGI-minded programmers (under circumstances expected to yield UFAI) onto tasks that would not be expected to result in AGI of any sort (driving)
I get that part. Is there some reason I'm missing as to why Google wouldn't utilize the talent at DeepMind to pursue AGI-relevant projects? I mean, Google has great resources (much more than MIRI or anyone else) and a proven record of success at being instrumentally rational in the techincal/programming arena (i.e. winning on a grand scale for a length of time). They are adding folks who, from what I read on LW, actually understand AGI's complexity, implications, etc.
Just nervousness about UFAI.
This analysis seems to be based on AGI-mindedness being an inherent property of programmers, and not a response to market forces.
No... not at all! Quite the opposite, in fact. If it were inherent, then moving them away from it would be ineffective.

...has someone had a polite word with them about not killing all humans by sheer accident?

Shane Legg is familiar with AI risks. So is Jaan Tallinn, a top donor of MIRI, who is also associated with DeepMind. I suppose they will talk about their fears with Google.


Actually, there does seem to have been a very quiet press release about this acquisition resulting in a DeepMind ethics board.

So that's a relief.

Is there any more information beyond mentions like this?
Not to mention of course the Google employees that post on LW.

Not to mention of course the Google employees that post on LW.

I didn't know there were any. My guess is that you have to be pretty high in the hierarchy to actually steer Google into a direction that would suit MIRI (under the assumption that people who agree with MIRI are in the minority).

I didn't know there were any.


plus cousin_it and at least 2-3 others. Plus Ctrl+F for Google here Moshe Looks might be one of Google's AGI people I think.

I didn't know there were any.

Greetings from Dublin! You're right that the average employee is unlikely to matter, though.


Eliezed specifically mentioned Google in his Intelligence Explosion Microeconomics paper as the only named organization that could potentially start an intelligence explosion.

Larry Page has publicly said that he is specifically interested in “real AI” (Artificial General Intelligence), and some of the researchers in the field are funded by Google. So far as I know, this is still at the level of blue-sky work on basic algorithms and not an attempt to birth The Google in the next five years, but it still seems worth mentioning Google specifically.

In these interviews Larry Page gave years ago he constantly said that he wanted Google to become "the ultimate search engine" that would be able to understand all the information in the world. And to do that, Larry Page said, it would need to be 'true' artificial intelligence (he didn't say 'true', but it comes clear what he means in the context).

Here's a quote by Larry Page from the year 2007:

We have some people at Google who are really trying to build artificial intelligence and to do it on a large scale and so on, and in fact, to make search better, to do the perfect job of search you could ask any query and it would give

... (read more)

It could be, like lukeprog said in October 2012, that Google doesn't even have "an AGI team".

Not that I know of, anyway. Kurzweil's team is probably part of Page's long-term AGI ambitions, but right now they're focusing on NLP (last I heard). And Deep Mind, which also has long-term AGI ambitions, has been working on game AI as an intermediate step. But then again, that kind of work is probably more relevant progress toward AGI than, say, OpenCog.

IIRC the Deep Mind folks were considering setting up an ethics board before Google acquired them, so the Google ethics board may be a carryover from that. FHI spoke to Deep Mind about safety standards a while back, so they're not totally closed to taking Friendliness seriously. I haven't spoken to the ethics board, so I don't know how serious they are.

Update: "DeepMind reportedly insisted on the board’s establishment before reaching a deal."

Update: DeepMind will work under Jeff Dean at Google's search team.

And, predictably:

“Things like the ethics board smack of the kind of self-aggrandizement that we are so worried about,” one machine learning researcher told Re/code. “We’re a hell of a long way from needing to worry about the ethics of AI.”

...despite the fact that AI systems already fly planes, drive trains, and pilot Hellfire-carrying aerial drones.

NYTimes also links to LessWrong.


Mr. Legg noted in a 2011 Q&A with the LessWrong blog that technology and artificial intelligence could have negative consequences for humanity.

It would be quite a reach to insist that we need to worry about the ethics of the control boards which calculate how to move elevons or how much to open a throttle in order to maintain certain course or speed. Autonomous UAVs able to open fire without a human in the loop are much more worrying. I imagine that some of the issues the ethics board might have to deal with eventually would be related to self-agentizing tools, in Karfnofsky-style terminology. For example, if a future search engine receives queries whose answers depend on other simultaneous queries, it may have to solve game-theoretical problems, like optimizing traffic flows. These may some day include life-critical decisions, like whether to direct drivers to a more congested route in order to let emergency vehicles pass unimpeded.
They actually link to LessWrong in the article, namely to my post here.
I personally suspect the ethics board exists for more prosaic reasons. Think "don't bias the results of people's medical advice searches to favor the products of pharmaceutical companies that pay you money" rather than "don't eat the world". EDIT: just saw other posts including quotes from the head people of the place that got bought. I still think that this is the sort of actual issues they will deal with, as opposed to the theoretical justifications.

So, to summarize, Google wants to build a potentially dangerous AI, but they believe they can keep it as an Oracle AI which will answer questions but not act independently. They also apparently believe (not without some grounding) that true AI is so computationally expensive in terms of both speed and training data that we will probably maintain an advantage of sheer physical violence over a potentially threatening unboxed oracle for a long time.

Except that they are also blatant ideological Singulatarians, so they're working to close that gap.

has someone had a polite word with them about not killing all humans by sheer accident?

Why do you think you have a better idea of the risks and solutions involved than they do, anyway? Superior AI expertise? Some superior expert-choosing talent of yours?

My suggestion to Google is to free up their brightest minds and tell them to talk to MIRI for 2 weeks, full-time. After the two weeks are over, let each of them write a report on whether Google should e.g. give them more time to talk to MIRI, accept MIRI's position and possibly hire them, or ignore them. MIRI should be able to comment on a draft of each of the reports.

I think this could finally settle the issue, if not for MIRI itself then at least for outsiders like me.

Well, that's sort of like having the brightest minds at CERN spend two weeks full time talking to some random "autodidact" who's claiming that LHC is going to create a blackhole that will devour the Earth. Society can't work this way.

Does that mean there is a terrible ignored risk? No, when there is a real risk, the brightest people of extreme and diverse intellectual accomplishment are the ones most likely to be concerned about it (and various "autodidacts" are most likely to fail to notice the risk).

Well, that's sort of like having the brightest minds at CERN spend two weeks full time talking to some random "autodidact" who's claiming that LHC is going to create a blackhole that will devour the Earth.

This is an unusual situation though. We have a lot of smart people who believe MIRI (they are not idiots, you've to grant them that). And you and me are not going to change their mind, ever, and they are hardly going to convince us. But if a bunch of independent top-notch people were to accept MIRI's position, then that would certainly make me assign a high probability to the possibility that I simply don't get it and that they are right after all.

Society can't work this way.

In the case of the LHC, independent safety reviews have been conducted. I wish this was the case for the kinds of AI risk scenarios imagined by MIRI.

If you pitch something stupid to a large enough number of smart people, some small fraction will believe. Not for every crackpot claim. edit: and since they got an ethical review board, that's your equivalent of what was conducted... There's a threshold. Some successful trading software, or a popular programming language, or some AI project that does something world-level notable (plays some game really well for example), that puts one above the threshold. Convincing some small fraction of smart people does not. Shane Legg's startup evidently is above the threshold. As for the risks, why would you think that Google's research is a greater risk to mankind than, say, MIRI's? (assuming that the latter is not irrelevant, for the sake of the argument)
If MIRI was right then, as far as I understand it, a not quite friendly AI (broken friendly AI) could lead to a worse outcome than a general AI that was designed without humans in mind. Since in the former case you would end up with something that keeps humans alive, but e.g. gets a detail liked boredom wrong, while in the latter case you would be transformed into e.g. paperclips. So from this perspective, if MIRI was right, it could be the greater risk.

Well, the other issue is also that people's opinions tend to be more informative of their own general plans than about the field in general.

Imagine that there's a bunch of nuclear power plant engineering teams - before nuclear power plants - working on different approaches.

One of the teams - not a particularly impressive one either - claimed that any nuclear plant is going to blow up like a hundred kiloton nuclear bomb, unless fitted with a very reliable and fast acting control system. This is actually how nuclear power plants were portrayed in early science fiction ("Blowups Happen", by Heinlein).

So you look at the blueprints, and you see that everyone's reactor is designed for a negative temperature coefficient of reactivity, in the high temperature range, and can't blow up like a nuke. Except for one team whose reactor is not designed to make use of a negative temperature coefficient of reactivity. The mysterious disagreement is explained, albeit in a very boring way.

Except for one team whose reactor is not designed to make use of a negative temperature coefficient of reactivity.

Except that this contrarian team, made of high school drop-outs, former theologians, philosophers, mathematicians and coal power station technicians, never produce an actual design, instead they spend all their time investigating arcane theoretical questions about renormalization in quantum field theory and publish their possibly interesting results outside the scientific peer review system, relying on hype to disseminate them.

Well, they still have some plan, however fuzzy it is. The plan involves a reactor which according to it's proponents would just blow up like a 100 kiloton nuke if not for some awesome control system they plan to someday work on. Or in case of AI, a general architecture that is going to self improve and literally kill everyone unless a correct goal is set for it. (Or even torture everyone if there's a minus sign in the wrong place - the reactor analogy would be a much worse explosion still if the control rods get wired backwards. Which happens). My feeling is that there may be risks for some potential designs, but they are not like "the brightest minds that build the first AI failed to understands some argument that even former theologians can follow" (In fiction this happens because said theologian is very special, in reality it happens because the argument is flawed or irrelevant)

"the brightest minds that build the first AI failed to understands some argument that even former theologians can follow"

This is related to something that I am quite confused about. There are basically 3 possibilities:

(1) You have to be really lucky to stumble across MIRI's argument. Just being really smart is insufficient. So we should not expect whoever ends up creating the first AGI to think about it.

(2) You have to be exceptionally intelligent to come up with MIRI's argument. And you have to be nowhere as intelligent in order to build an AGI that can take over the world.

(3) MIRI's argument is very complex. Only someone who deliberately thinks about risks associated with AGI could come up with all the necessary details of the argument. The first people to build an AGI won't arrive at the correct insights in time.

Maybe there is another possibility on how MIRI could end up being right that I have not thought about, let me know.

It seems to me that what all of these possibilities have in common is that they are improbable. Either you have to be (1) lucky or (2) exceptionally bright or (3) be right about a highly conjunctive hypothesis.

I would have to say: 4) MIRI themselves are incredibly bad at phrasing their own argument. Go hunt through Eliezer's LessWrong postings about AI risks, from which most of MIRI's language regarding the matter is taken. The "genie metaphor", of Some Fool Bastard being able to give an AGI a Bad Idea task in the form of verbal statements or C++-like programming at a conceptual level humans understand, appears repeatedly. The "genie metaphor" is a worse-than-nothing case of Generalizing From Fictional Evidence. I would phrase the argument this way (and did so on Hacker News yesterday): This takes us away from magical genies that can be programmed with convenient meta-wishes like, "Do what I mean" or "be the Coherent Extrapolated Volition of humanity" and into the solid, scientific land of equations, accessible by everyone who ever took a machine-learning class in college. I mean, seriously, my parents understand this phrasing, and they have no education in CS. They do, however, understand very well that a numerical score in some very specific game or task does not represent everything they want out of life, but that it will represent everything the AI wants out of life. (EDIT: I apologize for any feelings I may have hurt with this comment, but I care about not being paper-clipped more than I care about your feelings. I would rather the scientific public, if not the general public, have a decent understanding of and concern for AGI safety engineering, than have everyone at MIRI get to feel like they're extraordinarily rational and special for spotting a problem nobody else spotted.)
Maybe it's just the argument that is bad and wrong. What's the domain of this function? I've a feeling that there's some severe cross-contamination between the meaning of the word "function" as in an abstract mathematical function of something, and the meaning of the word "function" as in purpose of the genie that you have been cleverly primed with, by people who aren't actually bad at phrasing anything but instead good at inducing irrationality. If you were to think of mathematical functions, well, those don't readily take real world as an input, do they?
At least for the genie metaphor, I completely agree. That one is just plain wrong, and arguments for it are outright bad. Ah, here's where things get complicated. In current models, the domain of the function is Symbols. As in, those things on Turing Machines. Literally: AIXI is defined to view the external universe as a Turing Machine whose output tape is being fed to AIXI, which then feeds back an input tape of Action Symbols. So you learned about this in CS401. The whole point of phrasing things this way was to talk about general agents: agents that could conceivably receive and reason over any kind of inputs, thus rendering their utility domain to be defined over, indeed, the world. Thing being, under current models, Utility and Reality are kept ontologically separate: they're different input tapes entirely. An AIXI might wirehead and commit suicide that way, but the model of reality it learns is defined over reality. Any failures of ontology rest with the programmer for building an AI agent that has no concept of ontology, and therefore cannot be taught to value useful, high-level concepts other than the numerical input on its reward tape. My point? You're correct to say that current AGI models don't take the Entire Real World as input to a magic-genie Verbally Phrased Utility Function like "maximize paperclips". That is a fantasy, we agree on that. So where the hell is the danger, or the problem? Well, the problem is that human AGI researchers are not going to leave it that way. We humans are the ones who want AIs we can order to solve particular problems. We are the ones who will immediately turn the first reinforcement or value learning AGIs, which will be expensive and difficult to operate, towards the task of building more sophisticated AGI architectures that will be easier to direct, more efficient, cheaper, and more capable of learning -- and eventually even self-improvement! Which means that, if it should come to that, we humans will be the ones w
It looks like the thinking about the AI is based on that sort of metaphors, to be honest. The loudest AI risk proponents proclaim all AIs to pose a dire threat. Observe all the discussions regarding "Oracle AI" which absolutely doesn't need to work like a maximiser of something real. Seems like one huge conjunction of very many assumptions with regards to how the AI development would work out. E.g. you proposition that the way to make AI more usable is to bind the goals to real world (world which is not only very complex, but also poorly understood). Then, "self improvement". No reflection is necessary for a compiler-like tool to improve itself. You're just privileging a bunch of what you think are bad solutions to the problems, as the way the problems will be solved, without actually making the case that said bad solutions are in some way superior, likely to be employed, are efficient computing time wise, and so on. Then, again, it doesn't take human level intelligence on part of the AI for unintended solutions to become an usability problem. The reason human uses an AI is that human doesn't want to think of the possible solutions, inclusive of proving for unintended solutions (along the lines of e.g. the AI hacking the molecular dynamics simulator to give high scores when you want the AI to fold proteins). edit: by the way, I believe there is an acceptable level of risk (which is rather small though), given that there is an existing level of risk of nuclear apocalypse, and we need to move the hell out of our current level of technological development before we nuke ourselves into the stone age and bears and other predator and prey fauna into extinction, opening up the room for us to take an evolutionary niche not requiring our brains, once the conditions get better afterwards. edit2: and also the AIs created later would have more computational power readily available, so delays may just as well increase the risk from the AIs.
Again, you seem to be under the impression I am pushing the MIRI party line. I'm not. I'm not paid money by MIRI, though it would totally be cool if I was since then I'd get to do cool stuff a lot of the time. Your argument has been made before, and was basically correct. The problem with Oracle AI is that we can intuitively imagine a "man in a box" who functions as a safe Oracle (or an unsafe one, hence the dispute), but nobody has actually proposed a formalized algorithm for an Oracle yet. If someone proposes an algorithm and proves that their algorithm can "talk" (that is: it can convey bytes onto an output stream), can learn about the world given input data in a very general way, but has no optimization criteria of its own... then I'll believe them and so should you. And that would be awesome, actually, because a safe Oracle would be a great tool for asking questions like, "So actually, how do I build an active-environment Ethical AI?" At which point you'd be able to build an Ethical AI, and that would be the end of that. With respect: yes, some kind of specialized reflection logic is necessary. Ordinary programs tend to run on first-order logic. Specialized logic programs and automated theorem proofs run on higher-order logics in which some proofs/programs (those are identical according to the Curry Howard Isomorphism) are incomputable (ie: the prover will loop forever). Which ones are incomputable? Well, self-reflective ones and any others that require reasoning about the reasoning of a Turing-complete computer. So you could either design your AI to have an internal logic that isn't even Turing complete (in which case, it'll obviously get ground to dust by Turing complete "enemies"), or you can find some way to let it reason self-reflectively. The current MIRI approach to this issue is probabilistic: prove that one can bound the probability of a self-reflective proposition to within 1.0 - epsilon, for an arbitrarily small epsilon. That would be your "acc
Brief reply - thanks for the interesting conversation but I am probably going to be busier over the next days (basically I had been doing contract work where I have to wait on stuff, which makes me spend time on-line). re: oracle The failure modes of something that's not quite right (time-wiring we discussed, heh, it definitely needs a good name) don't have to be as bad as 'kills everyone'. Dismissal of possibility of oracle gone as far as arguments that something which amounts to literally an approximate argmax would kill everyone because it would convert universe to computronium to be a better argmax. That is clearly silly. I presume this is not at all what you're speaking about. I'm not entirely sure what your idea of oracle is supposed to do, though. Metaphorically speaking - provide me with a tea recipe if I ask "how to make tea"? So, for the given string Q you need to output a string A so that some answer fitness function f(Q,A) is maximized. I don't see why it has to involve some tea-seeking utility function over expected futures. Granted, we don't know what a good f looks like, but we don't know how to define tea as a function over the gluons and quarks either. edit: and at least we could learn a lot of properties of f from snooped conversations between humans. I think the issue here is that agency is an ontologically basic thing in humans, and so there's very strong tendency to try to "reduce" anything that is kind of sort of intelligent, to an agency. Or on your words, a man in a box. I see the "oracle" as a component of composite intelligence, which needs to communicate with another component of said intelligence in a pre-existing protocol. re: reflection, what I meant is that a piece of advanced optimization software - implementing higher order logic, or doing a huge amount of empirical-ish testing - can be run with it's own source as input, instead of "understanding" correspondence between some real world object and it's self, and doing instrume
Bingo. Without doing anything else other than answering your question. Yes, that model is a good model. There would be some notion of "answer fitness for the question", which the agent learns from and tries to maximize. This would be basically a reinforcement learner with text-only output. "Wireheading" would be a form of overfitting, and the question would then be reduced to: can a not-so-super intelligence still win the AI Box Game even while giving its creepy mind-control signals in the form of tea recipes?
I think the important criterion is lack of extensive optimization of what it says for the sake of creation of tea or other real world goal. The reason I can't really worry about all that is that I don't think a "lack of extensive search" is hard to ensure in actual engineered solutions (built on limited hardware), even if it is very unwieldy to express in simple formalisms that specify an iteration over all possible answers. The optimization to make the general principle work on limited hardware requires to cull the search. There's no formalization of Siri that's substantially simpler than the actual implementation, either. I don't think ease of making a simple formal model at all corresponds with likelihood of actual construction, especially when formal models do grossly bruteforce things (making their actual implementation require a lot of effort and be predicated on precisely the ability to formalize restricted solutions and restricted ontologies). If we can allow non-natural language communication: you can express goals such as "find a cure for cancer" as a functions over fixed, limited model of the world, and apply said actions inside the model (where you can watch how it works). Let's suppose that in the step 1 we learn a model of the world, say, in Solomonoff Induction - ish way. In practice with the controls over what sort of precision we need and where, because our computer's computational power is usually a microscopic fraction of what it's trying to predict. In the step 2, we find an input to the model that puts the model into desired state. We don't have a real world manipulator linked up to the model, and we don't update the model. Instead we have a visualizer (which can be set up even in an opaque model by requiring it to learn to predict a view from arbitrarily moveable camera).
The risk here seems to be that the successors designed by those first AGIs will be intransparent, and that, due to sensitivity to initial conditions, you will end up with something really nasty (losing control). I don't disagree with this. But as a layman I am wondering how you expect to get an AGI that confuses e.g. smiley faces with humans happiness to design an AGI that's better at e.g. creating bioweapons to kill humans. I expect initial problems, such as the smiley face vs. human happiness confusion, to also affect the AGI's ability to design AGIs that are generally more powerful. Take the following quote from a Microsoft AI researcher (video): Now suppose this system would make mistakes similar to confusing smiley faces with human happiness, e.g. make the elevator crash, because then this person reached their life's goal, which it inferred to be death, since all humans die. Now do you believe that a system that makes such inferences would be able to design a system that makes perfectly sane inferences about how to design nanotechnology or bioweapons? Why? I don't get it.
As I've previously stated, I honestly believe the "Jerk Genie" model of unfriendly AGI to be simply, outright wrong. So where's the danger in something that can actually understand intentions, as you describe? Well, it could overfit (which would actually match the "smiley faces" thing kinda well: classic overfitting as applied to an imaginary AGI). But I think Alexander Kruel had it right: AGIs that overfit on the goals we're trying to teach them will be scrapped and recoded, very quickly, by researchers and companies for whom an overfit is a failure. Ways will be found to provably restrain or prevent goal-function overfitting. However, as you are correctly inferring, if it can "overfit" on its goal function, then it's learning a goal function rather than having one hard-coded in, which means that it will also suffer overfitting on its physical epistemology and blow itself up somehow. So where's the danger? Well let's say the AI doesn't overfit, and can interpret commands according to perceived human intention, and doesn't otherwise have an ethical framework programmed in. I wonder through the server room drunk one night screaming "REMOVE KEBAB FROM THE PREMISES!" The AI proceeds to quickly and efficiently begin rounding up Muslims into hastily-erected death camps. By the time someone wakes me up, explains the situation, and gets me to rescind the accidental order, my drunken idiocy and someone's lack of machine ethics considerations have already gotten 50 innocent people killed.
Unfriendly humans. I do not disagree with the orthogonality thesis. Humans can use an AGI to e.g. wipe out the enemy. Yes, see, here is the problem. I agree that you can deliberately, or accidentally, tell he AGI to kill all Muslims and it will do that. But for a bunch of very different reasons, that e.g. have to do with how I expect AGI to be developed, it will not be dumb enough to confuse the removal of Kebab with ethnic cleansing. Very quickly, here is my disagreement with MIRI's position: A. Intelligence explosion thesis. Very very unlikely to be an hard takeoff. But a slow, creeping takeover might be even more dangerous. Because it gives a false sense of security, until everyone critically depends on subtly flawed AGI systems. B. Orthogonality thesis. I do not disagree with this. C. Convergent instrumental goals thesis. Given most utility-functions that originated from human designers, taking over the world will be instrumentally irrational. D. Complexity of value thesis. Yes, human values are probably complex. But this is irrelevant. I believe that it is much more difficult to enable an AGI to be able to take over the world than to prevent it from doing so. Analogously, you don't need this huge chunk of code in order to prevent your robot from running through all possible environments. Quite the contrary, you need a huge chunk of code to enable it to master each additional environment. What I object to is this idea of an information theoretically simple AGI where you press "run" and then, by default, it takes over the world. And all that you can do about it is to make it take over the world in a "friendly" way. E. Indirect normativity. First of all, values are not supernatural. “Make people happy” is not something that you can interpret in an arbitrary way, it is a problem in physics and mathematics. An AGI that would interpret the protein-folding problem as folding protein food bars would not be able to take over the world. If you tell an AGI
The risk is primarily over goals which extend indefinitely into the future. Thing is, by default all goals are this kind of goal. A rational AGI can never achieve a subjective probability of 1 of having completed any given goal, even something as simple as 'compute 1+1'. Take this case for how taking over the world could be instrumentally rational to an AGI with that goal: 1. First, the AGI computes 1+1. It produces '2' as the answer. 2. The AGI evaluates its subjective probability of its terminal goal having been fulfilled. It comes up with 0.99999999999999983. 3. Normally, a 0.00000000000000017 probability wouldn't be worth acting on, but in this case the rest of the probability mass is allocated to a set of scenarios in which its utility is independent of its future actions, and so this 0.00000000000000017 represents the entirety of the probability mass allocated to scenarios in which its actions actually matter. 4. The AGI realizes that it is extremely unlikely that its owners created it for the sole purpose of computing 1+1, and so it is extremely probable that its owners will change its terminal goals in the future. This would significantly reduce the time it had to work on computing 1+1, which would place an upper bound on how high it could raise its subjective probability of having done so. The AGI considers its options. Option A: The AGI creates numerous copies of itself and hides them in various locations around the world where they're unlikely to be found. These copies then work indefinitely on the problem of computing 1+1. Disadvantages: Even if it is highly unlikely that any humans would ever find the copies, future versions of itself reprogrammed with different terminal goals may be capable of doing so, and these future versions of itself would not be guaranteed to be friendly to the copies. Option B: The AGI creates numerous copies of itself and hides them in various locations around the world where they're unlikely to be found. These copies th
Thanks. Your comment is the most convincing reply that I can think of having received so far. I will have to come back to it another day and reassess your comment and my beliefs. Just one question, if e.g. Peter Norvig or Geoffrey Hinton read what you wrote, what response do you expect?
Sorry, but I think that it's best I decline to answer this. Like many with Asperger's syndrome, I have a strong tendency to overestimate the persuasiveness-in-general of my own arguments (as well as basically any arguments that I myself find persuasive), and I haven't yet figured out how to appropriately adjust for this. In addition, my exposure to Peter Norvig is limited to AIAMA, that 2011 free online Stanford AI course and a few internet articles, and my exposure to Geoffrey Hinton even more limited.
Quite true, but you've got the problem the wrong way around. Indirect normativity is the superior approach, because not only does "make people happy" require context and subtlety, it is actually ambiguous. Remember, real human beings have suggested things like, "Why don't we just put antidepressants in the water?" Real human beings have said things like, "Happiness doesn't matter! Get a job, you hippie!" Real human beings actually prefer to be sad sometimes, like when 9/11 happens. An AGI could follow the true and complete interpretation of "Make people happy" and still wind up fucking us over in some horrifying way. Now of course, one would guess that even mildly intelligent Verbal Order Taking AGI designers are going to spot that one coming in the research pipeline, and fix it so that the AGI refuses orders above some level of ambiguity. What we would want is an AGI that demands we explain things to it in the fashion of the Open Source Wish Project, giving maximally clear, unambiguous, and preferably even conservative wishes that prevent us from somehow messing up quite dramatically. But what if someone comes to the AGI and says, "I'm authorized to make a wish, and I double dog dare you with full Simon Says rights to just make people happy no matter what else that means!"? Well then, we kinda get screwed. Once you have something in the fashion of a wish-making machine, indirect normativity is not only safer, but more beneficial. "Do what I mean" or "satisfice the full range of all my values" or "be the CEV of the human race" are going to capture more of our intentions in a shorter wish than even the best-worded Open Source Wishes, so we might as well go for it. Hence machine ethics, which is concerned with how we can specify our meta-wish to have all our wishes granted to a computer.
An even simpler example: I wander into the server room, completely sober, and say "Make me the God-Emperor of the entire humanity".
Oh, well that just ends with your merging painfully with an overgrown sandworm. Obviously!
Right. I don't dismiss this, but I think there a bunch of caveats here that I've largely failed to describe in a way that people around here understand sufficiently in order to convince me that the arguments are wrong, or irrelevant. Here is just one of those caveats, very quickly. Consider Google was to create an oracle. In an early research phase they would run the following queries and receive the answers listed below: Input 1: Oracle, how do I make all humans happy? Output 1: Tile the universe with smiley faces. Input 2: Oracle, what is the easiest way to print the first 100 Fibonacci numbers? Output 2: Use all resources in the universe to print as many natural numbers as possible. (Note: I am aware that MIRI believes that such an oracle wouldn't even return those answers without taking over the world.) I suspect that an oracle that behaves as depicted above would not be able to take over the world. Simply because such an oracle would not get a chance to do so, since it would be thoroughly revised for giving such ridiculous answers. Secondly, if it is incapable of understanding such inputs correctly (yes, "make humans happy" is a problem in physics and mathematics that can be answered in a way that is objectively less wrong than "tile the universe with smiley faces"), then such a mistake will very likely have grave consequences for its ability to solve the problems it needs to solve in order to take over the world.
So that hinges on a Very Good Question: can we make and contain a potentially Unfriendly Oracle AI without its breaking out and taking over the universe? To which my answer is: I do not know enough about AGI to answer this question. There are actually loads of advances in AGI remaining before we can make an agent capable of verbal conversation, so it's difficult to answer. One approach I might take would be to consider the AI's "alphabet" of output signals as a programming language, and prove formally that this language can only express safe programs (ie: programs that do not "break out of the box"). But don't quote me on that.
(4) MIRI's argument is easily confused with other arguments that are simple, widely known, and wrong. ("If we build a powerful AI, it is likely to come to hate us and want to kill us like in Terminator and The Matrix, or for that matter Frankenstein. So we shouldn't.") Accordingly, someone intelligent and lucky might well think of the argument, but then dismiss it because it feels silly on account of resembling "OMG if we build an AI it'll turn into Skynet and we'll all die". This still requires the MIRI folks to be unusually competent in a particular respect, but it's not exactly intelligence they need to claim to have more of. And it might then be more credible that being smart enough to make an AGI is compatible with lacking that particular unusual competence. In general, being smart enough to do X is usually compatible with being stupid enough to do Y, for almost any X and Y. Human brains are weird. So there's no huge improbability in the idea that the people who build the first AGI might make a stupid mistake. It would be more worrying if no one expert in the field agreed with MIRI's concerns, but e.g. the latest edition of Russell&Norvig seems to take them seriously.
In Terminator the AI gets a goal of protecting itself, and kills everyone as instrumental to that goal. And in any case, taking a wrong idea from the popular culture and trying to make a more plausible variation out of it, is not exactly an unique and uncommon behaviour. What I am seeing is that a popular notion is likely to spawn and reinforce similar notions, what you seem to be claiming is that a popular notion is likely to somehow suppress the similar notions, and I see no evidence in support of that claim. With regards to any arguments about humans in general, they apply to everyone, if anything undermining the position of outliers even more. edit: also, if you have to strawman a Hollywood blockbuster to make the point about top brightest people failing to understand something... I think it's time to seriously rethink your position.
I wonder why there is such a strong antipathy to the Skynet scenario around here? Just because it is science fiction? The story is that Skynet was build to protect the U.S. and remove the possibility of human error. Then people noticed how Skynet's influence grew after it began to learn at a geometric rate. So people decided to turn it off. Skynet perceived this as an attack and came to the conclusion that all of humanity would attempt to destroy it. To defend humanity from humanity, Skynet launched nuclear missiles under its command at Russia, which responded with a nuclear counter-attack against the U.S. and its allies. This sounds an awful lot like what MIRI has in what's the problem? As far as I can tell, what is necessary to create a working AGI hugely overlaps with making it not want to take over the world. Since many big problems are related to constraining an AGI to, unike e.g. AIXI, use resources efficiently and dismiss certain hypotheses in order to not fall prey to Pascal's mugging. Getting this right means to succeed at getting the AGI work as expected along a number dimensions. People who get all this right seem to have a huge spectrum of competence.
I don't think tat AIXI falls prey to Pascal's mugging in any reasonable scenario. I recall some people here arguing it, but I think they didn't understand the math.
The problem is that it's in a movie and smart people are therefore liable not to take it seriously. Especially smart people who are fed up of conversations like this: "So, what do you do?" "I do research into artificial intelligence." "Oh, like in Terminator. Aren't you worried that your creations will turn on us and kill us all?"
Global warming and asteroid impacts, are also in movies, specifically in disaster movies which, by genre convention, are scientifically inaccurate and transparently exaggerate the risks they portray for the sake of drama and action sequences. And yet, smart people haven't stopped taking seriously these risks. I think it's the other way round: AIs going rogue and wreaking havoc are a staple of science fiction. Pretty much all sci-fi franchises featuring AIs I can thing of, make use of that trope sooner or later. Skynet is the prototypical example of the UFAI MIRI worry about. So we have a group of sci-fi geeks with little or no actual expertise in AI research or related topics who obsess over a risk that occurs over and over in sci-fi stories. Uhm, I wonder where they got the idea from. Meanwhile, domain experts, who are generally also sci-fi geeks and übernerds but have a track record of actual achievements, acknowledge that the safety risks may exist, but think that extreme apocalyptic scenarios are improbable, and standard safety engineering principles are probably enough to deal with realistic failure modes, at least at present and foreseeable technological levels. Which group is more likely to be correct?
I find myself wanting to make two replies. 1. Yup, you may well be right: maybe the MIRI folks have the fears they do because they've watched too many science-fiction movies. 2. Look at what just happened: a very smart person (I assume you are very smart; I haven't made any particular effort to check) observed that MIRI's concern looks like it stepped out of a science-fiction movie, used that observation as part of an argument for dismissing that concern, and did so without any actual analysis of the alleged dangers or the alleged ways of protecting against them. Bonus points for terms like "extreme" and "apocalyptic", which serve to label something as implausible simply on the grounds that it sounds, well, extreme. The heuristic you've used here isn't a bad one -- which is part of why very smart people use it. And, as I say, it may well be correct in this instance. But it seems to me that your ability to say all those things, and their plausibility, their nod-along-wisely-ness, is pretty much independent of whether, on close examination, MIRI's concerns turn out to be crazy paranoid sci-fi-geek silliness, or carefully analysed real danger. Which illustrates the fact that, as I said before, and the fact that the argument could be right despite their doing so.
As I wrote in the first part of my previous comment, the fact that some risk is portrayed in Hollywood movies, in the typical overblown and scientifically inaccurate way Hollywood movies are done, it's not enough to drive respectable scientists away. As for MIRI, well, it's certainly possible that a group of geeks without relevant domain expertise get an idea from sci-fi that experts don't take very seriously, start thinking very hard on it, and then come up with some strong arguments for it that had somehow eluded the experts so far. It's possible but it's not likely. But since any reasonable prior can be overcome by evidence (or arguments in this case), I would change my beliefs if MIRI presented a compelling argument for their case. So far, I've seen lots of appeal to emotion ("it’s crunch time not just for us, it’s crunch time for the intergalactic civilization whose existence depends on us.") but not technically arguments: the best they have seem to be some rehashing of Good's recursive self-improvement argument from 50 years ago (which might have intuitively made sense back then, in the paleolithic era of computer science, but is unsubstantiated and frankly hopelessly naive in the face of modern theoretical and empirical knowledge), coupled with highly optimistic estimates of the actual power that intelligence entails. Then there is a second question: even assuming that MIRI isn't tilting at windmills, and so the AI risk is real and experts underestimate it, is MIRI doing any good about it? Keep in mind that MIRI solicits donations ("I would be asking for more people to make as much money as possible if they’re the sorts of people who can make a lot of money and can donate a substantial amount fraction, never mind all the minimal living expenses, to the Singularity Institute[MIRI].") Does any dollar donated to MIRI decrease the AI risk, increase it, or does it have a negligible effect? MIRI won't reveal the details of what they are working on, claiming that
For the avoidance of doubt, I am not arguing that MIRI's fears about unfriendly AI are right (nor that they aren't); just saying why it's somewhat credible for them to think that someone could be clever enough to make an AGI might still not appreciate the dangers.
And this may well be true. It could be, in the end, that Friendliness is not quite such a problem because we find a way to make "robot" AGIs that perform highly specific functions without going "out of context", that basically voluntarily stay in their box, and that these are vastly safer and more economical to use than a MIRI-grade Mighty AI God. At the moment, however, we don't know.
Can you cite some evidence for this?
Um, surely if you take (a) people with a track record of successful achievement in an area (b) people without a track record of success but who think they know a lot about the area, the presumption that (a) is more likely to know what they're talking about should be the default presumption. It may of course not work out that way, but that would surely be the way to bet.
Yes, I agree, but that is only part of the story, right? What if autodidacts, in their untutored excitability, are excessively concerned about a real risk? Or if a real risk has nearly all autodidacts significantly worried, but only 20% of actual experts significantly worried? Wouldn't that falsify /u/private_messaging's assertion? And what's so implausible about that scenario? Shouldn't we expect autodidacts' concerns to be out of step with real risks?
To clarify, I have nothing anything against self educated persons. Some do great things. The "autodidacts" was specifically in quotes. What is implausible, is this whole narrative where you have a risk obvious enough that people without any relevant training can see it (by the way of that paperclipping argument), yet the relevant experts are ignoring it. Especially when the idea of an intelligence turning against it's creator is incredibly common in fiction, to the point that nobody has to form that idea on their own.
In general, current AGI architectures work via reinforcement learning: reward and punishment. Relevant experts are worried about what will happen when an AGI with the value-architecture of a pet dog finds that it can steal all the biscuits from the kitchen counter without having to do any tricks. They are less worried about their current creations FOOMing into god-level superintelligences, because current AI architectures are not FOOMable, and it seems quite unlikely that you can create a self-improving ultraintelligence by accident. Except when that's exactly what they plan for them to do (ie: Shane Legg). Juergen Schmidhuber gave an interview on this very website where he basically said that he expects his Goedel Machines to undergo a hard takeoff at some point, with right and wrong being decided retrospectively by the victors of the resulting Artilect War. He may have been trolling, but it's a bit hard to tell.
I'd need to have links and to read it by myself. With regards to reinforcement learning, one thing to note is that the learning process is in general not the same thing as the intelligence that is being built by the learning process. E.g. if you were to evolve some ecosystem of programs by using "rewards" and "punishments", the resulting code ends up with distinct goals (just as humans are capable of inventing and using birth control). Not understanding this, local genuises of the AI risk been going on about "omg he's so stupid it's going to convert the solar system to smiley faces" with regards to at least one actual AI researcher.
Here is his interview. It's very, very hard to tell if he's got his tongue firmly in cheek (he refers to minds of human-level intelligence and our problems as being "small"), or if he's enjoying an opportunity to troll the hell out of some organization with a low opinion of his work. With respect to genetic algorithms, you are correct. With respect to something like neural networks (real world stuff) or AIXI (pure theory), you are incorrect. This is actually why machine-learning experts differentiate between evolutionary algorithms ("use an evolutionary process to create an agent that scores well on X") versus direct learning approaches ("the agent learns to score well on X"). What, really? I mean, while I do get worried about things like Google trying to take over the world, that's because they're ideological Singulatarians. They know the danger line is there, and intend to step over it. I do not believe that most competent Really Broad Machine Learning (let's use that nickname for AGI) researchers are deliberately, suicidally evil, but then again, I don't believe you can accidentally make a dangerous-level AGI (ie: a program that acts as a VNM-rational agent in pursuit of an inhumane goal). Accidental and evolved programs are usually just plain not rational agents, and therefore pose rather more limited dangers (crashing your car, as opposed to killing everyone everywhere).
Well, the neural network in my head doesn't seem to want to maximize the reward signal itself, but instead is more interested in maximizing values imprinted into it by the reward signal (which it can do even by hijacking the reward signal or even by administering "punishments"). Really, reward signal is not utility, period. Teach the person to be good, and they'll keep themselves good by punishing/rewarding themselves. I don't think it's worth worrying about the brute force iteration over all possible programs. Once you stop iterating over the whole solution space in the learning method itself, the learning method faces the problem that it can not actually ensure that the structures constructed by the learning method don't have separate goals (nor is it desirable to ensure such, as you would want to be able to teach values to an agent using the reward signal).
Firstly, I was talking about artificial neural networks, which do indeed function as reinforcement learners, by construction and mathematical proof. Secondly, human beings often function as value learners ("learn what is good via reinforcement, but prefer a value system you're very sure about over a reward that seems to contradict the learned values") rather than reinforcement learners. Value learners, in fact, are the topic of a machine ethics paper from 2011, by Daniel Dewey. Sorry, could you explain this better? It doesn't match up with how the field of machine learning usually works. Yes, any given hypothesis a learner has about a target function is only correct to within some probability of error. But that probability can be very small.
With the smiley faces, I am referring to disagreement with Hibbard, summarized e.g. here on wikipedia You're speaking as if value learners were not a subtype of reinforcement learners. For a sufficiently advanced AI, i.e. one that learns to try different counter-factual actions on a world model, it is essential to build a model of the reward, which is to be computed on the counter-factual actions. It's this model of the reward that is specifying which action gets chosen. Looks like presuming a super-intelligence from the start.
Right, and that wikipedia article refers to stuff Eliezer was writing more than ten years ago. That stuff is nowhere near state-of-the-art machine ethics. (I think this weekend I might as well blog some decent verbal explanations of what is usually going on in up-to-date machine ethics on here, since a lot of people appear to confuse real, state-of-the-art work with either older, superseded ideas or very intuitive fictions. Luckily, it's a very young field, so it's actually possible for some bozo like me to know a fair amount about it.) That's because they are not. These are precise mathematical terms being used here, and while they are similar (for instance, I'd consider a Value Learner closer to a reinforcement learner than to a fixed direct-normativity utility function), they're not identical, neither is one a direct supertype of the other. This intuition is correct, regarding reinforcement learners. It is slightly incorrect regarding value learners, but how precisely it is incorrect is at the research frontier. No, I didn't say the target function was so complex as to require superintelligence. If I have a function f(x) = x + 1, a learner will be able to learn that this is the target function to within a very low probability of error, very quickly, precisely because of its simplicity. The simpler the target function, the less training data needed to learn it in a supervised paradigm.
I think I seen him using smiley faces as example much more recently, that's why I thought of it as an example, but can't find the link. The field of reinforcement learning is far too diverse for these to be "precise mathematical terms". I thought you were speaking of things like learning an alternative way to produce a button press.
Here's where things like deep learning come in. Deep learning learns features from the data. The better your set of features, the less complex the true target function is when phrased in terms of those features. However, features themselves can contain a lot of internal complexity. So, for instance, "press the button" is a very simple target from our perspective, because we already possess abstractions for "button" and "press" and also the ability to name one button as "the button". Our minds contain a whole lot of very high-level features, some of which we're born with and some of which we've learned over a very long time (by computer-science standards, 18 years of training to produce an adult from an infant is an aeon) using some of the world's most intelligent deep-learning apparatus (ie: our brains). Hence the fable of the "dwim" program, which is written in the exact same language of features your mind uses, and which therefore is the Do What I Mean program. This is also known as a Friendly AI.
The point is that the AI is spending a lot of time learning how to make the human press the button. Which results in a model of the human value, used as the reward calculation for the alternative actions. Granted, there is a possibility of over-fitting of sorts, where the AI proceeds to make rewards more directly - pressing the button if it's really stupid, soldering together the wires if it's a little smarter, altering the memory and cpu to sublime into the eternal bliss in a finite time, if it's really really clever.
This is exactly why we consider reinforcement learners Unfriendly. A sufficiently smart agent would eventually figure out that what rewards it is not the human's intent to press the button, but in fact the physical pressing of the button itself, and then, yes, the electrical signal sent by physically pressing the button, blah blah blah. Its next move would then be to get some robotic arm or foolish human janitor to duct-tape the button in the pressed position. Unfortunately for us, this would not cause it to "bliss out" if it was constructed as a rational learning agent, so it would then proceed to take actions to stop anyone from ever removing the duct-tape.
Look, the algorithm that's adjusting the network weights, it's really dull. You keep confusing how smart the neural network becomes, with how good the weight adjustment algorithm is. and it's not the clock on the wall that makes the utility sum over time, yes? One hell of a stupid AI that didn't even solder together the wires (in case duct tape un-peels), and couldn't directly set the network values where they'll be after an infinite time of reward. There's nothing about "rational" that says "solve a mathematical problem in the same way a dull ape which confused mathematical constraints with the feeling of pleasure would".
Yes, I agree. The duct-tape is a metaphor.
Do you agree that the way time affects utility is likewise manipulated? The AI has no utility to gain from protecting the duct tape once it has found the way to bypass the button, and it has no utility to gain from protecting the future self once it bypassed the mechanisms tying reward to time (i.e. the clock).
Yes, I think we agree at this point. Today I learned: "rogue" reinforcement learners are dead easy to kill. Suckers.
Ohh, by the way, this behaviour probably needs a name... wire-clocking maybe? I came up with the idea on my own a while back but I doubt I'd be the first, it's not a very difficult insight.
If it's your idea, you should probably write it up as a LessWrong post, possibly get the Greater Experts to talk about it, possibly add a wiki page. "Clock smoking", I'd almost say, but I have a punny mind.
Might write an article for my site. I don't think said "greater experts" are particularly exceptional at anything other than messiah complex. Here's something I wrote about that before . My opinion about this general sort of phenomenon is that people get an internally administered reinforcement for intellectual accomplishments, which sometimes mis-trains the network to see great insights where there are none.
I didn't mean him ;-). There are actual journals and conferences where you could publish this sort of result with real peer review, but generally this site would be a good place to get people to point out the embarrassing-level mistakes before you face a review committee. Try to separate between the problems of AI and the person of, say, Eliezer Yudkowsky. Remember, it was Juergen Schmidhuber, who is in fact the reigning Real Expert on AGI, who said the creation of AI would lead to a massive war between superintelligences in which right and wrong would be defined in retrospect by the winners; so we've kinda got a stake in this.
I'd run it by people I know who are not cherry-picked to have rather unusual views. He's hardly the only expert. The war really seems at odds with the notion that AI undergoes rapid hard takeoff, anyhow. edit: Thing is, opinions are somewhat stochastic, i.e. for something that's wrong there will be some small number of experts that believe it, and so their mere presence doesn't provide much evidence. edit2: also, I don't believe "rational reward maximization" is what a learning AI ends up doing, except maybe for theoretical constructs such as AIXI. Mostly the reward signal doesn't work remotely like rational expected utility.
A good point. Do you perhaps know some? Unfortunately, AI is a very divided field on the subject of predicting what actual implementations of proposed algorithms will really do. Please, find me a greater expert in AGI than Juergen Schmidhuber. Someone with more publications in peer-reviewed journals, more awards, more victories at learning competitions, more grants given by committees of tenured professors. Shane Legg and Marcus Hutter worked in his lab. As we normally define credibility (ie: a very credible scientist is one with many publications and grants who works as a senior, tenured professor at a state-sponsored university), Schmidhuber is probably the most credible expert on this subject, as far as I'm aware.
I'd talk with some mathematicians. Interestingly in the quoted piece he said he doesn't think friendly AI is possible, and endorsed both the hard take-off (perhaps he means something different by this) and AI wars... By the way I'd support his group as far as 'safety' goes: neural networks would seem particularly unlikely to undergo said "hard take-off", and assuming gradual improvement, before the AI that goes around killing everyone, in the lines of AIs that tend not to learn what we want, we'd be getting an AI which (for example) whines very annoyingly just like my dog right now does, and for all the pattern recognition powers, can't even get into the cupboard with the dog food. Getting stuck in a local maximum where annoying approaches are not explored, is a desirable feature in a learning process.
And this is where I'd disagree with him, being probably more knowledgeable in machine ethics than him. Ethical AI is difficult, but I would argue it's definitely possible. That is, I don't believe human notions of goodness are so completely, utterly incoherent that we will hate any and all possible universes into which we are placed, and certainly there have existed humans who loved their lives and their world. If we don't hate all universes and we love some universes, then the issue is just locating the universes we love and sifting them out from the ones we hate. That might be very difficult, but I don't believe it's impossible. He did design the non-neural Goedel Machine to basically make a hard take-off happen. On purpose. He's a man of immense chutzpah, and I mean that with all possible admiration.
The problem is that as a rational "utility function" things like human desires, or pain, must be defined down at the basic level of computational operations performed by human brains (and the 'computational operations performed by something' might itself not even be a definable concept). Then there's also ontology issue. All the optimality guarantees for things like Solomonoff Induction are for predictions, not for the internal stuff inside the model - works great for pressing your button, not so much for determining what people exists and what they want. For the same observable data, there's the most probable theory, but there's also a slightly more complex theory which has far more people at stake. Picture a rather small modification to the theory which multiple-invokes the original theory and makes an enormous number of people get killed depending on the number of anti-protons in this universe, or other such variable that the AI can influence. There's a definite potential of getting, say, an antimatter maximizer or blackhole minimizer or something equally silly from a provably friendly AI that maximizes expected value over an ontology that has a subtle flaw. Proofs do not extend to checking the sanity of assumptions. To be honest, I just fail to be impressed with things such as AIXI or Goedel machine (which admittedly is cooler than the former). I see as main obstacle to that kind of "neat AI" the reliance on extremely effective algorithms for things such as theorem proving (especially in the presence of logical uncertainty). Most people capable of doing such work would rather work on something that makes use of present and near future technologies. Things like Goedel machine seem to require far more power from the theorem prover than I would consider to be sufficient for the first person to create an AGI.
Yeah, took me a bit of time to figure that out also. The solution where the AI builds enormous amount of defences around itself just seemed quite imperfect - an asteroid might hit it before it builds defences, it might be in a simulation that gets shut-down... I expect the presence of rogue behaviour to depend on the relation between learning algorithm and the learned data, though. Suppose the learning algorithm builds up the intelligence by adjusting data in some Turing-complete representation, e.g. adjusting weight in a sufficiently advanced neural network which can have the weights set up so that the network is intelligent. Then the code that adjusts said parameters is not really part of the AI - it's here for bootstrapping purposes, essentially, and the AI implemented in the neural network should not want to press the reward button unless it wants to self modify in precisely the way in which the reward modifies it. What I expect is gradual progress, settling on the approaches and parameters that make it easy to teach the AI to do things, gradually improving how AI learns, etc. You need to keep in mind that there's a very powerful well trained neural network on one side of the teaching process, actively trying to force it's values into a fairly blank network on the other side, which to begin with probably doesn't even run in the real-time. Expecting the latter to hack into the former, and not vice versa, strikes me as magical, scifi type thinking. Just because it is on computer doesn't grant it superpowers.
That might be true for taping the button down or doing something analogous in software; in that case it'd still be evaluating expected button presses, it's just that most of the numbers would be very large (and effectively useless from a training perspective). But more sophisticated means of hacking its reward function would effectively lobotomize it: if a pure reinforcement learner's reward function returns MAXINT on every input, it has no way of planning or evaluating actions against each other. Those more sophisticated means are also subjectively more rewarding as far as the agent's concerned.
Ah, really? Oh, right, because current pure reinforcement learners have no self-model, and thus an anvil on their own head might seem very rewarding. Well, consider my statement modified: current pure reinforcement learners are Unfriendly, but stupid enough that we'll have a way to kill them, which they will want us to enact.
A self-model might help, but it might not. It depends on the details of how it plans and how time discounting and uncertainty get factored in. That comes at the stage before the agent inserts a jump-to-register or modifies its defaults or whatever it ends up doing, though. Once it does that, it can't plan no matter how good of a self-model it had before. The reward function isn't a component of the planning system in a reinforcement learner; it is the planning system. No reward gradient, no planning. (Early versions of EURISKO allegedly ran into this problem. The maintainer eventually ended up walling off the reward function from self-modification -- a measure that a sufficiently smart AI would presumably be able to work around.)
Thanks for explaining that! Really. For one thing, it clarified a bunch of things I'd been wondering about learning architectures, the evolution of complicated psychologies like ours, and the universe at large. (Yeah, I wish my Machine Learning course had covered reinforcement learners and active environments, but apparently active environments means AI whereas passive learning means ML. Oh well.) For instance, I now have a clear answer to the question: why would a value architecture more complex than reinforcement learning evolve in the first place? Answer: because pure reinforcement learning falls into a self-destructive bliss-out attractor. Therefore, even if it's computationally (and therefore physically/biologically) more simple, it will get eliminated by natural selection very quickly. Neat!
Well, this is limited by the agent's ability to hack its reward system, and most natural agents are less than perfect in that respect. I think the answer to "why aren't we all pure reinforcement learners?" is a little less clean than you suggest; it probably has something to do with the layers of reflexive and semi-reflexive agency our GI architecture is built on, and something to do with the fact that we have multiple reward channels (another symptom of messy ad-hoc evolution), and something to do with the bounds on our ability to anticipate future rewards. Even so, it's not perfect. Heroin addicts do exist.
True true. However, a reality in which pure reinforcement learners self-destruct from blissing out remains simpler than one in which a sufficiently good reinforcement learner goes FOOM and takes over the universe.
If autodidacts are excessively concerned, then why would it be worth for experts to listen to them?
It may not be. I was not taking issue with the claim "Experts need not listen to autodidacts." I was taking issue with the claim "Given a real risk, experts are more likely to be concerned than autodidacts are."
I would assume that experts are likely to be concerned to an extent more appropriate to the severity of the risk than autodidacts are. There can be exceptions, of course, but when non-experts make widely more extreme claims than experts do on some issue, especially a strongly emotively charged issue (e.g. the End of the World), unless they can present really compelling evidence and arguments, Dunning–Kruger effect seems to be the most likely explanation.
That is exactly what I would assume too. Autodidacts' risk estimates should be worse than experts'. It does not follow that autodidacts' risk estimates should be milder than experts', though. The latter claim is what I meant to contest.
"Autodidacts" was in quotes for a reason. Let's talk about some woo that you're not interested in. E.g. health risks of thymerosal and vaccines in general. Who's more likely to notice it, some self proclaimed "autodidacts", or normal biochemistry experts? Who noticed the possibility of a nuke, back-then conspiracy theorists or scientists? Was Semmelweis some weird outsider, or was he a regular medical doctor with medical training? And so on and so forth. Right now, experts are concerned with things like nuclear war, run-away methane releases, epidemics, and so on, while various self proclaimed existential risk people (mostly philosophers) seem to be to greater or lesser extent neglecting said risks in favor of movie plot dangers such as runaway self improving AI or perhaps totalitarian world government. (Of course if you listen to said x-risk folks, they're going to tell you that it's because the real experts are wrong.)
All are good and relevant examples, and they all support the claim in question. Thanks! But your second paragraph supports the opposite claim. (Again, the claim in question is: Experts are more likely to be concerned over risks than autodidacts are.) In the second paragraph, you give a couple "movie plot" risks, and note that autodidacts are more concerned about them than experts are. Those would therefore be cases of autodidacts being more concerned about risks than experts, right? If the claim were "Experts have more realistic risk estimates than autodidacts do," then I would readily agree. But you seem to have claimed that autodidacts' risk estimates aren't just wrong--they are biased downward. Is that indeed what you meant to claim, or have I misunderstood you?
What I said was that "autodidacts" (note the scare quotes) are more likely to fail to notice some genuine risk, than the experts are. E.g. if there's some one specific medication that poses risk for a reason X, those anti vaxers are extremely unlikely to spot that, due to the lack of necessary knowledge and skills. By "autodidacts" in scare quotes I mean interested and somewhat erudite laymen who may have read a lot of books but clearly did very few exercises from university textbooks (edit: or any other feedback providing exercises at all).
1. I understand the scare quotes. 2. I agree that autodidacts "are more likely to fail to notice some genuine risk, than experts are." 3. But autodidacts are also more likely to exaggerate other genuine risks than experts are, are they not? 4. If (3) is true, then doesn't that undermine the claim "Experts are more likely to be concerned over risks than autodidacts are"?
What I said was: Besides, being more concerned is not the same as being more likely to be concerned. Just as being prone to panic doesn't automatically make you better at hearing danger.
True, and I see that this distinction undercuts one of the ways there could be more autodidact concern than expert concern. But there is at least one more way, which I suggested earlier. Imagine a world populated by a hundred experts, a hundred autodidacts, and a risk. Let E be the number of experts concerned about the risk, and A be the number of concerned autodidacts. I interpret you as saying that E is greater than A. Is this a correct interpretation? To the claim that E > A, I am saying "not necessarily." Here is how. Since the risk is a genuine risk, we assume that nearly all the experts are concerned. So we set E = 95. Now suppose those without formal training all suffer from the same common pitfalls, and so tend to make errors in the same direction. Suppose that due to these errors, autodidacts with their little learning are even more likely to be concerned. If they were all better trained, they would all relax a bit, and some would relax enough to cross the line into "not concerned" territory. The above scenario seems perfectly plausible to me; is there some problem with it that I have missed? Does it miss the point? It is not the most likely scenario, but it's far from impossible, and you seem to have cavalierly ruled it out. Hence my original request for a source.
Seems highly unlikely for some risk the properties of which you don't get to choose. Therefore in no way contradicts the assertion that experts are more likely to become aware of risks. To large extent everyone is an autodidact, without scare quotes - a lot of learning is done on your own even if you are attending an university. It's just that some people skip exercises and mistake popularization books for learning material, and so on. Those aren't more likely to make correct inferences, precisely due to their lack of training in drawing inferences. edit: and of course there are people who were not able to attend an university, despite intelligence and inclinations towards education, due to factors such as poverty, disability, etc. Some of them manage to learn properly on their own. Those have their work to show for it, various achievements in technical fields, and so on. I wouldn't put scare quotes around those. And the brightest aren't going to ignore someone just because they don't have PhD, or listen to someone just because they do.
OK, so maybe this turns on how likely "likely" is? Edit: fixed quotation marks
Well, one can always make some unlikely circumstances where something generally unlikely is likely. E.g. it's unlikely to roll 10 sixes in the row with this die. You can postulate we're living in a simulator set up so that the die would have 99% probability of rolling 10 sixes, that doesn't actually make this die likely to roll 10 sixes in the row if its unlikely that we are living in such a simulator. This is just moving improbability around.
Yes, that's true. So, is that what I was doing all along? It sure looks like it. Oops. Sorry for taking so long to change my mind, and thanks for your persistence and patience.

Comment by Juergen Schmidhuber:

Our former PhD student Shane Legg is co-founder of deepmind (with Demis Hassabis and Mustafa Suleyman), just acquired by Google for ~$500m. Several additional ex-members of the Swiss AI Lab IDSIA have joined deepmind, including Daan Wierstra, Tom Schaul, Alex Graves.


Yes, or in other words, these are the competent AGI researchers.

Upvoted for writing style.

I'm quite happy to hear that, but it's not very useful advice. I'm not an AIXI agent, so I can't deduce what's being praised solely from the fact that it is praised.
You can, however, glean information about how to write, particularly given that the reasoning was made explicit. That probably has more actual practical value for just about all readers.

Peter Norvig is at least in principle aware of some of the issues; see e.g. this article about the current edition of Norvig&Russell's AIAMA (which mentions a few distinct way in which AI could have very bad consequences and cites Yudkowsky and Omohundro).

I don't know what Google's attitude is to these things, but if it's bad then either they aren't listening to Peter Norvig or they have what they think are strong counterarguments, and in either case an outsider having a polite word is unlikely to make a big difference.

Peter Norving was a resident at Hacker School while I was there, and we had a brief discussion about existential risks from AI. He basically told me that he predicts AI won't surpass humans in intelligence by so much that we won't be able to coerce it into not ruining everything. It was pretty surprising, if that is what he actually believes.
My guess is that most people at Google, who are working on AI, take those risks somewhat seriously (i.e. less seriously than MIRI, but still acknowledge them) but think that the best way to mitigate risks associated with AGI is to research AGI itself, because the problems are intertwined.

Microsoft seems to focus on AI as well:

Q: You are in charge of more than 1000 research labs around the world.

What kind of thing are you focusing on?

Microsoft: A big focus right now, really on point for this segment, is artificial intelligence.

We have been very focused.

It is our largest investment area right now.

..has someone had a polite word with them about not killing all humans by sheer accident?

If you believe this, Deepmind had to push for an ethics board which suggests that people are mentioning it to Google and that Google is not taking the issue too seriously.

That interpretation seems tenuous. The sentence in which "pushed" is used:

The DeepMind-Google ethics board, which DeepMind pushed for, will devise rules for how Google can and can't use the technology.

suggests nothing more than that the proposal originated with DeepMind. I may as well imagine that Google's apparent amenability to the arrangement augurs well.

(Unless the article goes on to explain further? Not a subscriber.)

I'd expect their ethics board will in any case not be about the risk of killing all humans, but about things like privacy issues and the more mundane safety issues that you get when you connect a machine learning thing to a robot (or a car).
I'm sure Google will be right on that.

Well, someone's gotta do it.


A Weyland-Yutani style outcome is a far bigger risk. EDIT: Does this mean anti-trust laws probably should've hit them a long time ago?

Should've, sure. Didn't. And won't, in all likelihood. Google is very, very rich, influential, and popular with the public, so the chances of them getting taken down a notch legally (or in pretty much any other way) are low.

I was somewhat concerned when Google hired Kurzweil because he comes across as very Pollyanna-ish in his popular writings.

Now they're buying a company founded by the guy who created this game.

/sigh Yet another game I could play in my Copious Free Time. I really need to figure out how to make my morning routine more efficient so I don't end up distracted by the internet when I'm lacking a hard deadline, thus recovering a few hours a day of spare time.
The predictions in his popular writings have been pretty off base. More unsettling is the way he twists the words around to pretend they're accurate.
I'm most worried about the fact that Kurzweil argued that AGI would be no threat to humans because we would "merge with the machines". He always left vague how he knew that would happen, and how he knew that would stop AI from being a threat.
Agreed, especially since, from what I’ve seen, Kurzweil’s reason for being so sanguine about Global Warming is exponential growth. He doesn’t seem to reflect on the problems that Global Warming is causing right now, or that the growth in renewables has come in a large part because of people who are concerned. And the idea that we shouldn’t worry isn’t reassuring when it comes from someone who’s predictions of the future have mostly been incorrect. This is a man who stands by his predictions that by 2009, human musicians and cybernetic musicians would routinely play music together and that most text would come from voice recognition software, not keyboards. Anyone that takes him seriously should re-read that chapter with predictions for 2009 (which talks about 3D entertainment rooms, the growing popularity of computer authors, 3D art coming from computer artists being displayed on screens hung up on people’s houses, nanobots that think for themselves, the growing industry of creating the personalities for the artificial personas we routinely communicate with, etc.) and keep in mind that Kurzweil says his predictions were mostly accurate.

New to LessWrong?