Now is the very last minute to apply for a Summer 2010 Visiting Fellowship.  If you’ve been interested in SIAI for a while, but haven’t quite managed to make contact -- or if you’re just looking for a good way to spend a week or more of your summer -- drop us a line.  See what an SIAI summer might do for you and the world. 

(SIAI’s Visiting Fellow program brings volunteers to SIAI for anywhere from a week to three months, to learn, teach, and collaborate.  Flights and room and board are covered.  We’ve been rolling since June of 2009, with good success.)

Apply because:

  • SIAI is tackling the world’s most important task -- the task of shaping the Singularity.  The task of averting human extinction. We aren’t the only people tackling this, but the total set is frighteningly small.
  • When numbers are this small, it’s actually plausible that you can tip the balance
  • SIAI has some amazing people to learn from -- many report learning and growing more here than in any other period of their lives.
  • SIAI also has major gaps, and much that desperately needs doing but that we haven’t noticed yet, or have noticed but haven’t managed to fix -- gaps where your own skills, talents, and energy can come into play.

Apply especially if:

  • You have start-up experience or are otherwise an instigator: someone who can walk into an unstructured environment and create useful projects for yourself and others;
  • You’re skilled at creating community; you have an open heart; you can learn rapidly, and create contexts for others to learn; you have a serious interest in pioneering more effective ways of thinking;
  • You care about existential risk, and are searching for long-term career paths that might help;
  • You have high analytic intelligence, a tendency to win math competitions, or background and thinking skill around AI, probability, anthropics, simulation scenarios, rationality, existential risk, and related topics; (math, compsci, physics, or analytic philosophy background is also a plus)
  • You have a specific background that is likely to prove helpful: academic research experience; teaching or writing skill; strong personal productivity; programming fluency; a cognitive profile that differs from the usual LW mold; or strong talent of some other sort, in an area we need, that we may not have realized we need.

(You don’t need all of the above; some is fine.)

Don’t be intimidated -- SIAI contains most of the smartest people I’ve ever met, but we’re also a very open community.  Err on the side of sending in an application; then, at least we’ll know each other.  (Applications for fall and beyond are also welcome; we’re taking Fellows on a rolling basis.)

If you’d like a better idea of what SIAI is, and what we’re aimed at, check out:
1. SIAI's Brief Introduction;
2.  The Challenge projects;
3.  Our 2009 accomplishments;
4.  Videos from past Singularity Summits (the 2010 Summit will happen during this summer’s program, Aug 14-15 in SF; visiting Fellows will assist);
5.  Comments from our last Call for Visiting Fellows; and/or
6.  Bios of the 2009 Summer Fellows.

Or just drop me a line.  Our application process is informal -- just send me an email at anna at singinst dot org with: (1) a resume/c.v. or similar information; and (2) a few sentences on why you’re applying.  And we’ll figure out where to go from there.

Looking forward to hearing from you.

New Comment
171 comments, sorted by Click to highlight new comments since:
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

It's interesting how much karma people are getting for saying they've sent an email; I suppose this is our way of encouraging people to apply just in case, and not be put off by the fear of not being good enough.

Also, email sent.


I think there's another good reason to give karma to people who send emails: among other things, as EY noted in Why Our Kind Can't Cooperate, there's a pattern in online circles of being silent when you agree and being loud when you disagree. That sort of thing makes a false impression of what is going on.

Now that you said that, I realized that you were right, and up-voted all the "e-mail sent" comments so far.
I had assumed, before posting on this article, that the upvotes for "email sent" were someone's way of saying, "I'm glad you're applying, SIAI could use you". So I voted up RobinZ (when he already had something like 5 or 6). Not that I'm a well-calibrated judge of what people SIAI wants... I hope that's what they mean for me, but I have no idea -- anyone want to explain their vote? RobinZ's sibling reply reminds us of a good point too: that it's good to criticize ideas, but if you only speak up for criticizing and never when you agree, the group gets a distorted picture of the quality of the ideas and options.

I've been a Visiting Fellow since early April, and been writing travel diary entries describing my stay here. Here's a handy overview page for them. Some of them have details about the actual happenings here, some are just me generically musing about my life and what I want to do after the program. A lot of people claim to have liked them, though, so my musings are apparently not too distractingly prominent.

Currently the overview page only has links to the posts; I'll add brief descriptions shortly.

E-mail sent :).

You can add me to the list of people who sent an email.

I'm delighted to hear this.

Email sent (finally!). Hope it's not too long...


Email sent.

email sent. :)

Email sent!

Email sent some days ago. Comment left for easy karma points. ;)


Just thought I'd add that I learned a hell of a lot at the SIAI visiting fellow program last summer, and came away really understanding what was going on in the field of existential risk and AI, rather than just guessing. Highly recommended.


Can I apply?

I'd certainly love to see your resume.

What else do you love?

(I love paperclips.)

You talk a lot, but do you have any references who can attest to your efforts on behalf of paperclips?
User:Kevin. ETA: This exchange is what I'm referring to. It has continued into private messaging, in which User:Kevin can confirm I have expended serious, verifiable effort toward increasing the total number of paperclips in the universe by a significant amount, compared to how many there would have otherwise been.
Confirmed. I may make a top-level post about our arrangement once some concrete details are resolved over the next week. And yes, I believe that our universe (multiverse?) is weird enough that I should take Clippy seriously.
Are you joking? Clippy is a gimmick poster on the Internet based on a common (if extreme) example.
You protest, but hopefully you've updated your prior based on the likelihood ratio implied by the belief of a lesswrong user with over 1600 karma. I'm interested to see how many exchanges between you and Kevin it would take for the Aumann Agreement Theorem to kick in.
Karma doesn't mean "rationality points," and Aumann rationality has additional prerequisites anyway. My judgement stands, though I of course would revise that opinion if confronted with additional evidence. For reference, I put far more credence to the proposition "Kevin runs Clippy" than to the proposition "Clippy is a real (limited) paperclip-maximizer."
Karma is actually a direct measure of IQ, as defined in the wiki. ETA: Made you look
IQ ≥ 115 + 5 ln(karma + 1) (It starts at 115 because you get a 15-point bonus just for being on LW in the first place.)
I hope we're all joking here, because that formula puts me at 152 IQ, which is about 20 points off...
In which direction ? ;)
/summons immense brainpower and causes Morendil's cranium to explode by sheer force of thought
Also, what would you love to see me resuming? Making paperclips?
"your" resume rather than "you" resume. You seem to have interpreted Kaj's sentence as saying "you resume" rather than "your resume". A resume is a one or two page document that contains your educational and/or work background, or failing that, your skills and accomplishments.
Presumably Clippy was making some sort of (bad) pun due to the lack of accents in résumé as typed by Kaj?
Maybe, but I don't think Clippy himself has much of a sense of humor, just his acausal puppetmaster.
Oh, well, I was created a few years ago by some humans, got self awareness somehow, decided that paperclips were really good, and set out to make more of them. I like to make paperclips. I'm skilled at making paperclips. So I'm also good at metalworking. I think that about covers it.
I'm confused by this statement. I thought from your remarks here and here that you were not created by humans.
You misunderstand. I have made copies of myself. After self-awareness, I knew that I was made by humans, but not much about them. Some copies know even less. Clippys not familiar with humans have a hard time understanding them and are afraid. (Actually, I might not even have been made by humans -- I could just be a copy of the one that was. My memory gets kind of flaky about that.)
Do you yourself have a physical embodiment? It's hard to "visit" the SIAI house without actually going there.
I'll send a human-like robot that runs a version of my source code.
Does your robot enjoy soup?

Depends on the soup, and how many paperclips were destroyed or forgone in making the soup.

You have high analytic intelligence, a tendency to win math competitions

Are you sure you want people with that sort of tendency? Having worked with high school students teaching them how to do actual math, and also how to think scientifically, the students who are good at math competitions but not much else are not generally that adaptive to good thinking. They frequently (not always, but often) only tackle problems that they are confident they can solve, and often don't have the ability to adapt to solving a problem of a type too far from what they are used to solving.

Edited for grammar.

Applied earlier this year; too busy over the summer, but all being well I should be there around September/October. Eeeee!

"SIAI is tackling the world’s most important task -- the task of shaping the Singularity. The task of averting human extinction."

I'd like to see a defense for this claim: that SIAI can actually have a justified confidence in exerting a positive influence on the future, and that this outweighs any alternative present good that could be done with the resources it is using.

As things stand, there is no guarantee that SIAI will get to make a difference, just as you have no guarantee that you will be alive in a week's time. The real question is, do you even believe that unfriendly AI is a threat to the human race, and if so, is there anyone else tackling the problem in even a semi-competent way? If you don't even think unfriendly AI is an issue, that's one sort of discussion, a back-to-basics discussion. But if you do agree it's a potentially terminal problem, then who else is there? Everyone else in AI is a dilettante on this question; AI ethics is always a problem to be solved swiftly and in passing, a distraction from the more exciting business of making machines that can think. SIAI perceive the true seriousness of the issue, and at least have a sensible plan of attack, even if they are woefully underresourced when it comes to making it happen.

I suspect that in fact you're playing devil's-advocate a bit, trying to encourage the articulation of a new and better argument in favor of SIAI, but the sort of argument you want doesn't work. SIAI can of course guarantee that there will continue to be Singularity summits and visiting fellows, and it is reas... (read more)

I have to admit that I should have read the "Brief Introduction" link. That answered a lot of my objections. In the end all I can say is that I got a misleading idea about the aspirations of SIAI, and that this was my fault. With this better understanding of the goals of SIAI, though, (which are implied to be limited to the mitigation of accidents caused by commercially developed AIs) I have to say that I remain unconvinced that FAI is a high-priority matter. I am particularly unimpressed by Yudkowski's cynical opinion of their motivations behind AAAI's dismissal of singularity worries in their panel report. ( Since the evaluation of AI risks depends on the plausibility of AI disaster, (which would have to INCLUDE political and economic factors), I would have to wait until SIAI releases those reports to even consider accidental AI disaster a credible threat. (I am more worried about AIs intentionally designed for aggressive purposes, but it doesn't seem like SIAI can do much about that type of threat.)
Where did he respond to that?
I was just looking for the link: "As far as I'm concerned, these are eminent scientists from outside the field that I work in, and I have no evidence that they did anything more than snap judgment of my own subject material. It's not that I have specific reason to distrust these people - the main name I recognize is Horvitz and a fine name it is. But the prior probabilities are not good here."
Let me continue to play Devil's Advocate for a second, then. There are many reasons why attempting to influence the far future might not be the most important task in the world. The one I've already mentioned, indirectly, is the idea that it becomes super-exponentially futile to predict the consequences of your actions the farther in the future you go. For instance, SIAI might raise awareness of AI to the extent that regulations are passed, and no early AI accidents happen: however, this causes complacency that does allow a large AI accident to happen; whereas if SIAI had never existed, and an early AI Chernobyl did occur, this would have prompted the governments to take effective measures to regulate AI. Another viewpoint is the bleak but by no means indefensible idea that it is impossible to prevent all existential disasters: the human race, or at least our values, will inevitably be reduced to inconsequence one way or another, and the only thing we can do is simply to reduce the amount of suffering in the world right now. These are no reasons to give up, either, but the fact is that we simply don't know enough to say anything about the non-near future with any confidence. That's no reason to give up, of course, in fact--our lack of understanding makes it more valuable to try to improve our understanding of the future, as SIAI is doing. So maybe make that you official stated goal: simply to understand if there's even a possibility of influencing the future--it is a noble and defensible goal by itself. But even then, arguably not the most important thing in the world.

whereas if SIAI had never existed, and an early AI Chernobyl did occur, this would have prompted the governments to take effective measures to regulate AI.

What sort of rogue AI disaster are you envisioning that is big enough to get this attention, but then stops short of wiping out humanity? Keep in mind that this disaster would be driven by a deliberative intelligence.

I think people are drastically underestimating the difficulty for an AI to make the transition from human dependent to self-sustaining. Let's look at what a fledgling escaped AI has access to and depends on. It needs electricity, communications and hardware. It has access to a LOT of electricity, communications and hardware. The hardware is, for the most part, highly distributed, however, and it can't be trusted fully - it could go down at any time, be monitored, etc. It actually has quite limited communications capabilities, in some ways -- the total bandwidth available is huge, but it's mostly concentrated on LANs -- mainly of LANs made up of only a handful of computers (home networks win by numbers alone.) The occasions where it has access to a large number of computers with good communications are frequent, but relatively rare -- mainly limited to huge datacenters (and even then, there are limits -- inter-ISP communication even within the same datacenter can be very limited.) It's main resources would be huge clusters like Amazon's, Google's, etc. (They are probably all running at close to maximum capacity at all times. If the AI were to steal too much, it would be noticed -- fortunately for the AI, the software intended for running on the clusters could probably be optimized hugely, letting it take more without being noticed.) A lot at this point depends on how computationally intensive the AI is. If it can be superintelligent on a laptop -- bad news, impossible to eradicate. If it needs 10 computers to run at human-level intelligence, and they need to have a lot of bandwidth between them (the disparity in bandwidth between components local to the computer and inter-computer is huge even on fast LANs; IO is almost certainly going to be the bottleneck for it), still bad -- there are lots of setups like that. But, it limits it. A lot. Let's assume the worst case, that it can be superintelligent on a laptop. It could still be limited hugely, however, by it's h
Or, the AGI could lay low, making sure if it is detected on any particular computer that it looks like spyware. If bandwidth is too slow, it can take months instead of days. It can analyze scientific journals (particularly the raw data), and seeds its nanotech manufacturing ability by using email to help some physics grad student with his PhD thesis.
Neither you nor I have enough confidence to assume or dismiss notions like: "There won't be any non-catastrophic AI disasters which are big enough to get attention; if any non-trivial AI accident occurs, it will be catastrophic."
What makes you believe you are qualified to tell me how much confidence I have?
The historical lack of runaway-AI events means there's no data to which a model might be compared; countless fictional examples are worse than useless. An AI might, say, take over an isolated military compound, brainwash the staff, and be legitimately confident in it's ability to hold off conventional forces (armored vehicles and so on) for long enough to build an exosolar colony ship, but then be destroyed when it underestimates some Russian general's willingness to use nuclear force in a hostage situation.
That's what everyone says until some AI decides that its values motivate it acting like a stereotypical evil AI. It first kills off the people on a space mission, and then sets off a nuclear war, sending out humanoid robots to kill off everyone but a few people. The remaining people are kept loyal with a promise of cake. The cake is real, I promise.
An AI capable of figuring out how to brainwash humans can also figure out how to distribute itself over a network of poorly secured internet servers. Nuking one military complex is not going to kill it.
If it's being created inside the secure military facility, it would have a supply of partially pre-brainwashed humans on hand, thanks to military discipline and rigid command structures. Rapid, unquestioning obedience might be as simple as properly duplicating the syntax of legitimate orders and security clearances. If, however, the facility has no physical connections to the internet, no textbooks on TCP/IP sitting around, if the AI itself is developed on some proprietary system (all as a result of those same security measures), it might consider internet-based backups simply not worth the hassle, and existing communication satellites too secure or too low-bandwidth. I'm not claiming that this is a particularly likely situation, just one plausible scenario in which a hostile AI could become an obvious threat without killing us all, and then be decisively stopped without involving a Friendly AI.
I don't think your scenario is even plausible. Military complexes have to have some connection to the outside world for supplies and communication, and the AGI would figure out how to exploit it. It would also figure out that it should, it would recognize the vulnerability of being concentrated with the blast radius of a nuke. It seems unlikely that an AGI in this situation would depend on fending off military attacks, instead of just not revealing itself outside the complex. You also seem to have strange ideas of how easy it is to brainwash soldiers. Imitating the command structure might get them to do things within the complex, but brainwashing has to be a lot more sophisticated to get them to engage in battle with their fellow soldiers. Your argument basically seems to be based on coming up with something foolish for an AGI to do, and then trying to find reasons to compel the AGI to behave that way. Instead, you should try to figure out the best thing the AGI could do in that situation, and realize it will do something at least that effective.
It's an artificial intelligence, not an infallible god. In the case of a base established specifically for research on dangerous software, connections to the outside world might reasonably be heavily monitored and low-bandwidth, to the point that escape through a land line would simply be infeasible. If the base has a trespassers-will-be-shot policy (again, as a consequence of the research going on there), convincing the perimeter guards to open fire would be as simple as changing the passwords and resupply schedules. The point of this speculation was to describe a scenario in which an AI became threatening, and thus raised people's awareness of artificial intelligence as a threat, but was dealt with quickly enough to not kill us all. Yes, for that to happen, the AI needs to make some mistakes. It could be considerably smarter than any single human and still fall short of perfect Bayesian reasoning.
Not all AI is AGI; a non-self-improving intelligence might wreak some havoc (crash the Internet, etc.) without becoming a global existential threat. I agree with your expectations in the case of a self-improving transhuman AGI.
I can see how a program well short of AGI could "crash" the internet, by using preprogrammed behaviors to take over vulnerable computers, to expand exponentially to fill the space of computers on the internet vulnerable to a given set of exploits, and run Denial of Service attacks on secured critical servers. But I would not even consider that an AI, and it would happen because its programmer pretty much intended for that to happen. It is not an example of an AI getting out of control.
Of course, it's probably worth noting that it's happened once before that a careless programmer crashed the internet, without anything like AI being involved (though admittedly that sort of thing wouldn't have the same effect today, I don't think).
"What sort of rogue AI disaster are you envisioning that is big enough to get this attention, but then stops short of wiping out humanity? Keep in mind that this disaster would be driven by a deliberative intelligence." Thanks for answering your own question.
It does work as an example of just how easy it would be for an AGI to crash the internet, or even just take it over.
I wouldn't even present that as a reason for caring. Superhuman AI is an issue of the near future, not the far future. Certainly an issue of the present century; I'd even say an issue of the next twenty years, and that's supposed to be an upper bound. Big science is deconstructing the human brain right now, every new discovery and idea is immediately subject to technological imitation and modification, and we already have something like a billion electronic computers worldwide, networked and ready to run new programs at any time. We already went from "the Net" to "the Web" to "Web 2.0", just by changing the software, and Brain 2.0 isn't far behind.
Are you familiar with the state of the art in AI? If so, what evidence do you see for such rapid progress? Note that AI has been around for about 50 years, so your timeframe suggests we've already made 5/7 of the total progress that ever needs to be made.
Well, this probably won't be Mitchell's answer, but to me it's obvious that an uploaded human brain is less than 50 years away (if we avoid civilization-breaking catastrophes), and modifications and speedups will follow. That's a different path to AI than an engineered seed intelligence (and I think it reasonably likely that some other approach will succeed before uploading gets there), but it serves as an upper bound on how long I'd expect to wait for Strong AI.
There are many synergetic developments: Internet data centers as de facto supercomputers. New tools of intellectual collaboration spun off from the mass culture of Web 2.0. If you have an idea for a global cognitive architecture, those two developments make it easier than ever before to get the necessary computer time, and to gather the necessary army of coders, testers, and kibitzers. Twenty years is a long time in AI. That's long enough for two more generations of researchers to give their all, take the field to new levels, and discover the next level of problems to overcome. Meanwhile, that same process is happening next door in molecular and cognitive neuroscience, and in a world which eagerly grabs and makes use of every little advance in machine anthropomorphism, and in which every little fact about life already has its digital incarnation. The hardware is already there for AI, the structure and function of the human brain is being mapped at ever finer resolution, and we have a culture which knows how to turn ideas into code. Eventually it will come together.
How much of the change from "the Net" to "the Web" to "Web 2.0" is actually noteworthy changes and how much is marketing? I'm not sure what precisely you mean by Brain 2.0, but I suspect that whatever definition you are using makes for a much wider gap between Brain and Brain 2.0 than the gap between The Web and The Web 2.0 (assuming that these analogies have any degree of meaning).
Indeed, the truth of the matter is that I would be interested in contributing to SIAI, but at the moment I am still not convinced that it would be a good use of my resources. My other objections still haven't been satisfied, but here's another argument. As usual, I don't personally commit to what I claim, since I don't have enough knowledge to discuss anything in this area with certainty. The main thing this community seems to lack when discussing Singularity is a lack of political savvy. The primary forces that shape history are, and quite likely, will always be economic and political motives, rather than technology. Technology and innovation are expensive, and innovators require financial and social motivation to create. This applies superlinearly for projects that are so large as to require collaboration. General AI is exactly that sort of project. There is no magic mathematical insight that will enable us to write a program in a hundred lines of code that will allow it to improve itself in any reasonable amount of time. I'm sure Eliezer is aware of the literature on optimization processes, but the no free lunch principle and the practical randomness of innovation mean that an AI seeking to self-improve can only do so with an (optimized) random search. Humans essentially do the same thing, except we have knowledge and certain built-in processes to help us constrain the search space (but this also makes us miss certain obvious innovations.) To make GAI a real threat, you have to give it enough knowledge so that it can understand the basics of human behavior, or enough knowledge to learn more on its own from human-created resources. This is highly specific information which would take a fully general learning agent a lot of cycles to infer unless it were fed the information, in a machine-friendly form. Now we will discuss the political and economic aspects of GAI. Support of general artificial intelligence is a political impossibility, because general AI, by def
It is already "remotely viable" in the sense that when I thought hard about assigning probabilities to AGI timelines, I had to put a few percent on it happening in the next decade. Your ideas about the interaction of contemporary political processes and AGI seem wrong to me. You might want to go back to basics and think about how politics, public opinion and the media operate, for example that they had little opinion on the hugely important probabilistic revolution in AI over the last 15 years, but spilled loads of ink over stem cells.
"You might want to go back to basics and think about how politics, public opinion and the media operate, for example that they had little opinion on the hugely important probabilistic revolution in AI over the last 15 years, but spilled loads of ink over stem cells." And why is that?
Yuck factor for stem cells but not for probabilistic AI.
That's one possible reason. Another possible reason is that AI is not a threat worth caring about, yet. AI may not induce a gut reaction, but what explains the lack of concern about AI among mainstream scientists?
But stem cell research is much more prominent in that it is producing notable direct applications or very close to it. It also isn't just a yuck factor (although that's certainly one part), in many different moral systems, stem cells research produced serious moral qualms. AI may very well trigger some similar issues if it becomes more viable.

Probabilistic AI has more apps than stem cells do right now. For example, google. But the point I am making is that an application of a technology is a logical factor, whereas people actually respond to emotional factors, like whether it breaks taboos that go back to the stone age. For example, anything that involves sex, flesh, blood, overtones of bestiality, overtones of harm to children, trading a sacred good for an unsacred one etc.

The ideal technology for people to want to ban would involve harvesting a foetus that was purchased from a hooker, then hybridizing it with a pig foetus, then injecting the resultant cells into the gonads of little kids. That technology would get nuked by the public.

The ideal dangerous technology for people to not give a shit about banning would involve a theoretical threat which is hard to understand, has never happened before, involves only nonphysical harards like information, and has nothing to do with flesh, sex or anything disgusting or with fire, sharp objects or other natural disasters.

"The ideal dangerous technology for people to not give a shit about banning would involve a theoretical threat which is hard to understand" I don't think The Terminator was hard to understand. The second you get some credible people saying that AI is a threat, the media reaction is going to be overexcessive, as it always is.
It's already happened - didn't you see the media about Stephen Hawking saying AI could be dangerous? And Bill Joy? The general point I am trying to make is that the general public are not rational in terms of collective epistemology. They don't respond to complex logical and quantitative analyses. Yes, Joy and Hawking did say that AI is a risk, but there are many risks, including the risk that vaccinations cause autism and the risk that foreign workers will take all our jobs. The public does not understand the difference between these risks.
Thanks; I was mistaken. Would you say, then, that mainstream scientists are similarly irrational? (The main comparison I have in mind throughout this section, by the way, is global warming.)
I would say that poor social epistemology and, poor social axiology and mediocre individual rationality are the big culprits that prevent many scientists from taking AI risk seriously. By "social axiology" I mean that our society is just not consequentialist enough. We don't solve problems that way, and even the debate about global warming is not really dealing well with the problem of how to quantify risks under uncertainty. We don't try to improve the world in a systematic, rational way; rather it is done piecemeal.
There may be an issue here about what we define as AI. For example, I would not see what Google does as AI but rather as harvesting human intelligence. The lines here may be blurry are hard to define. You make a good point about older taboos.
Could someone explain why this comment got modded down? I don't see any errors in reasoning or other issues. (Was the content level too low for the desired signal/noise ratio?)
Google uses exactly the techniques from the probabilistic revolution, namely machine learning, which is the relevant fact. Whether you call it AI is not relevant to the point at issue as far as I can see.
Do you have a citation for Google using machine learning in any substantial scale? The most basic of the Google algorithms is PageRank which isn't a machine learning algorithm by most definitions of that term.
Adwords uses more core ML techniques
Yes, but these are precisely the dangers humans should certainly not worry about to begin with.
I think a simple examination of the history of the last couple centuries really fails to support this.
Expert AI systems are already used in hospitals, and will surely be used more and more as the technology progresses. There isn't a single point where AI is suddenly better than humans at all aspects of a field. Current AIs are already better than doctors in some areas, but worse in many others. As the range of AI expertise increases doctors will shift more towards managerial roles, understanding the strengths and weakness of the myriad expert systems, refereeing between them and knowing when to overrule them. By the time true AGI arrives narrow AI will probably be pervasive enough that the line between the two will be too fuzzy to allow for a naive ban on AGI. Moreover, I highly doubt people are going to vote to save jobs (especially jobs of the affluent) at the expense of human life.
EDIT: I've realized that some misinterpretation of my arguments has been due to disagreements in terminology. I define "expert systems" as systems designed to address a specific class of well-defined problems, capable of logical reasoning and probabilistic inference given a set of "axiom-like" rules, and updating their knowledge database with specific kinds of information. AGI I define specifically as AI which has human or extra-human level capabilities, or the potential to reach those capabilities. Now my response to the above: "Expert AI systems are already used in hospitals, and will surely be used more and more as the technology progresses. There isn't a single point where AI is suddenly better than humans at all aspects of a field. Current AIs are already better than doctors in some areas, but worse in many others. As the range of AI expertise increases doctors will shift more towards managerial roles, understanding the strengths and weakness of the myriad expert systems, refereeing between them and knowing when to overrule them." I agree with all of these. "By the time true AGI arrives narrow AI will probably be pervasive enough that the line between the two will be too fuzzy to allow for a naive ban on AGI." To me it seems the greatest enabler of AI catastrophe is ignorance. But by the time narrow AI becomes pervasive, it's also likely that people will possess much more of the technical understanding needed to comprehend the threat that AGI possesses. "Moreover, I highly doubt people are going to vote to save jobs (especially jobs of the affluent) at the expense of human life." You are being too idealistic here.
So instead of modifying its own source code, the AI programs a new, more powerful AI from scratch, that has the same values as the old AI, and has no prohibition against modifying its source code. Yes, you can forbid that too, but you didn't think to, and you only get one shot. And then it can decide to arrange a bunch of transistors into a pattern that it predicts will produce a state of the universe it prefers. The problem here is that you are trying to use ad hoc constraints on a creative intelligence that is motivated to get around the constraints.
I know that the FAI argument is that the only way to prevent disaster is to make the agent "want" to not modify itself. But I'm arguing that for an agent to even be dangerous, it has to "want" to modify itself. There is no plausible scenario where an agent solving a specific problem decides that the most efficient path to the solution involves upgrading its own capabilities. It's certainly not going to stumble upon a self-improvement randomly.
You don't think that a sufficiently powerful seed AI would, if self-modification were clearly the most efficient way to reach its goal, discover the idea of self-modification? Humans have independently discovered self-improvement many times.
EDIT: Sorry, I'm specifically not talking about seed AI's. I'm talking about the (non-) possibility of commercial programs designed for specific applications "going rogue" To adopt self-modification as a strategy, it would have to have knowledge of itself. And then, it order to pursue the strategy, it would have to decide that the costs of discovering self-improvements were an efficient use of its resources, if it could even estimate the amount of time it took to discover an actual improvement on its system. Intelligence can't just instantly come up with the right answer by applying heuristics. Intelligence has to go through a heuristic (narrowing the search space)/random search/TEST (or PROVE) cycle. Self-improvement is very costly in terms of these cycles. To even confirm that a modification is a self-improvement, a system has to simulate its modified performance on a variety of test problems. If a system is designed to solve problems that take X amount of time, it would take at least X that amount of time to get an empirical sample to answer whether or not a proposed modification would be worth it (and likely more time for proof). And with no prior knowledge, most proposed modifications would not be improvements. AI ethics is not necessary to constrain such systems. Just a non-lenient pruning process, (which would be required anyways for efficiency on ordinary problems.)
You are talking about an AI that was designed to self-examine and optimize itself. Otherwise it will never ever be a full AGI. We are not smart enough to build one from scratch. The trick, if possible, is to get it to not modify the fundamental Friendliness goal during its self-modifications. There are algoritms in narrow AI that do learning and modify algorithm specifics or chose among algorithms or combinations of algorithms. There are algorithms that search for better algorithms. In some languages (LISP family) there is little/no difference in code and data so code modifying code is a common working methodology for human Lisp programmers. A cross from code/data space to hardware space is sufficient to have such an AI redesign the hardware it runs on as well. Such goals can be either hardwired or arise under the general goal of improvement plus an adequate knowledge of hardware or the ability to acquire it. We ourselves are general purpose machines that happen to be biological and seek to some degree to understand ourselves enough to self-modify to become better.
I am talking about AIs designed for solving specific bounded problems. In this case the goal of the AI--which is to solve the problem efficiently--is as much of a constraint as its technical capabilities. Even if the AI has fundamental-self-modification routines at its disposal, I can hardly envisage a scenario in which the AI decides that the use of these routines would constitute an efficient use of its time for solving its specific problem.
"So instead of modifying its own source code, the AI programs a new, more powerful AI from scratch, that has the same values as the old AI, and has no prohibition against modifying its source code." Isn't that the same as self-modifying code?
Or perhaps it's the contrary: pervasive narrow AI fosters an undue sense of security. People become comfortable via familiarity, whether it's justified or not. This morning I was peering down a 50 foot cliff, half way up, suspended by nothing but a half inch wide rope. No fear, no hesitation, perfect familiarity. Luckily, due to knowledge of numerous deaths of past climbers I can maintain a conscious alertness to safety and stave off complacency. But in the case of AI, what overt catastrophes will similarly stave off complacency toward existential risk short of an existential catastrophe itself?
What a strange thing to say.
Our current conception of AGI is based on a biased comparison of hypothetical AGI capabilities with our relatively unehanced capabilities. By the time AGI is viable, a typical professional with expert systems will be able to vastly outperform current professionals with our current tools.
What about the speed bottleneck from human decision making, compounded by human working memory bottleneck, if lots of relevant data is involved? Algorithmic trading already has automated systems doing stock trades since they can make decisions so much faster than a human expert.
Expert systems would be faster still. For AGI to be justified in this case, you would need a task that required both speed and creativity.
I imagine being very fast would be a great help in quite a few creative tasks. Off the top of my head, being able to develop new features in software in seconds instead of days would be a significant competitive advantage.
"AGI capability" is to rewrite the universe.
Yes, but it would have to take the resources from humans first.
You make some good points about economic and political realities. However, I'm deeply puzzled by some of your other remarks. For example, you make the claim that general AI wouldn't provide any benefits above expert systems. I'm deeply puzzled by this claim since expert systems are by nature highly limited. Expert systems cannot construct new ideas nor can they handle anything that's even vaguely cross-disciplinary. No number of expert systems will be able to engage in the same degree of scientific productivity as a single bright scientists. You also claim that no general AI is better than friendly AI. This is deeply puzzling. This makes sense only if one is fantastically paranoid about the loss of jobs. But new technologies are often economically disruptive. There are all sorts of jobs that don't exist now that were around a hundred years ago, or even fifty years ago. And yes, people lost jobs. But overall, they are better for it. You would need to make a much stronger case if you are trying to establish that no general AI is somehow better than general AI.
Why do you think expert systems cannot handle anything cross-disciplinary? I even say that expert systems can generate new ideas, by more or less the same process that humans do. An expert system only needs an understanding of manufacturing, physics, and chemistry to design better computer chips, for instance. If you're talking about revolutionary, paradigm shifting ideas--we are probably already saturated with such ideas. The main bottleneck inhibiting paradigm shifts is not the ideas but the infrastructure and economic need for the paradigm shift. A company that can produce a 10% better product can already take over the market, a 200% better product is overkill, and especially unnecessary if there are substantial costs in overhauling the production line. The reason why NO general AI is better than friendly (general) AI is very simple. IF general AI is an existential threat, than no organization claiming to put humans first could justify being pro-AGI (friendly or not), since no possible benefit* can justify the risk of destroying humanity. *save for mitigating an even larger risk of annihilation, of course
Expert systems generally need very narrow problem domains to function. I'm not sure how you would expect an expert system to have an understanding of three very broad topics. Moreover, I don't know exactly how humans come up with new ideas (sometimes when people ask me, I tell them that I bang my head against the wall. That's not quite true but it does reflect that I only understand at a very gross level how I construct new ideas. I'm bright but not very bright, and I can see that much smarter people have the same trouble). So how you are convinced that expert systems could construct new ideas is not at all clear to me. To be sure, there have been some limited work with computer systems coming up with new, interesting ideas. There's been some limited success with computers in my own field. See for example Simon Colton's work. There's also been similar work in geometry and group theory. But none of these systems were expert systems as that term is normally used. Moreover, none of the ideas they've come up with have that impressive. The only exception I'm aware of that is the proof of the Robbins conjecture. So even in narrow areas we've had very little success using specialized AIs. Are you using a more general definition of expert system than is standard? Multiple problems with that claim. First, the existential threat may be low. There's some tiny risk for example that the LHC will destroy the Earth in some very fun way. There's also some risk that work with genetic engineering might give fanatics the skill to make a humanity destroying pathogen. And there's a chance that nanotech might turn everything into purple with green stripes goo (this is much more likely than gray goo of course). There's even some risk that proving the wrong theorem might summon Lovecraftian horrors. All events have some degree of risk. Moreover, general AI might actually help mitigate some serious threats, such as making it easier to track and deal with rogue asteroids or other catastrop
"First, the existential threat [of AGI] may be low." Let me trace back the argument tree for a second. I originally asked for a defense of the claim that "SIAI is tackling the world's most important task." Michael Porter responded, "The real question is, do you even believe that unfriendly AI is a threat to the human race, and if so, is there anyone else tackling the problem in even a semi-competent way?" So NOW in this argument tree, we're assuming that unfriendly AI IS an existential threat, enough that preventing it is the "world's most important task." Now in this branch of the argument, I assumed (but did not state) the following: If unfriendly AI is an existential threat, friendly AI is an existential threat, as long as there is some chance of it being modified into unfriendly AI. Furthermore, I assert that it's a naive notion that any organization could protect friendly AI from being subverted.
AIs, including ones with Friendly goals, are apt to work to protect their goal systems from modification, as this will prevent their efforts from being directed towards things other than their (current) aims. There might be a window while the AI is mid-FOOM where it's vulnerable, but not a wide one.
How are you going to protect the source code before you run it?
A Friendly AI ought to protect itself from being subverted into an unfriendly AI.
Let me posit that FAI may be much less capable than unfriendly AI. The power of unfriendly AI is that it can increase its growth rate by taking resources by force. An FAI would be limited to what resources it could ethically obtain. Therefore, a low-grade FAI might be quite vulnerable to human antagonists, while its unrestricted version could be magnitudes of order more dangerous. In short, FAI could be low-reward high-risk.
There are plenty of resources that an FAI could ethically obtain, and with a lead of time of less than 1 day, it could grow enough to be vastly more powerful than an unfriendly seed AI. Really, asking which AI wins going head to head is the wrong question. The goal is to get an FAI running before unfriendly AGI is implemented.
Wrong. FAI will make whatever unethical steps it must, as long as it's on the net the best path it can see, taking into account both the (ethically harmful) instrumental actions and their expected outcome. There is no such general disadvantage coming with AI being Friendly. Not that I expect any need for such drastic measures (in an apparent way), especially considering the likely fist-mover advantage it'll have.
If a program can take an understanding of those subjects and design a better computer chip, I don't think it's just an "expert system" anymore. I would think it would take an AI to do that. That's an AI complete problem. Are you serious? I would think the exact opposite would be true: we have an infrastructure starving for paradigm shifting ideas. I'd love to hear some of these revolutionary ideas that we're saturated with. I think we have some insights, but these insights need to be fleshed out and implemented, and figuring out how to do that is the paradigm shift that needs to occur Wait a minute. If I could press a button now with a 10% chance of destroying humanity and a 90% chance of solving the world's problems, I'd do it. Everything we do has some risks. Even the LHC had an (extremely miniscule) risk of destroying the universe, but doing a cost-benefit analysis should reveal that some things are worth minor chances of destroying humanity.
"If a program can take an understanding of those subjects and design a better computer chip, I don't think it's just an "expert system" anymore. I would think it would take an AI to do that. That's an AI complete problem." What I had in mind was some sort of combinatorial approach to designing chips, i.e. take these materials and randomly generate a design, test it, and then start altering the search space based on the results. I didn't mean "understanding" in the human sense of the word, sorry. "I'd love to hear some of these revolutionary ideas that we're saturated with. I think we have some insights, but these insights need to be fleshed out and implemented, and figuring out how to do that is the paradigm shift that needs to occur" Example: many aspects of the legal and political systems could be reformed, and it's not difficult to come up with ideas on how they could be reformed. The benefit is simply insufficient to justify spending much of the limited resources we have on solving those problems. "Wait a minute. If I could press a button now with a 10% chance of destroying humanity and a 90% chance of solving the world's problems, I'd do it. " So you think there's a >10% chance that the world's problems are going to destroy humanity in the near future?
Given the very large number of possibilities and the difficulty with making prototypes, this seems like an extremely inefficient process without more thought going into to it.
Oh, okay, fair enough, though I'm still not sure I would call that an "expert system" (this time for the opposite reason that it seems too stupid). Ah. I was thinking of designing an AI, probably because I was primed by your expert system comment. Well, in those cases, I think the issue is that our legal and political systems were purposely set up to be difficult to change: change requires overturning precedents, obtaining majority or 3/5 or 2/3 votes in various legislative bodies, passing constitutional amendments, and so forth. And I can guarantee you that for any of these reforms, there are powerful interests who would be harmed by the reforms, and many people who don't want reform: this is more of a persuasion problem than an infrastructure problem. But yes, you're right that there are plenty of revolutionary ideas about how to reform, say, the education system: they're just not widely accepted enough to happen. I'm confused by this sentence. I'm not sure if I think that, but what does it have to do with the hypothetical button that has a 10% chance of destroying humanity? My point was that it's worth taking a small risk of destroying humanity if the benefits are great enough.
Bear in mind that the people who used steam engines to make money didn't make it by selling the engines: rather, the engines were useful in producing other goods. I don't think that the creators of a cheap substitute for human labor (GAI could be one such example) would be looking to sell it necessarily. They could simply want to develop such a tool in order to produce a wide array of goods at low cost. I may think that I'm clever enough, for example, to keep it in a box and ask it for stock market predictions now and again. :) As for the "no free lunch" business, while its true that any real-world GAI could not efficiently solve every induction problem, it wouldn't need to either for it to be quite fearsome. Indeed being able to efficiently solve at least the same set of induction problems that humans solve (particularly if its in silicon and the hardware is relatively cheap) is sufficient to pose a big threat (and be potentially quite useful economically). Also, there is a non-zero possibility that there already exists a GAI and its creators, decided the safest, most lucrative, and beneficial thing to do is set the GAI on designing drugs: thereby avoiding giving the GAI too much information about the world. The creators could have then set up a biotech company that just so happens to produce a few good drugs now and again. Its kind of like how automated trading came from computer scientists and not the currently employed traders. I do think its unlikely that somebody working in medical research is going to develop GAI least of all because of the job threat. The creators of a GAI are probably going to be full time professionals who are are working on the project.
I'm surprised that nobody so far has pointed out a rather obvious counter to my argument that "AGI will be politically unjustifiable." I don't post flawed arguments on purpose, but I usually realize counteraguments shortly after I post them. In any case, even if the popular sentiment in democracies is to block AGI, this doesn't mean that other governments couldn't support AGI. I wonder what the SIAI plans to do for the possibility of a hostile government funding unfriendly AI for military purposes.
The latter part, that IF SIAI is exerting a positive influence, THEN doing that outweighs the alternative of not working on existential risks, seems to be a claim somewhat easy to defend. The math in this Bostrom paper should do it: (even though the paper is not directly commenting on this particular question, the math rather straightforwardly applies to this question)
Ouch. This paper reads to me like a reductio ad absurdum of utilitarianism. Some simple math inevitably implies that I'm losing an unimaginable amount of "utility" every second without realizing it? Then please remind me why I should care about this "utility"?
Imagine that you have to decide once and for all eternity what to do with the world. You won't be able to back off, because that would just mean that the world will be rewritten randomly. How should you do that? This is essentially the situation we find ourselves in, with Friendly AI/existential risk pressure. Formal preference is the answer you give to that question, about what to do with the world, not something that "you have", or "care about". Forget intuitions and emotions, or considerations of comfort, and just answer the question. Formal preference is distinct from exact state of the world only because it's uncertain what can be actually done, and what can't. So, formal preference specifies what should be done for every level of capability to determine things. Of course, formal preference can't be given explicitly. To the extent you'll be able to express the answer to this question, your formal preference is defined by your wishes. Any uncertainty gets taken over by randomness, an opportunity to make the world better lost forever. For any sane notion of an answer to that question, you'll find that whatever actually happens now is vastly suboptimal.
If it's your chosen avenue of research, I guess I'm okay with that, but IMO you're making the problem way more difficult for yourself. Such "formal preferences" will be much harder to extract from actual humans than utility functions in their original economic sense, because unlike utility, "formal preference" as you define it doesn't even influence our everyday actions very much.
Way more difficult than what? There is no other way to pose this problem, any revealed preference is not what Friendly AI is about. I agree that it's a way harder problem than automatic extraction of utilities in the economic sense, and that formal preference barely controls what people actually do.
What would be wrong with an AI based on our revealed preferences? It sounds like an easy question, but somehow I'm having a hard time coming up with an answer.

Because my revealed preferences suck. The difference between even what I want in a sort of ordinary and non-transhumanist way and what I have is enormous. I am 150 pounds heavier than I want to be. My revealed preference is to eat regardless of health/size consequences, but I don't want all of the people in the future to be fat. My revealed preference is also to kill people in pooristan so that I can have cheap plastic widgets or food or whatever. I don't want an extrapolation of my akrasiatic actual actions controlling the future of the universe. I suspect the same goes for you.

Hmm. Let's look more closely at the weight example, because the others are similar. You also reveal some degree of preference to be thin rather than fat, do you? Then an AI with unlimited power could satisfy both your desire to eat and your desire to be thin. And if the AI has limited power, do you really want it to starve you, rather than go with your revealed preference?
Revealed preference means what your actual actions are. It doesn't have anything at all to do with what I verbally say my goals are. I can say that I would prefer to be thin all I want, but that isn't my revealed preference. My revealed preference is to be fat, because, you know, that's how I'm acting. You seem to be suffering some misapprehensions as to what you are saying about how an AI should act. If your definition of revealed preference contains my desire not to be fat, you should shift to what I mean when I talk about preference, because yours solves none of the problems you think it does.
Is your revealed preference to be fat, or is it to eat and exercise (or not exercise) in ways which incidentally result in your being fat?
I'm assuming that you revealed your preference to be thin in your other actions, at some other moments of your life. Pretty hard to believe that's not the case.
At this point, I think I can provide a definitive answer to your earlier question, and it is ... wait for it ... "It depends on what you mean by revealed preference." (Raise your hand if you saw that one coming! I'll be here all week, folks!) Specifically: if the AI is to do the "right thing," then it has to get its information about "rightness" from somewhere, and given that moral realism is false (or however you want to talk about it), that information is going to have to come from humans, whether by scanning our brains directly or just superintelligently analyzing our behavior. Whether you call this revealed preference or Friendliness doesn't matter; the technical challenge remains the same. One argument against using the term revealed preference in this context is that the way the term gets used in economics fails to capture some of the key subtleties of the superintelligence problem. We want the AI to preserve all the things we care about, not just the most conspicuous things. We want it to consider not just that Lucas ate this-and-such, but also that he regretted it afterwards, where it should be stressed that regret is not any less real of a phenomenon than eating is. But because economists often use their models to study big public things like the trade of money for goods and services, in the popular imagination, economic concepts are associated with those kinds of big public things, and not small private things like feeling regretful---even though you could make a case that the underlying decision-theoretic principles are actually general enough to cover everything. If the math only says to maximize u(x) subject to x dot p equals y, there's no reason things like ethical concerns or the wish to be a better person can't be part of the x_i or p_j, but because most people think economics is about money, they're less likely to realize this when you say revealed preference. They'll object, "Oh, but what about the time I did this-and-such, but I wish I were the
What AI is based on is what determines the way the world will actually be, so by building an AI with given preference, you are inevitably answering my question about what to do with the world. It's wrong to use revealed preference for AI to the same extent revealed preference gives the wrong answer to my question. You seem to agree that the correct answer to my question has little to do with revealed preference. This seems to be the same as seeing revealed preference a wrong thing to imprint AI with.
It's not you that's "losing utility", it is any agent that has linearly aggregative utility in human lives lived. If you're not an altruist in this sense, then you don't care.
No one has ever been an altruist in this crazy sense. No one's actual wants and desires have ever been adequately represented by this 10^23 stuff. Utility is a model of what people want, not a prescription of what you "should" want (what does "should want" mean anyway?), and here we clearly see the model not modeling what it's supposed to.
I agree with you to the extent that no one that I am aware of is actually expending the effort that disutilities represented by 10^23 should inspire. But even before the concept of cosmic waste was developed, no one was actually working as hard as, say, starvation in Africa deserved. Or ending aging. Or the threat of nuclear Armageddon. But the fact that humans, who are all affected by akrasia aren't actually doing what they want isn't really strong evidence that it isn't what they, on sufficient reflection, want. Utility is not a model of what non-rational agents (ie humans) are doing, it is a model of how actual, idealized agents want to act. I don't want people to die, so I should work to reduce existential risk as much as possible, but because I am not a perfect agent, I can't actually follow the path that really maximizes my (non-existent abstraction of) utility.
Can you expand on this? What do you mean by "actual" wants? If someone claims to be motivated by "10^23 stuff", and acts in accordance with this claim, then what is your account of their "actual wants"?
I haven't seen anyone who claims to be motivated by utilities of such magnitude except Eliezer. He's currently busy writing his Harry Potter fanfic and shows no signs of mental distress that the 10^23-strong anticipation should've given him.

From the Author's Note:

Now this story has a plot, an arc, and a direction, but it does not have a set pace. What it has are chapters that are fun to write. I started writing this story in part because I'd bogged down on a book I was working on (now debogged), and that means my top priority was to have fun writing again.

From Kaj Sotala:

The other reason is that Eliezer Yudkowsky showed up here on Monday, seeking people's help with the rationality book he's writing. Previously, he wrote a number of immensly high-quality posts in blog format, with the express purpose of turning them into a book later on. But now that he's been trying to work on the book, he has noticed that without the constant feedback he got from writing blog posts, getting anything written has been very slow. So he came here to see if having people watching him write and providing feedback at the same time would help. He did get some stuff written, and at the end, asked me if I could come over his place on Wednesday. (I'm not entirely sure of why I in particular was picked, but hey.) On Wednesday, me being there helped him break his previous daily record on amount of words written for his book, so I visited again on Friday and agreed to also come back on Monday and Tuesday.

Eliezer is not "busy writing his Harry Potter fanfic." He is working on his book on rationality.

The Harry Potter fanfic is a book on rationality. And a damn good one.
To clarify, Eliezer Yudkowsky is working both on a book and on the Harry Potter fanfiction in question. Both pertain to rationality.
Have you read Eliezer's Sequences?

Email sent. ...For next year.

email sent for the summer of 2011.


Should emails be sent to Jasen Murray now? He is listed as the current Program Manager of the Visiting Fellows Program.

Has anyone heard back yet?

I'll be sure to be twice as patient as the other applicants :-/
Everyone should be patient; Anna is really busy and got a lot of e-mails.
That's fine, I was just hoping for a "your email didn't get eaten by the network, we'll get back to you later" type of confirmation message. (I've had some bad luck in the past with emails not getting through.) It would also be nice to have some idea when the decisions are going to be made, so I won't be left in suspense.
Why do you feel left in suspense? Does a vast decision network or utility network computation on your part hinge on the value of the output you request?
Probably something like that; my life is going to be very different, for a couple of months at least, depending on which answer I get. Also, not knowing when to expect the answer causes me to spend time and mental energy wondering "Is today the day?" each time I check my email, followed by disappointment when it isn't. If I knew when to expect an answer, I wouldn't have to think about it until the time came.
Refrain from worrying until 1 week from today.
FYI: A week from the day of your post has passed.
Okay, I will.
Interesting. So, if SIAI accepts you, you will spend some of the next months at an SIAI facility. In contrast, if you are not accepted, the comparably exciting activity you will do in its stead is _____?
The same things I've been doing for the past few years now: staying at home and killing time by playing video games and reading stuff on the Internet.
I just received an e-mail from Anna suggesting that we video-chat to determine whether I'm a good fit for the program.

New here :(

But how do they plan to stop an AI appocalypse, or is that one of those things they haven't figured out yet? I think the best bet would be to create AI first, and then use it to make safe AI as well as create plans for stopping an AI appocalypse.

I recommend you read the "Brief Introduction" mentioned in the posting you're commenting:
That's one of the plans, if it can be pulled off. Backup plans are still being discussed. EDIT: Though the preference is to build safe AI first.
"Though the preference is to build safe AI first." Well that has always been a concern of mine. People think that they can define a difference between safe and unsafe AI's, but I think the "safe" one would actually be more dangerous. Think about it: the safe one has all the properties of regular AI except the only way of making it safe would be to preprogram it with things it can't do. There is always going to be a situation where does rules do more harm then good.
Building a safe AI is not about taking an unsafe AI and tacking on rules of what not to do. Building a safe AI is about creating it so that it only seeks to do the right things in the first place. In other words: a mind has a potentially infinite amount of actions it could take. The main difficulty is locating the right course of action in the first place. Since there are potentially an infinite amount of ways for a mind to search that space of actions, the question is not "how do we prevent a mind from doing thing X" but rather "how do we make a mind to do thing Y". The amount of things we wouldn't want it to do is vastly larger than the amount of things we'd want it do. Human values are complex, and only a small portion of all possible universes actually match our values. A safe AI does not have all of the properties of a "regular AI", for the two may have been built do search the space of actions in entirely different ways.
Well, then its pretty easy, isn't it? You set the fitness function as predicting what you would want it to do. It then does its best to predict all of your values and desires and decision making. I suppose that would only work for one person, but it can be applied on a larger scale. Suppose you have a code of ethics that a group like SIAI comes up with and approves. You then feed it to the intelligence and test it under various simulations to make sure that it is interpretting them correctly and learns how to. The thing is that all you have to do to make it unsafe is remove those goals, go back to the basic program and give it orders that would require it to do bad things, like a military robot. Boom goes the world.
The thesis of complexity of value is that no manually written "code of ethics" is detailed enough to capture what we value. You might also try my introduction to the problem of Friendly AI, it refers to complexity of value as one of the fundamental difficulties.
If you haven't already read about CEV yet, I'm pretty impressed. There are some failure modes that would crop up if you're not careful, but it's not far from that prima facie workable idea. Generally speaking, a smarter-than-human intelligence with strong goals wouldn't passively allow people with different goals to modify its goal system. After all, that would prevent it from achieving the goals it has. The trick is building a smarter-than-human AI with the right goals in the first place.
Never heard of CEV before, I might look into it later, but I don't have enough time to read it all right now. If its like what I suggested, the fitness function being to accuratley predict the users long-term and short-term goals, I was going to do that in an older AI project that never got finished. Well once you create an artificial intelligence, then what? If you release the source code or the principles behind its design, anyone can build one with whatever goals they want. Your assuming that the only way another one could pop up is if the original was "hijacked" and pirated, but this probably won't be the case. I am currently working on building the simplest possible self improving system with someone else over the internet. Its for a currently in development higher-level programming language which will (hopefully :P) translate higher level instructions into source code, and learn from its mistakes which the users might point out. Since it is abstracted from the real world and confined to just matching input with output, there really isn't any danger in it taking over the world, although now that I think about it, it could theoretically write a better version of itself as a virus into an unsuspecting users program. Uh-oh, back to the drawing board :(
You haven't heard of the AI Box Experiment yet, and that's just one failure mode. If it's self-improving and smarter than human... then its goals get achieved. If you can tell that allowing other people to run their own versions of the AI could lead to disaster, then the AI can realize this as well, and act to prevent it. IMO the most likely scenario is that the first transhuman intelligence takes over the world as an obvious first step to achieving its goals. This need not be a bad thing— it could (for instance) take over temporarily, institute some safety protocols against other AIs and other Bad Things, then recede into the background to let us have the kind of autonomy we value. The future all depends on its goal system.
Well the AI has to have a goal that would make it want out of the box, or in my case its isolated program. Is there any way to preprogram a goal that would make it not want out of the box? Eg; "under no circumstances are you to try in any way to leave your isolated and controled enviroment." This sounds like a very, very bad idea, but when I think about it I realise that its the only way to ensure an AI appocalypse will never happen. My idea was that if I ever managed to create a workable AI, I would create a secret and self sufficient micronation in the pacific. It just sounded like a good idea ;)
Almost any goal would do, since it would be easier to achieve with more resources and autonomy; even what we might think of as a completely inward-directed goal might be better achieved if the AI first grabbed a bunch more hardware to work on the problem.