A friend of mine is about to launch himself heavily into the realm of AI programming. The details of his approach aren't important; probabilities dictate that he is unlikely to score a major success. He's asked me for advice, however, on how to design a safe(r) AI. I've been pointing him in the right directions and sending him links to useful posts on this blog and the SIAI.

Do people here have any recommendations they'd like me to pass on? Hopefully, these may form the basis of a condensed 'warning pack' for other AI makers.

Addendum: Advice along the lines of "don't do it" is vital and good, but unlikely to be followed. Coding will nearly certainly happen; is there any way of making it less genocidally risky?

211 comments, sorted by Click to highlight new comments since: Today at 9:41 AM
New Comment
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

"And I heard a voice saying 'Give up! Give up!' And that really scared me 'cause it sounded like Ben Kenobi." (source)

Friendly AI is a humongous damn multi-genius-decade sized problem. The first step is to realize this, and the second step is to find some fellow geniuses and spend a decade or two solving it. If you're looking for a quick fix you're out of luck.

The same (albeit to a lesser degree) is fortunately also true of Artificial General Intelligence in general, which is why the hordes of would-be meddling dabblers haven't killed us all already.

This article (which I happened across today) written by Ben Goertzel should make interesting reading for a would-be AI maker. It details Ben's experience trying to build an AGI during the dot-com bubble. His startup company, Webmind, Inc., apparently had up to 130 (!) employees at its peak.

According to the article, the AGI was almost completed, and the main reason his effort failed was that the company ran out of money due to the bursting of the bubble. Together with the anthropic principle, this seems to imply that Ben is the person responsible for the stock market crash of 2000.

I was always puzzled why SIAI hired Ben Goertzel to be its research director, and this article only deepens the mystery. If Ben has done an Eliezer-style mind-change since writing that article, I think I've missed it.

ETA: Apparently Ben has recently been helping his friend Hugo de Garis build an AI at Xiamen University under a grant from the Chinese government. How do you convince someone to give up building an AGI when your own research director is essentially helping the Chinese government build one?

Ben has a Phd, can program, has written books on the subject and has some credibility. Those kinds of things can help a little if you are trying to get people to give you money in the hope of you building a superintelligent machine. For more see here [http://lesswrong.com/lw/wj/is_that_your_true_rejection/]:
I just came across an old post [http://www.sl4.org/archive/0711/17100.html] of mine that asked a similar question: From the reluctance of anyone at SIAI to answer this question, I conclude that Ben Goertzel being the Director of Research probably represents the outcome of some internal power struggle/compromise at SIAI, whose terms of resolution included the details of the conflict being kept secret. What is the right thing to do here? Should we try to force an answer out of SIAI, for example by publicly accusing it of not taking existential risk seriously? That would almost certainly hurt SIAI as a whole, but might strengthen "our" side of this conflict. Does anyone have other suggestions for how to push SIAI in a direction that we would prefer?

The short answer is that Ben and I are both convinced the other is mostly harmless.

Have you updated that in light of the fact that Ben just convinced the Chinese government to start funding AGI? (See my article link earlier in this thread.)
9Eliezer Yudkowsky12y
Hugo de Garis is around two orders of magnitude more harmless than Ben.

Update for anyone that comes across this comment: Ben Goertzel recently tweeted that he will be taking over Hugo de Garis's lab, pending paperwork approval.



What about all the other people Ben might help obtain funding for, partly due to his position at SIAI? And what about the public relations/education aspect? It's harmless that SIAI appears to not consider AI to be a serious existential risk?
This part was not answered. It may be a question to ask someone other than Eliezer. Or just ask really loudly. That sometimes works too.
5Eliezer Yudkowsky12y
The reverse seems far more likely.
I don't know how to parse that. What do you mean by "the reverse"?
Ben's position at SIAI may reduce the expected amount of funding he obtains for other existentially risky persons.
How much of this harmlessness is perceived impotence and how much is it an approximately sane way of thinking?
7Eliezer Yudkowsky12y
Wholly perceived impotence.
Do you believe the given answer [http://lesswrong.com/lw/1mm/advice_for_ai_makers/1h8n?c=1]? And if Ben is really that impotent, what do you think does it reveal about the SIAI, or whoever put Ben into a position within the SIAI?
I don't know enough about his capabilities when it comes to contributing to unfriendly AI research to answer that. Being unable to think sanely about friendliness or risks may have little bearing on your capabilities with respect to AGI research. The modes of thinking have very little bearing on each other. That they may be more rational and less idealistic than I may otherwise have guessed. There are many potential benefits the SIAI could gain from an affiliation with those inside the higher status AGI communities. Knowing who to know has many uses unrelated to knowing what to know.
Indeed. I read part of this post [http://lesswrong.com/lw/wj/is_that_your_true_rejection/] as implying that his position had at least a little bit to do with gaining status from affiliating with him ("It has similarly been a general rule with the Singularity Institute that, whatever it is we're supposed to do to be more credible, when we actually do it, nothing much changes. 'Do you do any sort of code development? I'm not interested in supporting an organization that doesn't develop code' -> OpenCog -> nothing changes. 'Eliezer Yudkowsky lacks academic credentials' -> Professor Ben Goertzel installed as Director of Research -> nothing changes.").
That's an impressive achievement! I wonder if they will be able to maintain it? I also wonder whether they will be able to distinguish those times when the objections are solid, not merely something to treat as PR concerns. There is a delicate balance to be found.
Does this suggest that founding a stealth AGI institute (to coordinate conferences, and communication between researchers) might be suited to oversee and influence potential undertakings that could lead to imminent high-risk situations? By the way, I noticed from my server logs that the Institute for Defense Analyses [http://en.wikipedia.org/wiki/Institute_for_Defense_Analyses] seems to be reading LW. They visited my homepage, referred by my LW profile. So one should think about the consequences of discussing such matters in public, respectively not doing so.

By the way, I noticed from my server logs that the Institute for Defense Analyses seems to be reading LW.

Most likely, someone working there just happens to.

Can we know how you came to that conclusion?
There is one 'mostly harmless' for people who you think will fail at AGI. There is an entirely different 'mostly harmless' for actually have a research director who tries to make AIs that could kill us all. Why would I not think the SIAI is itself an existential risk if the criteria for director recruitment is so lax? Being absolutely terrified of disaster is the kind of thing that helps ensure appropriate mechanisms to prevent defection are kept in place. Yes. The SIAI has to convince us that they are mostly harmless.
Phew...I was almost going to call bullshit on this but that would be impolite.
That is an excellent question.
And now for a truly horrible thought: I wonder to what extent we've been "saved" so far by anthropics. Okay, that's probably not the dominant effect. I mean, yeah, it's quite clear that AI is, as you note, REALLY hard. But still, I can't help but wonder just how little or much that's there.
If you think anthropics has saved us from AI many times, you ought to believe we will likely die soon, because anthropics doesn't constrain the future, only the past. Each passing year without catastrophe should weaken your faith in the anthropic explanation.
The first sentence seems obviously true to me, the second probably false. My reasoning: to make observations and update on them, I must continue to exist. Hence I expect to make the same observations & updates whether or not the anthropic explanation is true (because I won't exist to observe and update on AI extinction if it occurs), so observing a "passing year without catastrophe" actually has a likelihood ratio of one, and is not Bayesian evidence for or against the anthropic explanation.
Wouldn't the anthropic argument apply just as much in the future as it does now? The world not being destroyed is the only observable result.
The future hasn't happened yet.
Right. My point was in the future you are still going to say "wow the world hasn't been destroyed yet" even if in 99% of alternate realities it was. cousn_it said: Which shouldn't be true at all. If you can not observe a catastrophe happen, then not observing a catastrophe is not evidence for any hypothesis.
"Not observing a catastrophe" != "observing a non-catastrophe". If I'm playing russian roulette and I hear a click and survive, I see good reason to take that as extremely strong evidence that there was no bullet in the chamber.
But doesn't the anthropic argument still apply? Worlds where you survive playing russian roulette are going to be ones where there wasn't a bullet in the chamber. You should expect to hear a click when you pull the trigger.
As it stands, I expect to die (p=1/6) if I play russian roulette. I don't hear a click if I'm dead.
That's the point. You can't observe anything if you are dead, therefore any observations you make are conditional on you being alive.
Those universes where you die still exist, even if you don't observe them. If you carry your logic to its conclusion, there would be no risk to playing russian roulette, which is absurd.
The standard excuse given by those who pretend to believe in many worlds is that you are likely to get maimed in the universes where you get shot but don't die, which is somewhat unpleasant. If you come up with a more reliable way to quantum suicide, like using a nuke, they find another excuse [http://en.wikipedia.org/wiki/Quantum_suicide_and_immortality#Max_Tegmark.27s_work] .
Methinks that is still a lack of understanding, or a disagreement on utility calculations. I myself would rate the universes where I die as lower utility still than those were I get injured (indeed the lowest possible utility). Better still if in all the universes I don't die.
I do think 'a disagreement on utility calculations' may indeed be a big part of it. Are you a total utilitarian? I'm not. A big part of that comes from the fact that I don't consider two copies of myself to be intrinsically more valuable than one - perhaps instrumentally valuable, if those copies can interact, sync their experiences and cooperate, but that's another matter. With experience-syncing, I am mostly indifferent to the number of copies of myself to exist (leaving aside potential instrumental benefits), but without it I evaluate decreasing utility as the number of copies increases, as I assign zero terminal value to multiplicity but positive terminal value to the uniqueness of my identity. My brand of utilitarianism is informed substantially by these preferences. I adhere to neither average nor total utilitarianism, but I lean closer to average. Whilst I would be against the use of force to turn a population of 10 with X utility each into a population of 3 with (X + 1) utility each, I would in isolation consider the latter preferable to the former (there is no inconsistency here - my utility function simply admits information about the past).
That line of thinking leads directly to recommending immediate probabilistic suicide, or at least indifference to it. No thanks.
How so?
I'm saying that you can only observe not dying. Not that you shouldn't care about universes that you don't exist in or observe. The risk in Russian roulette is, in the worlds where you do survive you will probably be lobotomized, or drop the gun shooting someone else, etc. Ignoring that, there is no risk. As long as you don't care about universes where you die.
Ok. I find this assumption absolutely crazy, but at least I comprehend what you are saying now.
Well think of it this way. You are dead/non-existent in the vast majority of universes as it is.
How is that relevant? If I take some action that results in the death of myself in some other Everett branch, then I have killed a human being in the multiverse. Think about applying your argument to this universe. You shoot someone in the head, they die instantly, and then you say to the judge "well think of it this way: he's not around to experience this. besides, there's other worlds where I didn't shoot him, so he's not really dead!"
You can't appeal to common sense. That's the point of quantum immortality, it defies our common sense notions about death. Obviously, since we are used to assuming single-threaded universe, where death is equivalent to ceasing to exist. Of course, if you kill someone, you still cause that person pain in the vast majority of universes, as well as grieving to their family and friends. If star-trek-style teleportation was possible by creating a clone and deleting the original, is that equivalent to suicide/murder/death? If you could upload your mind to a computer but destroy your biological brain, is that suicide, and is the upload really you? Does destroying copies really matter as long as one lives on (assuming the copies don't suffer)?
You absolutely appeal to common sense on moral issues. Morality is applied common sense, in the Minsky view of "common sense" being an assortment of deductions and inferences extracted from the tangled web of my personal experiential and computational history. Morality is the result of applying that common sense knowledgebase against possible actions in a planning algorithm. Quantum "immortality" involves a sudden, unexpected, and unjustified redefinition of "death." That argument works if you buy the premise. But, I don't. If you are saying that there is no difference between painlessly, instantaneously killing someone in one branch while letting them live another, verses letting that person live in both, then I don't know how to proceed. If you're going to say that then you might as well make yourself indifferent to the arrow of time as well, in which case it doesn't matter if that person dies in all branches because he still "exists" in history. Now I no longer know what we are talking about. According to my morality, it is wrong to kill someone. The existence of other branches where that person does not die does not have even epsilon difference on my evaluation of moral choices in this world. The argument from the other side seems inconsistent to me. And yes, star trek transporters and destructive uploaders are death machines, a position [http://lesswrong.com/lw/jgd/link_consciousness_as_a_state_of_matter_max/] I've previously [http://lesswrong.com/lw/qx/timeless_identity/9txe] articulated [http://lesswrong.com/lw/iya/singularity_or_bust_full_documentary/a0fw] on lesswrong.
You are appealing to a terminal value that I do not share. I think caring about clones is absurd. As long as one copy of me lives, what difference does it make if I create and delete a thousand others? It doesn't change my experience or theirs. Nothing would change and I wouldn't even be aware of it.
From my point of view, I do not like the thought that I might be arbitrarily deleted by a clone of myself. I therefore choose to commit to not deleting clones of myself; thus preventing myself from being deleted by any clones that share that commitment.
I don't think this is quite true (it can redistribute probability between some hypotheses). But this strengthens your position rather than weakening it.
Ok, correct. Retracted: Not correct. What was I thinking? Just because you don't observe the universes where the world was destroyed, doesn't mean those universes don't exist.
That's the justification he gave me: he won't be able to make much of a difference to the subject, so he won't be generating much risk. Since he's going to do it anyway, I was wondering whether there were safer ways of doing so.

For useful-tool AI, learn stuff from statistics and machine learning before making any further moves.

For self-improving AI, just don't do it as AI, FAI is not quite an AI problem, and anyway most techniques associated with "AI" don't work for FAI. Instead, learn fundamental math and computer science, to a good level -- that's my current best in-a-few-words advice for would-be FAI researchers.

Isn't every AI potentially a self-improving AI? All it takes is for the AI to come upon the insight "hey, I can build an AI to do my job better." I guess it requires some minimum amount of intelligence for such an insight to become likely, but my point is that one doesn't necessarily have to set out to build a self-improving AI, to actually build a self-improving AI.
I'm very much out of touch with the AI scene, but I believe the key distinction is between Artificial General Intelligence, versus specialized approaches like chess-playing programs or systems that drive cars. A chess program's goal structure is strictly restricted to playing chess, but any AI with the ability to formulate arbitrary sub-goals could potentially stumble on self-improvement as a sub-goal.
Additionally, the actions that a chess AI can consider and take are limited to moving pieces on a virtual chess board, and the consequences of such actions that it considers are limited to the state of the chess game, with no model of how the outside world affects the opposing moves other than the abstract assumption that the opponent will make the best move available. The chess AI simply does not have any awareness of anything outside the chess game.
A good chess AI would not be so constrained. A history of all chess games played by the particular opponent would be quite useful. As would his psychology Is it worth me examining the tree beyond this particular move further? How long will it take me (metacognitive awareness...) relative to my time limit? Unless someone gives them such awareness, which may be useful in some situations or may just seem useful to naive developers who get their hands on more GAI research than they can safely handle.
Such a history would also contain of a list of move on a virtual chess game. If you are very naive it's unlikely that you understand the problem of AI well enough to solve it.
Today's specialized AIs have little chance of becoming self-improving, but as as specialized AIs adopt more advanced techniques (like the ones Nesov suggested), the line between specialized AIs and AGIs won't be so clear. After all, chess-playing and car-driving programs can always be implemented as AGIs with very specific and limited super-goals, so I expect that as AGI techniques advance, people working on specialized AIs will also adopt them, but perhaps without giving as much thought about the AI-foom problem.
I would think that specialization reduces the variant trees that the AI has to consider which makes it unlikely that implenting AGI techniques would help the chess playing program.
It is not clear to me that the AGI wouldn't (eventually) be able to do everything that a specialised program would (and more). After all, humans are a general intelligence and can specialise; some of us are great chess players, and if we stretch the word specialise, creating a chess AI also counts (it's a human effort to create a better optimisation process for winning chess). So I imagine an AGI, able to rewrite its own code, would at the same time be able to develop the techniques of specialised AIs, while considering broader issues that might also be of use (like taking over the world/lightcone to get more processing power for playing chess). Just like humanity making chess machines, it could discover and implement better techniques (and if it breaks out of the box, hardware), something the chess programs themselves cannot do. Or maybe I'm nuts. /layman ignoramus disclaimer/ but in that case I'd appreciate a hint at the error I'm making (besides being a layman ignoramus). :) EDIT: scary idea, but an AGI with the goal of becoming better at chess might only not kill us because chess is perhaps a problem that's generally soluble with finite resources.

Create a hardware device that would be fatal to the programmer. Allow it to be activated by a primitive action that the program could execute. Give the primitive a high apparent utility. Code the AI however he wants.

If he gets cold sweats every time he does a test run, the rest of us will probably be OK.

I suggest that working in the field of brain emulation is a way for anyone to actively contribute to safety.

If emulations come first, it won't take a miracle to save the human race; our existing systems of politics and business will generate a satisficing solution.

I figure that would be slow, ineffectual and probably more dangerous than other paths in the unlikely case that it was successful.
you think that there's something more dangerous than the human race, who can't quite decide whether global warming should be mitigated against, trying to build an AI, where you have to get the answer pretty close to perfect first time, whilst also preventing all other groups from rushing to beat you and building uFAI?
I'm not sure that is a proper sentence. I do think that we could build something more dangerous to civilization than the human race is at that time - but that seems like a rather obvious thing to think - and the fact that it is possible does not necessarily mean that it is likely.
Key Noun phrase: the human race,..., trying to build an AI, Then: {description of difficulty of said activity} I'm not sure it's proper either, but I'm sure you misparsed it.
Yay, that really helped! Roko and I don't see eye to eye on this issue. From my POV, we have had 50 years of unsuccessful attempts. That is not exactly "getting it right the first time". Google was not the first search engine, Microsoft was not the first OS maker - and Diffie–Hellman didn't invent public key crypto. Being first does not necessarily make players uncatchable - and there's a selection process at work in the mean time, that weeds out certain classes of failures. From my perspective, this is mainly a SIAI confusion. Because their funding is all oriented around the prospect of them saving the world from imminent danger, the execution of their mission apparently involves exaggerating the risks associated with that - which has the effect of stimulating funding from those who they convince that DOOM is imminent - and that the SIAI can help with averting in. Humans will most likely get the machines they want - because people will build them to sell them - and because people won't buy bad machines.

Tim, I think that what worries me is the "detailed reliable inheritance from human morals and meta-morals" bit. The worry that there will not be "detailed reliable inheritance from human morals and meta-morals" is robust to what specific way you think the future will go. Ems can break the inheritance. The first, second or fifteenth AGI system can break it. Intelligence enhancement gone wrong can break it. Any super-human "power" that doesn't explicitly preserve it will break it.

All the examples you cite differ in the substantive dimension: the failure of attempt number 1 doesn't preclude the success of attempt number two.

In the case of the future of humanity, the first failure to pass the physical representation of human morals and metamorals on to the next timeslice of the universe is game over.

The other thing to say is that there's an important sense in which most modern creatures don't value anything - except for their genetic heritage - which all living things necessarily value. Contrast with a gold-atom maximiser. That values collections of pure gold atoms. It cares about something besides the survival of its genes (which obviously it also cares about - no genes, no gold). It strives to leave something of value behind. Most modern organisms don't leave anything behind - except for things that are inherited - genes and memes. Nothing that they expect to last for long, anyway. They keep dissipating energy gradients until everything is obliterated in high-entropy soup. Those values are not very difficult to preserve - they are the default state. If ecosystems cared about creating some sort of low-entropy state somewhere, then that property would take some effort to preserve (since it is vulnerable to invasion by creatures who use that low-entropy state as fuel). However, with the current situation, there aren't really any values to preserve - except for those of the replicators concerned. The idea has been called variously: goal system zero, god's utility function, Shiva's values. Even the individual replicators aren't really valued in themselves - except by themselves. There's a parliament of genes, and any gene is expendable, on a majority vote. Genes are only potentially immortal. Over time, the representation of the original genes drops. Modern refactoring techniques will mean it will drop faster. There is not really a floor to the process - eventually, all may go.
I figure a fair amount of modern heritable information (such as morals) will not be lost. Civilization seems to be getting better at keeping and passing on records. You pretty-much have to hypothesize a breakdown of civilization for much of genuine value to be lost - an unprecedented and unlikely phenomenon. However, I expect increasing amounts of it to be preserved mostly in history books and museums as time passes. Over time, that will probably include most DNA-based creatures - including humans. Evolution is rather like a rope. Just as no strand in a rope goes from one end to the other, most genes don't tend to do that either. That doesn't mean the rope is weak, or that future creatures are not - partly - our descendants.
And how do museums lead to more paperclips?
Museums have some paperclips in them. You have to imagine future museums as dynamic things that recreate and help to visualise the past - as well as preserving artefacts.
If you were an intelligence only cared about the number of paperclips in the universe, you would not build a museum to the past, because you could make more paperclips with the resources needed to create such a museum. This is not some clever, convoluted argument. This is the same as saying that if you make your computer execute 10: GOTO 20 20: GOTO 10 then it won't at any point realize the program is "stupid" and stop looping. You could even give the computer another program which is capable of proving that the first one is an infinite loop, but it won't care, because its goal is to execute the first program.
That's a different question - and one which is poorly specified: If insufficient look-ahead is used, such an agent won't bother to remember its history - prefering instead the gratification of instant paperclips. On the other hand, if you set the look-ahead further out, it will. That's because most intelligent agents are motivated to remember the past - since only by remembering the past can they predict the future. Understanding the history of their own evolution may well help them to understand the possible forms of aliens - which might well help them avoid being obliterated by alien races (along with all the paper clips they have made so far). Important stuff - and well worth building a few museums over. Remebering the past is thus actually an proximate goal for a wide range of agents. If you want to argue paperclip-loving agents won't build museums, you need to be much more specific about which paperclip-loving agents you are talking about - because some of them will. Once you understand this you should be able to see what nonsense the "value is fragile" [http://lesswrong.com/lw/y3/value_is_fragile/] post is.
At this point, I'm only saying this to ensure you don't take any new LWers with you in your perennial folly, but your post has anthropomorphic optimism [http://lesswrong.com/lw/st/anthropomorphic_optimism/] written all over it.
This has nothing to do with anthropomorphism or optimism - it is a common drive for intelligent agents to make records of their pasts - so that they can predict the consequences of their actions in the future. Once information is lost, it is gone for good. If information might be valuable in the future, a wide range of agents will want to preserve it - to help them attain their future goals. These points do not seem particularly complicated. I hope at least that you now realise that your "loop" analogy was wrong. You can't just argue that paperclipping agents will not have preserving the past in museums as a proximate goal - since their ultimate goal involves making paperclips. There is a clear mechanism by which preserving their past in museums might help them attain that goal in the long term. A wide class of paperclipping agents who are not suffering from temporal myopia should attempt to conquer the universe before wasting precious time and resources with making any paperclips. Once the universe is securely in their hands - then they can get on with making paperclips. Otherwise they run a considerable risk of aliens - who have not been so distracted with useless trivia - eating them, and their paperclips. They will realise that they are in an alien race [http://originoflife.net/the_alien_race/] - and so they will run.
Did you make some huge transgression that I missed that is causing people to get together and downvote your comments? Edit: My question has now been answered.
I haven't downvoted, but I assume it's because he's conflating 'sees the value in storing some kinds of information' with 'will build museums'. Museums don't seem to be particularly efficient forms of data-storage, to me.
Future "museums" may not look exactly like current ones - and sure - some information will be preserved in "libraries" - which may not look exactly like current ones either - and in other ways.
'Museum' and 'library' both imply, to me at least, that the data is being made available to people who might be interested in it. In the case of a paperclipper, that seems rather unlikely - why would it keep us around, instead of turning the planet into an uninhabitable supercomputer that can more quickly consider complex paperclip-maximization strategies? The information about what we were like might still exist, but probably in the form of the paperclipper's 'personal memory' - and more likely than not, it'd be tagged as 'exploitable weaknesses of squishy things' rather than 'good patterns to reproduce', which isn't very useful to us, to say the least.
I see. We have different connotations of the word, then. For me, a museum is just a place where objects of historical interest are stored. When I talked about humans being "preserved mostly in history books and museums" - I was intending to conjour up an institution somewhat like the Jurassic park theme park. Or perhaps - looking further out - something like The Matrix. Not quite like the museum of natural history as it is today - but more like what it will turn into. Regarding the utility of existence in a museum - it may be quite a bit better than not existing at all. Regarding the reason for keeping objects of historical around - that is for much the same reason as we do today - to learn from them, and to preserve them for future generations to study. They may have better tools for analysing things with in the future. If the objects of study are destroyed, future tools will not be able to access them.
Not really, just lots of little ones involving the misuse of almost valid ideas. They get distracting.
That's pretty vague. Care to point to something specific?
The direct ancestors are perhaps not the most illustrative examples but they will do. (I downvoted them on their perceived merit completely independently of the name.)
A pathetic example, IMHO. Those were perfectly reasonable comments attempting to dispel a poster's inaccurate beliefs about the phenomenon in question.
Feel free to provide a better one. I disagree. That was what you were trying to do. You aren't a troll, you are just quite bad at thinking so your posts often get downvoted. This reduces the likelyhood that you successfully propagate positions that are unfounded. Clippy museums. Right.
Yet another vague accusation that is not worth replying to. I'm getting bored with this pointless flamewar. I can see that the mere breath of dissent causes the community to rise up in arms to nuke the dissenter. Great fun for you folk, I am sure - but I can't see any good reason for me to play along with your childish games.
It's really not. Nothing good can come of this exchange, least of all to you. People ask questions. People get answers. You included. No, you're actually just wrong and absurdly so. Clippy doesn't need you for his museum. It isn't wise for me to admit it but yes, there is a certain amount of satisfaction to be derived from direct social competition. I'm human, I'm male. I agree (without, obviously, accepting the label). You are better off sticking to your position and finding ways to have your desired influence that avoid unwanted social penalties.
Upvoted for honesty. It's far better to be aware of it than not to be. Anyhow, I think you don't really need to add anything more at this point; the thread looks properly wrapped up to me.
You got voted down because you were rational. You went over some peoples heads. These are popularity points, not rationality points.

That is something we worry about from time to time, but in this case I think the downvotes are justified. Tim Tyler has been repeating a particular form of techno-optimism for quite a while, which is fine; it's good to have contrarians around.

However, in the current thread, I don't think he's taking the critique seriously enough. It's been pointed out that he's essentially searching for reasons that even a Paperclipper would preserve everything of value to us, rather than just putting himself in Clippy's place and really asking for the most efficient way to maximize paperclips. (In particular, preserving the fine details of a civilization, let alone actual minds from it, is really too wasteful if your goal is to be prepared for a wide array of possible alien species.)

I feel (and apparently, so do others) that he's just replying with more arguments of the same kind as the ones we generally criticize, rather than finding other types of arguments or providing a case why anthropomorphic optimism doesn't apply here.

In any case, thanks for the laugh line:

You went over some peoples heads.

My analysis of Tim Tyler in this thread isn't very positive, but his replies seem quite clear to me; I'm frustrated on the meta-level rather than the object-level.

I don't think that a paperclip maximiser would "preserve everything of value to us" in the first place. What I actually said at the beginning [http://lesswrong.com/lw/1mm/advice_for_ai_makers/1gm3?c=1] was: Not everything. Things are constantly being lost. What I said here [http://lesswrong.com/lw/1mm/advice_for_ai_makers/1go9?c=1] was: We do, in fact, have detailed information about how much our own civilisation is prepared to spend on preserving its own history. We preserve many things which are millions of years old - and which take up far more resources than a human. For example, see how this museum dinosaur dwarfs the humans in the foreground [http://commons.wikimedia.org/wiki/File:Muttaburrasaurus-Dinosaur-skeleton.jpg]. We have many such exhibits - and we are still a planet-bound civilisation. Our descendants seem likely to have access to much greater resources - and so may devote a larger quantity of absolute resources to museums. So: that's the basis of my estimate. What is the basis of your estimate?
0Paul Crowley12y
I agree with your criticism, but I doubt that good will come of replying to a comment like the one you're replying to here, I'm afraid.
Fair enough; I should have replied to Tim directly, but couldn't pass up the laugh-line bit.
3Paul Crowley12y
Your use of "get together" brings to mind some sort of Less Wrong cabal who gathered to make a decision. This is of course the opposite of the truth, which is that each downvote is the result of someone reading the thread and deciding to downvote the comment. They're not necessarily uncorrelated, but "get together" is completely the wrong way to think about how these downvotes occur.
Actually, that's what I was meaning to evoke. I read his recent comments, and while I didn't agree with all of them, didn't find them to be in bad faith. I found it odd that so many of them would be at -3, and wondered if I missed something.
2Paul Crowley12y
In seriousness, why would you deliberately evoke a hypothesis that you know is wildly unrealistic? Surely whatever the real reasons for the downvoting pattern are, they are relevant to your enquiry?
Perhaps "cabal who gathered to make a decision [to downvote]" is an overly ominous image. However, we've seen cases where every one of someone's comments has been downvoted in a short span of time, which is clearly not the typical reason for a downvoting. That's the kind of thing I was asking about.
It is possible the first downvote tends to attract further downvotes (by priming, for example), but an equally parsimonious explanation is that there are several people refreshing [http://xkcd.com/281/] the comments page at a time and a subset of them dislike the content independently.
2Paul Crowley12y
But you can still be very confident that actual collusion wasn't involved, so you shouldn't be talking as if it might have been. EDIT: as always I'm keen to know why the downvote - thanks! My current theory is that they come across as hostile, which they weren't meant to, but I'd value better data than my guesses.
Possible precedents: the Library of Alexandria and the Dark Ages.
Reaching, though: the dark ages were confined to Western Europe - and something like the Library of Alexandria couldn't happen these days - there are too many libraries.
This doesn't deal with uFAI...

If you think you have an AI that might improve itself and act on the real world, don't run it.

4Paul Crowley12y
Strike "and act on the real world" - all AIs act on the real world.
I mean, act on the real world in a way more significant than your typical chess-playing program.
This rules out FAI.
9Paul Crowley12y
Sure, this is advice along the lines of "don't design your own cipher". Only more so.
In general wise, but in this case we need a cipher, don't have any, and will probably be handed a bad one in the future. Our truisms need to be advice we would want everyone to follow.
We should encourage thinking about the intent (incoming) and expected effect (outgoing) of truisms, rather than their literal meaning. If either of the above injunctions actually doesn't apply to you, you'll know it.
My concern is you'll also 'know' it doesn't apply to you when it does. People write ciphers all the time.
4Paul Crowley12y
Yes, this is my concern too. However, anyone who posts to a newsgroup saying "I'm about to write my own cipher, any advice" should not do it. The post indicated someone who planned to actually start writing code; that's a definite sign that they shouldn't do it.
See the addendum above; "don't do it" isn't likely to work.
0Paul Crowley12y
Even though it's unlikely to work, it is still the approach which minimizes risk; even a small reduction in their probability of going ahead will likely be a bigger effect than any other safety advice you can give, and any other advice will act against its efficacy.
"Then they are fools and nothing can be done about it." [http://lesswrong.com/lw/ri/the_outside_views_domain/] In any case, this seems to be the opposite of the concern you were citing before.
If we use truisms that everyone knows have to be ignored by someone, It becomes easier to think they can be ignored by oneself.
I reread the thread, leaning towards your position now.
Entertainingly, he's entering the field from mathematical cryptography; so "don't design your own cipher" is precisely the wrong analogy to use here :-)
0Paul Crowley12y
"mathematical cryptography"? What other sort of cryptography is there?
It used to be the domain of the linguists... But you're correct; nowadays, I'm using mathematical cryptography as a short hand for "y'know, like, real cryptography, not just messing around with symbols to impress you friends".
0Paul Crowley12y
Ah, OK! It's possible in that case that I may actually know your friend, if they happened to touch on some of the same parts of the field as me.
No extra clues :-)

Due to the lack of details, it is difficult to make a recomendation, but some thoughts.

Both as an AGI challenge and for general human safety, business intelligence datawarehouses are probably a good bet. Any pattern undetected by humans detected by an AI could mean good money, which could feedback into more resources for the AI. Also, the ability of corporations to harm others doesn't increase significantly with a better business intelligence tool.

Virtual worlds - If the AI is tested in an isolated virtual world, that will be better for us. Test it in ... (read more)

Virtual Worlds doesn't buy you any safety, even if it can't break out of the simulator. If you manage to make AI, you've got a Really Powerful Optimization Process. If it worked out simulated physics and has access to it's own source, it's probably smart enough to 'foom', even with the simulation. At which point you have a REALLY powerful optimizer, and no idea how to prove anything about it's goal system. An untrustable genie. Also, spending all those cycles on that kind of simulated world would be hugely inefficient.
James, you can't blame me for responding to the question. Stuart has said that advice on giving up will not be accepted. The question is to minimise the fallout of a lucky stroke moving this guy's AI forward and fooming. Both of my suggestions were around that.
You are quite right.
You are giving a budding superintelligence exposure to a simulation based on our physics? It would work out the physics of the isolated virtual world, deduce from the traces you leave in the design that it is in a simulation and have a good guess on what we believe to be the actual physics of our universe. Maybe even have a hunch about how we have physics wrong. I would not want to bet our existence on it being unable to get out of that box.
My point with the virtual worlds was to put the AI into a simulation sufficiently unlike our world that it wouldn't be a threat and sufficiently like our world that we would be able to recognise what it does as intelligence. Hence the Gas giant example. If we were to release an AI into today's simulations like sims which are much less granular than the one I have proposed in my post, then it would figure out that it is in a simulation much faster. If we put it into some other kind of universe with weird physics, a magical universe lets say, then we will need to send someone intelligent to do a considerable amount of trials before we release the AI. This is to prove that whatever solutions the AI comes up with are genuinely intelligent and not something that is obvious. I too agree that we wouldn't want to bet our existence on it being unable to get out of that box, but what evidence will we leave in the simulation which will point to it that it has to "Press Red for talking to simulator"? Or to put it in even simpler terms, where in our universe is OUR "Press Red to talk to simulator" button?
I'm not sure I follow. Gas giants run on the same physics as you and me. Do you mean a world with actual different simulated physics?
I don't know. Who is going to be creating the simulation? How can I be comfortable that he will not either make a bug or design a simulation that a superintelligence cannot deduce that it is artificial? Proving that things way way smarter than me couldn't know stuff is hard. Possible sometimes but hard. The presence or absence of such a button in our universe provides some evidence about whether we could reliably create a simulation that is undetectable. But not that much evidence.
How would you design such a button? Reciting a fixed verse and afterwards stating what you want from the simulator seems like a good technique. A majority of the people on this earth believe that such a button exists in form of praying ;)
Additionally the computer on which the virtual world runs shouldn't be directly connected to other computers to prevent the AGI to escape through some 0day.

This seems rather relevant - and suggests the answer is go watch more TV. Or, at least, I felt it really needed to be linked here, and this gave me the perfect opportunity!

Someone actually made a top-level post on this [http://lesswrong.com/lw/1mn/comic_about_the_singularity/] the other day. Just sayin'.
This comment and that post are actually within seventeen minutes of each other. I think Psychohistorian may be forgiven for not noticing dclayh.
That is odd; I distinctly recall posting this before the top-level.
That would be an even better excuse. Edit: It occurs to me that the datestamp may correspond to the writing of a draft, not the time of publication.

There isn't really a general answer to "how to design a safe AI". It really depends what the AI is used for (and what they mean by AI).

For recursively self-improving AI, you've got your choice of "it's always bad", "You should only do it the SIAI way (and they haven't figured that out yet)", or "It's not a big deal, just use sofware best practices and iterate".

For robots, I've argued in the past that robots need to share our values in order to avoid squashing them, but I haven't seen anyone work this out rigorously.... (read more)

If you want to design a complex malleable AI design and have some guarantees about what it will do (rather than just fail in some creative way), think of simple properties you can prove about your code, and then try and prove them using Coq or other theorem proving system.

If you can't think of any properties that you want to hold for your system, think more.

For solving the Friendly AI problem, I suggest the following constraints for your initial hardware system:

1.) All outside input (and input libraries) are explicitly user selected. 2.) No means for the system to establish physical action (e.g., no robotic arms.) 3.) No means for the system to establish unexpected communication (e.g., no radio transmitters.)

Once this closed system has reached a suitable level of AI, then the problem of making it friendly can be worked on much easier and more practically, and without risk of the world ending.

To start out fr... (read more)

This is essentially the AI box experiment. Check out the link to see how even an AI that can only communicate with its handler(s) might be lethal without guaranteed Friendliness.

I don't think the publicly available details establish "how", merely "that".
Sure, though the mechanism I was referring to is "it can convince its handler(s) to let it out of the box through some transhuman method(s)."
Wait, since when is Eliezer transhuman?
Who said he was? If Eliezer can convince somebody to let him out of the box--for a financial loss no less--then certainly a transhuman AI can, right?
Certainly they can; what I am emphasizing is that "transhuman" is an overly strong criterion.
Definitely. Eliezer reflects perhaps a maximum lower bound on the amount of intelligence necessary to pull that off.
Didn't David Chalmers propose that here: http://www.vimeo.com/7320820 [http://www.vimeo.com/7320820] ...? Test harnesses are a standard procedure - but they are not the only kind of test. Basically, unless you are playing chess, or something, if you don't test in the real world, you won't really know if it works - and it can't do much to help you do important things - like raise funds to fuel development.
I don't understand why this comment was downvoted. Yes, zero call asks a question many of us feel has been adequately answered in the past; but they are asking politely, and it would have taken extensive archive-reading for them to have already known about the AI-Box experiment. Think before you downvote, especially with new users! EDIT: As AdeleneDawner points out, zero call isn't that new. Even so, the downvotes (at -2 when I first made my comment) looked more like signaling disagreement than anything else.
I downvoted the comment not because of AI box unsafety (which I don't find convincing at the certainty level with which it's usually asserted -- disutility may well give weight to the worry, but not to the probability), but because it gives advice on the paint color for a spaceship in the time when Earth is still standing on a giant Turtle in the center of the world. It's not a sane kind of advice.
If I'd never heard of the AI-Box Experiment, I'd think that zero call's comment was a reasonable contribution to a conversation about AI and safety in particular. It's only when we realize that object-level methods of restraining a transhuman intelligence are probably doomed that we know we must focus so precisely on getting its goals right.
Vladimir and orthonormal, Please point me to some more details about the AI box experiment, since I think what i suggested earlier as isolated virtual worlds is pretty much the same as what zero call is suggesting here. I feel that there are huge assumptions in the present AI Box experiment. The gatekeeper and the AI share a language, for one, by which the AI convinces the gatekeeper. If AGI is your only criteria without regards to friendliness, just make sure not to communicate with the AI. Turing tests are not the only proofs of intelligence. If the agi can come up with unique solutions in the universe in which it is isolated, that is enough to understand this algorithm is creative.
This just evoked a possibly-useful thought: If observing but not communicating with a boxed AI does a good enough job of patching the security holes (which I understand that it might not - that's for someone who better understands the issue to look at), perhaps putting an instance of a potential FAI in a contained virtual world would be useful as a test. It seems to me that a FAI that didn't have humans to start with would perhaps have to invent us, or something like us in some specific observable way(s), because of its values.
Good thought, but on further examination it turns out that zero isn't all that new - xe's been commenting since November; xyr karma is low because xe has been downvoted almost as often as upvoted.

My current toy thinking along these lines is imagining a program that will write a program to solve the towers of hanoi, given only some description of the problem, and do nothing else, using only fixed computational resources for the whole thing.

I think that's safe, and would illustrate useful principles for FAI.

An earlier comment of mine [http://lesswrong.com/lw/2o3/rationality_quotes_september_2010/2joj] on the Towers of Hanoi. (ETA: I mean earlier relative to the point in time when this thread was resurrected [http://lesswrong.com/lw/1mm/advice_for_ai_makers/4ehg].) Are you familiar with Hofstadter's work in "microdomains", such as Copycat et al.?
So.... you want to independently re-invent a prolog compiler [http://www.csupomona.edu/~jrfisher/www/prolog_tutorial/2_3.html]?
More like a program that takes as input and returns the Prolog code as output.
What Blueberry said. The page you linked just gives the standard program for solving Towers of Hanoi. What JamesAndrix was imagining was a program that comes up with that solution, given just the description of the problem -- i.e., what the human coder did.
Well, this can actually be done (yes, in Prolog with a few metaprogramming tricks), and it's not really that hard - only very inefficient, i.e. feasible only for relatively small problems. See: Inductive logic programming [http://en.wikipedia.org/wiki/Inductive_logic_programming].
No, not learning. And the 'do nothing else' parts can't be left out. This shouldn't be a general automatic programing method, just something that goes through the motions of solving this one problem. It should already 'know' whatever principles lead to that solution. The outcome should be obvious to the programmer, and I suspect realistically hand-traceable. My goal is a solid understanding of a toy program exactly one meta-level above hanoi. This does seem like something Prolog could do well, if there is already a static program that does this I'd love to see it.
Until you specify the format of a description of the problem, and how the program figures out how to write a program to solve the problem, it is hard to tell if this would be safe. And if you don't know that it is safe, it isn't. Using some barrier like "fixed computational resources" to contain a non-understood process is a red flag.
The format of the description is something I'm struggling with, but I'm not clear how it impacts safety. How the AI figures things out is up to the human programmer. Part of my intent in this exercise is to constrain the human to solutions they fully understand. In my mind my original description would have ruled out evolving neural nets, but now I see I definitely didn't make that clear. By 'fixed computational resources' I mean that you've got to write the program such that if it discovers some flaw that gives it access to the internet, it will patch around that access because what it is trying to do is solve the puzzle of (solving the puzzle using only these instructions and these rules and this memory.) What I'm looking for is a way to work on friendliness using goals that are much simpler than human morality, implemented by minds that are at least comprehensible in their operation, if not outright step-able.

Try to build an AI that:

  1. Implements a timeless decision theory.
  2. Is able to value things that it does not directly perceive, and in particular cares about other universes.
  3. Has a utility function such that additional resources have diminishing marginal returns.

Such an AI is more likely to participate in trades across universes, possibly with a friendly AI that requests our survival.

[EDIT]: It now occurs to me that an AI that participates in inter-universal trade would also participate in inter-universal terrorism, so I'm no longer confident that my suggestions above are good ones.

(Disclaimer: I don't know anything about AI.) Is the marginal utility of resources something that you can input? It seems to me that since resources have instrumental value (pretty much, that's what a resource is by definition), their value would be something that has to be outputted by the utility function. If you tried to input the value of resources, you'd run into difficulties with the meaning of resources. For example, would the AI distinguish "having resources" from "having access to resources" from "having access to the power of having access to resources"? Even if 'having resources' has negative utility for the AI, he might enjoy controlling resources in all kinds of ways in exchange for power to satisfy terminal values. Even if you define power as a type of resource, and give that negative utility, then you will basically be telling the AI to enjoy not being able to satisfy his terminal values. (But yet, put that way, it does suggest some kind of friendly passive/pacifist philosophy.)
There is a difference between giving something negative utility and giving it decreasing marginal utility. It's sufficient to give the AI exponents strictly between zero and one for all terms in a positive polynomial utility function, for instance. That would be effectively "inputting" the marginal utility of resources, given any current state of the world.
I was considering the least convenient argument, the one that I imagined would result in the least aggressive AI. (I should explain here that I considered that even a 0 terminal utility for the resource itself would not result in 0 utility for that resource, because that resource would have some instrumental value in achieving things of value.) (Above edited because I don't think I was understood.) But I think the problem in logic identified with inputting the value of an instrumental value remains either way.
You pretty much have to guess about the marginal value of resources. But let's say the AI's utility function is "10^10th root of # of paperclips in universe." Then it probably satisfies the criterion. EDIT: even better would be U = 1 if the universe contains at least one paperclip, otherwise 0.
Can you please elaborate on "trades across universes"? Do you mean something like quantum civilization suicide, as in Nick Bostrom's paper on that topic?
Here's Nesov's elaboration of his trading across possible worlds idea [http://lesswrong.com/lw/102/indexical_uncertainty_and_the_axiom_of/10zh]. Personally, I think it's an interesting idea, but I'm skeptical that it can really work, except maybe in very limited circumstances such as when the trading partners are nearly identical.
Cool, thanks!

What does "AI programming" even mean ? If he's trying to make some sort of an abstract generally-intelligent AI, then he'll be wasting his time, since the probability of him succeeding is somewhere around epsilon. If he's trying to make an AI for some specific purpose, then I'd advise him to employ lots of testing and especially cross-validation, to avoid overfitting. Of course, if his purpose is something like "make the smartest killer drone ever", then I'd prefer him to fail...

I've read through the AI-Box experiment, and I can still say that I recommend the "sealed AI" tactic. The Box experiment isn't very convincing at all to me, which I could go into detail about, but that would require a whole post. But of course, I'll never develop the karma to do that because apparently the rate at which I ask questions of proper material exceeds the rate at which I post warm, fuzzy comments. Well, at least I have my own blog...

It looks like you're picking up karma relatively rapidly of late; it takes a while to learn the ways of speaking around here that don't detract from the content of one's comments, but once that happens, most people will accumulate karma reasonably quickly. But since the AI-Box experiment has been discussed a bit here already, it might make sense to lay out your counterargument here or on the Open Thread [http://lesswrong.com/lw/1lf/open_thread_january_2010/] for now. I know that's not as satisfying as making a post, but I think you'll still get quality discussion. (Also, a top-level post on an old topic by a relative newcomer runs a risk of getting downvoted for redundancy if the argument recapitulates someone's old position— and post downvotes can kill your karma for a while. Caveat scriptor!) P.S. Also on the topic, and quite interesting: That Alien Message [http://lesswrong.com/lw/qk/that_alien_message/].
I think God themselves just struck me with +20 karma somehow... thank ye almighty lords! Yeah, but indeed I will heed your advice and look into the issue more before posting.