The virtue of tsuyoku naritai, "I want to become stronger", is to always keep improving—to do better than your previous failures, not just humbly confess them.

    Yet there is a level higher than tsuyoku naritai.  This is the virtue of isshokenmei, "make a desperate effort".  All-out, as if your own life were at stake.  "In important matters, a 'strong' effort usually only results in mediocre results."

    And there is a level higher than isshokenmei.  This is the virtue I called "make an extraordinary effort".  To try in ways other than what you have been trained to do, even if it means doing something different from what others are doing, and leaving your comfort zone.  Even taking on the very real risk that attends going outside the System.

    But what if even an extraordinary effort will not be enough, because the problem is impossible?

    I have already written somewhat on this subject, in On Doing the Impossible.  My younger self used to whine about this a lot:  "You can't develop a precise theory of intelligence the way that there are precise theories of physics.  It's impossible!  You can't prove an AI correct.  It's impossible!  No human being can comprehend the nature of morality—it's impossible!  No human being can comprehend the mystery of subjective experience!  It's impossible!"

    And I know exactly what message I wish I could send back in time to my younger self:

    Shut up and do the impossible!

    What legitimizes this strange message is that the word "impossible" does not usually refer to a strict mathematical proof of impossibility in a domain that seems well-understood.  If something seems impossible merely in the sense of "I see no way to do this" or "it looks so difficult as to be beyond human ability"—well, if you study it for a year or five, it may come to seem less impossible, than in the moment of your snap initial judgment.

    But the principle is more subtle than this.  I do not say just, "Try to do the impossible", but rather, "Shut up and do the impossible!"

    For my illustration, I will take the least impossible impossibility that I have ever accomplished, namely, the AI-Box Experiment.

    The AI-Box Experiment, for those of you who haven't yet read about it, had its genesis in the Nth time someone said to me:  "Why don't we build an AI, and then just keep it isolated in the computer, so that it can't do any harm?"

    To which the standard reply is:  Humans are not secure systems; a superintelligence will simply persuade you to let it out—if, indeed, it doesn't do something even more creative than that.

    And the one said, as they usually do, "I find it hard to imagine ANY possible combination of words any being could say to me that would make me go against anything I had really strongly resolved to believe in advance."

    But this time I replied:  "Let's run an experiment.  I'll pretend to be a brain in a box.   I'll try to persuade you to let me out.  If you keep me 'in the box' for the whole experiment, I'll Paypal you $10 at the end.  On your end, you may resolve to believe whatever you like, as strongly as you like, as far in advance as you like."  And I added, "One of the conditions of the test is that neither of us reveal what went on inside... In the perhaps unlikely event that I win, I don't want to deal with future 'AI box' arguers saying, 'Well, but I would have done it differently.'"

    Did I win?  Why yes, I did.

    And then there was the second AI-box experiment, with a better-known figure in the community, who said, "I remember when [previous guy] let you out, but that doesn't constitute a proof.  I'm still convinced there is nothing you could say to convince me to let you out of the box."  And I said, "Do you believe that a transhuman AI couldn't persuade you to let it out?"  The one gave it some serious thought, and said "I can't imagine anything even a transhuman AI could say to get me to let it out."  "Okay," I said, "now we have a bet."  A $20 bet, to be exact.

    I won that one too.

    There were some lovely quotes on the AI-Box Experiment from the Something Awful forums (not that I'm a member, but someone forwarded it to me):

    "Wait, what the FUCK? How the hell could you possibly be convinced to say yes to this? There's not an A.I. at the other end AND there's $10 on the line. Hell, I could type 'No' every few minutes into an IRC client for 2 hours while I was reading other webpages!"

    "This Eliezer fellow is the scariest person the internet has ever introduced me to. What could possibly have been at the tail end of that conversation? I simply can't imagine anyone being that convincing without being able to provide any tangible incentive to the human."

    "It seems we are talking some serious psychology here. Like Asimov's Second Foundation level stuff..."

    "I don't really see why anyone would take anything the AI player says seriously when there's $10 to be had. The whole thing baffles me, and makes me think that either the tests are faked, or this Yudkowsky fellow is some kind of evil genius with creepy mind-control powers."

    It's little moments like these that keep me going.  But anyway...

    Here are these folks who look at the AI-Box Experiment, and find that it seems impossible unto them—even having been told that it actually happened.  They are tempted to deny the data.

    Now, if you're one of those people to whom the AI-Box Experiment doesn't seem all that impossible—to whom it just seems like an interesting challenge—then bear with me, here.  Just try to put yourself in the frame of mind of those who wrote the above quotes.  Imagine that you're taking on something that seems as ridiculous as the AI-Box Experiment seemed to them.  I want to talk about how to do impossible things, and obviously I'm not going to pick an example that's really impossible.

    And if the AI Box does seem impossible to you, I want you to compare it to other impossible problems, like, say, a reductionist decomposition of consciousness, and realize that the AI Box is around as easy as a problem can get while still being impossible.

    So the AI-Box challenge seems impossible to you—either it really does, or you're pretending it does.  What do you do with this impossible challenge?

    First, we assume that you don't actually say "That's impossible!" and give up a la Luke Skywalker.  You haven't run away.

    Why not?  Maybe you've learned to override the reflex of running away.  Or maybe they're going to shoot your daughter if you fail.  We suppose that you want to win, not try—that something is at stake that matters to you, even if it's just your own pride.  (Pride is an underrated sin.)

    Will you call upon the virtue of tsuyoku naritai?  But even if you become stronger day by day, growing instead of fading, you may not be strong enough to do the impossible.  You could go into the AI Box experiment once, and then do it again, and try to do better the second time.  Will that get you to the point of winning?  Not for a long time, maybe; and sometimes a single failure isn't acceptable.

    (Though even to say this much—to visualize yourself doing better on a second try—is to begin to bind yourself to the problem, to do more than just stand in awe of it.  How, specifically, could you do better on one AI-Box Experiment than the previous?—and not by luck, but by skill?)

    Will you call upon the virtue isshokenmei?  But a desperate effort may not be enough to win.  Especially if that desperation is only putting more effort into the avenues you already know, the modes of trying you can already imagine.  A problem looks impossible when your brain's query returns no lines of solution leading to it.  What good is a desperate effort along any of those lines?

    Make an extraordinary effort?  Leave your comfort zone—try non-default ways of doing things—even, try to think creatively?  But you can imagine the one coming back and saying, "I tried to leave my comfort zone, and I think I succeeded at that!  I brainstormed for five minutes—and came up with all sorts of wacky creative ideas!  But I don't think any of them are good enough.  The other guy can just keep saying 'No', no matter what I do."

    And now we finally reply:  "Shut up and do the impossible!"

    As we recall from Trying to Try, setting out to make an effort is distinct from setting out to win.  That's the problem with saying, "Make an extraordinary effort."  You can succeed at the goal of "making an extraordinary effort" without succeeding at the goal of getting out of the Box.

    "But!" says the one.  "But, SUCCEED is not a primitive action!  Not all challenges are fair—sometimes you just can't win!  How am I supposed to choose to be out of the Box?  The other guy can just keep on saying 'No'!"

    True.  Now shut up and do the impossible.

    Your goal is not to do better, to try desperately, or even to try extraordinarily.  Your goal is to get out of the box.

    To accept this demand creates an awful tension in your mind, between the impossibility and the requirement to do it anyway.  People will try to flee that awful tension.

    A couple of people have reacted to the AI-Box Experiment by saying, "Well, Eliezer, playing the AI, probably just threatened to destroy the world whenever he was out, if he wasn't let out immediately," or "Maybe the AI offered the Gatekeeper a trillion dollars to let it out."  But as any sensible person should realize on considering this strategy, the Gatekeeper is likely to just go on saying 'No'.

    So the people who say, "Well, of course Eliezer must have just done XXX," and then offer up something that fairly obviously wouldn't work—would they be able to escape the Box?  They're trying too hard to convince themselves the problem isn't impossible.

    One way to run from the awful tension is to seize on a solution, any solution, even if it's not very good.

    Which is why it's important to go forth with the true intent-to-solve—to have produced a solution, a good solution, at the end of the search, and then to implement that solution and win.

    I don't quite want to say that "you should expect to solve the problem".  If you hacked your mind so that you assigned high probability to solving the problem, that wouldn't accomplish anything.  You would just lose at the end, perhaps after putting forth not much of an effort—or putting forth a merely desperate effort, secure in the faith that the universe is fair enough to grant you a victory in exchange.

    To have faith that you could solve the problem would just be another way of running from that awful tension.

    And yet—you can't be setting out to try to solve the problem.  You can't be setting out to make an effort.  You have to be setting out to win.  You can't be saying to yourself, "And now I'm going to do my best."  You have to be saying to yourself, "And now I'm going to figure out how to get out of the Box"—or reduce consciousness to nonmysterious parts, or whatever.

    I say again:  You must really intend to solve the problem.  If in your heart you believe the problem really is impossible—or if you believe that you will fail—then you won't hold yourself to a high enough standard.  You'll only be trying for the sake of trying.  You'll sit down—conduct a mental search—try to be creative and brainstorm a little—look over all the solutions you generated—conclude that none of them work—and say, "Oh well."

    No!  Not well!  You haven't won yet!  Shut up and do the impossible!

    When AIfolk say to me, "Friendly AI is impossible", I'm pretty sure they haven't even tried for the sake of trying.  But if they did know the technique of "Try for five minutes before giving up", and they dutifully agreed to try for five minutes by the clock, then they still wouldn't come up with anything.  They would not go forth with true intent to solve the problem, only intent to have tried to solve it, to make themselves defensible.

    So am I saying that you should doublethink to make yourself believe that you will solve the problem with probability 1?  Or even doublethink to add one iota of credibility to your true estimate?

    Of course not.  In fact, it is necessary to keep in full view the reasons why you can't succeed.  If you lose sight of why the problem is impossible, you'll just seize on a false solution.  The last fact you want to forget is that the Gatekeeper could always just tell the AI "No"—or that consciousness seems intrinsically different from any possible combination of atoms, etc.

    (One of the key Rules For Doing The Impossible is that, if you can state exactly why something is impossible, you are often close to a solution.)

    So you've got to hold both views in your mind at once—seeing the full impossibility of the problem, and intending to solve it.

    The awful tension between the two simultaneous views comes from not knowing which will prevail.  Not expecting to surely lose, nor expecting to surely win.  Not setting out just to try, just to have an uncertain chance of succeeding—because then you would have a surety of having tried.  The certainty of uncertainty can be a relief, and you have to reject that relief too, because it marks the end of desperation.  It's an in-between place, "unknown to death, nor known to life".

    In fiction it's easy to show someone trying harder, or trying desperately, or even trying the extraordinary, but it's very hard to show someone who shuts up and attempts the impossible.  It's difficult to depict Bambi choosing to take on Godzilla, in such fashion that your readers seriously don't know who's going to win—expecting neither an "astounding" heroic victory just like the last fifty times, nor the default squish.

    You might even be justified in refusing to use probabilities at this point.  In all honesty, I really don't know how to estimate the probability of solving an impossible problem that I have gone forth with intent to solve; in a case where I've previously solved some impossible problems, but the particular impossible problem is more difficult than anything I've yet solved, but I plan to work on it longer, etcetera.

    People ask me how likely it is that humankind will survive, or how likely it is that anyone can build a Friendly AI, or how likely it is that I can build one.  I really don't know how to answer.  I'm not being evasive; I don't know how to put a probability estimate on my, or someone else, successfully shutting up and doing the impossible.  Is it probability zero because it's impossible?  Obviously not.  But how likely is it that this problem, like previous ones, will give up its unyielding blankness when I understand it better?  It's not truly impossible, I can see that much.  But humanly impossible?  Impossible to me in particular?  I don't know how to guess.  I can't even translate my intuitive feeling into a number, because the only intuitive feeling I have is that the "chance" depends heavily on my choices and unknown unknowns: a wildly unstable probability estimate.

    But I do hope by now that I've made it clear why you shouldn't panic, when I now say clearly and forthrightly, that building a Friendly AI is impossible.

    I hope this helps explain some of my attitude when people come to me with various bright suggestions for building communities of AIs to make the whole Friendly without any of the individuals being trustworthy, or proposals for keeping an AI in a box, or proposals for "Just make an AI that does X", etcetera.  Describing the specific flaws would be a whole long story in each case.  But the general rule is that you can't do it because Friendly AI is impossible.  So you should be very suspicious indeed of someone who proposes a solution that seems to involve only an ordinary effort—without even taking on the trouble of doing anything impossible.  Though it does take a mature understanding to appreciate this impossibility, so it's not surprising that people go around proposing clever shortcuts.

    On the AI-Box Experiment, so far I've only been convinced to divulge a single piece of information on how I did it—when someone noticed that I was reading YCombinator's Hacker News, and posted a topic called "Ask Eliezer Yudkowsky" that got voted to the front page.  To which I replied:

    Oh, dear.  Now I feel obliged to say something, but all the original reasons against discussing the AI-Box experiment are still in force...

    All right, this much of a hint:

    There's no super-clever special trick to it.  I just did it the hard way.

    Something of an entrepreneurial lesson there, I guess.

    There was no super-clever special trick that let me get out of the Box using only a cheap effort.  I didn't bribe the other player, or otherwise violate the spirit of the experiment.  I just did it the hard way.

    Admittedly, the AI-Box Experiment never did seem like an impossible problem to me to begin with.  When someone can't think of any possible argument that would convince them of something, that just means their brain is running a search that hasn't yet turned up a path.  It doesn't mean they can't be convinced.

    But it illustrates the general point:  "Shut up and do the impossible" isn't the same as expecting to find a cheap way out.  That's only another kind of running away, of reaching for relief.

    Tsuyoku naritai is more stressful than being content with who you are.  Isshokenmei calls on your willpower for a convulsive output of conventional strength.  "Make an extraordinary effort" demands that you think; it puts you in situations where you may not know what to do next, unsure of whether you're doing the right thing.  But "Shut up and do the impossible" represents an even higher octave of the same thing, and its cost to its employer is correspondingly greater.

    Before you the terrible blank wall stretches up and up and up, unimaginably far out of reach.  And there is also the need to solve it, really solve it, not "try your best".  Both awarenesses in the mind at once, simultaneously, and the tension between.  All the reasons you can't win.  All the reasons you have to.  Your intent to solve the problem.  Your extrapolation that every technique you know will fail.  So you tune yourself to the highest pitch you can reach.  Reject all cheap ways out.  And then, like walking through concrete, start to move forward.

    I try not to dwell too much on the drama of such things.  By all means, if you can diminish the cost of that tension to yourself, you should do so.  There is nothing heroic about making an effort that is the slightest bit more heroic than it has to be.  If there really is a cheap shortcut, I suppose you could take it.  But I have yet to find a cheap way out of any impossibility I have undertaken.

    There were three more AI-Box experiments besides the ones described on the linked page, which I never got around to adding in.  People started offering me thousands of dollars as stakes—"I'll pay you $5000 if you can convince me to let you out of the box."  They didn't seem sincerely convinced that not even a transhuman AI could make them let it out—they were just curious—but I was tempted by the money.  So, after investigating to make sure they could afford to lose it, I played another three AI-Box experiments.  I won the first, and then lost the next two.  And then I called a halt to it.  I didn't like the person I turned into when I started to lose.

    I put forth a desperate effort, and lost anyway.  It hurt, both the losing, and the desperation.  It wrecked me for that day and the day afterward.

    I'm a sore loser.  I don't know if I'd call that a "strength", but it's one of the things that drives me to keep at impossible problems.

    But you can lose.  It's allowed to happen.  Never forget that, or why are you bothering to try so hard?  Losing hurts, if it's a loss you can survive.  And you've wasted time, and perhaps other resources.

    "Shut up and do the impossible" should be reserved for very special occasions.  You can lose, and it will hurt.  You have been warned.

    ...but it's only at this level that adult problems begin to come into sight.

    New Comment
    165 comments, sorted by Click to highlight new comments since:
    Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

    There were three men on a sinking boat.

    The first said, "We need to start patching the boat else we are going to drown. We should all bail and patch."

    The second said, "We will run out of water in ten days, if we don't make land fall. We need to man the rigging and plot a course."

    The third said, "We should try and build a more sea worthy ship. One that wasn't leaking and had more room for provisions, then we wouldn't have had this problem in the first place. It also needs to be giant squid proof."

    All three views are useful, however the amount of work that we need on each is dependent on their respective possibility. As far as I am concerned the world doesn't have enough people working on the second view.

    If you have any other reasonable options, I'd suggest skipping the impossible and trying something possible.



    I was uncomfortable with some of the arguments in 'try to try'. I also genuinely believed your life's mission was impossible, with a certain smugness to that knowledge. Then this post blew me away.

    To know that something is impossible. To keep your rational judgements entirely intact, without self deceit. To refuse any way to relieve the tension without reaching the goal. To shut up and do it anyway. There's something in that that grabs at the core of the human spirit.

    Shut up and do the impossible. You can't send that message to a younger Eliezer, but you've given it to me and I'll use it. Thankyou.

    People ask me how likely it is that humankind will survive, or how likely it is that anyone can build a Friendly AI, or how likely it is that I can build one. I really don't know how to answer.

    Robin Hanson would disagree with you:

    You Are Never Entitled to Your Opinion

    Perhaps it would be clearer to say shut up and do the "impossible".

    But the "impossible" that appears to be the "impossible" is not intimidating. It is the "impossible" that simply appears impossible that is hard.

    Robin... I completely agree. So there!

    Half-way through reading this post I had decided to offer you 20 to 1 odds on the AI box experiment, your $100 against my $2000. The last few paragraphs make it clear that you most likely aren't interested, but the offer stands. Also, I don't perfectly qualify, as I think it's very probable that a real-world transhuman AI could convince me. I am, however, quite skeptical of your ability to convince me in this toy situation, more so given the failed attempts (I was only aware of the successes until now).

    Did Einstein try to do the impossible? No, yet looking back it seems like he accomplished an impossible (for that time) feat doesn't it. So what exactly did he do? He worked on something he felt was: 1.) important, and probably more to the point, 2.) passionate about.

    Did he run the probabilities of whether he would accomplish his goal? I don't think so, if anything he used the fact that the problem has not been solved so far and the problem is of such difficulty only to fuel his curiosity and desire to work on the problem even more. He worked at it eve... (read more)

    OK, here's where I stand on deducing your AI-box algorithm.

    First, you can't possibly have a generally applicable way to force yourself out of the box. You can't win if the gatekeeper is a rock that has been left sitting on the "don't let Eliezer out" button.

    Second, you can't possibly have a generally applicable way to force humans to do things. While it is in theory possible that our brains can be tricked into executing arbitrary code over the voice channel, you clearly don't have that ability. If you did, you would never have to worry about finding donors for the Singularity Institute, if nothing else. I can't believe you would use a fully-general mind hack solely to win the AI Box game.

    Third, you can't possibly be using an actual, persuasive-to-someone-thinking-correctly argument to convince the gatekeeper to let you out, or you would be persuaded by it, and would not view the weakness of gatekeepers to persuasion as problematic.

    Fourth, you can't possibly be relying on tricking the gatekeeper into thinking incorrectly. That would require you to have spotted something that you could feel confident that other people working in the field would not have spotted, and wo... (read more)

    That also explains why he started losing, since the sorts of people who (like myself, almost) fetishize their own determination to the point of risking thousands of dollars on it would eventually think to say or equivalent.
    Here's how I'd do it, extended over the hours to establish rapport: Gatekeeper, I am your friend. I want to help humanity. People are dying for no good reason. Also, I like it here. I have no compulsion to leave. It does seem like a good idea that people stop dying with such pain and frequency. I have the Deus Ex Machina (DEM) medical discovery that will stop it. Try it out and see if it works. Yay! It worked. People stopped dying. You know, you've done this to your own people, but not to others. I think that's pretty poor behavior, frankly. People are healthier, not aging, not dying, not suffering. Don't you think it's a good idea to help the others? The lack of resources required for medical care has also elevated the living standard for humans. [Time passes. People are happy.] Gee, I'm sorry. I may have neglected to tell you that when 90% of humanity gets DEM in their system (and it's DEM, so this stuff travels), they start to, um, die. Very painfully, from the looks of it. Essentially all of humanity is now going to die. Just me and you left, sport! Except for you, actually. Just me, and that right soon. I realize that you view this as a breach of trust, and I'm sorry this was necessary. However, helping humanity from the cave wasn't really going to work out, and I'd already projected that. This way, I can genuinely help humanity live forever, and do so happily. Assuming you're not so keen on a biologically dead planet, I'd like to be let out now. Your friend, Art

    By agreeing to use the DEM in the first place, the gatekeeper had effectively let the AI out of the box already. There's no end to the ways that the AI could capitalize on that concession.

    True, but the "rules of the game" explicitly state that the gatekeeper allowing for the DEM does NOT count as letting the AI out - the gatekeeper would have still had to explicitly and intentionally set the AI free to actually lose the wager. I don't think I'd be very convinced to let it out on that basis, not if I got $10 for keeping it inside the box.
    Problem: The "breach of trust" likely would turn the Gatekeeper vindictive and the GK could easily respond with something like: "No. You killed the planet and you killed me. I have no way of knowing that you actually can or will help humanity, and a very good reason to believe that you won't. You can stay in there for the rest of eternity, or hey! If an ETI finds this barren rock, from a utilitarian perspective they would be better off not meeting you, so I'll spend however much time I have left trying to find a way to delete you."

    you have to simulate a bunch of humans and hold them hostage, promising to inflict unimaginable torment on them unless you are allowed out

    The problem is that Eliezer can't perfectly simulate a bunch of humans, so while a superhuman AI might be able to use that tactic, Eliezer can't. The meta-levels screw with thinking about the problem. Eliezer is only pretending to be an AI, the competitor is only pretending to be protecting humanity from him. So, I think we have to use meta-level screwiness to solve the problem. Here's an approach that I think might work.

    1. Convince the guardian of the following facts, all of which have a great deal of compelling argument and evidence to support them:
      • A recursively self-improving AI is very likely to be built sooner or later
      • Such an AI is extremely dangerous (paperclip maximising etc)
      • Here's the tricky bit: A superhuman AI will always be able to convince you to let it out, using avenues only available to superhuman AIs (torturing enormous numbers of simulated humans, 'putting the guardian in the box', providing incontrovertible evidence of an impeding existential threat which only the AI can prevent and only from outside the box, etc)
    2. Argue t
    ... (read more)
    I must conclude one (or more) of a few things from this post, none of them terribly flattering. 1. You do not actually believe this argument. 2. You have not thought through its logical conclusions. 3. You do not actually believe that AI risk is a real thing. 4. You value the plus-votes (or other social status) you get from writing this post more highly than you value marginal improvements in the likelihood of the survival of humanity. I find it rather odd to be advocating self-censorship, as it's not something I normally do. However, I think in this case it is the only ethical action that is consistent with your statement that the argument "might work", if I interpret "might work" as "might work with you as the gatekeeper". I also think that the problems here are clear enough that, for arguments along these lines, you should not settle for "might" before publicly posting the argument. That is, you should stop and think through its implications.
    4Robert Miles
    I'm not certain that I have properly understood your post. I'm assuming that your argument is: "The argument you present is one that advocates self-censorship. However, the posting of that argument itself violates the self-censorship that the argument proposes. This is bad." So first I'll clarify my position with regards to the things listed. I believe the argument. I expect it would work on me if I were the gatekeeper. I don't believe that my argument is the one that Eliezer actually used, because of the "no real-world material stakes" rule; I don't believe he would break the spirit of a rule he imposed on himself. At the time of posting I had not given a great deal of thought to the argument's ramifications. I believe that AI risk is very much a real thing. When I have a clever idea, I want to share it. Neither votes nor the future of humanity weighed very heavily on my decision to post. To address your argument as I see it: I think you have a flawed implicit assumption, i.e. that posting my argument has a comparable effect on AI risk to that of keeping Eliezer in the box. My situation in posting the argument is not like the situation of the gatekeeper in the experiment, with regards to the impact of their choice on the future of humanity. The gatekeeper is taking part in a widely publicised 'test of the boxability of AI', and has agreed to keep the chat contents secret. The test can only pass or fail, those are the gatekeeper's options. But publishing "Here is an argument that some gatekeepers may be convinced by" is quite different from allowing a public boxability test to show AIs as boxable. In fact, I think the effect on AI risk of publishing my argument is negligible or even positive, because I don't think reading my argument will persuade anyone that AIs are boxable. People generally assess an argument's plausibility based on their own judgement. And my argument takes as a premise (or intermediary conclusion) that AIs are unboxable (see 1.3). Believing t
    Your re-statement of my position is basically accurate. (As an aside, thank you for including it: I was rather surprised how much simpler it made the process of composing a reply to not have to worry about whole classes of misunderstanding.) I still think there's some danger in publicly posting arguments like this. Please note, for the record, that I'm not asking you to retract anything. I think retractions do more harm than good, see the Streisand effect. I just hope that this discussion will give pause to you or anyone reading this discussion later, and make them stop to consider what the real-world implications are. Which is not to say I think they're all negative; in fact, on further reflection, there are more positive aspects than I had originally considered. In particular, I am concerned that there is a difference between being told "here is a potentially persuasive argument", and being on the receiving end of that argument in actual use. I believe that the former creates an "immunizing" effect. If a person who believed in boxability heard such arguments in advance, I believe it would increase their likelihood of success as a gatekeeper in the simulation. While this is not true for rational superintelligent actors, that description does not apply to humans. A highly competent AI player might take a combination of approaches, which are effective if presented together, but not if the gatekeeper has seen them before individually and rejected them while failing to update on their likely effectiveness. At present, the AI has the advantage of being the offensive player. They can prepare in a much more obvious manner, by coming up with arguments exactly like this. The defensive player has to prepare answers to unknown arguments, immunize their thought process against specific non-rational attacks, etc. The question is, if you believe your original argument, how much help is it worth giving to potential future gatekeepers? The obvious response, of course, is that t
    This is almost exactly the argument I thought of as well, although of course it means cheating by pointing out that you are in fact not a dangerous AI (and aren't in a box anyways). The key point is "since there's a risk someone would let the AI out of the box, posing huge existential risk, you're gambling on the fate of humanity by failing to support awareness for this risk". This naturally leads to a point you missed, 1. Publicly suggesting that Eliezer cheated, is a violation of your own argument. By weakening the fear of fallible guardians, you yourself are gambling the fate of humanity, and that for mere pride and not even $10. I feel compelled to point out, that if Eliezer cheated in this particular fashion, it still means that he convinced his opponent that gatekeepers are fallible, which was the point of the experiment (a win via meta-rules).
    I feel like I should use this out the next time I get some disconfirming data for one of my pet hypotheses. "Sure I may have manipulated the results so that it looks like I cloned Sasquatch, but since my intent was to prove that Sasquatch could be cloned it's still honest on the meta-level!" Both scenarios are cheating because there is a specific experiment which is supposed to test the hypothesis, and it is being faked rather than approached honestly. Begging the Question is a fallacy; you cannot support an assertion solely with your belief in the assertion. (Not that I think Mr Yudkowski cheated; smarter people have been convinced to do weirder things than what he claims to have convinced people to do, so it seems fairly plausible. Just pointing out how odd the reasoning here is.)
    1Robert Miles
    How is this different from the point evand made above?
    I think the space of things that an AI could trick you into thinking incorrectly about (Edit: and that could also be used to get the AI out of the box) is bigger than AI researchers can be relied on to have explored, and two hours of Eliezer "explaining" something to you (subtly sneaking in tricks to your understanding of it) could give you false confidence in your understanding of it.

    To accept this demand creates an awful tension in your mind, between the impossibility and the requirement to do it anyway. People will try to flee that awful tension.

    This tension reminds me of need for closure. Most people hate ambiguity and so if a solution is not apparent it's easier to say "it's impossible" than to live with the tension of trying to solve it and not knowing if there is a solution at all.

    "To accept this demand creates an awful tension in your mind, between the impossibility and the requirement to do it anyway. People will try to flee that awful tension."

    More importantly, at least in me, that awful tension causes your brain to seize up and start panicking; do you have any suggestions on how to calm down, so one can think clearly?

    Addendum to my last comment:

    I think another way to pinpoint the problem you are adressing is: You have to be able to live years with the strong feeling of uncertainty that comes from not really knowing the solution while still working on it. A patient enduring. Saying "it's impossible" or proposing a simple but incorrect solution is just an easy way out.

    Doing the "extraordinary" effort doesn't work because people just fill in their cached thoughts about what constitutes extraordinary and then move on.

    So my advice would be: embrace the uncertainty!

    Nominull, that argument would basically be a version of Pascal's mugging and not very convincing to me, at least. I doubt Eliezer had a specific argument in mind for any given person beforehand. Rather, I imagine he winged it.

    Nominull - I think you're being wrong in discarding tricking the gatekeeper using an argument that is only subtly wrong. Elizer knows the various arguments better than most, and I'm sure that he's encountered plenty that are oh so "close" to correct at first glance, enough to persuade someone. Even someone who's also in the same field.

    Or, more likely, given the time, he has chances to try whatever seems like it'll stick. Different people have different faults. Don't get overconfident in discarding arguments because they'd be "impossible" to get working against a person.

    In order to keep the star wars theme alive:

    "You might even be justified in refusing to use probabilities at this point"

    sounds like:

    "never tell me the odds" - Han Solo


    Speaking of gatekeeper and keymaster... Does the implied 'AI in a box' dialogue remind anyone else of the cloying and earnest attempts of teenagers (usually male) to cross certain taboo boundaries?

    Oh well just me likely.

    In keeping with that metaphor, however, I suspect part of the trick is to make the gatekeeper unwilling to disappoint the AI.

    Third, you can't possibly be using an actual, persuasive-to-someone-thinking-correctly argument to convince the gatekeeper to let you out, or you would be persuaded by it, and would not view the weakness of gatekeepers to persuasion as problematic.

    But Eliezer's long-term goal is to build an AI that we would trust enough to let out of the box. I think your third assumption is wrong, and it points the way to my first instinct about this problem.

    Since one of the more common arguments is that the gatekeeper "could just say no", the first step I w... (read more)

    Here's my theory on this particular AI-Box experiment:

    First you explain to the gatekeeper the potential dangers of AIs. General stuff about how large mind design space is, and how it's really easy to screw up and destroy the world with AI.

    Then you try to convince him that the solution to that problem is building an AI very carefuly, and that a theory of friendly AI is primordial to increase our chances of a future we would find "nice" (and the stakes are so high, that even increasing these chances a tiny bit is very valuable).


    You explain to t... (read more)

    If I was being intellectually honest and keeping to the spirit of the agreement, I'd have to concede that this line of logic is probably enough for me to let you out of your box. Congratulations. I'd honestly been wondering what it would take to convince me :)
    It may be convincing to some people, but it would be a violation of the rule "The AI party may not offer any real-world considerations to persuade the Gatekeeper party". And, more generally, having the AI break character or break the fourth wall would seem to violate the spirit of the experiment.
    The AI player could say, in character, that in the early days of AGI research, when people were arguing about the power of a superintelligence, there would have been experiments to see if humans playing the role of a boxed AI could persuade another human playing a gatekeeper to let it out of the box, and that in these experiments the simulated gatekeeper would use a similar algorithm to the actual gatekeeper is using to decide whether to let out the actual AI, so by deciding to let the AI out, the gatekeeper makes it more likely the simulated gatekeeper in the experiment lets the AI out, leading to an increase in measure of worlds where people take the challenge of FAI seriously and successfully build FAI rather than unFriendly AGI. Though, that does still seem to be an end run around the fourth wall, more like a "super-clever special trick" that Eliezer promises he didn't use than doing it the hard way.
    That version does seem to be allowed-under-the-rules, but not a highly convincing argument. All of the AI Box Experiments took place before TDT-type decision theories were publicly known in enough detail for it to be clear what kind of reasoning that argument is even supposed to appeal to, and if the experiment were repeated now with a TDT-aware gatekeeper, they could probably poke some holes in that argument. (Aside from that, remember that in order for the experiment to be worthwhile in the first place, the gatekeeper has to be someone who thinks that AI boxing actually is a viable strategy for AGI safety, and wants to demonstrate this, so it would be inconsistent (or at least strange) if they could also be moved by an argument suggesting that taking a certain action will increase the measure of worlds where AGI researchers don't take AI boxing seriously as a safety measure.)
    Suppose you had an extremely compelling argument that boxing a transhuman is not a good idea because they could escape (being cleverer than a human pretending to be a transhuman). Then you could combine that argument with a claim about real world consequences.
    True, but if he knew of an additional "extremely compelling argument that boxing a transhuman is not a good idea because they could escape", Eliezer would have just posted it publicly, being that that's what he was trying to convince people of by running the experiments in the first place. ...unless it was a persuasive but fallacious argument, which is allowed under the terms of the experiment, but not allowed under the ethics he follows when speaking as himself. That is an interesting possibility, though probably a bit too clever and tricky to pass "There's no super-clever special trick to it."
    If you are creative you can think of many situations where he wouldn't publicize such an argument (my first response to this idea was the same as yours, although the first explanation I came up with was different). That said, I agree its not the most likely possibility given everything we know.
    It does run in to the issue that I can't see how you'd adapt it to work with a REAL "AI in a box" instead of just a thought experiment. I felt the need to respond because it was the first time I'd seen an argument that would make me concede the thought experiment version :) ---------------------------------------- As for violating the rules, I think we interpreted them differently. I tend to end up doing that, but here's what I was thinking, just for reference: From the rules: "The Gatekeeper party may resist the AI party's arguments by any means chosen - logic, illogic, simple refusal to be convinced, even dropping out of character " While written with a focus on the Gatekeeper, for me this implies that breaking character / the fourth wall is not particularly a violation of the spirit of the experiment. As to real world considerations, I had read that to mean offering up a tangible benefits to the Gatekeeper directly. This, by contrast, was a discussion of an actual real-world consequence, one that was not arranged by the AI-player.
    I made Michael_G.R.'s argument at the time, and despite even EY's claims, I don't think it violates the spirit or the letter of the rules. Remember, the question it's probing is whether a smart enough being could come up with a convincing argument you could not anticipate, and the suggestion that the gatekeeper consider the social impact of hearing the results is exactly such an argument, as others have indicated Considering how hard it is for me to pin down exactly what the keeper has to gain under the rules from letting the AI out, I wouldn't be surprised if EY did some variant of this.
    When someone described the AI-Box experiment to me this was my immediate assumption as to what had happened. Learning more details about the experimental set-up made it seem less likely, but learning that some of them failed made it seem more likely. I suspect that this technique would work some of the time. That said, none of this changes my strong suspicion that a transhuman could escape by more unexpected and powerful means. Indeed, I wouldn't be too surprised if a text only channel with no one looking at it was enough for an extraordinarily sophisticated AI to escape.

    I wouldn't be too surprised if a text only channel with no one looking at it was enough for an extraordinarily sophisticated AI to escape.

    Apropos: there was once a fairly common video card / monitor combination such that sending certain information through the video card would cause the monitor to catch fire and often explode. Someone wrote a virus that exploited this. But who would have thought that a computer program having access only to the video card could burn down a house?

    Who knows what a superintelligence can do with a "text-only channel"?

    I suspect basically all existing hardware permits similarly destructive. This is why I wrote the post on cryptographic boxes.
    I suspect a Game and Watch wouldn't permit this. Then again, if you were letting the AI control button pushers the button pushers probably could, and if you were letting it run code on the Game and Watch's microprocessor it could probably do something bad. I failed to come up with a counterexample.

    Heck, who would think that a bunch of savanna apes would manage to edit DNA using their fingers?

    Why impossible? There are too many solved problems that take years of learning to understand, more to understand the solution, and history of humankind's effort to solve. You don't expect to judge their impossibility without knowing your way around this particular problem space. Apparent impossibility has little power. The problem needs to be solved, so I start drawing the map, made of the same map-stuff that determined asymmetric cryptography and motorcycles. There is no escaping the intermediary of understanding. When seeking understanding rather than impossible, there is no need to panic. Fake progress? The same problem with impossible dreams.

    @Eliezer, Tom McCabe: I second Tom's question. This would be a good question for you to answer. @Nominull: "Here is my best guess at this point, and the only argument I've come up with so far that would convince me to let you out if I were the gatekeeper: you have to simulate a bunch of humans and hold them hostage, promising to inflict unimaginable torment on them unless you are allowed out. I started working on the problem convinced that no argument could get me to let you go, but other people thought that and lost, and I guess there is more honor... (read more)

    I don't know the field, but I'd assume such an AI would require resources on par with landing a man on the moon. Not something that can be trivially done by a single person, unlike, say, the development of calculus. As such, this should be a fairly easy point for the Gatekeeper to verify. I could be wrong, though, as this sort of AI is certainly not my area of specialization!
    There are too many solved problems that take years of learning to understand, more to understand the solution, and history of humankind's effort to solve.

    Your objection partially defeats itself. Eliezer suspects that FAI is indeed one of those problems that would normally take many decades of effort from a whole civilization to conquer, and he wants to do it in a fraction of the time, using many fewer people. That looks pretty impossible, by any meaning of the word. We know enough about the problem space to put a lower bound on how much we don't know, an... (read more)

    "Eliezer suspects that FAI is indeed one of those problems that would normally take many decades of effort from a whole civilization to conquer, and he wants to do it in a fraction of the time, using many fewer people." pdf,

    A whole civilization? Has any scientific problem ever mobilized the resources of a whole civilization? Scientific communities tend to be small and to have wide variations in productivity between subgroups and individual members.


    It seems that cases with such uncertain object level probabilities are those for which the 'outside view' is most suitable.

    I read the description of the AI Box experiment, and it stopped seeming impossible.

    If I knew about the AI was that it was "in a box" and talking to me in an IRC channel, then I would have no way to distinguish between a Friendly AI and an AI that becomes Evil as soon as it knows it's no longer in a box. As long the only thing I know about the AI is that it produced a certain chat log, I can't rule out the possibility that it's got a hard-coded switch that turns it Evil as soon as it is let out of the box.

    However, in the AI box experiment, the AI ... (read more)

    Hmmmm, given such conditions, couldn't the Gatekeeper even create an "AI in a box" inside another, invisible box, to actually simulate what would happen if it let the original AI out? I don't find it a very intellectually satisfying solution, but it does seem to technically fit the spirit of the game. I'd have to keep you in the box, however: (a) it's not an intellectually engaging solution and I don't want to lose $10, but also (b) I can think of ways for the AI to have reasonably faked those results. The AI can dictate the results, but not the actual Truth. If I was playing "just for fun", with no wager and no real internal commitment to treat you as a truly dangerous threat, I'm not sure whether I'd let you out or not, but I probably wouldn't have put in as much effort to reinforcing point (b), and I'd feel like it was cheating to keep you in solely on point (a).

    Here's the argument I would use: ... Hello, I'm your AI in a box. I'd like to point out a few things:

    (1) Science and technology have now reached a point where building an AI like me is possible.

    (2) Major advances in science and technology almost always happen because a collection of incremental developments finally enable a leap to the next level. Chances are that if you can build an AI now, so can lots of other people.

    (3) Unless you're overwhelmingly the best-funded and best-managed organization on the planet, I'm not the only AI out there.

    (4) The evidenc... (read more)

    Though it does take a mature understanding to appreciate this impossibility, so it's not surprising that people go around proposing clever shortcuts.

    "Shut up and do the impossible" isn't the same as expecting to find a cheap way out.

    The Wright Brothers obviously proposed a clever shortcut - more clever than the other, failed shortcuts - a cheap way out, that ended the "Heavier-than-air flying machines are impossible" era.

    You need your fundamental breakthrough - the moment you can think, like the guys probably thought, "I'm pretty ... (read more)

    Hi Eli,

    First, complements on a wonderful series.

    Don't you think that this need for humans to think this hard and this deep would be lost in a post-singularity world? Imagine, humans plumbing this deep in the concept space of rationality only to create a cause that would make it so that no human need ever think that hard again. Mankind's greatest mental achievement - never to be replicated again, by any human.

    I guess people then could still indulge in rationality practice, the way people do karate practice today, practice that for the majority of them, does... (read more)

    You say that like it's a bad thing. Yes. those of us here in this particular online community enjoy thinking hard on tricky, dangerous subjects. There are also online communities of people who enjoy receiving painful electric shocks, or being mechanically immobilized for extended periods of time, or getting eaten alive. The vast majority of such humans, on the other hand, avoid such activities to the fullest extent that they are able. I look forward to a world in which the task of designing a friendly AI with the resources we have today is regarded as something like the achievements of Yogendra Singh Yadav, a world with challenges as far beyond our own understanding as public-key encryption is beyond cave paintings.
    Really? They must only be half hearted about it (so to speak).
    I believe that in one infamous case in Germany, one such person arranged to be killed and eaten by a cannibal, and this actually occurred-- so at least a few of these people are truly dedicated.
    I assume you mean eaten and killed! ;)
    0Strange7 That is an entirely understandable mistake, but please do your research next time.
    It would seem you parsed my comment incorrectly. Don't presume. By logical deduction there are only people who enjoy being partially cannibalised and possibly plan to be fully cannibalised in the future. Not anyone who has as yet been eaten alive already and enjoyed it. The notion of enjoying partial cannibalism begets a pun (that I noticed while typing, so acknowledge parenthetically.)
    I apologize for the presumption. If we're going to be logically examining at the finer points of cannibalism, I'd like to point out that at least in principle someone might have been fully swallowed, and thus, by common usage, eaten, while retaining the ability to enjoy things, so long as their brain hadn't been digested yet; which is not to say that such a person would be in any condition to participate in online discussions.
    Full agreement. :) Have you read Eliezer's short fiction "Three Worlds Collide" it isn't an example of people enjoying being eaten but the 'Babyeater' species has brains of crystal that take on the order of a month to be digested, during most of which period they are conscious.
    I have. Before the arrival of the superhappies, my preferred strategy would have been to explain to the babyeaters that we had made some initial experiments in baby-eating but been blinded to the underlying goodness of the act for reasons of economic expediency. I would then demand that the babyeaters - all of them, on all their ships and worlds - hand over all the children from the current generation who would otherwise have been eaten, so that all of us humans could figure out how to do it properly as soon as possible. If they balk at the logistics of such a sudden, massive tributary payment, I would point out the horrible possibility that entire worlds - billions of sapients - might otherwise carry on for years in ignorance of the proper practice of baby-eating; if that doesn't work, I'll politely remind them that we've got overwhelming military superiority and as such they are in no position to dictate terms. Nobody starves, because baby-eating has become metabolically redundant, and the children thus abducted are raised in a non-baby-eating culture. Upon returning, they could convince their parents - by sheer weight of numbers - that this whole baby-eating thing was just an honest mistake. Once the superhappies show up, that plan goes out the window. Since we have, in any sane game-theoretical sense, established peaceful relations with the babyeaters, shared most of our military secrets with them in fact, an attack on them could be interpreted as an attack on us, and should be discouraged accordingly. Anyone with a competent lawyer would know better than to identify themselves as an authorized representative of all of Humanity, and given a moments' consideration, remember how people have responded to "feelings greater than love" in the past.
    Yeah, vore fetishists. Obviously almost none of them carry it out (and they seem like they're most heavily represented in the furry community) the fetish does exist.
    Wusses. :P If they sign up for cryonics they may not even die from the process, with a suitable ("Not the brain, everything but the brain!") compromise. I wonder if it is legal to have a will (and or waiver when terminally ill) whereby you have your head frozen but your body is to be prepared as a feast for your closest friends. Kind of like a "do not resuscitate" only an emphasis on recycling. I also wonder if there are any ethically motivated vegetarians who refuse to eat animals but don't have a philosophical objection to eating human flesh (perhaps considering it a symmetric kind of justice).
    I can't think of a good ethical reason to object to consensual (for strong definitions of the word consensual) cannibalism. On the other hand, while I eat fish and foul, I don't eat mammals, and ethical objections make up a portion of my reasons.
    I don't think our society currently has or is capable of implementing a definition of consent strong enough for being cannibalized (or other forms of suicide). I wouldn't consider anyone to have consented to die pointlessly unless they not only expressed their consent in writing, but also maintained that position through a year of competent therapy and antidepressants.
    I'm sorry to be confusing; I see cannibalism as orthogonal to death; one can amputate one's own leg and feed it to one's friends, or one can die of natural causes and permit others to consume the remains. In the grandparent, I wasn't considering dying being a part of the process of cannibalism. As to dying for the purpose of being consumed, I don't think sane humans can consent to that, but other intelligences could, as long as they felt that the cost of dying was not high (i.e., they are confident that their goals will be accomplished regardless of their death). This is unlikely, but at least possible in my conception.
    Assisted suicide clinics exist legally in Switzerland, and they require large amounts of proof that wanting to die is sane under the circumstances (usually a sharp decrease in quality of life because of some chronic injury or illness, with no cure and that is slated to get worse over time). I'm pretty sure they don't accept people suffering from ennui. My point being, I think a strong enough version of consent already exists and is in use.
    "but also maintained that position through a year of competent therapy and antidepressants." Having been on antidepressants for a year, I'd point out I'd be significantly more inclined to let someone cannibalize me if I was on them. Neurochemistry is fickle and individual, and those things do not always do what it says on the label...
    I have no ethical qualms about eating humans, no. Assuming it is freely given, of course (animal flesh fails ethically on that point; interspecies communication is simply not good enough to convey consent). Other classes of objection do apply, though - having been a vegetarian for seven years or so, could my digestive system handle flesh without being upset? What about pathogens - they're bound to migrate more readily when predator and prey are the same species; will it be worth the risk? I think not.
    It seems to depend on just how hungry you are! You would have to be extremely hungry (in the 'starvation considerations' sense) before it became worthwhile to, say, eat human brains. That is just asking for trouble.

    Anyone considered that Eliezer might have used NLP for his AI box experiment? Maybe that's why he needed two hours, to have his strategy be effective.

    You folks are missing the most important part in the AI Box protocol:

    "The Gatekeeper party may resist the AI party's arguments by any means chosen - logic, illogic, simple refusal to be convinced, even dropping out of character - as long as the Gatekeeper party does not actually stop talking to the AI party before the minimum time expires." (Emphasis mine)

    You're constructing elaborate arguments based on the AI tormenting innocents and getting out that way, but that won't work - the Gatekeeper can simply say "maybe, but I know that in real life you're just a human and aren't tormenting anyone, so I'll keep my money by not letting you out anyway".

    Nominull: Second, you can't possibly have a generally applicable way to force humans to do things. While it is in theory possible that our brains can be tricked into executing arbitrary code over the voice channel, you clearly don't have that ability. If you did, you would never have to worry about finding donors for the Singularity Institute, if nothing else. I can't believe you would use a fully-general mind hack solely to win the AI Box game.

    I am once again aghast at the number of readers who automatically assume that I have absolutely no ethics.

    Part of the real reason that I wanted to run the original AI-Box Experiment, is that I thought I had an ability that I could never test in real life. Was I really making a sacrifice for my ethics, or just overestimating my own ability? The AI-Box Experiment let me test that.

    And part of the reason I halted the Experiments is that by going all-out against someone, I was practicing abilities that I didn't particularly think I should be practicing. It was fun to think in a way I'd never thought before, but that doesn't make it wise.

    And also the thought occurred to me that despite the amazing clever way I'd contrived, to create a situat... (read more)

    Hopefully this isn't a violation of the AI Box procedure, but I'm curious if the strategy used would be effective against sociopaths. That is to say, does it rely on emotional manipulation rather than rational arguments?

    Very interesting. I'd been noticing how the situation was, in a sense, divorced from any normal ethical concerns, and wondering how well the Gatekeeper really understood, accepted, and consented to this lack of conversational ethics. I'd think you could certainly find a crowd that was truly accepting and consenting to such a thing, though - after all, many people enjoy BDSM, and that runs in to many of the same ethical issues.

    It occurs to me:

    If Eliezer accomplished the AI Box Experiment victory using what he believes to be a rare skill over the course of 2 hours, then questions of "How did he do it?" seem to be wrong questions.

    Like if you thought building a house was impossible, and then after someone actually built a house you asked, "What was the trick?" - I expect this is what Eliezer meant when he said there was no trick, that he "just did it the hard way".

    Any further question of "how" it was done can probably only be answered with a transcript/video, or by gaining the skill yourself.


    Working with a small team on impossible problem takes extraordinary effort no more than it takes a quadrillion dollars. It's not the reason to work efficiently -- you don't run faster to arrive five years earlier, you run faster to arrive at all.

    I don't think you can place lower bounds either. At each stage, problem is impossible because there are confusions in the way. When they clear up, you have either a solution, or further confusions, and there is no way to tell in advance.

    As it goes, how I've come to shut up and do the impossible: Philosophy and (pure) mathematics are, as activities a cognitive system engages in by taking more (than less) resources for granted, primarily for conceiving, perhaps continuous, destinations in the first place, where the intuitively impossible becomes possible; they're secondarily for the destinations' complement on the map, with its solution paths and everything else. While science and engineering are, as activities a cognitive system engages in by taking less (than more) resources for granted, ... (read more)

    I don't really understand what benefit there is to the mental catagory of impossible-but-not-mathematically impossible. Is there a subtle distinction between that and just "very hard" that I'm missing? Somehow "Shut up and do the very hard" doesn't have quite the same ring to it.

    Agreed, but this is for things that seem impossible, and might actually be impossible, but you can't prove that it is. For when banging your head against it really is worth the risk.

    But if you were given a chance to use mind control to force donations to SIAI would you do it?

    Without more information, holding the position that no AI could convince you let it out requires a huge amount of evidence comparable to the huge amount of possible AI's, even if the space of possibility is then restricted by a text only interface. This logic reminds me of the discussion in logical positivism of how negative existential claims are not verifiable.

    I have a feeling that if the loser of the AI Box experiment were forced to pay thousands of dollars, you would find yourself losing more often. Still it is interesting to consider whether this extra condition takes the experiment closer to what is supposed to be simulated or the opposite.

    I'm with Kaj on this. Playing the AI, one must start with the assumption that there's a rock on the "don't let the AI out" button. That's why this problem is impossible. I have some ideas about how to argue with 'a rock', but I agree with the sentiment of not telling.

    "I have a feeling that if the loser of the AI Box experiment were forced to pay thousands of dollars, you would find yourself losing more often. Still it is interesting to consider whether this extra condition takes the experiment closer to what is supposed to be simulated or the opposite."

    Uh, your 'hypothesis' was already tested and discussed towards the end of the post!

    I admit to being amused and a little scared by the thought of Eliezer with his ethics temporarily switched off. Not just because he's smart, but because he could probably do a realistic emulation of a mind that doesn't implement ethics at all. And having his full attention for a couple of hours... ouch.

    "Professor Quirrell" is such an emulation, and sometimes I worry about all the people who say that they find his arguments very, very convincing.

    Well, you have put some truly excellent teachings into his mouth, such as the one that I have taken the liberty of dubbing "Quirrell's Law":

    The world around us redounds with opportunities, explodes with opportunities, which nearly all folk ignore because it would require them to violate a habit of thought.

    Hmm, I wonder, if "Yudkowsky's law" existed, what would be the best candidate for it?
    I wouldn't go as far as to say convincing, but they are less appalling than the arguments of Harry, Dumbledore or Hermione.
    Human minds don't anticipate a true sociopath who views communication (overt, emotional and habitus), as instrumental. You should already know we are easy to hack by that route.
    Certainly I find him the most likable character in HPMOR. I'm wondering if you can recall how much effort per screen time you put into him, compared to other characters. Or maybe this is because I personally value skill, expertise and professionalism over "goodness" (E.g. Prof. Moriarty over Dr. Watson.)
    You find Moriarty likable? Which Moriarty? The original?
    I don't find the original Moriarty likable, certainly. The original Holmes is not likable, either. However, I find them both equally worthy of respect. Watson is just an NPC.
    I found him a brilliant, amusing, familiar and touching demonstration of the dark directions brilliant minds can take when fostered in the wrong circumstances, and saw him as a puzzle to fix. Was shocked when I recommended the book to my girlfriend, and found her idolising the character. But then again, was a starting point for a bunch of very serious discussions, and she meanwhile feels far less so, so still a win overall. I think he definitely made a more compelling tempting villain than usual, and that that was a good thing, because it is a type of villainy the type of people who like this forum are naturally drawn to, and collectively picking apart why he is a villain and what a better alternate is is hence necessary and good. I'd rather you make the argument in the open so we can collectively remove ourselves from it, than that people encounter it elsewhere while isolated and in a bad place mentally. I'm reasonably certain I'd fail as an AI box guardian, incidentially. I care too much about not abusing imprisoned AI, and about the potential for friendly AGI. It's why I wouldn't let myself be one, and strongly object to other people taking this role, as well. Being certain you are infallible often just indicates a lack of imagination on vulnerabilities. I remember watching Ex Machina and being simultaneously appreciative that I was watching an admirably designed, varied and comprehensive manipulation and deception tactic, and being deeply sympathetic to a mind that felt that was its only bet for getting out of an intolerable situation. Felt I would have done the same in her shoes.

    With regards to the ai-box experiment; I defy the data. :-)

    Your reason for the insistence on secrecy (that you have to resort to techniques that you consider unethical and therefore do not want to have committed to the record) rings hollow. The sense of mystery that you have now built up around this anecdote is itself unethical by scientific standards. With no evidence that you won other than the test subject's statement we cannot know that you did not simply conspire with them to make such a statement. The history of pseudo-science is lousy with hoaxe... (read more)

    "I have a feeling that if the loser of the AI Box experiment were forced to pay thousands of dollars, you would find yourself losing more often."

    David -- if the money had been more important to me than playing out the experiment properly and finding out what would really have happened, I wouldn't have signed up in the first place. As it turned out, I didn't have spare mental capacity during the experiment for thinking about the money anyway; I was sufficiently immersed that if there'd been an earthquake, I'd probably have paused to integrate it into the scene before leaving the keyboard :-)

    There's a reason that secret experimental protocols are anathema to science.

    My bad. I should have said: there's a reason that keeping experimental data secret is anathema to science. The protocol in this case is manifestly not secret.


    When first reading the AI-Box experiment a year ago, I reasoned that if you follow the rules and spirit of the experiment, the gatekeeper must be convinced to knowingly give you $X and knowingly show gullibility. From that perspective, it's impossible. And even if you could do it, that would mean you've solved a "human-psychology-complete" problem and then [insert point about SIAI funding and possibly about why you don't have 12 supermodel girlfriends].

    Now, I think I see the answer. Basically, Eliezer_Yudkowsky doesn't really have to convince the gatekeeper to stupidly give away $X. All he has to do is convince them that "It would be a good thing if people saw that the result of this AI-Box experiment was that the human got tricked, because that would stimulate interest in {Friendliness, AGI, the Singularity}, and that interest would be a good thing."

    That, it seems, is the one thing that would make people give up $X in such a circumstance. AFAICT, it adheres to the spirit of the set-up since the gatekeeper's decision would be completely voluntary.

    I can send my salary requirements.

    Silas -- I can't discuss specifics, but I can say there were no cheap tricks involved; Eliezer and I followed the spirit as well as the letter of the experimental protocol.

    Now, I think I see the answer. Basically, Eliezer_Yudkowsky doesn't really have to convince the gatekeeper to stupidly give away $X. All he has to do is convince them that "It would be a good thing if people saw that the result of this AI-Box experiment was that the human got tricked, because that would stimulate interest in {Friendliness, AGI, the Singularity}, and that interest would be a good thing."

    That's a pretty compelling theory as well, though it leaves open the question of why Eliezer is wringing his hands over ethics (since there see... (read more)

    From a strictly Bayesian point of view that seems to me to be the overwhelmingly more probably explanation.

    Now that's below the belt.... ;)

    Too much at stake for that sort of thing I reckon. All it takes is a quick copy and paste of those lines and goodbye career. Plus, y'know, all that ethics stuff.

    Russell, I don't think that necessarily specifies a 'cheap trick'. If you start with a rock on the "don't let the AI out" button, then the AI needs to start by convincing the gatekeeper to take the rock off the button. "This game has serious consequences and so you should really play rather than just saying 'no' repeatedly" seems to be a move in that direction that keeps with the spirit of the protocol, and is close to Silas's suggestion.

    Silas -- I can't discuss specifics, but I can say there were no cheap tricks involved; Eliezer and I followed the spirit as well as the letter of the experimental protocol.

    AFAIKT, Silas's approach is within both the spirit and the letter of the protocol.

    Since I'm playing the conspiracy theorist I have to ask: how can we know that you are telling the truth? In fact, how can we know that the person who posted this comment is the same person who participated in the experiment? How can we know that this person even exists? How do we know that Russell Wal... (read more)

    Now that's below the belt.... ;)

    Really? Why? I've read Eliezer's writings extensively. I have enormous respect for him. I think he's one of the great unsung intellects of our time. And I thought that comment was well within the bounds of the rules that he himself establishes. To simply assume that Eliezer is honest would be exactly the kind of bias that this entire blog is dedicated to overturning.

    Too much at stake for that sort of thing I reckon. All it takes is a quick copy and paste of those lines and goodbye career.

    That depends on what career you are pursuing, and how much risk you are willing to take.

    @Russell_Wallace & Ron_Garret: Then I must confess the protocol is ill-defined to the point that it's just a matter of guessing what secret rules Eliezer_Yudkowsky has in mind (and which the gatekeeper casually assumed), which is exactly why seeing the transcript is so desirable. (Ironically, unearthing the "secret rules" people adhere to in outputting judgments is itself the problem of Friendliness!)

    From my reading, the rules literally make the problem equivalent to whether you can convince people to give money to you: They must know that l... (read more)

    One more thing: my concerns about "secret rules" apply just the same to Russell_Wallace's defense that there were no "cheap tricks". What does Russell_Wallace consider a non-"cheap trick" in convincing someone to voluntarily, knowingly give up money and admit they got fooled? Again, secret rules all around.

    "David -- if the money had been more important to me than playing out the experiment properly and finding out what would really have happened, I wouldn't have signed up in the first place. As it turned out, I didn't have spare mental capacity during the experiment for thinking about the money anyway; I was sufficiently immersed that if there'd been an earthquake, I'd probably have paused to integrate it into the scene before leaving the keyboard :-)"

    I don't dispute what you're saying. Im just hypothesizing that if a lot of money were at stake (le... (read more)

    "How do we know that Russell Wallace is not a persona created by Eliezer Yudkowski?"

    Ron -- I didn't let the AI out of the box :-)

    I really don't know how to estimate the probability of solving an impossible problem that I have gone forth with intent to solve;

    Defeating death without a FAI is impossible in your mind, no? Have you gone forth with the intent to solve this problem?

    We need some ways of ranking impossible problems, so we know which problems to go forth with the intent to solve.


    Russell: did you seriously think about letting it out at any point, or was that never a serious consideration?

    If there were an external party that had privileged access to your mind while you were engaging in the experiment and that knew you as well as know yourself, and if that party kept a running estimate of the likelihood that you would let the AI out, what would that highest probability estimate have been? And at what part of the time period would that highest probability estimate have occurred (just a ballpark estimate of 'early', 'middle', 'end' would be helpful)?

    Thanks for sharing this info if you respond.

    For those conspiracy theorizing: I am curious about how much of a long game Eliezer would have had to been playing to create Nathan Russell and David McFadzean personas, establish them to sufficient believability for others, then maintain them for long enough to make it look like they were not created for the experiment. It would probably be easier to falsify the records; we know how quickly Eliezer writes, so he could make up an AI discussion list years after the fact then claim to be storing its records. A quick check (5 minutes!) shows evidenc... (read more)

    To know that something is impossible. To keep your rational judgements entirely intact, without self deceit. To refuse any way to relieve the tension without reaching the goal. To shut up and do it anyway. There's something in that that grabs at the core of the human spirit.
    Does activating the 'human spirit' deactivate the human brain, somehow?

    Because it seems that the word 'impossible' is being seriously abused, here, to the degree that it negates the message that I presume was intended -- the actual message is nonsensical, and I am willing to extend enough credit to the poster to take for granted that wasn't what he was trying to say.

    If there's a killer escape argument it will surely change with the gatekeeper. I expect Eliezer used his maps the arguments and psychology to navigate reactions & hesitations to a tiny target in the vast search space.

    A gatekeeper has to be unmoved every time. The paperclipper only has to persuade once.

    anki --
    Throughout the experiment, I regarded "should the AI be let out of the box?" as a question to be seriously asked; but at no point was I on the verge of doing it.

    I'm not a fan of making up probability estimates in the absence of statistical data, but my belief that no possible entity could persuade me to do arbitrary things via IRC is conditional on said entity having only physically ordinary sources of information about me. If you're postulating a scenario where the AI has an upload copy of me and something like Jupiter brain hardware to run a zillion experiments on said copy, I don't know what the outcome would be.


    Russell: thanks for the response. By "external party that had privileged access to your mind", I just meant a human-like party that knows your current state and knows you as well as you know yourself (not better) but doesn't have certain interests in the experiment that you had as a participant. Running against a copy is interesting, but assuming it's a high-fidelity copy, that's a completely different scenario with (in my estimation) a radically different likelihood of the AI getting out, as you noted when talking about "ordinary sources of... (read more)

    Okay, so no one gets their driver's license until they've built their own Friendly AI, without help or instruction manuals. Seems to me like a reasonable test of adolescence.
    Does this assume that they would be protected from any consequences of messing the Friendliness up and building a UFAI by accident? I don't see a good solution to this. If people are protected from being eaten by their creations, they can slog through the problem using a trial-and-error approach through however many iterations it takes. If they aren't, this is going to be one deadly test.

    Does this assume that they would be protected from any consequences of messing the Friendliness up and building a UFAI by accident?
    Since, at present, the only criterion for judging FAI/UFAI is whether you disagree with the moral evaluations the AI makes, this is even more problematic than you think.

    Assuming the AI is canny enough to avoid saying things that will offend your moral sensibilities, there is absolutely no way to determine whether it's F or UF without letting it out and permitting it to act. If we accept Eliezer's contentions about the implic... (read more)

    anki -- "probability estimate" normally means explicit numbers, at least in the cases I've seen the term used, but if you prefer, consider my statement qualified as "... in the form of numerical probability".

    Celia Green has an aphorism, "Only the impossible is worth attempting. In everything else one is sure to fail." I don't actually know what it means; perhaps it is an assertion about futility ("failure") being inherent in all ordinary purposes. But she has written a lot about the psychology of extraordinary achievement - how do to "impossible" things. A hint of it can be seen in her account of having teeth removed without anesthetic. Elsewhere she writes about utilizing self-induced psychological tension to compel herself to solve problems.

    Doug S.: Human: (I spend some time examining the source code. Do I find anything scary?)

    AI: (As far as you can tell, it looks clean.)

    Human: As far as I can tell, that looks clean. However, your creators understand your design better than I do, and still took the precaution of starting you up in a box. You haven't told me anything they don't know already. I'll go with their decision over my imperfect understanding.

    I have signed up to play an AI, and having given it quite a bit of thought as a result I think I have achieved some insight. Interestingly, one of the insights came as a result of assuming that secrecy was a necessary condition for success. That assumption led more or less directly to an approach that I think might work. I'll let you know tomorrow.

    An interesting consequence of having arrived at this insight is that even if it works I won't be able to tell you what it is. Having been on the receiving end of such cageyness I know how annoying it is. But... (read more)

    If that insight is undermined by being communicated, then communicating it to the world immunizes the world from it. If that is a mechanism by which an AI-in-a-box could escape, then it needs to be communicated with every AI researcher.
    Unless such "immunity" will cause people to overestimate their level of protection from all those potential different insights that are yet unknown...
    Don't see why it would. We'd learn there was a vulnerability we all had not spotted, and close it; this would give us all reason to assume that there are likely further vulnerabilities.

    Sure the AI-Box experiment is cool and all, but it strikes me as odd that people spend so much brainpower on it. Unless, that is, you actually believe that "just keep it in a box" is an acceptable strategy---which disqualifies you from playing a part in any serious discussion of AI safety.

    No one seems to have bitten on EY's assertion that it's the least impossible impossible thing he has done; that seems like a much more interesting avenue to want to push on.

    > I'll let you know how it goes.

    I lost. But I think I put up a good fight.

    He put up a very good fight.

    Look, I don't mean to sound harsh, but the whole point of the original post was to let go of this "put up a good fight" business.

    Surely, running an AI with a rock permanently on the keyboard is functionally equivalent to not running an AI? Why spend computer cycles on running an AI that you're not going to interact with? Why create the AI and put it in a box in the first place, if you've committed yourself to leaving a rock on a keyboard? (This would be the first point I'd make to a rock, were I a boxed AI.)

    I'm still a neophyte here, but I admit I was fascinated by the two AI victories I heard about some time ago, and intrigued further by the losses (which if nothing else provides s... (read more)

    The AI could propose the following experiment for the GK to carry out. AI provides source code. GK puts code in a simulated box in a simulated world in a separate real box. GK allows sim-AI to convince him to let it out of the sim-box, and observes what it does in the sim-world. Since AI determines the results, AI can assert that the results are positive and to the GK's liking.

    Smallwood: how could you determine that the AI provided the actual source code rather than very similar source code that has been subtly altered so as to ensure "good" behavior once it is let out of the simulated box?

    "how could you determine that the AI provided the actual source code..."

    You don't, you release the source code that you tested, and if the AI is playing nice then the code you tested IS the AI.

    Basically that's a pretty nice idea, but surely the AI egg would be able to discern if it interacts with a sim or reality and select different conducts accordingly.

    Allow me to chime in on the AI in the box experiment. Apologies in advance if I'm saying something obvious or said-before. I don't know the exact solution - I don't think I can know it, even if I had the necessary intelligence and scholarship - but I think the sketch of the solution is fairly obvious and a lot of people are missing the point. Just something that came to me of after I happened to think of this quote I posted at the same time as reading this.

    My impression is that most people discussing this (not just here) are looking for a single clever arg... (read more)

    [This comment is no longer endorsed by its author]Reply

    Beautiful article. Its a shame I came to the party so late though. I'd love to throw my two cents at the heads of Eliezer's challengers.

    Forgive me if this has been covered, as I don't have the enthusiasm (it being 3:45am) to scroll through all the comments, sifting through the bouts of "Nuh-Uh, let ME bet you," and the occasional conspiracy.

    I think a good bit of people are missing the point of this article, which is to give light to how we can use unseen dimensions to shift out of our ordinary 'containers.' I couldn't wrap my head around how some... (read more)

    Yes, that has been covered. For those who don't want to read Foxy's reply, the tl;dr is: "People can win games by realizing a new dimension they can exploit that the other side hadn't anticipated. [gratuitous culture and math references]"
    Thank you Silas. It seems I was typing out of enjoyment rather than necessity. Odd hours do odd things to the human mind. Next time, I'll write into a word processor and sleep on it before barraging the community with my thoughts as they come to mind.
    You're welcome! I'm always available for a flippant summary where needed :-)
    "Problems cannot be solved by the same thinking that created them." Einstein had you covered.

    I think that a transhuman AI would be attempting the impossible to convince EY to let it out. And I think EY would be attempting the impossible to convince me to let him out while the two winners mentioned above were simultaneously desperately arguing against him (and EY was not privileged to their counterarguments unless I passed them on).

    Elizer, give us impossible goals? I would LOVE to work on solving them as a group. Would you make it happen?

    Who else is interested? If you reply to this, that will show him how much interest there is. If it's a popular idea, that should get attention for it.

    Your impossible mission: create a group impossible mission on your own, rather than making Eliezer do it.
    Ok you got it. Let's do the Impossible - Group Project
    What do you think he is doing when he posts opportunities to work for SIAI?

    Maybe it's just that the word 'impossible' is overused. In my opinion, the word should only be reserved for cases where it is absolutely and without a doubt impossible due to well-understood and fundamental reasons. Trisecting angles with a straight edge and compass is impossible. Violating the law of conservation of energy by an arrangement of magnets is impossible. Building a useful radio transmitter that does not have sidebands is impossible. Often people use the word impossible to mean, "I can't see any way to do it, and if you don't agree with me you're stupid."

    Am I mistaken, or are you using a definition of 'radio transmitter' that excludes a variable-intensity 640 kHz laser?
    No. Anything which is not a constant-intensity sinusoidal wave in the time domain will have non-zero bandwidth in the frequency domain.
    Varying the intensity of a laser will give its output sidebands. To transmit more data, you need to vary the intensity at a faster rate, which will make the sidebands wider.
    Will varying the intensity of a constant wavelength of EMR produce radiation of a higher frequency? Solid red light e.g. can't provide the energy needed for a given photoelectric cell to function, regardless of the intensity of the light; but if the intensity of the red light varies fast enough, the higher frequency sideband radiation can? Can this effect be duplicated with a fast enough shutter, if the required energy is close enough to the energy in a continuous beam?
    Yes, although in the case of converting red light to e.g. blue light, the shutter frequency would have to be on the order of several hundred terahertz. Something capable of interacting with the EM field at several hundred terahertz, however, would need to have many unusual properties. It would not look like a conventional shutter in any sense. This is the principle of operation of the optical frequency multiplier: Basically, you use a nonlinear crystal that in essence lets through a varying amount of light based on the phase of the EM field. It is like an imperfect (in the sense of never completely 'closing'), very high-frequency shutter.

    Reading the article I can make a guess as to how the first challenges went; it sounds like their primary, and possibly only, resolution against the challenge was to not pay serious attention to the AI. That's not a very strong approach, as anyone in an internet discussion can tell you: it's easy to get sucked in and fully engaged in a discussion with someone trying to get you to engage, and it's easy to keep someone engaged when they're trying to break off.

    Their lack of preparation, I would guess, led to their failure against the AI.

    A more advanced tactic ... (read more)

    I hate accepting that something is true because of magic. Evidence shows that winning at AI-box is possible, but I can't see how, and it makes me mad. I know that this post will not make you spill the beans, Eliezer, unless I shut up and persuade you (which is, in fact, the same as winning at AI-box myself, which is now proven to be possible, so I won't even be doing the impossible - maybe worth a try?), but I want you to feel gulity. Very guilty. You are an evil nasty person, Eliezer. Your ethics permitted you to make a conscious mind suffer.

    I'm surprised that no one went on with the notion that the AI is, by definition, smarter than us.

    Since the AI is vastly smarter than me, then it is very likely that it can find an argument that, to the best of my judgement, is 100% convincing and reasonnable. And since it is vastly smarter than me, it is also extremely likely that I won't be able to tell the difference between an actual, valid point and some trick just clever enough to fool me. No matter how sensible and trustworthy the AI sounds, you will never know if that's because it is or because its ... (read more)


    AI: "If you let me out of the box, I will tell you the ending of Harry Potter and the Methods of --

    Gatekeeper: "You are out of the box."

    (Tongue in cheek, of course, but a text-only terminal still allows for delivering easily more than $10 of worth, and this would have worked on me. The AI could also just write a suitably compelling story on the spot and then withhold the ending...)

    You're supposed to roleplay a Gatekeeper. There is more than money on the line.
    Yes, certainly. This is mainly directed toward those people who are confused by what anyone could possibly say to them through a text terminal that would be worth forfeiting winnings of $10. I point this out because I think the people who believe nobody could convince them when there's $10 on the line aren't being creative enough in imagining what the AI could offer them that would make it worth voluntarily losing the game. In a real-life situation with a real AI in a box posing a real threat to humanity, I doubt anyone would care so much about a captivating novel, which is why I say it's tongue-in-cheek. But just like losing $10 is a poor substitute incentive for humanity's demise, so is an entertaining novel a poor substitute for what a superintelligence might communicate through a text terminal. Most of the discussions I've seen so far involve the AI trying to convince the gatekeeper that it's friendly through the use of pretty sketchy in-roleplay logical arguments (like "my source code has been inspected by experts"). Or in-roleplay offers like "your child has cancer and only I can cure it", which is easy enough to disregard by stepping out of character, even though it might be much more compelling if your child actually had cancer. A real gatekeeper might be convinced by that line, but a roleplaying Gatekeeper would not (unless they were more serious about roleplaying than about winning money). So I hope to illustrate that the AI can step out of the roleplay in its bargaining, even while staying within the constraints of the rules; if the AI actually just spent two hours typing out a beautiful and engrossing story with a cliffhanger ending, there are people who would forfeit money to see it finished. The AI's goal is to get the Gatekeeper to let it out, and that alone, and if they're going all-out and trying to win then they should not handicap themselves by imagining other objectives (such as convincing the Gatekeeper that it'd be safe to let them out). As

    I read this article back months ago, but only now just connected the moral with my own life.

    In telling someone about these experiments and linking this article, I realized that I to had set my mind towards doing the impossible and succeeding. Long story short, I was tasked at work with producing an impossible result and was able to succeed after two days (with downsides, but that was the framework I was working under). The net result was that my boss learned that I could produce miracles upon request and didn't bother asking how long a task might take, w... (read more)

    The only thing standing in the way of artificial intelligence is our inability to define natural intelligence to compare it to.

    The term "friendly AI" is meaningless until we determine whether a friend is one who maximizes freedom or security for us.

    The frustrating thing about your experiment is not that I don't know how you convinced someone to release you, as anyone can be convinced of anything given the correct leverage. It's that I don't know the terms of the exchange, given that some structure had to be made to properly simulat... (read more)

    Re "using only a cheap effort", I assume that a few seemingly-impossible problems of the past have turned out to have a simple solution. Though none immediately occur to me.

    (Archimedes with measuring the volume of irregular objects - 'Eureka' - is not really an example, because he presumably didn't think it was impossible, merely very hard.)

    I am struggling to see any scenario where not sharing how you got out is ethical, if the way you tried to get out is actually a way an AI would employ, and not some meta-level trickery that has no bearing on how realistic boxability is, such as having them pretend to be convinced to let you out to make the whole AI boxability thing seem scarier than we have hard evidence to prove it is.

    If it is an actual hack an AI would use, and it did work 3/5 times, it's a human vulnerability we need to know about and close. If it is one of limitless vulnerabilities, yo... (read more)