Ming the Merciless offers you a choice that you cannot refuse. Either (a) his torturer will rip one of your fingernails off, or (b) his torturer will inflict pain more intense than you can imagine, continuously for the next 24 hours, without otherwise harming you. But in case (b) only, his evil genius neuroscientists will cause you to afterwards completely forget the experience, and any other aftereffects from the stress will be put right as well. If you refuse to make a choice, you will get (b) without the amnesia.
What do you choose?
If you choose (a), how much worse would (a) have to be, for you to choose (b)? If you choose (b), how much less bad would (a) have to be, for you to choose (a)?
I choose (b) for instrumental reasons, but would much prefer (a) if I didn't
have to worry so much about preserving my mental equilibrium.
4byrnema14y
My interpretation of this scenario flip-flops.
Since I will forget the experience (b), I sometimes interpret this question as
being equivalent to whether I prefer having something minor happen to me (a) or
something more serious happen to someone else (b). Then deciding how bad (a)
would need to be before I choose (b) becomes a squirm-worthy ethical question.
Yet on alternating thoughts, I realize the choice is not as bad as choosing
whether case (a) happens to me or case (b) happens to someone else because I
still need to factor in that the other person will forget the torture after it
happens, so it doesn't happen to them either. I might as well say it's happening
to me. But this feels like rationalization to avoid having my fingernail taken
off, which I really don't want either.
In the end, I'm just confused about it.
3andrewc14y
I choose (b) without the amnesia. Why? Because fuck Ming, that's why!
Or more seriously, by refusing to play Ming's bizzare little game you deny him
the utility he gets from watching people agonise about what the best choice is.
Turn it up to 11, Ming you pussy!
Or maybe I already chose (b) and can't remember...
1rhollerith14y
I choose (b) without hesitation. There is not some counter or accumulator
somewhere that is incremented any time someone has a positive experience and
decremented every time someone has a negative experience.
EDIT. To answer Kennaway's second question, there is no way to attenuate (a) to
make me prefer it to (b). I'd choose (b) even if the alternative was a dust
speck in my eye or a small scratch on my skin because the dust speck and the
scratch have a nonzero probability of negatively affecting my vision or my
health.
9MBlume14y
To the best of our knowledge, the universe will run down one day, and all our
struggling will come to nothing.
A meta-ethics which says "nothing temporary can matter" means all utilities come
to zero in such a universe.
0rhollerith14y
I am in essential agreement with MBlume. It is more likely than not that the
space-time continuum we find ourselves in will support life and intelligence for
only a finite length of time. But even if that is the case, there might be
another compartment of reality beyond our space-time continuum that can support
life or intelligence indefinitely. If I affect that other compartment (even if I
merely influence someone who influences someone who communicates with the other
compartment) then my struggling comes to more than nothing.
If on the other hand, there really is no way for me or my friends to have a
permanent effect on reality, then I have no preference for what happens.
5AnnaSalamon14y
People use the word "preference" to mean many things, including:
1. Felt emotional preference;
2. Descriptive model of the the preferences an outside observer could use to
predict one's actual behavior;
3. Intellectual framework that has an xml tag "preference", that accords with
some other xml tag "the right thing to do", and perhaps with what one
verbally advocates;
4. Intellectual framework that a particular verbal portion of oneself, in
practice, tries to manipulate the rest of oneself into better following.
I take it you mean "preference" in senses 3 and 4, but not in sense 1 or 2?
0rhollerith14y
Anna, you are incorrect in guessing that my statement of preference is less than
extremely useful for an outside observer to predict my actual behavior.
In other words, the part of me that is loyal to the intellectual framework is
very good at getting the rest of me to serve the framework.
The rest of this comment consists of more than most readers probably want to
know about my unusual way of valuing things.
I am indifferent to impermanent effects. Internal experiences, mine and yours,
certainly qualify as impermanent effects. Note though that internal experiences
correlate with things I assign high instrumental value to.
OK, so I care only about permanent effects. I still have not said which
permanent effects I prefer. Well, I value the ability to predict and control
reality. Whose ability to predict and control? I am indifferent about that: what
I want to maximize is reality's ability to predict and control reality: if
maximizing my own ability is the best way to achieve that, then that is what I
do. If maximizing my friend's ability or my hostile annoying neighbor's ability
is the best way, then I do that. When do I want it? Well, my discount rate is
zero.
That is the most informative 130 words I can write for improving the ability of
someone who does not know me to predict the global effects of my actual
behavior.
Since I am in a tiny, tiny minority in wanting this, I might choose to ally
myself with people with significantly different preferences. And it is probably
impossible in the long term to be allies or colleagues or coworkers with a group
of people who all roughly share the same preferences without in a real sense
adopting those preferences as my own.
But the preferences I just outlined are the criteria I'd use to decide who to
ally with. The single criterion that is most informative in predicting who I
might ally with BTW is the prospective ally's intrinsic values' discount rate's
being low.
2AnnaSalamon14y
I understand that your stated goal system has effects on your external behavior.
Still, I was trying to understand your claim that "If... there really is no way
for me or my friends to have a permanent effect on reality, then I have no
preference for what happens" (emphasis mine). Imagine that you were somehow
shown a magically 100% sound, 100% persuasive proof that you could not have any
permanent effect on reality, and that the entire multiverse would eventually
end. In this circumstance, I doubt very much that the concept “Hollerith’s aims”
would cease to be predictively useful. Whether you ate breakfast, or sought to
end your life, or took up a new trade, or whatever, I suspect that your actions
would have a purposive structure unlike the random bouncing about of inanimate
systems. If you maintain that you would have no "preferences" under these
circumstances (despite a model of "Hollerith's preferences" being useful to
predict your behavior under these circumstances), this suggests you're using the
term "preferences" in an interesting way.
The reason I’m trying to pursue this line of inquiry is that I am not clear what
“preference” does and should mean, as any of us discuss ethics and meta-ethics.
No doubt you feel some desire to realize goals that are valued by goal system
zero, and no doubt you act partially on that desire as well. No doubt you also
feel and act partially on other desires or preferences that a particular aspect
of you does not endorse. The thing I’m confused about is... well, I don’t know
how to say what I’m confused about; I’m confused. But something like:
* What goes on, in practice, when a person verbally endorses certain sense (1)
and sense (2) preferences and disclaims other sense (1) or sense (2)
preferences? What kind of a sense (4) system for manipulating oneself then
gets formed -- is it distinguished from other cognitive subsystems by more
than the xml tag? What kind of actual psychological consequences does the xml
1rhollerith14y
I agree with you, Anna, that in that case the concept of my aims does not cease
to be predictively useful. (Consequently, I take back my "then I have no
preferences" .) It is just that I have not devoted any serious brain time to
what my aims might be if knew for sure I cannot have a permanent effect. (Nor
does it bother me that I am bad at predicting what I might do if I knew for sure
I cannot have a permanent effect.)
Most of the people who say they are loyal to goal system zero seem to have only
a superficial commitment to goal system zero. In contrast, Garcia clearly had a
very strong deep commitment to goal system zero. Another way of saying what I
said above: like Garcia's, my commitment to goal system zero is strong and deep.
But that is probably not helping you.
One of the ways I have approached CEV is to think of the superintelligence as
implementing what would have happened if the superintelligence had not come into
being -- with certain modifications. An example of a modification you and I will
agree is desirable: if Joe suffers brain damage the day before the
superintelligence comes into being, the superintelligence arranges things the
way that Joe would have arranged them if he had not suffered the brain damage.
The intelligence might learn that by e.g. reading what Joe posted on the
internet before his injury. In summary, one line of investigation that seems
worthwhile to me is to get away from this slippery concept of preference or
volition and think instead of what the superintelligence predicts would have
happened if the superintelligence does not act. Note that e.g. the human sense
of right and wrong are predicted by any competent agent to have huge effects on
what will happen.
My adoption of goal system zero in 1992 helped me to resolve an emotional
problem of mine. I severely doubt it would help your professional goals and
concerns for me to describe that, though.
3dclayh14y
Would you go into why you only care about permanent effects? It seems highly
bizarre to me (especially since, as Eliezer has pointed out, everything that
happens is permanent insofar as occupies volume in 4d spacetime).
2rhollerith14y
A system of valuing things is a definition. I have defined a system and said,
"Oh, by the way, this system has my loyalty."
It is possible that the system is ill-defined, that is, that my definition
contradicts itself, does not apply to the reality we find ourselves in, or
differs in some significant way from what I think it means. But your appeal to
general relativity does not show the ill-definedness of my system because it is
possible to pick the time dimension out of spacetime: the time dimension it is
treated quite specially in general relativity.
Eliezer's response to my definition appeals not to general relativity but rather
to Julian Barbour's endless physics and Eliezer's refinements and additions to
it [http://www.overcomingbias.com/2008/05/timeless-physic.html], but his
response does not establish the ill-definedness of my system any more than your
argument does. If anyone wants the URLs of Eliezer's comments (on Overcoming
Bias) that respond to my definition, write me
[http://rhollerith.com/blog/contact-richard-hollerith] and say a few words about
why it is important to you that I make this minor effort.
If Eliezer has a non-flimsy argument that my definition contradicts itself, does
not apply to the reality we find ourselves in, or differs significantly from
what I think it means, he has not shared it with me.
When I am being careful, I use Judea Pearl's language of causality in my
definition rather than the concept of time. The reason I used the concept of
time in yesterday's description is succinctness: "I am indifferent to
impermanent effects" is shorter than "I care only about terminal effects where a
terminal effect is defined as an effect that is not itself a cause" plus
sufficient explanation of Judea Pearl's framework to avoid the most common ways
in which those words would be misunderstood.
So if I had to, I could use Judea Pearl's language of causality to remove the
reliance of my definition on the concept of time. But again, nothing you
0[anonymous]14y
I wasn't trying to claim that your stated goal system had no effect on your
observable behavior, only that it doesn't have a complete effect. That is, that
I would be very surprised if, after you were shown a magically completely
certain proof that it is impossible for you to have permanent effect on reality,
concepts like "Hollerith's goals" completely ceased to be useful in predicting,
say, whether you would eat breakfast.
5Richard_Kennaway14y
*jangling chord*
Ming's minions burst in and abduct you to the planet Ming. "So!" smiles Ming the
Merciless in his merciless way, "My astronomers and physicists, who have had
thousands of years to improve their sciences beyond your primitive level, assure
me that all this will pass, yes, even I myself! One day it will be as if none
had ever lived! Just rocks and dead stars, and insufficient complexity to ever
again assemble creatures such as us, though it last a Graham number of years!"
"Tell me, knowing this -- and I am as known for my honesty as for my evil, for
see! I have not executed my scientists for telling me an unwelcome truth -- are
you truly indifferent as to whether I let you go, or hand you over to my
torturers? Does this touch of the branding iron mean nothing?"
*sizzle*
1[anonymous]14y
I feel like pointing out that Graham's number is big enough that if the universe
lasted that long, it would effectively visit every state it possibly can, unless
the universe is fucking huge.
1rhollerith14y
I am not completely indifferent to being tortured, so in your hypothetical,
Kennaway, I will try to get Ming to let me go because in your hypothetical I
know I cannot have a permanent effect on reality.
But when faced with a choice between having a positive permanent effect on
reality and avoiding being tortured I'll always choose having the permanent
effect if I can.
Almost everybody gives in under torture. Almost everyone will eventually tell an
interrogator skilled in torture everything they know, e.g., the passphrase to
the rebel mainframe. Since I have no reason to believe I am any different in
that regard, there are limits to my ability to choose the way I said. But for
most practical purposes, I can and will choose the way I said. In particular, I
think I can calmly choose being tortured over losing my ability to have a
permanent effect on reality: it is just that once the torture actually starts, I
will probably lose my resolve.
0rhollerith14y
I am worried, Kennaway, that our conversation about my way of valuing things
will distract you from what I wrote below about the risk of post-traumatic
stress disorder from a surgical procedure. Your scenario is less than ideal for
exploring what intrinsic value people assign to internal experience: it is
better to present people with a choice of being killed painlessly and being
killed after 24 hours of intense pain and then asking what benefit to their best
friend or to humanity would induce them to choose the intense pain.
0dfranke14y
I'd go with (a) for values of (a) smaller than loss of a finger.
0MBlume14y
how much does the level of medical technology in the society in which you live
matter?
0dfranke14y
Basically the amount you'd expect: if medical technology allowed me to regrow
the finger with full use of it, that would mitigate it considerably. With
current technology, it gets me a lifetime of inconvenience.
At the level of loss of a fingernail, I think the answer is a no-brainer: (a)
gets me considerably less total pain, with both options protracted over a period
of time that makes time preference mostly inconsequential (24 hours, and maybe a
couple weeks); getting a fingernail ripped off isn't a bad enough experience to
have any significant long-term impact on my mental state.
0dclayh14y
I have no frame of reference for computing the relative disutility (i.e. level
of pain integrated over time) of these. Intuitively I feel like (b) is the
better option, but not by much.
-1mattnewport14y
I choose a) because I see little reason to trust that his evil genius
neuroscientists will do less damage to my brain than his torturer will do to my
finger (I've had a toenail pulled out so I have a pretty good reason to think I
will survive the torture largely intact, even though it will hurt like hell). To
choose b) I'd need a good reason to trust his neuroscientists.
0Richard_Kennaway14y
Ming enjoys setting these conundrums, and therefore cultivates not merely the
reputation, but also the actuality, of being an Evil Emperor of his word. He
will not even engage in lawyering or offer devil's bargains. It's no fun if his
prisoners just shrug and say, "Why should I believe you? Do your worst, you will
anyway."
0rhollerith14y
Kennaway's reason for asking the questions is probably to get at how much people
prefer to avoid negative internal experiences relative to negative effects on
external reality, which parenthetically is the main theme of my blog on the
ethics of superintelligence [http://dl4.jottit.com/]. If so, then he wants you
to assume that you can trust Ming 100% to do what he says -- and he also wants
you to assume that Ming's evil geniuses can somehow compensate you for the fact
that you could have done something else with the 24 hours during which you were
experiencing the unimaginably intense pain, e.g., by using a (probably imposible
in reality) time machine to roll back the clock by 24 hours.
6mattnewport14y
Yes, I assumed that something like that was the reason for posing the question.
My answer deliberately 'missed the point' for the kinds of reasons mentioned in
Hardened Problems Make Brittle Models
[http://lesswrong.com/lw/f0/hardened_problems_make_brittle_models/] and No
Universal Probability Space
[http://lesswrong.com/lw/ew/no_universal_probability_space/].
I am not a fan of what Daniel Dennett calls Intuition Pumps
[http://en.wikipedia.org/wiki/Intuition_pump] - thought experiments in
philosophy that ask people to imagine a scenario and then draw a conclusion when
the scenario requires a leap of imagination that few people are capable of. The
Chinese Room [http://en.wikipedia.org/wiki/Chinese_Room] thought experiment is a
classic example.
I don't necessarily think the original question was driving at a particular
answer but I'm just getting a little sick of this style of thinking on Less
Wrong. I think it is sloppy and not very rational. I'd place any discussions
involving Omega, most of the posed utilitarian moral dilemmas (specks vs.
torture) and a number of other examples commonly discussed in the same category.
I should probably have composed a post explaining that rather than trying to
make my point by making a dumb answer to the question though.
0rhollerith14y
I am not completely surprised to learn that your not getting the point was
intentional, Newport, because your comments are usually good.
Do you consider it a "leap of imagination that few are capable of" to ask people
here to indicate how much they value internal experience compared to how much
they value external reality?
0mattnewport14y
No, but if that's the question the original poster was interested in asking then
I don't see any value in posing it in the form of an elaborate thought
experiment rather than just directly asking the question, or asking about a more
plausible scenario that raises similar issues.
4Richard_Kennaway14y
I have similar misgivings about Hardened Problems, but thought this one worth
posing anyway. But here are two actual experiences I have had, that raise the
same issue of how to assess experiences that leave no trace.
I was in hospital for surgery, to be carried out under general anaesthetic. Some
time before the procedure was to happen, a nurse came with the pre-med
tranquiliser, which seemed to have absolutely no effect. Eventually, the time
came when my bed, with me on it, was wheeled out of the ward, down a corridor,
into a lift, and -- bam! -- I woke up in the ward after it was all over. I was
perfectly compos mentis right up to the time when my memories stopped. I don't
believe I passed out. More likely, this was retrograde amnesia for things I was
fully aware of at the time.
Maybe I was conscious all the way through the operation? If you ever need
surgery, maybe you will be fully conscious, but paralysed as the surgeons cut
you open and rummage about in your interior, but you will forget all about it
afterwards. Maybe the tales of people waking up during surgery are to be
explained, not as a failure to render the patient unconscious, but a failure to
erase the memory of it.
Am I scaring anyone?
On another occasion I was in hospital for an examination of a somewhat
uncomfortable and invasive nature -- I shall tastefully omit all detail -- to be
carried out under a sedative. Same thing: one moment, watching the doctor's
preparations and the machines that go ping, the next, waking up in the recovery
room. But this time, I was told afterwards that I had been "somewhat
uncooperative" during the procedure. So I know that I was awake, and having
experienced on another occasion the same procedure without any memory loss, I
have a pretty good idea of what I must have experienced but have no memory of.
Next time (there will be a next time), should I be apprehensive that I will
experience pain and discomfort, or only that I may remember it?
2rhollerith14y
The following conclusions come from a book on post-traumatic stress disorder
(PTSD) called Waking the Tiger by Peter Levine, who treats PTSD for a living. I
have a copy of this book, which I hereby offer to loan to Richard Kennaway if I
do not have to pay to get it to him and to get it back from him.
Surgical procedures are in the opinion of Peter Levine a huge cause of PTSD.
According to Levine, PTSD is caused by subtle damage to the brain stem. Since in
contrast episodic memory seems to have very little to do with the brain stem,
the fact that one has no episodic memories of a surgical procedure does not mean
that one was not traumatized by the procedure.
Since it is impossible in our society for doctors and nurses and such to ignore
the fact that someone has died, you can somewhat sometimes rely on them not to
kill you unnecessarily, but for anything as subtle as PTSD with as much false
information floating about as there is about PTSD, you can pretty much count on
it that whenever they cause a case of PTSD, they will remain serenely unaware of
that fact, and consequently they will not take even the simplest and most
straightforward measure to avoid traumatizing a patient. This sentiment (that
medical professionals regularly do harms they are unaware of) is not in Levine's
book AFAICR but is pretty common among rationalists who have extensive
experience with the health-care system.
Most cases of traumatization caused by surgical procedures probably occur
despite the use of general or local anesthesia.
IN CONCLUSION, IF I HAD TO UNDERGO A SURGICAL PROCEDURE, I'D GATHER MORE
INFORMATION OF THE TYPE I HAVE BEEN SHARING HERE, BUT IF THAT WERE NOT POSSIBLE,
I WOULD TREAT THE POSSIBILITY OF BEING TRAMATIZED BY A SURGICAL PROCEDURE
REQUIRING THE USE OF GENERAL ANESTHETIC AS HAVING A GREATER EXPECTED NEGATIVE
EFFECT ON MY HEALTH, INTELLIGENCE AND CREATIVITY THAN LOSING A FINGERNAIL WOULD
HAVE. (IT IS MORE LIKELY THAN NOT TO TURN OUT LESS BAD THAN LOSING A FINGERNAI
lesswrong.com's web server is in the US but both of its nameservers are in Australia, leading to very slow lookups for me -- often slow enough that my resolver times out (and caches the failure).
I am my own DNS admin so I can work around this by forcing a cache flush when I need to, but I imagine this would be a more serious problem for people who rely on their ISPs' DNS servers.
would anyone be interested in a 2-3 post sequence on metaethics? The idea would be to present a slower, more simplified version of Eliezer's metaethics. I've notice that many people have had trouble grasping it (myself included), and I think an alternate presentation might help. Thoughts?
See this [http://www.overcomingbias.com/2007/10/torture-vs-dust.html] and this
[http://www.overcomingbias.com/2008/01/circular-altrui.html].
Then see this [http://www.overcomingbias.com/2007/10/a-terrifying-ha.html].
0MrHen14y
Ah, thanks.
0MBlume14y
oh, it was either a bit of leaf or dust. It's been pretty windy here lately.
This is the latest Off Topic Thread I could find. Are we supposed to make off-topic posts in the Open Thread now? Anyway, to be safe, I'll post here.
There was a recent article in the NY Times about fixing tendon problems with simple eccentric exercise. It might be helpful for others here who make heavy use of computers, which can cause tendon problems. I've had pain in the tendons in my shoulders and arms, which I eventually managed to control using weekly sessions of eccentric exercise.
If we come up with a strong AI that we suspect is un-Friendly, should we use it to help us create Friendly AI? (Perhaps by playing a single game of 20 Questions, which has probably been played enough times that every possible sequence of yes-or-no answers has come up?)
An unfriendly AI is unfriendly because it maximizes a utility function that does
not represent our values, and it will act against our values whenever that
increases its utility function. It would not help us create a Friendly AI,
because that would be creating a powerful force that acts in the interests of
our values, and would act to decrease the unfriendly utility function whenever
that advances our values; that is, it would be contrary to the goals of the
unfriendly AI.
2Vladimir_Nesov14y
That's not true, e.g. the choice may be between UFAI agreeing to create a FAI
that lets the UFAI have a little chunk of utility vs. getting nothing (becoming
terminated). Allow a little cooperation. Still not a good idea, since UFAI may
well discover a third option [http://wiki.lesswrong.com/wiki/Third_option].
1JGWeissman14y
If you somehow reliably trapped a UFAI in a box, and then tried to coerce it to
design an FAI, it would disguise the next version of itself as an FAI.
Seriously, if our strategy is to build something smarter than us, and then try
to outsmart it, the best outcome we could reasonably hope for is that we never
succeed in building the thing that is smarter than us. We need the thing that is
smarter than us to be on our side by its very nature.
0Vladimir_Nesov14y
"Trapped", "coerced", "itself"? You should be more careful around these things,
you seem to be giving your answers immediately
[http://www.overcomingbias.com/2007/10/hold-off-solvin.html] on this not at all
elementary problem. You didn't actually address my counterexample, and I already
agreed with the general message of your second paragraph in the last sentence of
the comment above.
1JGWeissman14y
Well, in attempting to address your counterexample, I had to guess what you
meant, as it is not very clear. What situation do you envision in which the UFAI
would expect to gain utility by building an FAI?
And it seems a little strange to accuse me of offering solutions before the
problem is fully explored, when I was responding to a proposal for a solution
(using UFAI to build FAI).
(Also, I have edited out a typo (repetition of the word "it") in my statement
which you quoted.)
0Vladimir_Nesov14y
The situation I described: cooperation
[http://www.overcomingbias.com/2008/09/true-pd.html] between FAI and UFAI. Two
unrelated AIs are never truly antagonistic, so they have something to gain from
cooperation.
The same problem on both accounts, confident assertions about a confusing issue.
This happened twice in a row, because the discussion shared the common confusing
topic, so it's not very surprising.
0JGWeissman14y
Unless you are actually saying that the way to get an UFAI to build an FAI is to
build the FAI ourselves, locate the UFAI in a different universe, and have some
sort of rift between the universe with contrived rules about what sort of
interaction it allows, I still do not understand the situation you are talking
about.
An AI that wants to tile the solar system with molecular smiley faces and an AI
that wants to tile the solar system with paperclips are going to have conflicts.
Either of them would have conflicts with an FAI that wants to use the resources
of the solar system to create a rich life experience for humanity. Maybe these
AI's are not what you call "unrelated", but if so, I doubt the UFAI and the FAI
we want it to build can be considered to be unrelated.
Are you asking me to have less confidence in the difficulty of us outsmarting
things that are smarter than us?
0Vladimir_Nesov14y
Among the two options "UFAI doesn't do anything, and so we terminate/won't build
it", and "UFAI builds/explains FAI, and gets -- simplifying -- 1/100th of the
universe", the second option is preferable to both us and the UFAI, and so if
these are the only options, it'll take it.
Yes, I'm asking you to have less confidence in any conclusion you are drawing
here. "Outsmarting" isn't exactly a tangible event.
0JGWeissman14y
First of all, the "won't build it" option does not make any sense. It is not
like the UFAI is going to do anything before it exists.
So then, going back to the point that you claim doesn't address your
counterexample, supposing we actually have the UFAI in a situation where we can
terminate it at will, and it can't persuade anyone to free it, how are we
supposed to verify that the thing it is helping us build is actually an FAI,
(and not another UFAI with the same values) if we do not know how to build an
FAI on our own? If we can't tell the difference, isn't that what the UFAI, with
its values, should do?
0[anonymous]14y
That's why I never proposed that we ask the UFAI to create us an FAI; I proposed
that we ask the UFAI to help us create an FAI. Doing something like proving a
mathematical theorem for us, or creating something that fulfills a template
we've created ourselves, leaves no room for betrayal, barring any "third
options" like manipulating the EM waves produced by its CPU to send arbitrary
messages over the Internet.
0JGWeissman14y
Suppose the UFAI figures out a fake theorem that it would like us to believe,
because it would lead us down a path of creating an AI it likes. If we were to
ask it to prove this fake theorem, it would give back to us something that looks
very much like a proof, so that we would miss the point where it goes wrong.
Maybe we require a machine verifiable proof, but it takes advantage of a flaw in
our automatic verifiers, or the way we interpret the results. So how does it get
us to ask about this fake theorem? It might manipulate its proof of a theorem we
do ask about to inspire us to ask that question. It might respond to our request
for a proof with, "Well, that is not quite right, but if you make these
modifications...". Keep in mind, this is a hostile intelligence that is way
beyond us. It will take any opportunity to subtly manipulate us that it gets.
And these have only been the ideas that I, a mere human, could come up with.
I am not sure what sort of template you mean, but I suspect that it will have
the same problem. Basically, room for the UFAI to use its superior intelligence
to help us is room for betrayal.
0Vladimir_Nesov14y
Step back and check what are you arguing. The discussion is whether a scenario
where the UFAI is helpful is at all plausible. Of course all sorts of stuff can
go wrong. Of course it isn't a good idea. The arguments saying "but this could
go wrong too" don't advance the discussion a bit, as it's already understood.
0JGWeissman14y
I am not talking about some slim chance of something happening to go wrong. I am
talking about a hostile super intelligence systematically arranging for us to
make dangerous critical mistakes. I don't know exactly what the UFAI will do
anymore than I can predict which move Kasparov will make in a chess game
[http://www.overcomingbias.com/2008/10/belief-in-intel.html], but I can predict
with greater confidence than I can predict Kasparov would win a chess game
against a mere grand master, that, given that we build and use a UFAI, we will
be the bug and the UFAI will be the windshield. Splat!
If your standard here is if something is "at all plausible", then the discussion
is trivial, nothing has probability 0. However, none of the proposals for using
UFAI discussed here are plans that I expect positive utility from. (Which is
putting it mildly, I expect them all to result in the annihilation of humanity
with high probability, and to produce good results only in the conjunction of
many slim chances of things happening to go right.)
You should speak for yourself about what is already understood, what you think
is not a good idea. Warrigal seems to think using UFAI can be a good idea.
1Vladimir_Nesov14y
When you figure out how to arrange the usage of UFAI that is a good idea, the
whole contraption becomes a kind of FAI.
0JGWeissman14y
Good, that is a critical insight. I will explore some of the implications of
considering what kind of FAI it is.
By "FAI", we refer to an AI that we can predict with high confidence will be
friendly from our deep understanding of how it works. (Deep understanding is not
actually part of the definition, but, without it, we are not likely to achieve
high (well calibrated) confidence).
So, a kind of FAI that is built out of a UFAI subject to some constraints (or
other means of making it friendly), would require us to understand the
constraints and the UFAI sufficiently to predict that the UFAI would not be able
to break the constraints, which it would of course try to do. (One might imagine
constraints that are sufficient not to be broken by any level of intelligence,
but I find this unlikely, as a major vulnerability is the possibility of giving
us misinformation with regards to issues we don't understand.)
Which is easier, to understand the UFAI we built without understanding it, and
its constraints, to predict the system will be "friendly" (and to actually
produce a system, including the UFAI, such that the prediction would be
correct), or to directly build an FAI, that is designed from the ground up to be
predictably friendly? And which system is more likely to wipe us out if we are
wrong, rather than fail to do anything interesting?
I prefer the solution where friendliness is not a patch, but is a critical part
of the intelligence itself.
0Vladimir_Nesov14y
You might be able to recognize the right solution (e.g. a theory) when you see
it, while unable to generate it yourself (as fast). If you are sensitive enough
to attempts of UFAI to confuse you into doing the wrong thing, just going with
the deal may be the best option for it.
You decide whether to build something new before it exists, based on its
properties. AI's decisions are such properties.
2JGWeissman14y
And if we are not sensitive enough, our molecules get reused for paperclips. You
are talking about matching wits with something that is orders of magnitude
smarter than us, thinks orders of magnitude faster than us, has a detailed model
of our minds, and doesn't think we are worth the utilons it can build out of our
quarks. Yes, we would think we recognize the right solution, that it is obvious
now that it's been pointed out, and we would be wrong. An argument than a UFAI
figured out would be persuasive to us is nowhere near as trustworthy as an
argument we figure out ourselves. While we can be wrong about our own arguments,
the UFAI will present arguments that we will be systematically wrong about in
very dangerous ways.
And for our next trick, we will just ask Omega to build an FAI for us.
0jimrandomh14y
No AI, friendly or unfriendly, will ever have a model of our minds as detailed
as the model we have of its mind, because we can pause it and inspect its source
code while it can't do anything analogous to us.
0JGWeissman14y
I have written plenty of mere desktop applications that are a major pain for a
human mind to understand in a debugger. And have you ever written programs that
generate machine code for another program? And then tried to inspect that
machine code when something went wrong to figure out why?
Well, that stuff is nothing compared to the difficulty of debugging or otherwise
understanding an AI, even given full access to inspect its implementation. An
unfriendly AI is likely to be built by throwing lots of parallel hardware
together until something sticks. If the designers actually knew what they were
doing, they would figure out they need to make it friendly. So, yes, you could
pause the whole thing, and look at how the billions of CPU's are interconnected,
and the state of the terabytes of memory, but you are not logically omniscient,
and you will not make sense of it.
It's not like you could just inspect the evil bit
[http://www.faqs.org/rfcs/rfc3514.html].
0[anonymous]14y
Can't we run a UFAI in a sandbox that prevents it from ever emitting more than a
certain amount of information--and, especially, from discovering the nature of
the hardware it runs on?
2Cyan14y
Not if we have the ability to let the UFAI out of the sandbox. See the AI-box
experiment [http://yudkowsky.net/singularity/aibox].
0Vladimir_Nesov14y
Here you start applying the structure of your choice to the black swan of UFAI's
third option. Literally letting anything out is a trivial option, there are many
others, some of which nobody thought of. Even if you can't let UFAI out of the
box, that's still not enough to be safe, and so your argument is too weak to be
valid [http://lesswrong.com/lw/g2/positive_bias_test_c_program/d0o].
1Vladimir_Nesov14y
That's the problem with third options -- you may not be as protected as you
think you are.
I can think of several answers to this, but not a coherent narrative weaving
them all together. So because it is late, here are some random thoughts.
1. Because it has utility for you by your present utility function.
2. I don't believe there are any utility functions.
3. Why is it not enough for you, that you want whatever it is that you want?
4. Your utility function will change anyway.
Ming the Merciless offers you a choice that you cannot refuse. Either (a) his torturer will rip one of your fingernails off, or (b) his torturer will inflict pain more intense than you can imagine, continuously for the next 24 hours, without otherwise harming you. But in case (b) only, his evil genius neuroscientists will cause you to afterwards completely forget the experience, and any other aftereffects from the stress will be put right as well. If you refuse to make a choice, you will get (b) without the amnesia.
What do you choose?
If you choose (a), how much worse would (a) have to be, for you to choose (b)? If you choose (b), how much less bad would (a) have to be, for you to choose (a)?
lesswrong.com's web server is in the US but both of its nameservers are in Australia, leading to very slow lookups for me -- often slow enough that my resolver times out (and caches the failure).
I am my own DNS admin so I can work around this by forcing a cache flush when I need to, but I imagine this would be a more serious problem for people who rely on their ISPs' DNS servers.
This is interesting.
Apparently, humans (and teams of them) are beating computer programs at... protein folding?
would anyone be interested in a 2-3 post sequence on metaethics? The idea would be to present a slower, more simplified version of Eliezer's metaethics. I've notice that many people have had trouble grasping it (myself included), and I think an alternate presentation might help. Thoughts?
Please add a favicon, they make bookmarking much easier. The FHI diamond in green might work, but just about anything is better than nothing.
I have a speck in my eye. It hurts. Just thought I'd let you guys know.
This is the latest Off Topic Thread I could find. Are we supposed to make off-topic posts in the Open Thread now? Anyway, to be safe, I'll post here.
There was a recent article in the NY Times about fixing tendon problems with simple eccentric exercise. It might be helpful for others here who make heavy use of computers, which can cause tendon problems. I've had pain in the tendons in my shoulders and arms, which I eventually managed to control using weekly sessions of eccentric exercise.
For the record, Thom_Blake is thomblake.
Are there any Less Wrong-like web sites that are about intellectual pursuits in general?
If we come up with a strong AI that we suspect is un-Friendly, should we use it to help us create Friendly AI? (Perhaps by playing a single game of 20 Questions, which has probably been played enough times that every possible sequence of yes-or-no answers has come up?)
Off topic thread again, huh? Let's see...
Look out robots will kill you
No really
Anybody got a good reason for adopting a certain utility function versus some other one?
Because I can't find one, and now I feel weird, cause without a decent utility function, rationalism gives you knowledge, but no wisdom.
what are you doing?