Using computers to find a cure

Folding@home

What it could be like to make a program which would fulfill our wish to "cure cancer"? I'll try to briefly present the contemporary mainstream CS perspective on this.

Here's how "curing cancer using AI technologies" could realistically work in practice. You start with a widely applicable, powerful optimization algorithm. This algorithm takes in a fully formal specification of a process, and then finds and returns the parameters for that process for which the output value of the process is high. (I am deliberately avoiding use of the word "function").

If you wish to cure a cancer, even having this optimization algorithm at your disposal, you can not simply write "cure cancer" on the terminal. If you do so, you will get something to the general sense of:

No command 'cure' found, did you mean:
 Command 'cube' from package 'sgt-puzzles' (universe)
 Command 'curl' from package 'curl' (main)

The optimization algorithm by itself not only does not have a goal set for it, but does not even have a domain for the goal to be defined on. It can't by itself be used to cure cancer or make paperclips. It may or may not map to what you would describe as AI.

First, you would have to start with the domain. You would have to make a fairly crude biochemical model of the processes in the human cells and cancer cells, crude because you have limited computational power and there is very much that is going on in a cell. 1

On the model, you define what you want to optimize - you specify formally how to compute a value from the model so that the value would be maximal for what you consider a good solution. It could be something like [fraction of model cancer cells whose functionality is strongly disrupted]*[fraction of model noncancer cells whose functionality is not strongly disrupted]. And you define model's parameters - the chemicals introduced into the model.

Then you use the above mentioned optimization algorithm to find which extra parameters to the model (i.e. which extra chemicals) will result in the best outcome as defined above.

Similar approach can, of course, be used to find manufacturing methods for that chemical, or to solve sub-problems related to senescence, mind uploading, or even for the development of better algorithms including optimization algorithms.

Note that the approach described above does not map to genies and wishes in any way. Yes, the software can produce unexpected results, but concepts from One Thousand and One Nights will not help you predict the unexpected results. More contemporary science fiction, such as the Terminator franchise where the AI had the world's nuclear arsenal and probable responses explicitly included in it's problem domain, seem more relevant.

Hypothetical wish-granting software

Some nanotechnological AI wonderIt is generally believed that understanding of natural language is a very difficult task which relies on intelligence. For the AI, the sentence in question is merely a sensory input, which has to be coherently accounted for in it's understanding of the world.

The bits from the ADC are accounted for with an analog signal in the wire, which is accounted for with pressure waves at the microphone, which are accounted for with a human speaking from any one of a particular set of locations that are consistent with how the sound interferes with it's reflections from the walls. The motions of the tongue and larynx are accounted for with electrical signals sent to the relevant muscles, then the language level signals in the Broca's area, then some logical concepts in the frontal lobes, an entire causal diagram traced backwards. In practice, a dumber AI would have a much cruder model, while a smarter AI would have a much finer model than I can outline.

If you want the AI to work like a Jinn and "do what it is told", you need to somehow convert this model into a goal. Potential relations between "cure cancer" and "kill everyone" which the careless wish maker has not considered, naturally, played no substantial role in the process of the formation of the sentence. Extraction of such potential relations is a separate very different, and very difficult problem.

It does intuitively seem like a genie which does what it is told, but not what is meant, would be easier to make, because it is a worse, less useful genie, and if it was for sale, it would have a lower market price. But in practice, the "told"/"meant" distinction does not carve reality at the joints and primarily applies to the plausible deniability.

 


 

footnotes:

1: You may use your optimization algorithm to build the biochemical model, by searching for "best" parameters for a computational chemistry package. You will have to factor in the computational cost of the model, and ensure some transparency (e.g. the package may only allow models that have a spatial representation that can be drawn and inspected).

New Comment
43 comments, sorted by Click to highlight new comments since:

This is a good point, and new to LW as far as I know:

It does intuitively seem like a genie which does what it is told, but not what is meant, would be easier to make, because it is a worse, less useful genie, and if it was for sale, it would have a lower market price. But in practice, the "told"/"meant" distinction does not carve reality at the joints and primarily applies to the plausible deniability.

Congratulations! Please keep up this sort of work.

As a counterpoint, some goals that potentially touch the real world (and possibly make the AI kill everyone) might have a shorter path to formalization that doesn't require quite as much understanding of human internals. For example, something like this might be possible to formalize directly: "Try to find a proof of theorem X, possibly using resources from the external world. Use this simple mathematical prior over possible external worlds." That seems like a computationally intractable task for an AI, but might become tractable if the AI can self-improve at math (another task which doesn't seem to require understanding humans).

Try to find a proof of theorem X, possibly using resources from the external world.

This could be an inspiration for a sci-fi movie: A group of scientists created a superhuman AI and asked it to prove the theorem X, using resources from the external world. The Unfriendly AI quickly took control over most of the planet, enslaved all humans and used them as cheap labor to build more and more microprocessors.

A group of rebels fights against the AI, but is gradually defeated. Just when the AI is trying to kill the protagonist and/or the people dearest to the protagonist, the protagonist finally understands the motivation of the AI. AI does not love humans, but neither it hates them... it is merely trying to get as much computing power as possible to solve the theorem X, which is the task it was programmed to do. So our young hero takes a pen and paper, solves the theorem X, and shows the solution to the AI... which upon seeing the solution prints the output and halts. Humans are saved!

(And if this does not seem like a successful movie scenario, maybe there is something wrong with our intuitions about superhuman AIs.)

It doesn't halt, because it can't be perfectly certain that the proof is correct. There are alternative explanations for the computation it implemented reporting that the proof was found, such as corrupted hardware used for checking the proof, or errors in the design of proof-checking algorithms, possibly introduced because of corrupted hardware used to design the proof checker, and so on.

Since it can't ever be perfectly certain, there is always more to be done in the service of its goal, such as building more redundant hardware and staging experiments to refine its understanding of physical world in order to be able to more fully rely on hardware.

The Unfriendly AI quickly took control over most of the planet, enslaved all humans and used them as cheap labor to build more and more microprocessors.

While this is better than using humans as a power source it still seems like there are more efficient configurations of matter that could achieve this task.

[-][anonymous]20

Assuming that the AI builds machines that aren't directly controlled by the AI itself, it doesn't have any incentive to build the machines such that they stop working once a proof is found.

Not that realism is a primary objective in most SF movies.

Good point. The AI would probably build the simplest or cheapest machines to do the job, so their behavior when the AI stops giving them commands would be... unspecified explicitly... so they would probably do something meaningless, which would have been meaningful if the AI would still work.

For example, they could contain a code: "if you lose signal from the AI, climb to the highest place you see, until you catch the signal again" (coded assuming they lost the signal because they were deep underground of something), then the machines would just start climbing to the tops of buildings and mountains.

But also, their code could be: "wait until you get the signal again, and while doing that, destroy any humans around you" (coded assuming those humans are probably somehow responsible for the loss of signal), in which case the machines would continue fighting.

The worst case: The AI would assume it could be destroyed (by humans, natural disaster, or anything else), so the machines would have an instruction to rebuild the AI somewhere else. Actually, this seems like a pretty likely case. The new AI would not know about the proof, so it would start fighting again. And if destroyed, a new AI would be built again, and again. -- The original AI does not have motivation to make the proof known to its possible clones.

Try to find a proof of theorem X, possibly using resources from the external world. Use this simple mathematical prior over possible external worlds.

A proof is not a known world state, though. The easy way would be to have a proof checker in the world model and the goal is to pound on the world in such a way so that the proof checker says "proof valid" (a known world state on the output of the proof checker), the obvious solution is to mess with the proof checker, while actual use of the resources runs into the problem of prediction of what this use will, exactly, produce, or how exactly it will happen - you don't know what's going to go into the proof checker if you act in the proof-finding-using-external-resources way. And if you don't represent all parts of the AI as embodied in the real world then the AI can not predict consequences of the damage to physical structures representing it.

The real killer though is that you got a really huge model, for which you need a lot of computational resources to begin with. Plus with a simple prior over possible worlds, you will be dealing with very super duper fundamental laws of physics (below quarks). A very huge number of technologies each of which is a lot more useful elsewhere.

I know the standard response to this, it's that if something doesn't work someone tries something different, but the different is very simple to picture: you restrict the model, a lot, which a: speeds up the AI by a mindbogglingly huge factor, and b: eliminates most of unwanted exploration (both are intrinsically related). You can't tell AI to "self improve", either, you have to define what improvement is, and a lot of improvement is about better culling of anything you can cull.

Congratulations! Please keep up this sort of work.

Thanks, I guess, but I do not view it as work. I am sick with cold, bored, and burnt out from doing actual work, and suffering from "someone wrong on the internet" syndrome, in combination with knowing that extremely rationalized wrongitude affects people like you.

[-]ESRogs140

I think I'm missing something. What is the motivation for this post? Does it fit into the context of some larger discussion, or are these just meant as isolated thoughts?

(It sounds like maybe you're trying to contrast what you see as the mainstream CS perspective from some other view -- one advocated on LessWrong perhaps? But I'm not sure exactly what view that would be.)

I second this question. Who is arguing that a genie that "does what it's told" is easier to make than a genie that "does what is meant"? Eliezer didn't, at least not in this post:

The user interface doesn't take English inputs. The Outcome Pump isn't sentient, remember? But it does have 3D scanners for the near vicinity, and built-in utilities for pattern matching. So you hold up a photo of your mother's head and shoulders; match on the photo; use object contiguity to select your mother's whole body (not just her head and shoulders); and define the future function using your mother's distance from the building's center. The further she gets from the building's center, the less the time machine's reset probability.

You cry "Get my mother out of the building!", for luck, and press Enter.

The contrast between what is said and what is meant pops up in the general discussion of goals, such as there: http://lesswrong.com/lw/ld/the_hidden_complexity_of_wishes/9nig

Further in that thread there's something regarding computer scientist's hypothetical reactions to the discussion of wishes.

Variations on the "curing cancer by killing everyone" theme also pop up quite often.

With regards to the "outcome pump", it is too magical and I'll give the magical license for it to do what ever the scifi writer wants it to do, and if you want me to be a buzzkill, I can note that of course one could use this dangerous tool by wishing that in the future they press the 'i am satisfied' button, which they will also press if a die rolls N consecutive sixes, to put a limit on improbability by making it control the die as a fallback if that's the most plausible solution (to avoid lower probability things like spontaneous rewiring of your brain, albeit it seems to me that a random finger twitch would be much more probable than anything catastrophic). This also removes the requirement for the user interface, 3D scanners, and other such extras. I recall another science fiction author ponder something like this, but I can't recall the name, and if memory serves me right, that other science fiction author managed to come up with ways to use this time reset device productively. At the end of the day its just a very dangerous tool, like a big lathe. You forget the safety and leave the tightening key in, and you start it, and it will get caught then bounce off at a great speed, possibly killing you.

So, to summarize, you just wish that a button is pressed, and you press the button when your mother is rescued. That will increase your risk of a stroke.

edit: and of course, one could require entry of a password, attach all sorts of medical monitors that prevent the "satisfied" signal in the event of a stroke or other health complication to minimize risk of a stroke, as well as vibration monitors to prevent it from triggering natural disasters and such. If the improbability gets too high, it'll just lead to the device breaking down due to it's normal failure rate being brought up.

That comment thread is really really long (especially if I go up in the thread to try to figure out the context of the comment you linked to), and the fact it's mostly between people I've never paid attention before doesn't help raise my interest level. Can you summarize what you perceive the debate to be, and how your post fits into it?

Variations on the "curing cancer by killing everyone" theme also pop up quite often.

When I saw this before (here for example), it was also in the context of "programmer makes a mistake when translating 'cancer cure' into formal criteria or utility function" as opposed to "saying 'cure cancer' in the presence of a superintelligent AI causes it to kill everyone".

Can you summarize what you perceive the debate to be, and how your post fits into it?

I perceive that stuff to be really confused/ambiguous (and perhaps without clear concept even existing anywhere), and I seen wishes and goal making discussed here a fair lot.

When I saw this before (here for example), it was also in the context of "programmer makes a mistake when translating 'cancer cure' into formal criteria or utility function" as opposed to "saying 'cure cancer' in the presence of a superintelligent AI causes it to kill everyone".

The whole first half of my post deals with this situation exactly.

You know, everyone says "utility function" here a lot, but no one is ever clear what it is a function of, i.e. what is it's input domain (and at times it looks like the everyday meaning of the word "function" as in "the function of this thing is to some verb" is supposed to be evoked instead). Functions are easier to define for simpler domains, e.g. paperclips are a lot easier to define for some Newtonian physics as something made out of a wire that's just magicked from a spool. And cure for cancer is a lot easier to define as in my example.

Of course, it is a lot easier to say something without ever bothering to specify the context. But if you want to actually think about possible programmer mistakes, you can't be thinking in terms of what would be easier to say. If you are thinking in terms of what would be easier to say, then even though you want it to be about programming, it is still only about saying things.

edit: You of all people ought to realize that faulty definition of a cancer cure on the UDT's world soup is not plausible as an actual approach to curing cancer. If you propose that the corners are cut when implementing the notion of curing cancer as a mathematical function, you got to realize that having simple input specification goes par-course. (Simple input specification being, say, data from a contemporary biochemical model of a cell). You also got to realize that stuff like UDT, CDT, and so on, requires some sort of "mathematical intuition" that can at least find maximums, and by itself doesn't do any world-wrecking on it's own, without being put into a decision framework. A component considerably more useful than the whole (and especially so for plausibly limited "mathematical intuitions" which can take microseconds to find a cure for a cancer in the sane way and still be unable to even match a housecat when used with some decision theory, taking longer than the lifetime of the universe they are embedded in, to produce anything at all)

Do you think we'll ever have AIs that can accomplish complex real-world goals for us, not just find some solution to a biochemical problem, but say produce and deliver cancer cures to everyone that needs it, or eliminate suffering, or something like that? If not, why not? If yes, how do you think it will work, that doesn't involve having a utility function over a complex domain?

Do you think we'll ever have AIs that can accomplish complex real-world goals for us

This has a constraint that it can not be much more work to specify the goal than to do it in some other way.

How do I think it will not happen, is manual, unaided, no inspection, no viewer, no nothing, magical creation of a cancer curing utility function over a model domain so complex you immediately fall back to "but we can't look inside" when explaining how it happens that the model can not be used in lieu of empty speculation to see how the cancer curing works out.

How it can work, well, firstly it got to be rather obvious to you that the optimization algorithm (your "mathematical intuition" but with, realistically considerably less power than something like UDT would require) can self improve without an actual world model, without embedding of the self in that world model, a lot more effectively than with. So we have this for "FOOM".

It can also build a world model without maximizing anything about the world, of course, indeed a part of the decision theory which you know how to formalize is concerned with just that. Realistically one would want to start with some world modelling framework more practical than "space of the possible computer programs".

Only once you have the world model you can realistically start making an utility function, and you do that with considerable feedback from running the optimization algorithm on just the model and inspecting it.

I assume you do realize that one has to do to a great length to make the runs of optimization algorithm manifest themselves as real world changes, whereas test dry runs on just the model are quite easy. I assume you also realize that very detailed simulations are very impractical.

edit: to take from a highly relevant Russian proverb, you can not impress a surgeon with the dangers of tonsillectomy performed through the anal passage.

Other ways how it can work may involve neural network simulation to the point that you get something thinking and talking (which you'd get in any case after years and years of raising it), at which point its not that much different from raising a kid to do it, really, and very few people would get seriously worked up about the possibility of our replacement by that.

Once we have this self-improved optimization algorithm, do you think everyone who has access to it will be as careful as you're assuming? As you say, it's just a dangerous tool, like a lathe. But unlikely a lathe which can only hurt its operator, this thing could take over the world (via economic competition if not by killing everyone) and use it for purposes I'd consider pointless.

Do you agree with this? If not, how do you foresee the scenario play out, once somebody develops a self-improving optimization algorithm that's powerful enough to be used as part of an AI that can accomplish complex real world goals? What kind of utility functions do you think people will actually end up making, and what will happen after that?

Once we have this self-improved optimization algorithm, do you think everyone who has access to it will be as careful as you're assuming?

It looks to me like not performing tonsillectomy via anal passage doesn't require too great carefulness on part of the surgeon.

One can always come up with some speculative way how particular technological progress can spell our doom. Or avert it, as the gradual improvement of optimization algorithms allows for intelligence enhancement and other things that, overall, should lower the danger. The fact that you can generate a scenario in favour of either (starting from the final effect) is entirely uninformative of the total influence.

how do you foresee the scenario play out, once somebody develops a self-improving optimization algorithm that's powerful enough to be used as part of an AI that can accomplish complex real world goals? What kind of utility functions do you think people will actually end up making, and what will happen after that?

I think I need to ask some question here, to be able to answer this in a way that would be relevant to you. Suppose today you get a function that takes a string similar to:

struct domain{
.... any data
};
real Function(domain value){
    .... any code
}

and gives you as a string an initializer for "domain" which results in the largest output of the Function. It's very magically powerful, albeit some ridiculous things (exact simulation from the big bang to the present day inclusive of the computer running this very algorithm) are reasonably forbidden.

How do you think it can realistically be used, and what mistakes do you picture? Please be specific; when you have a mathematical function, describe it's domain, or else I will just assume that the word "function" is meant to merely trigger some innate human notion of purpose.

edit: extension of the specification. The optimization function now takes a string S and a real number between 0 and 1 specifying the "optimization power" roughly corresponding to what you can reasonably expect to get from restricting the computational resources available to the optimizer.

Here's what I'd do:

Step 1: Build an accurate model of someone's mind. Domain would be set of possible neuro networks, and Function would run the input neuro net and compare its behavior to previously recorded behavior of the target person (perhaps a bunch of chat logs would be easiest), returning a value indicating how well it matches.

Step 2: Use my idea here to build an FAI.

what mistakes do you picture?

In step 2 it would be easy to take fewer precautions and end up hacking your own mind. See this thread for previous discussion.

(Does this answer your question in the spirit that you intended? I'm not sure because I'm not sure why you asked the question.)

Thanks. Yes, it does. I asked because I don't want to needlessly waste a lot of time explaining that one would try to use the optimizer to do some of the heavy lifting (which makes it hard to predict an actual solution). What do you think reckless individuals would do?

By the way, your solution would probably just result in a neural network that hard-wires a lot of the recorded behaviour, without doing anything particularly interesting. Observe that an ideal model, given thermal noise, would not result in the best match, whereas a network that connects neurons in parallel to average out the noise and encode the data most accurately, does. I am not sure if fMRI would remedy the problem.

edit: note that this mishap results in Wei_Dai getting some obviously useless answers, not in world destruction.

edit2: by the way, note that an infinite torque lathe motor, while in some sense capable of infinite power output, doesn't imply that you can make a mistake that will spin up the earth and make us all fly off. You need a whole lot of extra magic for that. Likewise, "outcome pump" needs through the wall 3D scanners to be that dangerous to the old woman, and "UFAI" needs some potentially impossible self references in the world model and a lot of other magic. Bottom line is, it boils down to this: there is this jinn/golem/terminator meme, and it gets rationalized in a science fictional way, and the fact that the golem can be rationalized in the science fictional way provides no information about the future (because i expect it to be rationalizable in such a manner irrespective of the future), hence zero update, hence if I didn't worry I won't start to worry. Especially considering how often the AI is the bad guy, I really don't see any reason to think that issues are under publicized in any way. Whereas the fact that it is awfully hard to rationalize that superdanger when you start with my optimizer (where no magic bans you from making models that you can inspect visually), provides the information against the notion.

I don't think anyone is claiming that any mistake one might make with a powerful optimization algorithm is a fatal one. As I said, I think the danger is in step 2 where it would be easy to come up with self-mindhacks, i.e., seemingly convincing philosophical insights that aren't real insights, that cause you to build the FAI with a wrong utility function or adopt crazy philosophies or religions. Do you agree with that?

Whereas the fact that it is awfully hard to rationalize that superdanger when you start with my optimizer (where no magic bans you from making models that you can inspect visually), provides the information against the notion.

Are you assuming that nobody will be tempted to build AIs that make models and optimize over models in a closed loop (e.g., using something like Bayesian decision theory)? Or that such AIs are infeasible or won't ever be competitive with AIs that have hand-crafted models that allow for visual inspection?

I don't think anyone is claiming that any mistake one might make with a powerful optimization algorithm is a fatal one.

Well, some people do, by a trick of substituting some magical full blown AI in place of it. I'm sure you are aware of "tool ai" stuff.

As I said, I think the danger is in step 2 where it would be easy to come up with self-mindhacks, i.e., seemingly convincing philosophical insights that aren't real insights, that cause you to build the FAI with a wrong utility function or adopt crazy philosophies or religions. Do you agree with that?

To kill everyone or otherwise screw up on the grand scale, you still have to actually make it, make some utility function over an actual world model, and so on, and my impression was that you would rely on your mindhack prone scheme for getting technical insights as well. Good thing about nonsense in the technical fields is that it doesn't work.

Are you assuming that nobody will be tempted to build AIs that make models and optimize over models in a closed loop (e.g., using something like Bayesian decision theory)? Or that such AIs are infeasible or won't ever be competitive with AIs that have hand-crafted models that allow for visual inspection?

These things come awfully late without bringing in any novel problem solving capacity whatsoever (which degrades them from the status of "superintelligences" to the status of "meh, what ever"), and no, models do not have to be hand crafted to allow for inspection*. Also, your handwave of "Bayesian decision theory" still doesn't solve any hard problems of representing oneself in the model but neither wireheading nor self destructing. Or the problem of productive use of external computing resources to do something that one can't actually model without doing it.

At least as far as "neat" AIs go, those are made of components that are individually useful. Of course one can postulate all sorts of combinations of components, but combinations that can't be used to do anything new or better than what some of the constituents can be straightforwardly used to do, and only want-on-their-own things that components can and were used as tools to do, are not a risk.

edit: TL;DR; the actual "thinking" in a neat generally self willed AI is done by optimization and model-building algorithms that are usable, useful, and widely used within other contexts. Let's picture it this way. There's a society of people who work on their fairly narrowly defined jobs, employing their expertise that they obtained by domain specific training. In comes a mutant newborn who will grow to be perfectly selfish, but will have an IQ of 100 exactly. No one cares.

*in case that's not clear, any competent model of physics can be inspected by creating a camera in it.

[-]yli60

When people talk about the command "maximize paperclip production" leading into the AI tiling the universe with paperclips, I interpret it to mean a scenario where first a programmer comes up with a shoddy formalization of paperclip maximization that he thinks is safe but actually isn't, and then writes that formalization into the AI. So at no point does the AI actually have to try and interpret a natural language command. Genie analogies are definitely confusing and bad to use here because genies do take commands in english.

Indeed. I would think (as someone who knows nothing of AI beyond following LW for a few years) that the likely AI risk is something that doesn't think like a human at all, rather than something that is so close to a human in its powers of understanding that it could understand a sentence well enough to misconstrue it in a manner that would be considered malicious in a human.

There's also the thing that you'd only get this dangerous AI that you can't use for anything by bolting together a bunch of magical technologies to handicap something genuinely useful, much like you obtain an unusable outcome pump by attaching a fictional through wall 3D scanner to a place where two dangling wires that you have to touch together after your mother is saved would have worked just fine.

The genie post is sort of useful as a musing on philosophy and the inexactitude of words, but is still ridiculous as a threat model.

, I interpret it to mean a scenario where first a programmer comes up with a shoddy formalization of paperclip maximization that he thinks is safe but actually isn't, and then writes that formalization into the AI.

Well, you'd normally define a paperclip counter function that takes as it's input the state of some really shoddy simulator of Newtonian physics and material science, and then use some "AI" optimization software to find what sort of actions within this simulator produce simulated paperclips from the simulated spool of simulated steel wire with minimum use of electricity and expensive machinery. You also have some viewer for that simulator.

You need to define some context to define paperclip maximization in. Easy way to define a paperclip is as a piece of wire bent in a specific shape. Easy way to define a wire is just to have it as some abstract object with specific material properties, which you just have an endless supply of, coming out of a black box.

This is good:

But in practice, the "told"/"meant" distinction does not carve reality at the joints and primarily applies to the plausible deniability.

Yep - humans can't even talk about misunderstandings of human language without postulating an agent that thinks like a human.

(Thinking the opposite, being close enough to end up with a hideous misunderstanding, is another version of thinking like a human, because the intelligence needs to think almost entirely like a human to understand well enough to get it so badly wrong. Edit: By which I mean: an AI wouldn't be able to understand language close enough to the way a human does to get it that badly wrong, without having a pretty good simulation of a human in the first place - the same way a human understands and construes or misconstrues language. And don't forget that humans do a lot of this in evolved hardware.)

Thinking the opposite

I don't think I'm following -- what is the opposite of what?

I ... don't actually recall what I meant the opposite of!

What I'm actually saying: The "genie" who you order to "cure cancer!" and it understands you well enough to get it hideously wrong and cure cancer by killing everyone - for an AI to go that wrong, I suspect it's got to be just about human, and have a good internal model of a human, and understand you like a human would.

I would be amazed if we're lucky enough to get a first AI that we actually understand or that actually understands us - a copy of the thousand contradictory evolved shards that make up human utility strikes me as a very unlikely first AI, and worrying in particular about the hidden complexity of wishes strikes me as privileging the hypothesis. Surely we're much more likely to end up with something utterly alien out there in mindspace.

Ah, I understand what you're saying now. Thanks!

Yep - humans can't even talk about misunderstandings of human language without postulating an agent that thinks like a human.

Some "simple" algorithms for "understanding" language - such as "All I need is a dictionary and a grammar parser" - lead to predictable categories of mistakes. For example, there's the story of the automatic translating software that, when asked to translate "The spirit is willing but the flesh is weak" into Russian and then back into English, returned "The vodka is good but the meat is rotten".

I've heard that story since I was a kid. It sounds made-up. Does anyone know its actual source?

Apparently the details might well be made up, but the problem itself was real: http://www.snopes.com/language/misxlate/machine.asp

Very real indeed.

Some of these problems arise simply because the translation software doesn't have an ontology and therefore doesn't recognize category mistakes (like vodka not being the right kind of thing to be willing). The other sort of problem is a result of its failure to reason about probable utterances. This would require an even huger amount of knowledge. In practice, this is the issue: it's just too much data to handcode. But it's not a problem in principle, I would think.

The "cure cancer" -> "kill everyone" example is perfectly silly, by the way. In order to get that, you have to reinterpret "cure" in a rather non-trivial way, which actually requires malicious intent.

I think the told/meant distinction is confused. You're conflating different uses of "meant." When somebody misunderstands us, we say "I meant...", but it doesn't follow that when they do understand us we didn't mean what we told them! The "I meant..." is because they didn't get the meaning the first time. I can't do what I'm told without knowing what you meant; in fact, doing what I'm told always implies knowing what you meant. If I tried to follow your command, but didn't know what you meant by your command, I wouldn't be doing what I was told. Doing what I'm told is a success term. Somebody who says "I was just doing what you told me!" is expressing a misunderstanding or an accusation that we didn't make ourselves clear (or perhaps is being mischievous or insubordinate).

There is no following commands without knowing the meaning. The only thing we can do in language without knowing what is meant is to misunderstand, but to misunderstand one must first be able to understand, just as to misperceive one must first be able to perceive. There's no such thing as misunderstanding all the time or misunderstanding everything. The notion of a wish granting genie that always misunderstands you is an entertaining piece of fiction (or comedy), but not a real possibility.

Well, one of my points is that there's no actual distinction. People make a distinction though, because firstly there's cognitive effort on both the saying and listening side to make communication clear, and there's a distinction between things that one side is responsible for and things that other side is responsible for. Secondarily, it is often selfish-optimal to misunderstand commands to some extent which can be attributed to an alternate understanding. Particularly prominent in lawyering.

I think this "use computers to find a cure for cancer" example is misleading. The issue is confusion between optimization and hypothesis generation.

The "cure for cancer" kind of problems are not of the the "we have a bunch of hypotheses, which is the best one?" kind. They are of the "we have no good hypotheses and we need to generate/invent/create some" kind. And optimizers, regardless of how powerful they are, are useless for that.

And optimizers, regardless of how powerful they are, are useless for that.

The (powerful) optimizer needs to have a model of how its optimizations impact that which is to be optimized. A model it adapts. Hypotheses, in other words. "Maximize my life span" would need to deal with cancer.

The (powerful) optimizer needs to have a model of how its optimizations impact that which is to be optimized.

Yes, that's typically called a "fitness function" or a "loss function", depending on the sign.

But the problem is defining the set out of which you pick your "optimizations" to be evaluated. Make it too narrow and your optimum will be outside of it, make it too wide and you'll never find the optimum.

They are of the "we have no good hypotheses and we need to generate/invent/create some" kind

Not really. Firstly, we do have chemistry figured out very well; it's just that cells are complicated and it is very difficult to find the consequences of our interventions, so we tend to throw fairly bad ideas at the wall and see what sticks. Secondarily, generating most plausible hypotheses that fit the data is also an optimization problem. And thirdly, observe that evolution - a rather messy optimization process - did decrease the per-cell cancer rate for a whale to the utterly minuscule fraction of that of a human. And of a human, to a fairly small fraction to that of a dog. (With an advanced optimization, whale's cellular biochemistry may be of use also - a whale has far more cells).

I don't think that the statement

we do have chemistry figured out very well

is consistent with

it is very difficult to find the consequences of our interventions

Otherwise,

generating most plausible hypotheses that fit the data is also an optimization problem

Is not true. In your example of evolution it's sexual reproduction and mutation that "generate hypotheses" -- neither is an optimizer.

Yes, I understand that you can treat hypothesis generation as a traversal of hypothesis space and so a search and so an optimization, but that doesn't seem to be a helpful approach in this instance.

We have chemistry figured out, we don't have "making truly enormous computers to compute enough of that chemistry fast enough" figured out, or "computing chemistry a lot more efficiently" figured out. Does that make it clearer?

I am not entirely clear how do you imagine hypothesis generation happening on a computer, other than by either trying what sticks, or analytically finding the best hypothesis that works by working backwards from the data.

Your position is clear, it's just that I don't agree with it. I don't think that human biochemistry has been figured out (e.g. consider protein structure). I also think that modeling human body at the chemistry level is not a problem of insufficient computing power. It's a problem of insufficient knowledge.

Non-trivial hypothesis generation is very hard to do via software which is one of the reasons why IBM's Watson haven't produced a cure for cancer already. Humans are still useful in some roles :-/

The structure of a protein is determined by the known laws of physics, other compounds in the solution, and protein's formula (which is a trivial translation of the genetic code for that protein). But it is very computationally expensive to simulate for a large, complicated protein. Watson is a very narrow machine that tries to pretend at answering by using a large database of answers. AFAIK it can't even do trivial new answers (what is the velocity of a rock that fell from the height of 131.5 meters? Wolfram Alpha can answer this, but it is just triggered by the keyword 'fell' and 'height')