Folks here should be familiar with most of these arguments. Putting some interesting quotes below:
"Creative blocks: The very laws of physics imply that artificial intelligence must be possible. What's holding us up?"
Remember the significance attributed to Skynet’s becoming ‘self-aware’? [...] The fact is that present-day software developers could straightforwardly program a computer to have ‘self-awareness’ in the behavioural sense — for example, to pass the ‘mirror test’ of being able to use a mirror to infer facts about itself — if they wanted to. [...] AGIs will indeed be capable of self-awareness — but that is because they will be General
Some hope to learn how we can rig their programming to make [AGIs] constitutionally unable to harm humans (as in Isaac Asimov’s ‘laws of robotics’), or to prevent them from acquiring the theory that the universe should be converted into paper clips (as imagined by Nick Bostrom). None of these are the real problem. It has always been the case that a single exceptionally creative person can be thousands of times as productive — economically, intellectually or whatever — as most people; and that such a person could do enormous harm were he to turn his powers to evil instead of good.[...] The battle between good and evil ideas is as old as our species and will go on regardless of the hardware on which it is running
He also says confusing things about induction being inadequate for creativity which I'm guessing he couldn't support well in this short essay (perhaps he explains better in his books). Not quoting here. His attack on Bayesianism as an explanation for intelligence is valid and interesting, but could be wrong. Given what we know about neural networks, something like this does happen in the brain, and possibly even at a concept level.
The doctrine assumes that minds work by assigning probabilities to their ideas and modifying those probabilities in the light of experience as a way of choosing how to act. This is especially perverse when it comes to an AGI’s values — the moral and aesthetic ideas that inform its choices and intentions — for it allows only a behaviouristic model of them, in which values that are ‘rewarded’ by ‘experience’ are ‘reinforced’ and come to dominate behaviour while those that are ‘punished’ by ‘experience’ are extinguished. As I argued above, that behaviourist, input-output model is appropriate for most computer programming other than AGI, but hopeless for AGI.
His final conclusions are disagreeable. He somehow concludes that the principal bottleneck in AGI research is a philosophical one.
In his last paragraph, he makes the following controversial statement:
For yet another consequence of understanding that the target ability is qualitatively different is that, since humans have it and apes do not, the information for how to achieve it must be encoded in the relatively tiny number of differences between the DNA of humans and that of chimpanzees.
This would be false if, for example, the mother controls gene expression while a foetus develops and helps shape the brain. We should be able to answer this question definitively once we can grow human babies completely in vitro. Another problem would be the impact of the cultural environment. A way to answer this question would be to see if our Stone Age ancestors would be classified as AGIs under a reasonable definition
I disagree with this. The development of probabilistic graphical models (incl. bayesian networks and some types of neural networks) was a very important forward advance, I think.
A little bit of arrogance here from Deutsch, but we can let it slide.
Absolutely true, and the first camp still persists to this day, and is still extremely confused/ignorant about universality. It's a view that is espoused even in 'popular science' books.
I don't follow. You can write a program to generate random hypotheses, and you can write a program to figure out the implications of those hypotheses and whether they fit in with current experimental data, and if they do, to come up with tests of those ideas for future experiments. Now, just generating hypotheses completely randomly may not be a very efficient way, but it would work. That's very different from saying "It's impossible". It's just a question of figuring out how to make it efficient. So what's the problem here?
But the Turing test is very different from coming up with an explanation of dark matter. The Turing test is a very specific test of use of language and common sense, which is only defined in relation to human beings (and thus needs human beings to test) whereas an explanation of dark matter does not need human beings to test. Thus making this particular argument moot.
What else could it possibly be? Information is either encoded into a brain, or predicted based on past experiences. There is no other way to gain information. Deutsch gives the example of dates starting with 19- or 20-. Surely, such information is not encoded into our brains from birth. It must be learned from past experiences. But knowledge of dates isn't the only knowledge we have! We have teachers and parents telling us about these things so that we can learn how they work. This all falls under the umbrella of 'past experiences'. And, indeed, a machine who's only inputs were dates would have a tough time making meaningful inferences about them, no matter how intelligent or creative it was.
I cannot make head or tail of this.
Anyway, I stopped reading after this point because it was disappointing. I expected an interesting and insightful argument, one to make me actually question my fundamental assumptions, but that's not the case here.
I think his claim is basically "we don't know yet how to teach a machine how to identify reasonable hypotheses in a short amount of time," where the "short amount of time" is implicit. The proposal "let's just test every possible program, and see which ones explain Dark Matter" is not a workable approach, even if it seems to describe the class that contains actual workable approaches. (Imagine actually going to a conference and proposing a go-bot that considers every possible sequence of moves possible from the current board position, and then picks the tree most favorable to it.)
I think the Turing test is being used as an illustrative example here. It seems unlikely that you could have a genetic algorithm operate on a population of code and end up with a program that passes the Turing test, because at each step the genetic algorithm (as an optimization procedure) needs to have some sense of what is more or less likely to pass the test. It similarly seems unlikely that you could have a genetic algorithm operate on a population of physics explanations and end up with an explanation that successfully explains Dark Matter, because at each step the genetic algorithm needs to have some sense of what is more or less likely to explain Dark Matter.
I think his claim is that a correct inference procedure will point right at the correct answer, but as I disagree with that point I am reluctant to ascribe it to him. I think it likely that a correct inference procedure involves checking out vast numbers of explanations, and discarding most of them very early on. But optimization over explanations instead of over plans is in its infancy, and I think he's right that AGI will be distant so long as that remains the case.
My interpretation of that section is that Deutsch is claiming that "induction" is not a complete explanation. If you say "well, the sun rose every day for as long as I can remember, and I suspect it will do so today," then you get surprised by things like "well, the year starts with 19 every day for as long as I can remember, and I suspect it will do so today." If you say "the sun rises because the Earth rotates around its axis, the sun emits light because of nuclear fusion, and I think the sun has enough fuel to continue shining, angular momentum is conserved, and the laws of physics do not vary with time," then your expectation that the sun will rise is very likely to be concordant with reality, and you are very unlikely to make that sort of mistake with the date. But how do you gets beliefs of that sort to begin with? You use science, which is a bit more complicated than induction.
Similarly, the claim that prediction is unimportant seems to be that the target of an epistemology should be at least one level higher than the output predictions- you don't want "the probability the sun will rise tomorrow" but "conservation of angular momentum" because the second makes you more knowledgeable and more powerful.
My impression was that he was saying that creativity is some mysterious thing that we don't know how to implement. But we do. Creativity is just search. Search that is possibly guided by experience solving similar problems. By learning from past experiences, search becomes more efficient. This idea is quite consistent with studies on how the human brain works. Beginner chess players rely more on 'thinking' (i.e. considering a large variety of moves, most of which are terrible), but grandmasters seem to rely more on their memory.
As I said, though, it's quite different, because a hypothetical explanation for dark matter needs to only be consistent with existing experimental data. It's true that it's unfeasible to do this for the Turing test, because you need to test millions of candidate programs against humans, and this cannot be done inside the computer unless you already have AGI. But checking proposals for dark matter against existing data can be done entirely inside the computer.
I agree with you.
If the machine's only inputs were '1990, 1991, 1992, ... , 1999', and it had no knowledge of math, arithmetic, language, or what years represent, then how on Earth can it possibly make any inference other than the next date will also start with 19? There is no other inference it could make.
On the other hand, if it had access to the sequence '1900, 1901, 1902, ... , 1999' then it becomes a different story. It can infer that 1 always follows 0, 2 always follows 1, etc., and 0 always follows 9. It could also infer that when 0 follows 9, the next digit is incremented. Thus it can conclude that after 1999, the date 2000 is plausible, and add it to its list of highly-plausible hypotheses. Another hypothesis could be that the 3rd digit is never affected, and that the next date after 1999 is 1900.
Equivalently, if it had already been told about math, it would know how number sequences work, and could say with high confidence that the next year will be 2000. Yes, going to school counts as 'past experiences'.
It's is a common mistake that people make when talking about induction. They think induction is simply just 'X has always happened, therefore it will always happen'. But induction is far more complicated than that! That's why it took so long to come up with a mathematical theory of induction (Solomonoff induction). Solomonoff induction considers all possible hypotheses - some of them possibly extremely complex - and weighs them according to how simple they are and if they fit the observed data. That is the very definition of science. Solomonoff induction could accurately predict the progression of dates, and could do 'science'. People have implemented time-limited versions of Solomonoff induction on a computer and they work as expected. We do need to come up with faster and more efficient ways of doing this, though. I agree with that.
I agree that there's a lot more work to be done in AI. We need to find better learning and search algorithms. What I disagree with is that the work must be this kind of philosophical work that Deutsch is proposing. I think the work that needs to be done is very much engineering work.
Correct, but not helpful; when you say "just search," that's like saying "but Dark Matter is just physics." The physicists don't have a good explanation of Dark Matter yet, and the search people don't have a good implementation of creativity (on the level of concepts) yet.
It is not obvious to me that Deutsch is familiar with ideas like Solomonoff induction, Pearl's work on causality, and so on, and thinks that they're inadequate to the task. He might be saying "we need a formalized version of induction" while unaware that Solomonoff already proposed one.
I made it clear what I mean:
Why did I mention this at all? Because there's no other way to do this. Creativity (coming up with new unprecedented solutions to problems) must utilize some form of search, and due to the no-free-lunch theorem, there is no shortcut to finding the solution to a problem. The only thing that can get around no-free-lunch is to consider an ensemble of problems. That is, to learn from past experiences.
And about your point:
I agree with this. The fact that he didn't even mention Solomonoff at all, even in passing, despite the fact that he devoted half the article to talking about induction, is strongly indicative of this.
That doesn't look helpful to me. Yes, you can define creativity this way but the price you pay is that your search space becomes impossibly huge and high-dimensional.
Defining sculpture as a search for a pleasing arrangement of atoms isn't very useful.
After that sentence I made it clear what I mean. See my reply to Vaniver.
"It seems unlikely that you could have a genetic algorithm operate on a population of code and end up with a program that passes the Turing test"
Well, we have one case of it working, and that wasn't even with the process being designed with the "pass the Turing test" specifically as a goal.
"because at each step the genetic algorithm (as an optimization procedure) needs to have some sense of what is more or less likely to pass the test."
Having an automated process for determining with certainty that something passes the Turing test is quite stronger than merely having nonzero information. Suppose I'm trying to use a genetic algorithm to create a Halting Tester, and I have a Halting Tester that says that a program doesn't halt. If I know that the program does, in fact, not halt after n steps (by simply running the program for n steps), that provides nonzero information about the efficacy of my Halting Tester. This suggests that I could create a genetic algorithm for creating Halting Testers (obviously, I couldn't evolve a perfect Halting Tester, but perhaps I could evolve one that is "good enough", given some standard). And who knows, maybe if I had such a genetic algorithm, not only would my Halting Testers evolve better Halting Testing, but since they are competing against each other, they would evolve better Tricking Other Halting Testers, and maybe that would eventually spawn AGI. I don't find that inconceivable.
Are you referring to the biological evolution of humans, or stuff like this?
Right; how did you interpret "some sense of what is more or less likely to pass the test"?
I was referring to the biological evolution of humans; in your link, the process appears to have been designed with the Turing test in mind.
There's probably going to be a lot of guesswork as for as what metrics for "more likely to pass" are best, but the process doesn't have to be perfect, just good enough to generate intelligence. Obvious places to start would be complex games such as Go and poker, and replicating aspects of human evolution, such as simulating hunting and social maneuvering.
Ok. When I said "you," I meant modern humans operating on modern programming languages. I also don't think it's quite correct to equate actual historical evolution and genetic algorithms, for somewhat subtle technical reasons.
My estimate is 80% prediction, with the rest evaluation and tree pruning.
He does - but it isn't pretty.
Here is my review of The Beginning of Infinity: Explanations That Transform the World.
I think this paragraph illustrates the key failure of Deutsch's stance: he assumes all statistical methods must be fundamentally naive. This is about equivalent to assuming all statistical methods operate on small data sets. Of course, if your entire dataset is a moderately long list of numbers that all begin with the number 19, your statistical method will naively assume that the 19s will continue with high probability. But this restriction on the size and complexity of the data set is completely arbitrary. Humans experience, and learn from, an enormously vast data set containing language, images, sound, sensorimotor feedback, and more; all of it indexed by a time variable that permits correlational analysis (the man's lips moved in a certain way and the word 'kimchi' came out). The human learning process constructs, with some degree of success, complex world theories that describe this vast data set. When the brain perceives a sequence of dates, as Deutsch mentions, it does not analyze the sequence in isolation and create a simple standalone theory to do the prediction; rather it understands that the sequence is embedded in a much larger web of interrelated data, and correctly applies the complex world theory to produce the right prediction. In other words, though both the data set and the chosen hypothesis are large and complex, the operation is essentially Bayesian in character. Human brains certainly assume the future is like the past, but we know that the past is more complex than a simple sequential theory would predict; when the future is genuinely unlike the past, humans run into serious difficulty.
I don't want to speak for Deutsch, but since I'm sympathetic to his point of view I'll point out that a better way to formulate the issue would be to say that all statistical methods rest on some assumptions and when these assumptions break the methods fail.
Not at all. The key issue isn't the size of the data set, the key issue is stability of the underlying process.
To use the calendar example, you can massively increase your data set by sampling not every day but, say, every second. And yet this will not help you one little bit.
Not really. Things have to computable before the heat death of the universe. Or, less dramatically and more practically, the answer to the question must be received while there is still the need for an answer. This imposes rather serious restrictions on the size and complexity of the data that you can deal with.
Sometimes correctly. And sometimes incorrectly. Brains operate more by heuristics than by statistical methods and the observation that a heurstic can be useful doesn't help you define under which constraints statistical methods will work.
You do realize that people are working on logical uncertainty under limited time, and this could tell an AI how to re-examine its assumptions? I admit that Gaifman at Columbia deals only with a case where we know the possibilities beforehand (at least in the part I read). But if the right answer has a description in the language we're using, then it seems like E.T. Jaynes theoretically addresses this when he recommends having an explicit probability for 'other hypotheses.'
Then again, if this approach didn't come up when the authors of "Tiling Agents" discuss utility maximization, perhaps I'm overestimating the promise of formalized logical uncertainty.
I remember Eliezer making the same point in a bloggingheads video with Robin Hanson. I believe Hanson's position (although I watched this years ago, and I may be confabulating) was that our intelligence works via the same kludge as other animals and we got better results mostly by developing a better talent for social transmission.
That idea makes a lot of sense to me. I recall reading that chimpanzees have at various times invented more advanced tools (e.g. using spears to hunt bush babies )--but they seem to spread the methods by accident rather than deliberate teaching. New chimpanzee technologies don't seem to persist as they do for humans.
ETA: I looked it up, and I couldn't find a Hanson/Yudkowsky bloggingheads. I'm not sure if it was taken down or if the video was not done through bloggingheads.
He seems to suggest that "humans = chimps + X", therefore what makes intelligence must be a subset of "X", therefore rather small.
Which in my opinion is wrong. Imagine that to make intelligence, you need A and B and C and D. Chimps have A and B and C, but they don't have D. Humans have A and B and C and D, which is why humans are more intelligent than chimps. However, having D alone, without A and B and C, would not be sufficient for intelligence.
The fact that the DNA distance between humans and chimps is small only proves that if we tried to make chimps smarter by genetical engineering, we wouldn't have to change most of their genes. But that is irrelevant for making a machine. We don't have fully chimp-level machines yet.
A Hanson/Yudkowsky bloggingheads?!? Methinks you are mistaken.
I looked it up, and I couldn't find a Hanson/Yudkowsky bloggingheads. I'm not sure if it was taken down or if the video was not done through bloggingheads.
There never was a bloggingheads - AFAIK. There is: Yudkowsky vs Hanson on the Intelligence Explosion - Jane Street Debate. However, I'd be surprised if Yudkowsky makes the same silly mistake as Deutsch. Yudkowsky knows some things about machine intelligence.
Deutsch is interesting. He seems very close to the LW camp, and I think he's someone LWers should at least be familiar with. (This article is not as good an introduction as The Beginning of Infinity, I think.)
I suspect, personally, that the conflict between "Popperian conjecture and criticism" and the LW brand of Bayesianism is a paper tiger. See this comment thread in particular.
Deutsch is right that a huge part of artificial general intelligence is the ability to infer explanatory models from experience from the complete (infinite!) set of possible explanations, rather than just fit parameters to a limited set of hardcoded explanatory models (as AI programs today work). But that's what I think people here think (generally under the name Solomonoff induction).
Deutsch seems pretty clueless in the section quoted below. I don't see why students should be interested in what he has to say on this topic.
He's clever enough to get a lot of things right, and I think the things that he gets wrong he gets wrong for technical reasons. This means it's relatively quick to dispense with his confusions if you know the right response, but if you can't it points out places you need to shore up your knowledge. (Here I'm using the general you; I'm pretty sure you didn't have any trouble, Tim.)
I also think his emphasis on concepts- which seems to be rooted in his choice of epistemology- is a useful reminder of the core difference between AI and AGI, but don't expect it to be novel content for many (instead of just novel emphasis).
What our AIs are missing is not what separates us from the apes. If we could give the AIs what they have now AND what the apes have now, I think we'd have a strong AI (but of course the chances of its being friendly are approximately 0).
"A way to answer this question would be to see if our Stone Age ancestors would be classified as AGIs under a reasonable definition"
I'm confused as to how they would satisfy the A part of AGI.
I'm not sure whether AGI won't come until AI is social - i.e. the mistake is to think of it intelligence as a property of an individual machine, whereas it's more a a property of a transducer (that's of a sufficient level of complexity) embedded in a network. That is so even when it's working relatively independently of the network.
IOW, the tools and materials of intelligence are a social product, even if an individual transducer that's relatively independent works with them in an individual act of intelligence. When I say "product" I mean that the meaning itself is distributed in amongst the network and doesn't reside in any individual.
No AGI until social AI.
Deutsch claims in the article to have proved that any physical process can in principle be emulated at arbitrarily fine detail by a universal quantum Turing machine. Is this proof widely accepted? I tried to read the paper, but the math is beyond me. I've found relatively little discussion of it elsewhere, and most of it critical.
I'd say this falls under the Church-Turing thesis.
In his 1985 paper he seems to be arguing that he uniquely extends the Church-Turing thesis.
How do you propose this should be done? Put sequenced Stone Age genomes in human ova until a statistically significant number survives long enough for cognitive testing? To get the approximate impact of genes we share with Stone Agers or chimps but not both?
I'm convinced the builders of Stonehenge or the Pyramids were so much less "intelligent" than us that if we met them, we'd think of them as intellectually disabled. But I don't think the knowledge whether that's due to genes or culture is worth that kind of experiment.
For your question about the impact of the cultural environment on "intelligence", twin studies make a lot more sense than what I think you're suggesting.
Really? I don't think the average or the moderately above average modern person could design a pyramid, never having heard of one. They might think a big pointy monument would be cool, but that's not the same thing as getting the angles right, building stable interior tunnels, or organizing the work.
I don't think the vast majority of modern people could invent writing, either.
It's not like a someone woke up one day in ancient Egypt and decided to build a pyramid. Someone built a raised platform, someone else realized that if you built a raised platform on top of another raised platform, you could get a higher raised platform, eventually people started making ziggurats (Google Chrome says that's not a correctly spelled word. Hmmm) . Then someone decided that a pyramid was prettier than a ziggurat. And so on and so forth. There was no one person who designed a pyramid from scratch. They were designed over thousands over years, with huge amounts of resources being consumed. I don't think most people could build a bird nest, but that doesn't mean that humans aren't smarter than birds.
The "vast majority" of preliterate people didn't invent writing either. Outliers demonstrate very little.
The original quote mentions the builders of Stonehenge and the Pyramids, and I assume what's intended includes the designers and administrators, not just the people doing the hauling.
Does it seem likely that the middle of the bell curve for preliterate people was a lot lower, even though the outliers were about as high?
Well yes; if nothing else early agricultural societies were probably rather malnourished outside the elite. But chopping twenty points off an average person's IQ does not make him "intellectually disabled", just excruciatingly slow. As opposed to merely painfully slow.
The usual boundary for "mentally disabled" is IQ 70. There are a LOT of IQ 90 people walking around, chopping off twenty points won't work well for them.
And a warning -- as was pointed to me recently IQ points are really ranks. There is no implication that a one point difference (or a twenty point difference) means the same thing in the 70 - 90 context as in the, say, 120 - 140 context.