I am suspicious of attempts to define intelligence for the following reason. Too often, they lead the definer down a narrow and ultimately fruitless path. If you define intelligence as the ability to perform some function XYZ, then you can sit down and start trying to hack together a system that does XYZ. Almost invariably this will result in a system that achieves some superficial imitation of XYZ and very little else.
Rather than attempting to define intelligence and move in a determined path toward that goal, we should look around for novel insights and explore their implications.
Imagine if Newton had followed the approach of "define physics and then move toward it". He may have decided that physics is the ability to build large structures (certainly an understanding of physics is helpful or required for this). He might then have spent all his time investigating the material properties of various kinds of stone - useful perhaps, but misses the big picture. Instead he looked around in the most unlikely places to find something interesting that had very little immediate practical application. That should be our mindset in pursuing AI: the scientist's, rather than the engineer's, approach.
@Aron, wow, from your initial post I thought I was giving advice to an aspiring undergraduate, glad to realize I'm talking to an expert :-)
Personally I continually bump up against performance limitations. This is often due to bad coding on my part and the overuse of Matlab for loops but I still have the strong feeling that we need faster machines. In particular, I think full intelligence will require processing VAST amounts of raw unlabelled data (video, audio, etc) and that will require fast machines. The application of statistical learning techniques to vast unlabeled data streams is about to open new doors. My take on this idea is spelled out better here.
Aron, I don't think anyone really knows the general requirements for AGI, and therefore nobody knows what (if any) kind of specialized hardware is necessary. But if you're a hardware guy and you want something to work on, you could read Pearl's book (mentioned above) and find ways to implement some of the more computationally intensive inference algorithms in hardware. You might also want to look up the work by Geoff Hinton et al on reduced Boltzmann machines and try to implement the associated algorithms in hardware.
Eliezer, of course in order to construct AI we need to know what intelligence really is, what induction is, etc. But consider an analogy to economics. Economists understand the broad principles of the economy, but not the nuts and bolts details. The inability of the participants to fully comprehend the market system hardly inhibits its ability to function. A similar situation may hold for intelligence: we might be able to construct intelligent systems with only an understanding of the broad principles, but not the precise details, of thought.
I mean that a superintelligent AI should be able to induce the Form of the Good from extensive study of humans, human culture, and human history. The problem is not much different in principle from inducing the concept of "dog" from many natural images, or the concept of "mass" from extensive experience with physical systems.
@Eliezer - I think Shane is right. "Good" abstractions do exist, and are independent of the observer. The value of an abstraction relates to its ability to allow you to predict the future. For example, "mass" is a good abstraction, because when coupled with a physical law it allows you to make good predictions.
If we assume a superintelligent AI, we have to assume that the AI has the ability to discover abstractions. Human happiness is one such abstraction. Understanding the abstraction "happiness" allows one to predict certain events related to human activity. Thus a superintelligent AI will necessarily develop the concept of happiness in order to allow it to predict human events, in much the same way that it will develop a concept of mass in order to predict physical events.
Plato had a concept of "forms". Forms are ideal shapes or abstractions: every dog is an imperfect instantiation of the "dog" form that exists only in our brains. If we can accept the existence of a "dog" form or a "house" form or a "face" form, then it is not difficult to believe in the existence of a "good" form. Plato called this the Form of the Good. If we assume an AI that can develop its own forms, then it should be able to discover the Form of the Good.
"Yeah? Let's see your aura of destiny, buddy."
I don't want to see your aura of destiny. I just want to see your damn results! :-)
In my view, the creation of an artificial intelligence (friendly or otherwise) would be a much more significant achievement than Einstein's, for the following reason. Einstein had a paradigm: physics. AI has no paradigm. There is no consensus about what the important problems are. In order to "solve" AI, one not only has to answer a difficult problem, one has to begin by defining the problem.
This may be nitpicking and I agree with your overarching point, but I think you're drawing a false dichotomy between Science and Bayes. Science is the process of constructing theories to explain data. The theory must optimize a tradeoff between two terms:
1) ability to explain data
2) compactness of the theory
If one is willing to ignore or gloss over the second requirement, the process becomes nonsense. One can easily construct a theory of astrology which explains the motion of the planets, the weather, the fates of lovers, and violence in the Middle East. It just won't be a compact theory. So Science and Bayes are one and the same.
I suggest a lot of caution in thinking about how entropy appears in thermodynamics and information theory. All of statistical mechanics is based on the concept of energy, which has no analogue in information theory. Some people would suggest that for this reason the two quantities should not be called by the same term.
the "temperature" isn't a uniform speed of all the molecules, it's an average speed of the molecules, which in turn corresponds to a predictable statistical distribution of speeds
I assume you know this, but some readers may not: temperature is not actually equivalent to energy/speed, but rather to the derivative of entropy with respect to energy:
1/T = dS/dE
This is why we observe temperature equilibriation: two systems in thermal contact trade energy to maximize the net entropy of the ensemble. Thus in equilibrium a small shift in energy from one system to the other must not change the ensemble energy ==> the temperature of the systems must be equal.
In almost all real systems, temperature and energy are monotonically related, so you won't go too far astray by thinking of temperature as energy. However, in theory one can imagine systems that are forced into a smaller number of states as their energies increase (dS/dE < 0) and so in fact have negative temperature:
Prof. Jaynes would doubtless be surprised by the power of algorithms such as Markov Chain Monte Carlo, importance sampling, and particle filtering. The latter method is turning out to be one of the most fundamental and powerful tools in AI and robotics. A particle filter-like process has also been proposed to lie at the root of cognition, see Lee and Mumford "Hierarchical Bayesian Inference in the Visual Cortex".
The central difficulty with Bayesian reasoning is its deep, deep intractability. Some probability distributions just can't be modeled, other than by random sampling.
Another way to think about probabilities of 0 and 1 is in terms of code length.
Shannon told us that if we know the probability distribution of a stream of symbols, then the optimal code length for a symbol X is:
l(X) = -log p(X)
If you consider that an event has zero probability, then there's no point in assigning a code to it (codespace is a conserved quantity, so if you want to get short codes you can't waste space on events that never happen). But if you think the event has zero probability, and then it happens, you've got a problem - system crash or something.
Likewise, if you think an event has probability of one, there's no point in sending ANY bits. The receiver will also know that the event is certain, so he can just insert the symbol into the stream without being told anything (this could happen in a symbol stream where three As are always followed by a fourth). But again, if you think the event is certain and then it turns out not to be, you've got a problem: the receiver doesn't get the code you want to send.
If you refuse to assign zero or unity probabilities to events, then you have a strong guarantee that you will always be able to encode the symbols that actually appear. You might not get good code lengths, but you'll be able to send your message. So Eliezer's stance can be interpreted as an insistence on making sure there is a code for every symbol sequence, regardless of whether that sequence appears to be impossible.