Followup toEfficient Cross-Domain Optimization

Shane Legg once produced a catalogue of 71 definitions of intelligence.  Looking it over, you'll find that the 18 definitions in dictionaries and the 35 definitions of psychologists are mere black boxes containing human parts.

However, among the 18 definitions from AI researchers, you can find such notions as

"Intelligence measures an agent's ability to achieve goals in a wide range of environments" (Legg and Hutter)


"Intelligence is the ability to optimally use limited resources - including time - to achieve goals" (Kurzweil)

or even

"Intelligence is the power to rapidly find an adequate solution in what appears a priori (to observers) to be an immense search space" (Lenat and Feigenbaum)

which is about as close as you can get to my own notion of "efficient cross-domain optimization" without actually measuring optimization power in bits.

But Robin Hanson, whose AI background we're going to ignore for a moment in favor of his better-known identity as an economist, at once said:

"I think what you want is to think in terms of a production function, which describes a system's output on a particular task as a function of its various inputs and features."

Economists spend a fair amount of their time measuring things like productivity and efficiency.  Might they have something to say about how to measure intelligence in generalized cognitive systems?

This is a real question, open to all economists.  So I'm going to quickly go over some of the criteria-of-a-good-definition that stand behind my own proffered suggestion on intelligence, and what I see as the important challenges to a productivity-based view.  It seems to me that this is an important sub-issue of Robin's and my persistent disagreement about the Singularity.

(A)  One of the criteria involved in a definition of intelligence is that it ought to separate form and function.  The Turing Test fails this - it says that if you can build something indistinguishable from a bird, it must definitely fly, which is true but spectacularly unuseful in building an airplane.

(B)  We will also prefer quantitative measures to qualitative measures that only say "this is intelligent or not intelligent".  Sure, you can define "flight" in terms of getting off the ground, but what you really need is a way to quantify aerodynamic lift and relate it to other properties of the airplane, so you can calculate how much lift is needed to get off the ground, and calculate how close you are to flying at any given point.

(C)  So why not use the nicely quantified IQ test?  Well, imagine if the Wright Brothers had tried to build the Wright Flyer using a notion of "flight quality" build around a Fly-Q test standardized on the abilities of the average pigeon, including various measures of wingspan and air maneuverability.  We want a definition that is not parochial to humans.

(D)  We have a nice system of Bayesian expected utility maximization.  Why not say that any system's "intelligence" is just the average utility of the outcome it can achieve?  But utility functions are invariant up to a positive affine transformation, i.e., if you add 3 to all utilities, or multiply all by 5, it's the same utility function.  If we assume a fixed utility function, we would be able to compare the intelligence of the same system on different occasions - but we would like to be able to compare intelligences with different utility functions.

(E)  And by much the same token, we would like our definition to let us recognize intelligence by observation rather than presumption, which means we can't always start off assuming that something has a fixed utility function, or even any utility function at all.  We can have a prior over probable utility functions, which assigns a very low probability to overcomplicated hypotheses like "the lottery wanted 6-39-45-46-48-36 to win on October 28th, 2008", but higher probabilities to simpler desires.

(F)  Why not just measure how well the intelligence plays chess?  But in real-life situations, plucking the opponent's queen off the board or shooting the opponent is not illegal, it is creative.  We would like our definition to respect the creative shortcut - to not define intelligence into the box of a narrow problem domain.

(G)  It would be nice if intelligence were actually measurable using some operational test, but this conflicts strongly with criteria F and D.  My own definition essentially tosses this out the window - you can't actually measure optimization power on any real-world problem any more than you can compute the real-world probability update or maximize real-world expected utility.  But, just as you can wisely wield algorithms that behave sorta like Bayesian updates or increase expected utility, there are all sorts of possible methods that can take a stab at measuring optimization power.

(H)  And finally, when all is said and done, we should be able to recognize very high "intelligence" levels in an entity that can, oh, say, synthesize nanotechnology and build its own Dyson Sphere.  Nor should we assign very high "intelligence" levels to something that couldn't build a wooden wagon (even if it wanted to, and had hands).  Intelligence should not be defined too far away from that impressive thingy we humans sometimes do.

Which brings us to production functions.  I think the main problems here would lie in criteria DE.

First, a word of background:  In Artificial Intelligence, it's more common to spend your days obsessing over the structure of a problem space - and when you find a good algorithm, you use that algorithm and pay however much computing power it requires.  You aren't as likely to find a situation where there are five different algorithms competing to solve a problem and a sixth algorithm that has to decide where to invest a marginal unit of computing power.  Not that computer scientists haven't studied this as a specialized problem.  But it's ultimately not what AIfolk do all day.  So I hope that we can both try to appreciate the danger of deformation professionelle.

Robin Hanson said:

"Eliezer, even if you measure output as you propose in terms of a state space reduction factor, my main point was that simply 'dividing by the resources used' makes little sense."

I agree that "divide by resources used" is a very naive method, rather tacked-on by comparison.  If one mind gets 40 bits of optimization using a trillion floating-point operations, and another mind achieves 80 bits of optimization using two trillion floating-point operations, even in the same domain using the same utility function, they may not at all be equally "well-designed" minds.  One of the minds may itself be a lot more "optimized" than the other (probably the second one).

I do think that measuring the rarity of equally good solutions in the search space smooths out the discussion a lot.  More than any other simple measure I can think of.  You're not just presuming that 80 units are twice as good as 40 units, but trying to give some measure of how rare 80-unit solutions are in the space; if they're common it will take less "optimization power" to find them and we'll be less impressed.  This likewise helps when comparing minds with different preferences.

But some search spaces are just easier to search than others.  I generally choose to talk about this by hiking the "optimization" metric up a meta-level: how easy is it to find an algorithm that searches this space?  There's no absolute easiness, unless you talk about simple random selection, which I take as my base case.  Even if a fitness gradient is smooth - a very simple search - e.g. natural selection would creep down it by incremental neighborhood search, while a human would leap through by e.g. looking at the first and second derivatives.  Which of these is the "inherent easiness" of the space?

Robin says:

Then we can talk about partial derivatives; rates at which output increases as a function of changes in inputs or features...  Yes a production function formulation may abstract from some relevant details, but it is far closer to reality than dividing by "resources."

A partial derivative divides the marginal output by marginal resource.  Is this so much less naive than dividing total output by total resources?

I confess that I said "divide by resources" just to have some measure of efficiency; it's not a very good measure.  Still, we need to take resources into account somehow - we don't want natural selection to look as "intelligent" as humans: human engineers, given 3.85 billion years and the opportunity to run 1e44 experiments, would produce products overwhelmingly superior to biology.

But this is really establishing an ordering based on superior performance with the same resources, not a quantitative metric.  I might have to be content with a partial ordering among intelligences, rather than being able to quantify them.  If so, one of the ordering characteristics will be the amount of resources used, which is what I was getting at by saying "divide by total resources".

The idiom of "division" is based around things that can be divided, that is, fungible resources.  A human economy based on mass production has lots of these.  In modern-day computing work, programmers use fungible resources like computing cycles and RAM, but tend to produce much less fungible outputs.  Informational goods tend to be mostly non-fungible: two copies of the same file are worth around as much as one, so every worthwhile informational good is unique.  If I draw on my memory to produce an essay, neither the sentences of the essay, or the items of my memory, will be substitutable for one another.  If I create a unique essay by drawing upon a thousand unique memories, how well have I done, and how much resource have I used?

Economists have a simple way of establishing a kind of fungibility-of-valuation between all the inputs and all the outputs of an economy: they look at market prices.

But this just palms off the problem of valuation on hedge funds.  Someone has to do the valuing.  A society with stupid hedge funds ends up with stupid valuations.

Steve Omohundro has pointed out that for fungible resources in an AI - and computing power is a fungible resource on modern architectures - there ought to be a resource balance principle: the marginal result of shifting a unit of resource between any two tasks should produce a decrease in expected utility, relative to the AI's probability function that determines the expectation.  To the extent any of these things have continuous first derivatives, shifting an infinitesimal unit of resource between any two tasks should have no effect on expected utility.  This establishes "expected utilons" as something akin to a central currency within the AI.

But this gets us back to the problems of criteria D and E.  If I look at a mind and see a certain balance of resources, is that because the mind is really cleverly balanced, or because the mind is stupid?  If a mind would rather have two units of CPU than one unit of RAM (and how can I tell this by observation, since the resources are not readily convertible?) then is that because RAM is inherently twice as valuable as CPU, or because the mind is twice as stupid in using CPU as RAM?

If you can assume the resource-balance principle, then you will find it easy to talk about the relative efficiency of alternative algorithms for use inside the AI, but this doesn't give you a good way to measure the external power of the whole AI.

Similarly, assuming a particular relative valuation of resources, as given by an external marketplace, doesn't let us ask questions like "How smart is a human economy?"  Now the relative valuation a human economy assigns to internal resources can no longer be taken for granted - a more powerful system might assign very different relative values to internal resources.

I admit that dividing optimization power by "total resources" is handwaving - more a qualitative way of saying "pay attention to resources used" than anything you could actually quantify into a single useful figure.  But I pose an open question to Robin (or any other economist) to explain how production theory can help us do better, bearing in mind that:

  • Informational inputs and outputs tend to be non-fungible;
  • I want to be able to observe the "intelligence" and "utility function" of a whole system without starting out assuming them;
  • I would like to be able to compare, as much as possible, the performance of intelligences with different utility functions;
  • I can't assume a priori any particular breakdown of internal tasks or "ideal" valuation of internal resources.

I would finally point out that all data about the market value of human IQ only applies to variances of intelligence within the human species.  I mean, how much would you pay a chimpanzee to run your hedge fund?

New Comment
9 comments, sorted by Click to highlight new comments since:

Eliezer: have you given any thought to the problem of choosing a measure on the solution space? If you're going to count bits of optimization, you need some way of choosing a measure. In the real world solutions are not discrete and we cannot simply count them.

My (not so "fake") hint:

Think economics of ecologies. Coherence in terms of the average mutual information of the paths of trophic I/O provides a measure of relative ecological effectiveness (absent prediction or agency.) Map this onto the information I/O of a self-organizing hierarchical Bayesian causal model (with, for example, four major strata for human-level environmental complexity) and you should expect predictive capability within a particular domain, effective in principle, in relation to the coherence of the hierarchical model over its context.

As to comparative evaluation of the intelligence of such models without actually running them, I suspect this is similar to trying to compare the intelligence of phenotypical organisms by comparing the algorithmic complexity of their DNA.


I'm afraid that I'm not sure precisely what your measure is, and I think this is because you have given zero precise examples: even of its subcomponents. For example, here are two optimization problems:

1) You have to output 10 million bits. The goal is to output them so that no two consecutive bits are different.

2) You have to output 10 million bits. The goal is to output them so that when interpreted as an MP3 file, they would make a nice sounding song.

Now, the solution space for (1) consists of two possibilities (all 1s, all 0s) out of 2^10000000, for a total of 9,999,999 bits. The solution space for (2) is millions of times wider, leading to fewer bits. However, intuitively, (2) is a much harder problem and things that optimized (2) are actually doing more of the work of intelligence, after all (1) can be achieved in a few lines of code and very little time or space, while (2) takes much more of these resources.

(2) is a pretty complex problem, but can you give some specifics for (1)? Is it exactly 9,999,999 bits? If so, is this the 'optimization power'? Is this a function of the size of the solution space and the size of the problem space only? If there was another program attempting to produce a sequence of 100 million bits coding some complex solution to a large travelling salesman problem, such that only two bitstrings suffice, would this have the same amount of optimization power?, or is it a function of the solution space itself and not just its size?

Without even a single simple example, it is impossible to narrow down your answer enough to properly critique it. So far I see it as no more precise than Legg and Hutter's definition.

Sorry, I didn't see that you had answered most of this question in the other thread where I first asked it.

Toby, if you were too dumb to see the closed-form solution to problem 1, it might take an intense effort to tweak the bit on each occasion, or perhaps you might have trouble turning the global criterion of total success or failure into a local bit-fixer; now imagine that you are also a mind that finds it very easy to sing MP3s...

The reason you think one problem is simple is that you perceive a solution in closed form; you can imagine a short program, much shorter than 10 million bits, that solves it, and the work of inventing this program was done in your mind without apparent effort. So this problem is very trivial on the meta-level because the program that solves it optimally appears very quickly in the ordering of possible programs and is moreover prominent in that ordering relative to our instinctive transformations of the problem specification.

But if you were trying random solutions and the solution tester was a black box, then the alternating-bits problem would indeed be harder - so you can't be measuring the raw difficulty of optimization if you say that one is easier than the other.

This is why I say that the human notion of "impressiveness" is best constructed out of a more primitive notion of "optimization".

We also do, legitimately, find it more natural to talk about "optimized" performance on multiple problems than on a single problem - if we're talking about just a single problem, then it may not compress the message much to say "This is the goal" rather than just "This is the output."

I take it then that you agree that (1) is a problem of 9,999,999 bits and that the travelling salesman version is as well. Could you take these things and generate an example which doesn't just give 'optimization power', but 'intelligence' or maybe just 'intelligence-without-adjusting-for-resources-spent'. You say over a set of problem domains, but presumably not over all of them given the no-free-lunch theorems. Any example, or is this vague?


But if you were trying random solutions and the solution tester was a black box

Then you're not solving the same optimization problem anymore. If the black box just had two outputs, "good" and "bad", then, yes, a black box that accepts fewer input sequences is going to be one that is harder to make accept. On the other hand, if the black box had some sort of metric on a scale from "bad" going up to "good", and the optimizer could update on the output each time, the sequence problem is still going to be much easier than the MP3 problem.

I am not sure you are taking into account the possibility that an intelligence may yield optimal performance within a specific recource-range. Would a human mind given a 10x increase in memmory (and memmories) opperate even marginally better? Or would it be overwhelmed by an amount of information it was not prepared for? Similarly, would a human mind even be able to operate given half the computational resources? In comparing mind A with 40bits/1trillionFPO with the Mind B of 80bits/2trillionFPO may be a matter of how many resources are available, since we don't have any datapoints about how much they each yield given the other's resources.

So perhaps the trendy term of scalability might be one dimension of the intelligence metric you seek. Can a mind take advantage of additional resources if they are made available? I suspect that an intelligence A that can scale up and down (to a specific minimum) linearly may be thought of as superior to an intelligence B that may yield a higher optimization output for a specific amount of resources but is unable to scale up or down.

My parents just arrived for a week long visit so I've been distracted - have meant no disrespect to the reasonable question posed. Will respond ASAP.

The concept of a resource can be defined within ordinary decision theory: something is a resource iff it can be used towards multiple goals and spending it on one goal makes the resource unavailable for spending on a different goal. In other words, it is a resource iff spending it has a nontrivial opportunity cost. Immediately we have two implications: whether or not something is a resource to you depends on your ultimate goal and (2) diving by resources spent is useful only for intermediate goals: it never makes sense to care how efficiently an agent uses its resources to achieve its ultimate goal or to satisfy its entire system of terminal values.

If humanity was forced to choose a simple optimization process to submit itself to I think capitalism would be our best bet.