Suppose that human beings had absolutely *no idea* how they performed arithmetic. Imagine that human beings had *evolved,* rather than having *learned,* the ability to count sheep and add sheep. People using this built-in ability have no idea how it worked, the way Aristotle had no idea how his visual cortex supported his ability to see things. Peano Arithmetic as we know it has not been invented. There are philosophers working to formalize numerical intuitions, but they employ notations such as

Plus-Of(Seven, Six) = Thirteen

to formalize the intuitively obvious fact that when you add "seven" plus "six", of course you get "thirteen".

In this world, pocket calculators work by storing a giant lookup table of arithmetical facts, entered manually by a team of expert Artificial Arithmeticians, for starting values that range between zero and one hundred. While these calculators may be helpful in a pragmatic sense, many philosophers argue that they're only *simulating* addition, rather than really *adding.* No machine can really* count* - that's why humans have to count thirteen sheep before typing "thirteen" into the calculator. Calculators can recite back stored facts, but they can never know what the statements mean - if you type in "two hundred plus two hundred" the calculator says "Error: Outrange", when it's intuitively *obvious,* if you *know* what the words *mean*, that the answer is "four hundred".

Philosophers, of course, are not so naive as to be taken in by these intuitions. Numbers are really a purely formal system - the label
"thirty-seven" is meaningful, not because of any inherent property of
the words themselves, but because the label *refers to* thirty-seven sheep in
the external world. A number is given this referential property by its *semantic
network* of relations to other numbers. That's why, in computer programs, the LISP token for "thirty-seven" doesn't need any *internal* structure - it's only meaningful because of reference and relation, not some computational property of "thirty-seven" itself.

No one has ever developed an Artificial General Arithmetician, though of course there are plenty of domain-specific, narrow Artificial Arithmeticians that work on numbers between "twenty" and "thirty", and so on. And if you look at how slow progress has been on numbers in the range of "two hundred", then it becomes clear that we're not going to get Artificial General Arithmetic any time soon. The best experts in the field estimate it will be at least a hundred years before calculators can add as well as a human twelve-year-old.

But not everyone agrees with this estimate, or with merely conventional beliefs about Artificial Arithmetic. It's common to hear statements such as the following:

- "It's a framing problem - what 'twenty-one plus' equals depends on whether it's 'plus three' or 'plus four'. If we can just get enough arithmetical facts stored to cover the common-sense truths that everyone knows, we'll start to see real addition in the network."
- "But you'll never be able to program in that many arithmetical facts by hiring experts to enter them manually. What we need is an Artificial Arithmetician that can
*learn*the vast network of relations between numbers that humans acquire during their childhood by observing sets of apples." - "No, what we really need is an Artificial Arithmetician that can understand natural language, so that instead of having to be explicitly told that twenty-one plus sixteen equals thirty-seven, it can get the knowledge by exploring the Web."
- "Frankly, it seems to me that you're just trying to convince yourselves that you can solve the problem. None of you really know what arithmetic is, so you're floundering around with these generic sorts of arguments. 'We need an AA that can learn X', 'We need an AA that can extract X from the Internet'. I mean, it sounds good, it sounds like you're making progress, and it's even good for public relations, because everyone thinks they understand the proposed solution - but it doesn't really get you any closer to
*general*addition, as opposed to domain-specific addition. Probably we will never know the fundamental nature of arithmetic. The problem is just too hard for humans to solve." - "That's why we need to develop a general arithmetician the same way Nature did - evolution."
- "Top-down approaches have clearly failed to produce arithmetic. We need a bottom-up approach, some way to make arithmetic
*emerge.*We have to acknowledge the basic unpredictability of complex systems." - "You're all wrong. Past efforts to create machine arithmetic were futile from the start, because they just didn't have enough computing power. If you look at how many trillions of synapses there are in the human brain, it's clear that calculators don't have lookup tables anywhere near that large. We need calculators as powerful as a human brain. According to Moore's Law, this will occur in the year 2031 on April 27 between 4:00 and 4:30 in the morning."
- "I believe that machine arithmetic will be developed when researchers scan each neuron of a complete human brain into a computer, so that we can simulate the biological circuitry that performs addition in humans."
- "I don't think we have to wait to scan a whole brain. Neural networks are just like the human brain, and you can train them to do things without knowing how they do them. We'll create programs that will do arithmetic without we, our creators, ever understanding how they do arithmetic."
- "But Gödel's Theorem shows that no formal system can ever capture the basic properties of arithmetic. Classical physics is formalizable, so to add two and two, the brain must take advantage of quantum physics."
- "Hey, if human arithmetic were simple enough that we could reproduce it in a computer, we wouldn't be able to count high enough to build computers."
- "Haven't you heard of John Searle's Chinese Calculator Experiment? Even if you did have a huge set of rules that would let you add 'twenty-one' and 'sixteen', just imagine translating all the words into Chinese, and you can see that there's no genuine addition going on. There are no real
*numbers*anywhere in the system, just labels that humans use for numbers..."

There is more than one moral to this parable, and I have told it with different morals in different contexts. It illustrates the idea of levels of organization, for example - a CPU can add two large numbers because the numbers aren't black-box opaque objects, they're ordered structures of 32 bits.

But for purposes of overcoming bias, let us draw two morals:

- First, the danger of believing assertions you can't regenerate from your own knowledge.
- Second, the danger of trying to dance around basic confusions.

Lest anyone accuse me of generalizing from fictional evidence, both lessons may be drawn from the real history of Artificial Intelligence as well.

The first danger is the object-level problem that the AA devices ran into: they functioned as tape recorders playing back "knowledge" generated from outside the system, using a process they couldn't capture internally. A human could tell the AA device that "twenty-one plus sixteen equals thirty-seven", and the AA devices could record this sentence and play it back, or even pattern-match "twenty-one plus sixteen" to output "thirty-seven!", but the AA devices couldn't generate such knowledge for themselves.

Which is strongly reminiscent of believing a physicist who tells you "Light is waves", recording the fascinating words and playing them back when someone asks "What is light made of?", without being able to generate the knowledge for yourself. More on this theme tomorrow.

The second moral is the meta-level danger that consumed the Artificial Arithmetic researchers and opinionated bystanders - the danger of dancing around confusing gaps in your knowledge. The tendency to do just about anything *except* grit your teeth and buckle down and fill in the damn gap.

Whether you say, "It is emergent!", or whether you say, "It is unknowable!", in neither case are you acknowledging that there is a basic insight required which is possessable, but unpossessed by you.

How can you know when you'll have a new basic insight? And there's no way to get one except by banging your head against the problem, learning everything you can about it, studying it from as many angles as possible, perhaps for years. It's not a pursuit that academia is set up to permit, when you need to publish at least one paper per month. It's certainly not something that venture capitalists will fund. You want to either go ahead and build the system *now,* or give up and do something else instead.

Look at the comments above: none are aimed at setting out on a quest for the missing insight which would *make numbers no longer mysterious,* make "twenty-seven" more than a black box. None of the commenters realized that their difficulties arose from ignorance or confusion in their own minds, rather than an inherent property of arithmetic. They were not trying to achieve a state where the confusing thing ceased to be confusing.

If you read Judea Pearl's "Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference" then you will see that the basic insight behind graphical models is *indispensable* to problems that require it. (It's not something that fits on a T-Shirt, I'm afraid, so you'll have to go and read the book yourself. I haven't seen any online popularizations of Bayesian networks that adequately convey the reasons behind the principles, or the importance of the math being exactly the way it is, but Pearl's book is wonderful.) There were once dozens of "non-monotonic logics" awkwardly trying to capture intuitions such as "If my burglar alarm goes off, there was probably a burglar, but if I then learn that there was a small earthquake near my home, there was probably not a burglar." With the graphical-model insight in hand, you can give a mathematical explanation of exactly why first-order logic has the wrong properties for the job, and express the correct solution in a compact way that captures all the common-sense details in one elegant swoop. Until you have that insight, you'll go on patching the logic here, patching it there, adding more and more hacks to force it into correspondence with everything that seems "obviously true".

You won't *know* the Artificial Arithmetic problem is unsolvable without its key. If you don't know the rules, you don't know the rule that says you need to know the rules to do anything. And so there will be all sorts of clever ideas
that seem like they might work, like building an Artificial
Arithmetician that can read natural language and download millions of
arithmetical assertions from the Internet.

And yet *somehow* the clever ideas never work. Somehow it always turns out that you "couldn't see any reason it wouldn't work" because you were ignorant of the obstacles, not because no obstacles existed. Like shooting blindfolded at a distant target - you can fire blind shot
after blind shot, crying, "You can't prove to me that I won't hit the center!" But until you take off the blindfold, you're not even in the
aiming game. When "no one can prove to you" that your precious idea *isn't* right, it means you don't have enough information to strike a small target in a vast answer space. *Until you know your idea will work, it won't.*

From the history of previous key insights in Artificial Intelligence, and the grand messes which were proposed prior to those insights, I derive an important real-life lesson: *When the basic problem is your ignorance, clever strategies for bypassing your ignorance lead to shooting yourself in the foot.*

Well, shooting randomly at a distant target is more likely to produce a bulls-eye than not shooting at all, even though you're almost certainly going to miss (and probably shoot yourself in the foot while you're at it). It's probably better to try to find a way to take off that blindfold. As you suggest, we don't yet understand intelligence, so there's no way we're going to make an intelligent machine without either significantly improving our understanding or winning the proverbial lottery.

"Programming is the art of figuring out what you want so precisely that even a machine can do it." - Some guy who isn't famous

Well shooting randomly is perhaps a bad idea, but I think the best we can do is shoot systematically, which is hardly better (takes exponentially many bullets). So you either have to be lucky, or hope the target isn't very far, so you don't need to a wide cone to take pot shots at, or hope P=NP.

@Doug & Gray: AGI is a William Tell target. A near miss could be very unfortunate. We can't responsibly take a proper shot till we have an appropriate level of understanding and confidence of accuracy.

Eliezer,

Did you include your own answer to the question of why AI hasn't arrived yet in the list? :-)

This is a nice post. Another way of stating the moral might be: "If you want to understand something, you have to stare your confusion right in the face; don't look away for a second."

So, what is confusing about intelligence? That question is problematic: a better one might be "what isn't confusing about intelligence?"

Here's one thing I've pondered at some length. The VC theory states that in order to generalize well a learning machine m... (read more)

That's not how William Tell managed it. He had to practice aiming at less-dangerous targets until he became an expert, and only then did he attempt to shoot the apple.

It is not clear to me that it is desirable to prejudge what an artificial intelligence should desire or conclude, or even possible to purposefully put real constraints on it in the first place. We should simply create the god, then acknowledge the truth: that we aren't capable of evaluating the thinking of gods.

Adding to DanBurFoot, is there a link you want to point to that shows your real, tangible results for AI, based on your superior methodology?

For what it's worth, Benoit Essiambre, the things you have just said are nonsense. The reason logicians seem to be unable to make a distinction between 1.999... and 2 is that there is no distinction. They are not two different definable real numbers, they are the same definable real number.

Yes, by "take a proper shot" I meant shooting at the proper target with proper shots. And yes, practice on less-dangerous targets is necessary, but it's not sufficient.

... (read more)No, by 2 I mean 1.999...

A_A

1.9999... = 2 is not an "issue" or a "paradox" in mathematics.

If you use a limited number of digits in your calculations, then your quantization errors can accumulate. (And suppose the quantity you are measuring is the difference of two much larger numbers.)

Of course it's possible that there's nothing in the real world that corresponds exactly to our so-called "real numbers". But until we actually know what smaller-scale structure it is that we're approximating, it would be crazy to pick some arbitrary "lower-resolution&q... (read more)

"...mathematics that represent continuous scales which would be best represented by the real numbers system with the limited significant digits."

If you limit the number of significant digits, your mathematics are discrete, not continuous. I'm guessing the concept you're really after is the idea of computable numbers. The set of computable numbers is a dense countable subset of the reals.

With the graphical-network insight in hand, you can give a mathematical explanation of exactly why first-order logic has the wrong properties for the job, and express the correct solution in a compact way that captures all the common-sense details in one elegant swoop.Consider the following example, from Menzies's "Causal Models, Token Causation, and Processes"[*]:

An assassin puts poison in the king's coffee. The bodyguard responds by pouring an antidote in the king's coffee. If the bodyguard had not put the antidote in the coffee, the king would... (read more)

"But until we actually know what smaller-scale structure".

From http://en.wikipedia.org/wiki/Planck_Length: "Combined, these two theories imply that it is impossible to measure position to a precision greater than the Planck length, or duration to a precision greater than the time a photon traveling at c would take to travel a Planck length"

Therefore, one could in fact say that all time- and distance- derived measurements can in fact be truncated to a fixed number of decimal places without losing any real precision, by using precisions b... (read more)

"When the basic problem is your ignorance, clever strategies for bypassing your ignorance lead to shooting yourself in the foot."

I like this lesson. It rings true to me, but the problem of ego is not one to be overlooked. People like feeling smart and having the status of being a "learned" individual. It takes a lot of courage to profess ignorance in today's academic climate. We are taught that we have such sophisticated techniques to solve really hard problems. There are armies of scientists and engineers working to advance our so... (read more)

anonymous--I'd like to second that motion

I read a book on the philosophy of set theory -- and I get lost right at the point where classical infinite thought was replaced by modern infinite thought. IIRC the problem was paradoxes based on infinite recursion (Zeno et. all) and finding mathematical foundations to satisfy calculus limits. Then something about Cantor, cardinality and some hand wavy 'infinite sets are real!'.

1.999... is just an infinite set summation of finite numbers 1 + 0.9 + 0.09 + ...

Now, how an infinite process on an infinite set can equal an integer is a problem I still grapple... (read more)

Better question: why do you insist that those examples are of failures to acknowledge intelligence when you also insist that we are unable to meaningfully define intelligence?

mclaren, your comment is way too long. I have truncated it and emailed you the full version. Feel free to post the comment to your blog, then post a link to the blog here.

Anonymous (re Planck scales etc.), sure you can truncate your representations of lengths at the Planck length, and likewise for your representations of times, but this doesn't simplify your

numbersystem unless you have acceptable ways of truncating all the other numbers you need to use. And, at present, we don't. Sure, maybe really the universe is best considered as some sort of discrete network with some funky structure on it, but that doesn't give us any way of simplifying (or making more appropriate) our mathematics until we know just what sort of disc... (read more)Thanks g for the tip about computable numbers, that's pretty much what I had in mind. I didn't quite get from the wikipedia article if these numbers could or could not replace the reals for all of useful mathematics but it's interesting indeed.

James, I share your feelings of uneasiness about infinite digits, as you said, the problem is not that these numbers will not represent the same points at the limit but that they shouldn't be taken to the limit so readily as this doesn't seem to add anything to mathematics but confusion.

@James:

If I recall my Newton correctly, the only way to take this "sum of an infinite series" business consistently is to interpret it as shorthand for the

limitof an infinite series. (Cf. Newton'sPrincipia Mathematica, Lemma 2. The infinitesimally wide parallelograms are dubitably real, but the area under the curve between the sets of parallelograms is clearly a real, definite area.)@Benoit:

Why shouldn't we take 1.9999... as just another, needlessly complicated (if there's no justifying context) way of writing "2"? Just as I could conceivably count "1, 2, 3, 4, d(5x)/dx, 6, 7" if I were a crazy person.

Benquo, I see two possible reasons:

1) '2' leads to confusion as to whether we are representing a real or a natural number. That is, whether we are counting discrete items or we are representing a value on a continuum. If we are counting items then '2' is correct.

2) If it is clear that we are representing numbers on a continuum, I could see the number of significant digits used as an indication of the amount of uncertainty in the value. For any real problem there is

alwaysuncertainty caused by A) the measuring instrument and B) the representation system it... (read more)Benoit Essiambre,

Right now Wikipedia's article is claiming that calculus cannot be done with computable numbers, but a Google search turned up a paper from 1968 which claims that differentiation and integration can be performed on functions in the field of computable numbers. I'll go and fix Wikipedia, I suppose.

Benoit Essiambre,

You say:

"1) '2' leads to confusion as to whether we are representing a real or a natural number. That is, whether we are counting discrete items or we are representing a value on a continuum."

If I recall correctly, this "confusion" is what allowed modern, atomic chemistry. Chemical substances -- measured as continuous quantities -- seem to combine in simple natural-number ratios. This was the primary evidence for the existence of atoms.

What is the practical negative consequence of the confusion you're trying to avoid?... (read more)

Benoit, it was "Cyan" and not me who mentioned computable numbers.

Benoit, you assert that our use of real numbers leads to confusion and paradox. Please point to that confusion and paradox.

Also, how would your proposed number system represent pi and e? Or do you think we don't need pi and e?

Well, for example, the fact that two different real represent the same point. 2.00... 1.99... , the fact that they are not computable in a finite amount of time. pi and e are quite representable within a computable number system otherwise we couldn't reliably use pi and e on computers!

Benoit, those are

two different waysofwritingthesamereal, just like 0.333... and 1/3 (or 1.0/3.0, if you insist) are the same number. That'snota paradox. 2 is a computable number, and thus so are 2.000... and 1.999..., even though you can't write downthose ways of expressing themin a finite amount of time. See the definition of a computable number if you're confused.1.999... = 2.000... = 2. Period.

Benoit,

In the decimal numeral system, every number with a terminating decimal representation also has a non-terminating one that ends with recurring nines. Hence, 1.999... = 2, 0.74999... = 0.75, 0.986232999... = 0.986233, etc. This isn't a paradox, and it has nothing to do with the precision with which we measure actual real things. This sort of recurring representation happens in any positional numeral system.

You seem very confused as to the distinction between what numbers are and how we can represent them. All I can say is, these matters have been well thought out, and you'd profit by reading as much as you can on the subject and by trying to avoid getting too caught up in your preconceptions.

This old post led me to an interesting question: will AI find itself in the position of our fictional philosophers of addition? The basic four functions of arithmetic are so fundamental to the operation of the digital computer that an intelligence built on digital circuitry might well have no idea of how it adds numbers together (unless told by a computer scientist, of course).

Bog: You are correct. That is, you do not understand this article at all. Pay attention to the first word, "Suppose..."

We are not talking about how calculators are designed in reality. We are discussing how they are designed in a hypothetical world where the mechanism of arithmetic is not well-understood.

"Like shooting blindfolded at a distant target"

So long as you know where the target is within five feet, it doesn't matter how small it is, how far away it is, whether or not you're blindfolded, or whether or not you even know how to use a bow. You'll hit it on a natural twenty. http://www.d20srd.org/srd/combat/combatStatistics.htm#attackRoll

Thread necromancy:

It occured to me that a real life example of this kind of thing is

grammar. I don't know what the grammatical rules are for which of the words "I" or "me" should be used when I refer to myself, but I can still use those words with perfect grammar in everyday life*. This may be a better example to use since it's one that everyone can relate to.*I do use a rule for working out whether I should say "Sarah and I" or "Sarah and me", but that rule is just "use whichever one you would use if you were just talking about youself". Thinking about it now I can guess at the "I/me" rule, but there's plenty of other grammar I have no idea about.

McDermott's old article, "Artificial Intelligence and Natural Stupidity" is a good reference for suggestively-named tokens and algorithms.

Someone needs to teach them how to count: {}, {{}}, {{},{{}}}, {{},{{}},{{},{{}}}}...

Gah! Any field with a publishing requirement like that... I shudder.

And... is it me, or is this one of the stupidest discussion threads on this site?

"I don't think we have to wait to scan a whole brain. Neural networks are just like the human brain, and you can train them to do things without knowing how they do them. We'll create programs that will do arithmetic without we, our creators, ever understanding how they do arithmetic."

This sort of anti-predicts the deep learning boom, but only sort of.

Fully connected networks didn't scale effectively; researchers had to find (mostly principled, but some ad-hoc) network structures that were capable of more efficiently learning complex patterns.

A... (read more)

I'd be interested to hear whether it is the case that people were saying things like this about AGI two, three, four, five decades ago:

... (read more)I've pointed out the cases of Moravec (1997) and Shane Legg pre-DM (~2009) as saying pretty much exactly that and in the case of Legg, influencing his DM founding timeline. I am pretty sure that if you were able to go back and do a thorough survey of the connectionist literature and influenced people, you'd find more instances.

For example, yesterday I was collating my links on AI Dungeon and I ran into a 1989 text adventure talk by Doug Sharp mostly about his

King of Chicago& simple world/narrative simulation approach to IF, where before discussingKing, to my shock, he casually drops in Moravec's 1988Mind Children's forecast for human-level compute in 2030 and compute as a prerequisite for "having this AI problem licked", and notesWell, I can't disagree with that! It's only 2021, and AI Dungeon and its imitators owe essentially nothing to the last 46 years of IF, and have to inven... (read more)

At first I was perplexed, thinking that Yudkovsky for some reason wants to use programs for AI, and not neural networks. This article showed me very clearly why you need to understand the general principle first, and not try to do anything now. Even if you can randomly find answers to a specific quadratic equation, it won't solve even other quadratic equations, let alone cubic or any other problem in mathematics.