Mathematical simplicity bias and exponential functions

[-]byrnema16y80

I have this vague idea that sometime in our past, people thought that knowledge was like an almanac; a repository of zillions of tiny true facts that summed up to being able to predict stuff about stuff, but without a general understanding of how things work. There was no general understanding because any heuristic that would begin to explain how things work would immediately be discounted by the single tiny fact, easily found, that contradicted it. Details and concern with minutia and complexity is actually anti-science for this reason. It’s not that deta... (read more)

5taw16y

I must disagree with premise that biology is not making progress while physics is. As far as I can tell biology is making progress many orders of magnitude larger and more practically significant than physics at the moment. And it requires this messy complex paradigm of accumulating plenty of data and mining it for complicated regularities - even the closest things biology has to "physical laws" like the Central Dogma or how DNA sequences translate to protein sequences, each have enough exceptions and footnotes to fill a small book. The world isn't simple. Simple models are usually very wrong. Exceptions to this pattern like basic physics are extremely unusual, and shouldn't be taken as a paradigm for all science.

2Simon_Jester16y

The catch is that complex models are also usually very wrong. Most possible models of reality are wrong, because there are an infinite legion of models and only one reality. And if you try too hard to create a perfectly nuanced and detailed model, because you fear your bias in favor of simple mathematical models, there's a risk. You can fall prey to the opposing bias: the temptation to add an epicycle to your model instead of rethinking your premises. As one of the wiser teachers of one of my wiser teachers said, you can always come up with a function that fits 100 data points perfectly... if you use a 99th-order polynomial. Naturally, this does not mean that the data are accurately described by a 99th-order polynomial, or that the polynomial has any predictive power worth giving a second glance. Tacking on more complexity and free parameters doesn't guarantee a good theory any more than abstracting them out does.

0byrnema16y

I actually entirely agree with you. Biology is making terrific progress, and shouldn't be overly compared with physics. Two supporting comments: First, when biology is judged as nascent, this may be because it is being overly compared with physics. Success in physics meant finding and describing the most fundamental relationship between variables analytically, but this doesn't seem to be what the answers look like in biology. (As Simon Jester wrote here, describing the low-level rules is just the beginning, not the end.) And the relatively simple big ideas, like the theory of evolution and the genetic code, are still often judged as inferior in some way as scientific principles. Perhaps because they're not so closely identified with mathematical equations. Further, and secondly, the scientific culture that measures progress in biology using the physics paradigm may still be slowing down our progress. While we are making good progress, I also feel a resistance: the reality of biology doesn't seem to be responding well to the scientific epistemology we are throwing at it. But I'm still open-minded, maybe our epistemology needs to be updated or maybe our epistemology is fine and we just need to keep forging on.

3Johnicholas16y

Rather than describing the difference between physics and biology as "simple models" vs. "complex models", describe them in terms of expected information content. Physicists generally expect an eventual Grand Unified Theory to be small in information content (one or a few pages of very dense differential equations, maybe as small as this: http://www.cs.ru.nl/~freek/sm/sm4.gif ). On the order of kilobytes, plus maybe some free parameters. Biologists generally expect an eventual understanding of a species to be much much bigger. At the very least, the compressed human genome alone is almost a gigabyte; a theory describing how it works would be (conservatively) of the same order of magnitude. All things being equal, would biologists prefer a yottabyte-sized theory to a zettabyte-sized theory? No, absolutely not! The scientific preference is still MOSTLY in the direction of simplicity. There's a lot of sizes out there, and the fact that gigabyte-sized theories seem likely to defeat kilobyte-sized theories in the biological domain shouldn't be construed as a violation of the general "prefer simplicity" rule.

0timtyler15y

The uncompressed human genome is about 750 megabytes.

0Johnicholas15y

Thanks, and I apologize for the error.

0Gavin16y

Biology is a special case of physics. Physicists may at some point arrive at a Grand Unified Theory of Everything that theoretically implies all of biology. Biology is the classification and understanding of the complicated results of physics, so it is in many ways basically an almanac.

0byrnema16y

I hope that when we understand biology better, it won't seem like an almanac. I predict that our understanding of what "understanding" means will shift dramatically as we continue to make progress in biology. For example -- just speculating -- perhaps we will feel like we understand something if we can compute it. Perhaps we will develop and run models of biological phenemena as trivially as using a calculator, so that such knowledge seems like an extension of what we "know". And then understanding will mean identifying the underlying rules, while the almanac part will just be the nitty gritty output; like doing a physics calculation for specific forces. (For example, it's pretty neat that WHO is using modeling in real time to generate information about the H1N1 pandemic.)

1Gavin16y

My use of the world "almanac" was more of a reference to the breadth of the area covered by biology, rather than a comment on the difficulty or content of the information. It's funny that you mention predictive modeling--one of the main functions of an Almanac is to provide predictions based on models. From http://en.wikipedia.org/wiki/Almanac: "Modern almanacs include a comprehensive presentation of statistical and descriptive data covering the entire world. Contents also include discussions of topical developments and a summary of recent historical events."

0byrnema16y

Yes, I noticed that I was still nevertheless describing biology as an almanac, as a library of information (predictions) that we will feel like we own because we can generate it. I suppose the best way to say what I was trying to say is that I hope that when we have a better understanding of biology, the term "almanac" won't seem pejorative, but the legitimate way of understanding something that has large numbers of similar interacting components.

0Simon_Jester16y

This is profoundly misleading. Physicists already have a good handle on how the things biological systems are made of work, but it's a moot point because trying to explain the details of how living things operate in terms of subatomic particles is a waste of time. Unless you've got a thousand tons of computronium tucked away in your back pocket, you're never going to be able to produce useful results in biology purely by using the results of physics. Therefore, the actual study of biology is largely separate from physics, except for the very indirect route of quantum physics => molecular chemistry => biochemistry => biology. Most of the research in the field has little to do with those paths, and each step in the indirect chain is another level of abstraction that allows you to ignore more of the details of how the physics itself works.

1Gavin16y

The ultimate goal of physics is to break things down until we discover the simplest, most basic rules that govern the universe. The goals of biology do not lead down what you call the "indirect route." As you state, Biology abstracts away the low-level physics and tries to understand the extremely complicated interactions that take place at a higher level. Biology attempts to classify and understand all of the species, their systems, their subsystems, their biochemistry, and their interspecies and environmental interactions. The possible sum total of biological knowledge is an essentially limitless dataset, what I might call the "Almanac of Life." I'm not sure quite where you think we disagree. I don't see anything in our two posts that's contradictory--unless you find the use of the word "Almanac" disparaging to biologists? I hope it's clear that it wasn't a literal use -- biology clearly isn't a yearly book of tabular data, so perhaps the simile is inapt.

1Simon_Jester16y

The way you put it does seem to disparage biologists, yes. The biologists are doing work that is qualitatively different from what physicists do, and that produces results the physicists never will (without the aforementioned thousand tons of computronium, at least). In a very real sense, biologists are exploring an entirely different ideaspace from the one the physicists live in. No amount of investigation into physics in isolation would have given us the theory of evolution, for instance. And weirdly, I'm not a biologist; I'm an apprentice physicist. I still recognize that they're doing something I'm not, rather than something that I might get around to by just doing enough physics to make their results obvious.

[-]Psychohistorian16y50

I recall a discussion I had with a fellow econ student on the effects of higher taxes. He said something to the effect of, "Higher taxes are inefficient, and all you need to do to prove that is to draw the graph." (Unfortunately the topic changed before I could point out the problems with this statement.)

This (rather common) view reflects two major problems with modeling (particularly in economics): an amoral value (economic efficiency) becomes a normative value because it's relatively easy to understand and (in theory) measure, and, more relevan... (read more)

3Vladimir_Nesov16y

This is related to this post by Katja Grace:

0PhilGoetz16y

I would bet that your fellow econ student became a Republican or Libertarian before convincing himself that higher taxes are provably inefficient. (Higher than what? Failing to have an answer to that proves irrationality.) Confusion induced by ideology is different from confusion induced by math.

1Douglas_Knight16y

Most Democratic academic economists agree with the claim that higher taxes are inefficient ("deadweight loss"). That inefficiency is the main cost of taxation, which must be balanced against the good that can be accomplished by the government using the revenue. ("Higher than what" you ask? Almost any increase in tax is inefficient. But DeLong and Mankiw certainly agree that a height tax is efficient.)

1CronoDAS16y

Well, the key is "what kind of taxes, and on what?" Taxes that distort incentives away from the no-externality, no-taxes perfect competition equilibrium, do create econ-101-style inefficiency, but not all possible taxes distort incentives, not all possible taxes are on things that have no negative externalities, and not all markets are in a perfect competition equilibrium. In the real world, all else is never equal.

[-]SforSingularity16y40

One of biases that are extremely prevalent in science, but are rarely talked about anywhere, is bias towards models that are mathematically simple and easier to operate on.

I think that this is a heuristic rather than a bias, because favoring simple models over complex ones is generally a good thing. In particular, the complexity prior is claimed by some to be a fundemental principle of intelligence.

1taw16y

This is only true as long as difference between simple and complex models are small, and only because simple model avoids overfitting problem. For many orders of magnitude failures choosing a simple and wrong over complex and right is not very intelligent.

0SforSingularity16y

There is a formal theory describing how to balance model complexity against fit to data: describe the model using a program on a simple, fixed turing machine, and then penalize that model by assigning a prior probability to it of 2^-L, where L is the length of the program... this has all been worked out.

1Nick_Tarleton16y

Which Turing machine, though?

0SforSingularity16y

How about picking one of these?

2steven046116y

unpaywalled version

0[anonymous]16y

...using a model. (I suppose someone could argue that it's not complex enough.)

0[anonymous]16y

It can't be bettered on average, assuming that the thing you are modelling is computable. But I haven't seen any proof to say that any other strategy will do worse on average. Anyone got any links?

0SforSingularity16y

See Hutter, Legg.

0[anonymous]16y

If I understand the maths right the important part of http://www.hutter1.net/ai/paixi.ps for using kolmogorov complexity is the part of section 2 that says "The SPΘμ system is best in the sense that EnΘμ ≤ Enρ for any ρ." That doesn't guarantee that this EnΘμ = Enρ for large numbers of different ρ. isn't true which would invalidate any claims of it being the one right way of doing things. I was interested in links to papers with that theorem disproved.

[-]Mike Bishop16y30

To me, the problem is not "Mathematical Simplicity Bias," but rather, failing to check the model with empirical data. It seems totally reasonable to start with a simple model and add complexity necessary to explain the phenomenon. (Of course it is best to test the model on new data.)

Also, if you're going to claim Mathematical Simplicity Bias is, "One of biases that are extremely prevalent in science," it would help to provide real examples of scientists failing because of it.

2Daniel_Burfoot16y

Careful. It is reasonable to add complexity if the complexity is justified by increased explanatory power on a sufficiently large quantity of data. If you attempt to use a complex model to explain a small amount of data, you will end up overfitting the data. Note that this leaves us in a somewhat unpleasant situation: if there is a complex phenomenon regarding which we can obtain only small amounts of data, we may be forced to accept that the phenomenon simply cannot be understood.

2Mike Bishop16y

Yes, this is exactly the point I was getting at when I wrote: "Of course it is best to test the model on new data."

[-]Johnicholas16y30

In general, rules of thumb have two dimensions - applicability (that is the size of the domain where it applies) and efficacy (the amount or degree of guidance that the rule provides).

Simplicity, a.k.a Occam's Razor, is mentioned frequently as a guide in these (philosophy of science/atheist/AI aficionado) circles. However, it is notable more for its broad applicability than for its efficacy compared to other, less-broadly-applicable guidelines.

Try formulating a rule for listing natural numbers (positive integers) without repeats that does not generally tre... (read more)

-3timtyler16y

What's with the Occam bashing? Yes, the OP wrote: "Nature doesn't care all that much for mathematical simplicity." ...but that doesn't make it true: Occam's Razor is great!

[-]CronoDAS16y30

A discharging capacitor is a pretty good fit for exponential decay. (At least, until it's very very close to being completely discharged.)

0Bo10201016y

This is consistent with my experience... I've always been skeptical of very simple models to describe large behaviors, but in my first EE labs classes I was astounded at how well a very simple superposition of functions described empirical measurements.

0taw16y

Funny that you mention it, as I remember performing high school physics experiments with discharging a battery (not a capacitor), and due to heating there was very significant deviation from exponential decay - as battery discharges power, it heats up, and it changes it resistance. That's more 10% kind of error than order of magnitude kind of error. (with a capacitor heating will more likely occur inside the load than inside the capacitor, but you might get similar effect) And of course properties near complete discharge will be very different, what should be very clear on a log-log plot.

[-]fnc16y20

I don't see how they can even try to apply -any- curve to something that has feedbacks over time, like population or gdp. Technology is an obvious confounding variable there, with natural events coming into play as well.

[-]Shalmanese16y20

"All models are wrong, some are useful" - George Box

1Furcas16y

If model X is more useful than model Y, it's probably because model X is closer to the truth than model Y. "All models are wrong" only means that 100% certainty is impossible.

1Johnicholas16y

"If model X is more useful than model Y, it's probably because model X is closer to the truth than model Y." What if model X is tractable in some useful way? Box's emphasis on utility over correctness would be nigh-meaningless if they were the same thing.

1Furcas16y

Sure. To use Eliezer's example, if we want to fire artillery shells, Newtonian mechanics is more useful than general relativity, because we're more interested in computational speed than in accuracy. But that's not the point that the people who say things like the quote above are usually trying to make. When I hear similar statements, it's from people who say they don't believe in a theory because it's true, but because it's useful for making predictions, as if the two concepts were completely disconnected! That said, after googling George Box, he's certainly not one of those people. Wikiquote gives another quote from him, which I like better: "Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful."

[-]RolfAndreassen16y20

Comparing a made-up exponential to a process that no scientist who knew anything about radioactivity would expect to model with anything but a sum of coupled exponentials is a bit of a straw man. There's a bias to simplicity, certainly, but there's not that much bias!

0taw16y

I used radioactivity example because it was painfully (as in smacked in the face by truth) clear what the correct answer is. But people do use stupid models like simple exponential growth for things like population and economic growth all the time.

2PhilGoetz16y

Like certain singulatarian futurists.

2milindsmart9y

So someone has mentioned it on LW after all. Lots of singulatarian ideas depend heavily exponential growth.

[-]Psychohistorian16y20

Obligatory topical XKCD link. Though it's linear, not exponential.

[-]talisman16y10

I do not think your claim is what you think it is.

I think your claim is that some people mistake the model for the reality, the map for the territory. Of course models are simpler than reality! That's why they're called "models."

Physics seems to have gotten wiser about this. The Newtonians, and later the Copenhagenites, did fall quite hard for this trap (though the Newtonians can be forgiven to some degree!). More recently, however, the undisputed champion physical model, whose predictions hold to 987 digits of accuracy (not really), has the ... (read more)

[-]CronoDAS16y10

Any continuous function is approximately linear over a small enough scale. ;)

5SforSingularity16y

False. What you mean is "any differentiable function is approximately linear over a small enough scale". See this

8Eliezer Yudkowsky16y

Heck, any linear function is approximately exponential over a small enough scale.

0SforSingularity16y

Do you mean "the exponential function is approximately linear over a small enough scale"?

2[anonymous]16y

Both are true.

2ArthurB16y

Question is, what do you mean "approximately". If you mean, for any error size, the supremum of distance between the linear approximation and the function is lower than this error for all scales smaller than a given scale, then a necessary and sufficient condition is "continuous". Differentiable is merely sufficient. When the function is differentiable, you can make claims on how fast the error decreases asymptotically with scale.

0Johnicholas16y

And if you use the ArthurB definition of "approximately" (which is an excellent definition for many purposes), then a piecewise constant function would do just as well.

0ArthurB16y

Indeed. But I may have gotten "scale" wrong here. If we scale the error at the same time as we scale the part we're looking at, then differentiability is necessary and sufficient. If we're concerned about approximating the function, on a smallish part, then continuous is what we're looking for.

0CronoDAS16y

Indeed, you can't get a good linear approximation to that function...

0[anonymous]16y

Locally, even an elephant is approximately a tree trunk. Or a rope.

3Johnicholas16y

Under the usual mathematical meanings of "continuous", "function" and so on, this is strictly false. See: http://en.wikipedia.org/wiki/Weierstrass_function It might be true under some radically intuitionist interpretation (a family of philosophies I have a lot of sympathy with). For example, I believe Brouwer argued that all "functions" from "reals" to "reals" are "continuous", though he was using his own interpretation of the terms inside of quotes. However, such an interpretation should probably be explained rather than assumed. ;)

1ArthurB16y

No he's right. The Weierstrass function can be approximated with a piecewise linear function. It's obvious, pick N equally spaced points and join then linearly. For N big enough, you won't see the difference. It means that is is becoming infinitesimally small as N gets bigger.

0SforSingularity16y

that's because you can't "see" the The Weierstrass function in the first place, because our eyes cannot see functions that are everywhere (or almost everywhere) nondifferentiable. When you look at a picture of the The Weierstrass function on google image search, you are looking at a piecewise linear approaximation of it. Hence, if you compare what you see on google image search with a piecewise linear approaximation of it, they will look the same...

0[anonymous]16y

I'm sort of annoyed by your insistence that the Weierstrass function cannot be approximated by piecewise linear functions when, after all, it is the limit of a series of piecewise linear functions. RTFM.

-1SforSingularity16y

that is because our eyes cannot see nowhere differentiable functions, so a "picture" of the Weierstrass function is some piecewise linear function that is used as a human-readable symbol for it. Consider that when you look at a "picture" of the Weierstrass function and pick a point on it, you would swear to yourself that the curve happens to be "going up" at that point. Think about that for a second: the function isn't differentialble - it isn't "going" anywhere at that point!

2ArthurB16y

That is because they are approximated by piecewise linear functions. It means on any point you can't make a linear approximation whose precision increases like the inverse of the scale, it doesn't mean you can't approximate.

0SforSingularity16y

taboo "approximate" and restate.

1ArthurB16y

I defined approximate in an other comment. Approximate around x : for every epsilon > 0, there is a neighborhood of x over which the absolute difference between the approximation and the approximation function is always lower than epsilon. Adding a slop to a small segment doesn't help or hurt the ability to make a local approximation, so continuous is both sufficient and necessary.

1SforSingularity16y

ok, but with this definition of "approximate", a piecewise linear function with finitely many pieces cannot approximate the Weierstrass function. Furthermore, two nonidentical functions f and g cannot approximate each other. Just choose, for a given x, epsilon less than f(x) and g(x); then no matter how small your neighbourhood is, |f(x) - g(x)| > epsilon.

1ArthurB16y

The original question is whether a continuous function can be approximated by a linear function at a small enough scale. The answer is yes. If you want the error to decrease linearly with scale, then continuous is not sufficient of course.

-2SforSingularity16y

I think we have just established that the answer is no... for the definition of "approximate" that you gave...

1ArthurB16y

Hum no you haven't. The approximation depends on the scale of course.

0CronoDAS16y

Yeah, you're right. I think I needed to say any analytic function, or something like that.

0tut16y

Mathematically he should have said "any C1 function". But if you are measuring with a tolerance level that allows a step function to be called exponential, then we can probably say that any continuous function is analytic too.

2Vladimir_Nesov16y

Which is the origin of many physical laws, since Nature usually doesn't care about scale at which nonlinear effects kick in, leaving huge areas of applicability for the laws based on linear approximation.

[-][anonymous]16y10

My computer is biased toward not running at 100 petahertz and having 70 petabytes of RAM. My brain is biased toward not using so many complicated models that it needs 1 trillion neurons each with 1 million connections and firing up to 10,000 times per second.

And now for something perhaps more useful than sarcasm, it seems to me that people tend to simply come up with the consistent model that is either the easiest one to compute or the simplest one to describe. Are heuristics for inconsistency, such as "exponential growth/decay rarely occurs in nature", quickly spread and often used? How about better approximations such as logistical growth?

3cousin_it15y

Hahaha, anthropic Occam's Razor! If a science allows simple theories that can fit in our tiny brains, we call it a good science and observe with satisfaction that it "obeys Occam". If a science doesn't allow simple theories, we call it a bad science and go off to play somewhere else! Come to think of it, physics seems to be the only science where Occam's Razor actually works. Even math is a counterexample: there's no law of nature saying simple theorems should have short proofs, and easy-to-formulate statements like 4-color or Fermat's last can cause huge explosions of complexity when you try to prove them.

2[anonymous]15y

Occam's razor still applies. If we're looking for the most elegant possible proof of a theorem (whatever that means), any sufficiently short proof is much more likely to be it than any sufficiently long proof. If you want to take a completely wild guess about what statement an unknown theorem proves, you're better off guessing short statements than long ones.

2cousin_it15y

Could you try to make that statement more precise? Because I don't believe it. If you take the shortest possible proofs to all provable theorems of length less than N, both the maximum and the average length of those proofs will be extremely (uncomputably) fast-growing functions of N. To see that, imagine Gödel-like self-referential theorems that say "I'm not provable in less than 3^^^^3 steps" or somesuch. They're all true (because otherwise the axiom system would prove a false statement), short and easy to formulate, trivially seen to be provable by finite enumeration, but not elegantly provable because they're true. Another way to reach the same conclusion: if "expected length of shortest proof" were bounded from above by some computable f(N) where N is theorem length in bits, we could write a simple algorithm that determines whether a theorem is provable: check all possible proofs up to length f(N)*2^N. if the search succeeds, say "yes". If the search fails, the shortest proof (if it exists) must be longer than f(N)*2^N, which is impossible because that would make the average greater than f(N). Therefore no shortest proof exists, therefore no proof exists at all, so say "no". But we know that provability cannot be decidable by an algorithm, so f(N) must grow uncomputably fast.

0[anonymous]15y

Could you give a precise meaning to that statement? I can't think of any possible meaning except "if a proof exists, it has finite length", which is trivial. Are short proofs really more likely? Why?

0wedrifid15y

More emphasis on the most elegant possible.

0cousin_it15y

Sorry for deleting my comment, I got frustrated and rewrote it. See my other reply to grandparent.

0wedrifid15y

I don't believe it either, by the way.

[-][anonymous]16y00

I recall a discussion I had with a fellow econ student on the effects of higher taxes. He said, "Higher taxes are inefficient; should I draw the graph." (Unfortunately the topic changed before I could dissect this for him.)

[-]MendelSchmiedekamp16y00

Generally (and therefore somewhat inaccurately) speaking, one way that our brains seem to handle the sheer complexity computing in the real world us is a tendency to simplify the information we gather.

In many cases these sorts of extremely simple models didn't start that way. They may have started with more parameters and complexity. But as they were repeated, explained and applied the model becomes, in effect, simpler. The example begins to represent the entire model, rather than serving to show only a piece of it.

Technically the exponential radioactive... (read more)

0fburnaby16y

So pretty much, this: http://en.wikipedia.org/wiki/Medawar_zone

0MendelSchmiedekamp16y

No. The Medawar zone is more about scientific discoveries as marketable products to the scientific community, not the cultural and cognitive pressures of those communities which affect how those products are used as they become adopted. Different phenomena, although there are almost certainly common causes.

0taw16y

If errors were a few percent randomly up or down it wouldn't matter, but the inaccuracy is not tiny, over long timescales it's many orders of magnitude, and almost always in the same direction - growth/decay are slower over long term than exponential models predicts.

0MendelSchmiedekamp16y

Oh yes, but it's not just a prediliction for simple models in the first place, but also a tendency to culturally and cognitively simplify the model we access to use - even if the original model had extensions to handle this case and even to the tune of orders of magnitude of error. Of course sometimes it may be worth computing an estimate that is (unknown to you) orders of magnitude off, in a very short amount of time. Certainly if the impact of the estimate is delayed and subtle less conscious trade-offs may factor in between cognitive effort to access and use a more detailed model and the consequences of error. Yet another form of akrasia.

LESSWRONG
LW

LESSWRONG
LW

16

Mathematical simplicity bias and exponential functions

16

16