LESSWRONG
LW

Personal Blog

4

Paper: "A simple function of one parameter (θ) can fit any collection of ordered pairs {Xi,Yi} to arbitrary precision"--implications for Occam?

by fortyeridania
31st May 2018
1 min read
9

4

This is a linkpost for https://colala.bcs.rochester.edu/papers/piantadosi2018one.pdf
Personal Blog

4

Paper: "A simple function of one parameter (θ) can fit any collection of ordered pairs {Xi,Yi} to arbitrary precision"--implications for Occam?
9jimrandomh
4fortyeridania
7Charlie Steiner
5paulfchristiano
6Dacyn
0paulfchristiano
4Dacyn
4yagudin
2fortyeridania
New Comment
9 comments, sorted by
top scoring
Click to highlight new comments since: Today at 6:48 AM
[-]jimrandomh7y90

Careful formulations of Occam's Razor have always spoken of bits of parameters, rather than count of parameters, for basically this reason: a "parameter" could be a boolean yes/no, or a number, or a five-node digraph, or... pretty much anything you can write down. This paper demonstrates that point very well, but I don't think the point itself is new.

Reply
[-]fortyeridania7y40

Well put.

Reply
[-]Charlie Steiner7y70

Naturally, that one parameter has to be very precise in order to work - if you have 1000 bits of data, the parameter will take at least 1000 bits to write down.

Pretty cool scheme for fitting general scatterplots. You could do the same in higher dimensions, but intuitively it seems like you are actually anti-compressing the data. Their point about not measuring complexity by parameter count is made.

Reply
[-]paulfchristiano7y50

I didn't read the paper, but you can already fit arbitrary data with a high frequency sine curve, sin(x/T) for T close to 0 Not sure if they gain anything extra from their complicated-looking function (it looks like their result is weaker).

ETA: I'm totally wrong (I like that you can downvote yourself).

Reply
[-]Dacyn7y60

They want all of the error to be in the vertical direction. So e.g. sin(x/T) cannot approximate (0,1) in their sense for any T. Similarly, it cannot simultaneously approximate (1,0) and (2,1).

Reply
[-]paulfchristiano7y00

Sorry, we need sin(ax + b).

Reply
[-]Dacyn7y40

Still doesn't work. If (xi)3i=1 is an arithmetic progression then (exp(i(axi+b)))3i=1 is a geometric progression, the imaginary part of which is (sin(axi+b))3i=1. This implies that for example sin(ax+b) cannot simultaneously approximate (0,0), (1,0), and (2,1).

Reply
[-]yagudin7y40

Somehow related papers in ML / DL:

  • Keeping NN Simple by Minimizing the Description Length of the Weights (Hinton, 1997);
  • Binarized Neural Networks (Courbariaux, 2016).
Reply
[-]fortyeridania7y20

Thanks to Alex Tabarrok at Marginal Revolution: https://marginalrevolution.com/marginalrevolution/2018/05/one-parameter-equation-can-exactly-fit-scatter-plot.html

Title: "One parameter is always enough"

Author: Steven T. Piantadosi, ( University of Rochester)

Abstract:

We construct an elementary equation with a single real valued parameter that is capable of fitting any “scatter plot” on any number of points to within a fixed precision. Specifically, given given a fixed  > 0, we may construct fθ so that for any collection of ordered pairs {(xj , yj )} n j=0 with n, xj ∈ N and yj ∈ (0, 1), there exists a θ ∈ [0, 1] giving |fθ(xj ) − yj | <  for all j simultaneously. To achieve this, we apply prior results about the logistic map, an iterated map in dynamical systems theory that can be solved exactly. The existence of an equation fθ with this property highlights that “parameter counting” fails as a measure of model complexity when the class of models under consideration is only slightly broad.

After highlighting the two examples in the paper, Tabarrok provocatively writes:

Aside from the wonderment at the result, the paper also tells us that Occam’s Razor is wrong. Overfitting is possible with just one parameter and so models with fewer parameters are not necessarily preferable even if they fit the data as well or better than models with more parameters.

Occam's Razor in its narrow form--the insight that simplicity is renders a claim more probable--is a consequence of the interaction between Kolmogorov complexity and Bayes' theorem. I don't see how this result affects this idea per se. But perhaps it shows the flaws of conceptualizing complexity as "number of parameters."

Reply
Moderation Log
More from fortyeridania
View more
Curated and popular this week
9Comments