Deriving Our World From Small Datasets

One problem here is that there is a tradeoff between data and compute, and we are already at a bad place for compute.

People plausibly estimate the current universe with all its contents at kilobits (Standard Model equations + initial Big Bang conditions); that's radical data efficiency, hard to do better than that. So this minimal dataset problem is already solved. But even if someone handed the Turing machine of the universe to you, you can't compute it, it's too expensive. That's the problem with Kolmogorov complexity: it is the shortest program given unlimited compute. And it spends any amount of compute for a shorter program, implying that longer programs (up to simply encoding the answer literally) can be much more compute-efficient.

So if they hand you megabytes of text for free, that should make it way easier to run a cheap program... but as the Hutter Prize shows, we can't even run the programs which would solve that either! So we definitely can't run the programs which are more sample-efficient than the Hutter Prize.

And if, even if we had the minimal datasets, we couldn't do anything with them, what's the point?

[-]Capybasilisk4y10

That’s the problem with Kolmogorov complexity: it is the shortest program given unlimited compute. And it spends any amount of compute for a shorter program

I don't see why it's assumed that we'd necessarily be searching for the most concise models rather than, say, optimizing for CPU cycles or memory consumption. I'm thinking of something like Charles Bennett's Logical Depth.

These types of approaches also take it for granted that we're conducting an exhaustive search of model-space, which yes, is ludicrous. Of course we'd burn through our limited compute trying to brute-force the space. There's plenty of room for improvement in a stochastic search of models which, while still expensive, at least has us in the realm of the physically possible. There might be something to be said for working primarily on the problem of probabilistic search in large, discrete spaces before we even turn to the problem of trying to model reality.

(Standard Model equations + initial Big Bang conditions); that’s radical data efficiency,

Allow me to indulge in a bit of goal-post shifting.

A dataset like that gives us the entire Universe, ie. Earth and a vast amount of stuff we probably don't care about. There might come a point where I care about the social habits of a particular species in the Whirlpool Galaxy, but right now I'm much more concerned about the human world. I'm far more interested in datasets that primarily give us our world, and through which the fundamental workings of the Universe can be surmised. That's why I nominated the VIX as a simple, human/Earth-centric dataset that perhaps holds a great amount of extractible information.

[-]gwern4y30

rather than, say, optimizing for CPU cycles or memory consumption

As I already pointed out, we already do. And turns out that you need to optimize more for CPU/memory, past the kilobytes of samples which are already flabby and unnecessary from the point of view of KC. And more. And more. Go right past 'megabyte' without even stopping. Still way too small, way too compute/memory-hungry. And a whole bunch more beyond that. And then you hit the Hutter Prize size, and that's still too optimized for sample-efficiency, and we need to keep going. Yes, blow through 'gigabyte', and then more, more, and some more - and eventually, a few orders of magnitude sample-inefficiency later, you begin to hit projects like GPT-3 which are finally getting somewhere, having traded off enough sample-inefficiency (hundreds of gigabytes) to bring the compute requirements down into the merely mortal realm.

A dataset like that gives us the entire Universe, ie. Earth and a vast amount of stuff we probably don't care about.

You can locate the Earth in relatively few bits of information. Off the top of my head: the observable universe is only 45 billion lightyears radius; how many bits could an index into that possibly take? 24 bits to encode distance from origin in lightyears out of 45b, maybe another 24 bits to encode angle? <50 bits for such a crude encoding, giving an upper bound. You need to locate the Earth in time as well? Another <20 bits or so to pin down which year out of ~4.5b years. If you can do KC at all, another <60 bits or so shouldn't be a big deal...

[-]Alex Vermillion4y10

I'm not so sure that knowing primes tells you anything useful about cicadas, for the simple reason that prime numbers function just as well on planets that with or without cicadas. Because of this, learning about primes doesn't inform your knowledge of whether or not cicadas are real by very much (in the best case).

I think this applies similarly to much of the middle part of your thesis, and I think the end kind of just puts your theoretical AI into already-existing "machine learning" territory (unless I'm misunderstanding something here).

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

5

Deriving Our World From Small Datasets

5

5