(Cross-posted from another blog but written with LessWrong in mind. Don't worry, if this post isn't well-received then for LessWrong the series will end with this post.)


Summary: This post is the beginning of a systematic attempt to answer the question "what is the most important thing?". (Updateless) decision theory is used to provisionally define "importance" and Juergen Schmidhuber's theory of beauty is introduced as a possible answer to the question. The motivations for bringing in Schmidhuber's theory are discussed. This post is also intended to serve as an example of how to understand and solve hard problems in general, and emphasizes the heuristic "go meta".


 

This post is the first in a series about what might be the most important question we know to ask: What is the most important thing?

Don't try to answer the question yet. When faced with a tough question our first instinct should always be to go meta. What is it that causes me to ask the question "what is the most important thing"? What makes me think the question is itself important? Is that thing important? Does it point to itself as the most important thing? If not, then where does it point? Does the thing it points to, point to itself? If we follow this chain, where do we end up? How path-dependent is the answer? How much good faith do we have to assume on the part of the various things, to trust that they'll give their honest opinions? If we can't simply assume good faith, can we design a mechanism to promote honesty? What mechanisms are already in place, and are there cheap, local improvements we can make for those mechanisms?

And to ask all those questions we have to assume various commonsense notions that we might in fact need to pin down more precisely beforehand. Like, what is importance? Luckily we have some tools we can use to try to figure that part out.

Decision theory is one such tool. In Bayesian decision theory "importance" might be a fair name for what is measured by your decision policy, which you get by multiplying your beliefs by your value function. Informally, your decision policy tells you what options or actions to pay most attention to, or what possibilities are most important. But arguably it's your values themselves that should be considered "important", and your beliefs just tell you how the important stuff relates to what is actually going on in the world. Of the decision policy and the utility function, which should we provisionally consider a better referent for "importance"?

Luckily, decision theories like updateless decision theory (UDT) un-ask the question for us. As the name suggests, unlike Bayesian decision theories like Eliezer's timeless decision theory, UDT doesn't update its beliefs. It just has a utility function which specifies what actions it should take in all of the possible worlds it finds itself in. It doesn't care about the state of the world on top of its utility function—i.e., it doesn't have beliefs—because what worlds it cares about is a fact already specified by its utility function, and not something added in. So "importance" can only be one thing, and it's a surprisingly simple notion that's powerful enough to solve simple decision problems. UDT has problems with mathematical uncertainty and reflection—it has a magical "mathematical intuition module", and weird things happen when it proves things about its own output after taking into account that it will always give the "optimal" solution to a problem—but those issues don't change the fact that decision theory's notion of importance is a decent provisional notion for us to work with.

Of course, many meta-ethicists would have reservations about defining importance this way. They would say that (moral) importance isn't something agent-specific: it's an objective fact of the universe what's (morally) important. But even given that, as bounded agents we have to find out what's actually important somehow, so when we're making decisions we can talk about our best guess at what's important without committing ourselves to any meta-ethical position. The kind of importance that has bearing on all our decisions is a prescriptive notion of importance, not a descriptive one nor a normative one. It's our agent-specific, best approximation of normative importance.

So given our decision theoretic notion of importance we can get back to the question given above: what is the most important thing? If counterfactually we had all of our values represented as a utility function, what would be the term that had the most utility associated with it? We don't know how to talk about them computationally, but for now we'll let ourselves use vague human concepts. Would the most important thing be eudaimonia, maybe? How about those other Aristotelian emphases of arete (virtue) and phronesis (practical and moral wisdom)? Maybe the sum of all three? Taken together they surely cover a lot of ground.

Various answers are plausible, but again, this is a perfect time to go meta. What causes the question "what is the most important thing?" to rise to our attention, and what causes us to try to find the answer?

One reason we ask is that it's an interesting question of its own accord. We want to understand the world, and we're curious about the answers to some questions even when they don't seem to have any practical significance, like with chess problems or with jigsaw puzzles. We're curious by nature.

We can always go meta again, we can always seek whence cometh a sequence [pdf]. What causes us to be interested in things, and what causes things to be interesting? It might be a subtle point that these can be distinct questions. Maybe aliens are way more interested in sorting pebbles into prime-numbered heaps than we are. In that case we might want to acknowledge that sorting pebbles into prime-numbered heaps can be interesting in a certain general sense—it just doesn't really interest us. But we might be interested that the aliens find it interesting: I'd certainly want to know why the aliens are so into prime numbers, pebbles, and the conjunction of the two. Given my knowledge of psychology and sociology their hypothetical fixation strikes me as highly unlikely. And that brings us to the question of what in general, in a fairly mind-universal sense, causes things to be interesting.

Luckily we can take a computational perspective to get a preliminary answer. Juergen Schmidhuber's theory of beauty and other stuff is an attempt to answer the question of what makes things interesting. The best introduction to his theory is his descriptively-titled paper "Driven by Compression Progress: A Simple Principle Explains Essential Aspects of Subjective Beauty, Novelty, Surprise, Interestingness, Attention, Curiosity, Creativity, Art, Science, Music, Jokes". Here's the abstract:

I argue that data becomes temporarily interesting by itself to some self-improving, but computationally limited, subjective observer once he learns to predict or compress the data in a better way, thus making it subjectively simpler and more beautiful. Curiosity is the desire to create or discover more non-random, nonarbitrary, regular data that is novel and surprising not in the traditional sense of Boltzmann and Shannon but in the sense that it allows for compression progress because its regularity was not yet known. This drive maximizes interestingness, the first derivative of subjective beauty or compressibility, that is, the steepness of the learning curve. It motivates exploring infants, pure mathematicians, composers, artists, dancers, comedians, yourself, and (since 1990) artificial systems.

This compression-centric formulation of beauty and interestingness reminds me of a Dinosaur Comic:

In Schmidhuber's beautiful and interesting theory, compression plays a key role, and explains many things that we find important. So is compression the most important thing? Should we structure our decision theory around a compression progress drive, as Schmidhuber has done with some of his artificial intelligences?

I doubt it—I don't think we've gone meta enough. But we'll further consider that question, and continue our exploration of the more important question "what's the most important thing?" in future posts.

New Comment
17 comments, sorted by Click to highlight new comments since: Today at 9:23 PM

Thanks. Burfoot's book is mostly irrelevant to the post: it's about epistemology, whereas the post is about meta-ethics and axiology.

A literature review is an essential part of any serious publication. Shows that you did your homework and so gives you more credibility.

Granted. I didn't think it necessary because I don't think Schmidhuber's theory is a legitimate answer to the question, it's just a step on the way to finding an answer. Also, Burfoot's book doesn't include the relevant aspect of Schmidhuber's theory, which is the axiological aspect, i.e. the aspect pertaining to beauty and so on. A literature review of meta-ethics, on the other hand, would be more relevant, but wouldn't be necessary for the modest nature of this post. Existent meta-ethics will start needing referencing in Part II and beyond, though it doesn't look like Part II will show up on LessWrong.

If you don't do Part 2 here, please post replies here with pointers to your blog so I'll know when they're available.

It's not irrelevant at least as background - in the sense that I suspect even a fair number of LWers are wondering why compression is supposed to be such an overarching paradigm it could cover all of science, much less extend further to meta-ethics and axiology.

Granted, and a good point. Schmidhuber's paper should be enough of an explanation, but Burfoot's book lends additional credibility to the notion, and of course gives us additional information on the subject.

The merits of Schmidhuber's formulation would be discussed in Part II, but it seems that this post won't be received well, so even if Part II will be posted elsewhere it probably won't appear on LessWrong. (ETA: Actually, Part II likely won't be put here in any case, as it might start to justify meta-ethical theism, and many LessWrong users will see the conclusion, meta-ethical theism, and infer by backwards-chaining that the arguments must be wrong even before seriously considering them. I don't wish to cause opprobrium on LessWrong, so Part II likely won't show up here—but I do wish to note that my silence shouldn't be taken as approval of such mind-killed epistemic habits. (The God question is, of course, extremely political.) That said, Part II might not get to God—I might try to structure the series such that God is introduced at the very beginning of Part III. Please note that Part I has nothing at all to do with God. )

Seems like this post is missing a decent summary/abstract.

I don't think that "Most Important" is a label that catches a single, unambiguous thing. "Important" is a big label that we use to say something like, "Is in the class of things deserving my focus and attention."

"Most important" would then seem to be something like "The single item most deserving my focus and attention."

(If there is another definition of Important being used here, that points to Important being an ambiguous term that may need to be better defined.)

But there probably isn't a single item that is always most deserving my focus and attention. If I haven't slept in 3 days, sleep becomes the most important thing for me. If I haven't eaten in 2 days, food becomes the most important thing for me.

I suppose you can try to abstract away the "my" to "the single item most deserving (everyone's?) focus and attention" or create some specialized caveats like "aside from human needs" or something like that, but even then I am not sure that a single item gets the sort of consistent Importance that you are implying. I suppose it is possible that you have found some incredibly clever method, maybe finding a unifying principle under all of the things a person might find important, but I am skeptical.

So you're searching for "the most important thing", and reason that this is the same as searching for some utility function, and then you note that one reason this question seems worth thinking about is because it's interesting, and then you refer to Schmidhuber's definition of interestingness (which would yield a utility function), and note that it is itself interesting, so maybe importance is the same as interestingness, because importance has to be itself important and (Schmidhuberian) interestingness satisfies this requirement by being itself interesting

At this point I'm not very impressed. This seems to be the same style of reasoning that gets people obsessed with "complexity" or "universal instrumental values" as the ultimate utility functions.

At the end you say you doubt that interestingness is the ultimate utility function, too, but apparently you still think engaging in this style of reasoning is a good idea, we just have to take it even further.

At this point I'm thinking that it could go either way: you could come up with an interesting proposal in the class of CEV or "Indirect Normativity", which definitely are in some sense the result of going meta about values, or you could come up with something that turns out to be just another fake utility function in the class of "complexity" and "universal instrumental values".

[-][anonymous]12y00

First impression: Given the low expectation of the post saying something groundbreaking, its style seems overly pompous.

[This comment is no longer endorsed by its author]Reply

I found this post interesting but somewhat confusing. You start by talking about UDT in order to talk about importance. But really the only connection from UDT to importance is the utility function, so you might as well start with that. And then you ignore utility functions in the rest of your post when you talk about Schmidhuber's theory.

It just has a utility function which specifies what actions it should take in all of the possible worlds it finds itself in.

Not quite. The utility function doesn't specify what action to take, it specifies what worlds are desirable. UDT also requires a prior over worlds and a specification of how the agent interacts with the world (like the Python programs here). The combination of this prior and the expected value computations that UDT does would constitute "beliefs".

Informally, your decision policy tells you what options or actions to pay most attention to, or what possibilities are most important.

I don't see how this is. Your decision policy tells you what to do once you already know what you can do. If you're using "important" to mean "valuable" just say that instead.

I do like the idea of modelling the mind as an approximate compression engine. This is great for reducing some thought processes to algorithms. For example I think property dualism can be thought of as a way to compress the fact that I am me rather than some other person, or at least make explicit the fact that this must be compressed.

Schmidhuber's theory is interesting but incomplete. You can create whatever compression problem you want through a template, e.g. a pseudorandom sequence you can only compress by guessing the seed. Yet repetitions of the same problem template are not necessarily interesting. It seems that some bits are more important than other bits; physicists are very interested in compressing the number of spacial dimensions in the universe even though this quantity can be specified in a few bits. I don't know any formal approaches to quantifying the importance of compressing different things.

I wrote a paper on this subject (compression as it relates to theory of the mind). I also wrote this LessWrong post about using compression to learn values.

[-][anonymous]12y00

By the way, clearly I'm Will_Newsome, but I didn't want this post to be judged in the context of my previous posts. I made this decision after thinking carefully about both the benefits and harms of this policy, especially the harms of it becoming a categorical rule. You needn't remind me of the downsides, I assure you I've already weighed them.

[This comment is no longer endorsed by its author]Reply

Can you figure out something (anything!) without compressing the whole data context? I can't imagine a situation like that.

Newton compressed the data we had about the Solar system, quite a bit. Einstein add some compression when explained the Mercury's orbit. The theory always compresses the data we previously have.

Killing the God-life-creator by Darwin reduced the necessity for a large number of bits, one needed to explain the God's work.

You must compress the earthquake data to predict the next one.

And so on, and so on. I can't find a counter example.

Juergen Schmidhuber just ought to be right.

Can you figure out something (anything!) without compressing the whole data context? I can't imagine a situation like that.

Depends what you mean by "figure out"; there's always stamp collecting.

Upvoted. I think

This drive maximizes interestingness, the first derivative of subjective beauty or compressibility, that is, the steepness of the learning curve.

is a cool way to quantify interestingness. I wonder if future posts might compare this to other possible measures of interestingness, such as Gelman's Kullback-Leibler divergence between the original and updated model (which is therefore a measure of the information entropy between the two).

ETA: Nevermind, I just read the paper you linked to and the author mentioned it:

Note that the concepts of Huffman coding [28] and relative entropy between prior and posterior immediately translate into a measure of learning progress reflecting the number of saved bits—a measure of improved data compression. Note also, however, that the naive probabilistic approach to data compression is unable to discover more general types of algorithmic compressibility. For example, the decimal expansion of π looks random and incompressible but isn’t: there is a very short algorithm computing all of π, yet any finite sequence of digits will occur in π’s expansion as frequently as expected if π were truly random, that is, no simple statistical learner will outperform random guessing at predicting the next digit from a limited time window of previous digits. More general program search techniques are necessary to extract the underlying algorithmic regularity.

Still, I wonder whether KL divergence (or another entropic measure) is an improvement over pure compressibility in some settings, which would imply a sort of trade-off.