The Truth Points to Itself, Part I

Try Burfoot's book? http://arxiv.org/abs/1104.5466

Thanks. Burfoot's book is mostly irrelevant to the post: it's about epistemology, whereas the post is about meta-ethics and axiology.

[-]Shmi14y20

A literature review is an essential part of any serious publication. Shows that you did your homework and so gives you more credibility.

[-]Arbitrarity14y10

Granted. I didn't think it necessary because I don't think Schmidhuber's theory is a legitimate answer to the question, it's just a step on the way to finding an answer. Also, Burfoot's book doesn't include the relevant aspect of Schmidhuber's theory, which is the axiological aspect, i.e. the aspect pertaining to beauty and so on. A literature review of meta-ethics, on the other hand, would be more relevant, but wouldn't be necessary for the modest nature of this post. Existent meta-ethics will start needing referencing in Part II and beyond, though it doesn't look like Part II will show up on LessWrong.

[-]torekp14y00

If you don't do Part 2 here, please post replies here with pointers to your blog so I'll know when they're available.

[-]gwern14y20

It's not irrelevant at least as background - in the sense that I suspect even a fair number of LWers are wondering why compression is supposed to be such an overarching paradigm it could cover all of science, much less extend further to meta-ethics and axiology.

[-]Arbitrarity14y00

Granted, and a good point. Schmidhuber's paper should be enough of an explanation, but Burfoot's book lends additional credibility to the notion, and of course gives us additional information on the subject.

The merits of Schmidhuber's formulation would be discussed in Part II, but it seems that this post won't be received well, so even if Part II will be posted elsewhere it probably won't appear on LessWrong. (ETA: Actually, Part II likely won't be put here in any case, as it might start to justify meta-ethical theism, and many LessWrong users will see the conclusion, meta-ethical theism, and infer by backwards-chaining that the arguments must be wrong even before seriously considering them. I don't wish to cause opprobrium on LessWrong, so Part II likely won't show up here—but I do wish to note that my silence shouldn't be taken as approval of such mind-killed epistemic habits. (The God question is, of course, extremely political.) That said, Part II might not get to God—I might try to structure the series such that God is introduced at the very beginning of Part III. Please note that Part I has nothing at all to do with God. )

[-]Shmi14y50

Seems like this post is missing a decent summary/abstract.

[-]Arbitrarity14y20

Added.

[-]asparisi14y30

I don't think that "Most Important" is a label that catches a single, unambiguous thing. "Important" is a big label that we use to say something like, "Is in the class of things deserving my focus and attention."

"Most important" would then seem to be something like "The single item most deserving my focus and attention."

(If there is another definition of Important being used here, that points to Important being an ambiguous term that may need to be better defined.)

But there probably isn't a single item that is always most deserving my focus and attention. If I haven't slept in 3 days, sleep becomes the most important thing for me. If I haven't eaten in 2 days, food becomes the most important thing for me.

I suppose you can try to abstract away the "my" to "the single item most deserving (everyone's?) focus and attention" or create some specialized caveats like "aside from human needs" or something like that, but even then I am not sure that a single item gets the sort of consistent Importance that you are implying. I suppose it is possible that you have found some incredibly clever method, maybe finding a unifying principle under all of the things a person might find important, but I am skeptical.

[-]amit14y20

So you're searching for "the most important thing", and reason that this is the same as searching for some utility function, and then you note that one reason this question seems worth thinking about is because it's interesting, and then you refer to Schmidhuber's definition of interestingness (which would yield a utility function), and note that it is itself interesting, so maybe importance is the same as interestingness, because importance has to be itself important and (Schmidhuberian) interestingness satisfies this requirement by being itself interesting

At this point I'm not very impressed. This seems to be the same style of reasoning that gets people obsessed with "complexity" or "universal instrumental values" as the ultimate utility functions.

At the end you say you doubt that interestingness is the ultimate utility function, too, but apparently you still think engaging in this style of reasoning is a good idea, we just have to take it even further.

At this point I'm thinking that it could go either way: you could come up with an interesting proposal in the class of CEV or "Indirect Normativity", which definitely are in some sense the result of going meta about values, or you could come up with something that turns out to be just another fake utility function in the class of "complexity" and "universal instrumental values".

[-][anonymous]14y00

First impression: Given the low expectation of the post saying something groundbreaking, its style seems overly pompous.

[This comment is no longer endorsed by its author]Reply

[-]jacobt14y00

I found this post interesting but somewhat confusing. You start by talking about UDT in order to talk about importance. But really the only connection from UDT to importance is the utility function, so you might as well start with that. And then you ignore utility functions in the rest of your post when you talk about Schmidhuber's theory.

It just has a utility function which specifies what actions it should take in all of the possible worlds it finds itself in.

Not quite. The utility function doesn't specify what action to take, it specifies what worlds are desirable. UDT also requires a prior over worlds and a specification of how the agent interacts with the world (like the Python programs here). The combination of this prior and the expected value computations that UDT does would constitute "beliefs".

Informally, your decision policy tells you what options or actions to pay most attention to, or what possibilities are most important.

I don't see how this is. Your decision policy tells you what to do once you already know what you can do. If you're using "important" to mean "valuable" just say that instead.

I do like the idea of modelling the mind as an approximate compression engine. This is great for reducing some thought processes to algorithms. For example I think property dualism can be thought of as a way to compress the fact that I am me rather than some other person, or at least make explicit the fact that this must be compressed.

Schmidhuber's theory is interesting but incomplete. You can create whatever compression problem you want through a template, e.g. a pseudorandom sequence you can only compress by guessing the seed. Yet repetitions of the same problem template are not necessarily interesting. It seems that some bits are more important than other bits; physicists are very interested in compressing the number of spacial dimensions in the universe even though this quantity can be specified in a few bits. I don't know any formal approaches to quantifying the importance of compressing different things.

I wrote a paper on this subject (compression as it relates to theory of the mind). I also wrote this LessWrong post about using compression to learn values.

[-][anonymous]14y00

By the way, clearly I'm Will_Newsome, but I didn't want this post to be judged in the context of my previous posts. I made this decision after thinking carefully about both the benefits and harms of this policy, especially the harms of it becoming a categorical rule. You needn't remind me of the downsides, I assure you I've already weighed them.

[This comment is no longer endorsed by its author]Reply

[-]Thomas14y00

Can you figure out something (anything!) without compressing the whole data context? I can't imagine a situation like that.

Newton compressed the data we had about the Solar system, quite a bit. Einstein add some compression when explained the Mercury's orbit. The theory always compresses the data we previously have.

Killing the God-life-creator by Darwin reduced the necessity for a large number of bits, one needed to explain the God's work.

You must compress the earthquake data to predict the next one.

And so on, and so on. I can't find a counter example.

Juergen Schmidhuber just ought to be right.

[-]Eugine_Nier14y00

Can you figure out something (anything!) without compressing the whole data context? I can't imagine a situation like that.

Depends what you mean by "figure out"; there's always stamp collecting.

[-]Andy_McKenzie14y00

Upvoted. I think

This drive maximizes interestingness, the ﬁrst derivative of subjective beauty or compressibility, that is, the steepness of the learning curve.

is a cool way to quantify interestingness. I wonder if future posts might compare this to other possible measures of interestingness, such as Gelman's Kullback-Leibler divergence between the original and updated model (which is therefore a measure of the information entropy between the two).

ETA: Nevermind, I just read the paper you linked to and the author mentioned it:

Note that the concepts of Huffman coding [28] and relative entropy between prior and posterior immediately translate into a measure of learning progress reﬂecting the number of saved bits—a measure of improved data compression. Note also, however, that the naive probabilistic approach to data compression is unable to discover more general types of algorithmic compressibility. For example, the decimal expansion of π looks random and incompressible but isn’t: there is a very short algorithm computing all of π, yet any ﬁnite sequence of digits will occur in π’s expansion as frequently as expected if π were truly random, that is, no simple statistical learner will outperform random guessing at predicting the next digit from a limited time window of previous digits. More general program search techniques are necessary to extract the underlying algorithmic regularity.

Still, I wonder whether KL divergence (or another entropic measure) is an improvement over pure compressibility in some settings, which would imply a sort of trade-off.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

1

The Truth Points to Itself, Part I

1

1