Wiki Contributions


Thank you for the comprehensive answer and for correcting the points where I wasn't clear. Also, thank you for pointing out that the Kolmogorov complexity of a program is the length of the program that writes that program

The complexity of the algorithms was totally arbitrary and for the sake of the example.

I still have some doubts, but everything is more clear now (see my answer to Charlie Steiner also)

I think that re-reading again your answer made something click. So thanks for that

The observed data is not **random**, because random is not a property of the data itself.
The hypotheses that we want to evaluate are not random either, because we are analysing Turing machines that generate those data deterministically.

If the data is HTHTHT,  we do not test a python script that is doing:

random.choices(["H","T"], k=6)

What we test instead is something more like

["H"] +["T"]+["H"]+["T"]+["H"]+["T"]



In this case, this last script will be simpler and for that reason, will receive a higher prior.

If we apply this is a Bayesian setting, the likelihood of all these hyptohesis is necessarily 1, so the posterior probabilty just becomes the prior (divided by some factor), which is proportional to the length of the program. This makes sense because it is in agreement with Occam's razor.

The thing I still struggle to see is how I connect this framework with probabilistic hypothesis that I want to test, such as the data was generated by a fair coin. One possibility that I see (but I am not sure this is the correct thing) is testing all the possible strings generated by an algorithm like this:

while True:
   random.choices(["H","T"], k=6)

The likelihood of the strings like HHTHTH is 0 so we remove them and then we are left only with the algorithms that are consistent with the data.

Not totally sure of the last part

The part I understood is, that you weigh the programs based on the length in bits, the longer the program the less weight it has. This makes total sense.

I am not sure that I understand the prefix thing and I think that's relevant. For example, it is not clear to me if once I consider a program that outputs 0101 I will simply ignore other programs that output that same thing plus one bit (e.g. 01010).

I also find still fuzzy (and know at least I can put my finger on it) is the part where Solomonoff induction is extended to deal with randomness.

Let me see if I can make my question more specific:

Let's imagine for a second that we live in a universe where only the next programs could be written:

  • A) A program that produces deterministically a given sequence of five digits (there are 2^5 of this programs
  • B) A program that produces deterministically a given sequence of 6 digits (there are 2^6 of them)
  • C) A program that produces 5 random coin flips with p=0.5

The programs in A have 5 bits of Kolmogorov complexity each. The programs in B have 6 bits. The program C has 4

We observe the sequence O = HTHHT

I measure the likelihood for each possible model. I discard the models with L = 0

A) There is a model here with likelihood 1

B) There are 2 models here, each of them with likelihood 1 too

C) This model has likelihood 2^-5

Then, things get murky:

the priors for each models will be 2^-5 for model A, 2^-6 for model B and 2^-4 for model C, according to their Kolmogorov complexity? 

Yes, this is something I can see easily, but I am not sure how Solomonoff induction accounts for that 

I think this is pointing to what I don't understand: how do you account for hypotheses that explain data generated randomly? How do you compare a hypothesis which is a random number generator with some parameters against a hypothesis which has some deterministic component?

 Is there a way to understand this without reading the original paper (which will probably take me quite long)? 

When you understood this, how was your personal process that took you from knowing about probabilities and likelihood to understanding Solomonoff induction? Did you have to read the original sources or you found some good explanations somewhere?

I also don't get if this is a calculation that you can do in a single step or if this is a continuous thing. In other words, Solomonoff induction would work only if we assume that we keep observing new data? 

Sorry for the stupid questions, as you can see, I am confused.

I have followed a similar strategy using Anki cards. However, I think that allocating a specific time slot to review your principles and then "act" on then is probably much more effective than passively remind those principles. I will adopt this.

What happens if there is more than one powerful agent just playing the charade game? Is there any good article about what happens in a universe where multiple AGI are competing among them? I normally find only texts that consider that once we get AGI we all die so there is no room for these scenarios.

I have been (and I am not the only one) very put off by the trend in the last months/years of doomerism pervading LW, with things like "we have to get AGI right at the first try or we all die" repeated constantly as a dogma.

To someone who is very skeptical of the classical doomist position (aka AGI will make nanofactories and will kill everyone at once), this post is very persuasive and compelling. This is something I could see happening.  This post serves as an excellent example for those seeking effective ways to convince skeptics.

Load More