YVLIAZ

00

There is difference between "having an idea" and "solid theoretical foundations". Chemists before quantum mechanics had a lots of ideas. But they didn't have a solid theoretical foundation.

That's a bad example. You are essentially asking researchers to predict what they will discover 50 years down the road. A more appropriate example is a person thinking he has medical expertise after reading bodybuilding and nutrition blogs on the internet, vs a person who has gone through medical school and is an MD.

00

I think you are overhyping the PAC model. It surely is an important foundation for probabilistic guarantees in machine learning, but there are some serious limitations when you want to use it to constrain something like an AGI:

It only deals with supervised learning

Simple things like finite automata are not learnable, but in practice it seems like humans pick them up fairly easily.

It doesn't deal with temporal aspects of learning.

However, there are some modification of the PAC model that can ameliorate these problems, like learning with membership queries (item 2).

It's also perhaps a bit optimistic to say that PAC-style bounds on a possibly very complex system like an AGI would be "quite doable". We don't even know, for example, whether DNF is learnable in polynomial time under the distribution free assumption.

00

Yes, so the exact definition of "have-to" and "want-to" already present some difficulties in pinpointing what exact the theory says.

In my personal experience, it's not so much "fear" than fatigue and frustration. I also don't feel that my desire to read reduces; it stays intense, but my brain just can't keep absorbing information, and I find myself keep rereading the same passages because I can't wrap my head around them.

80

I can see this theory working in several scenarios, despite (or perhaps rather *because of*) the relative fuzziness of its description (which is of course the norm in psychological theories so far). However I have personal experiences that at least at face value don't seem to be able to be explained by this theory:

During my breaks I would read textbooks, mostly mathematics and logic, but also branching into biology/neuroscience, etc. I would begin with pleasure, but if I read the same book for too long (several days) my reading speed slows down and I start flipping a couple pages to see how far it is till the next section/chapter. So to me it this seems not like a motivation shift from "have-to" to "want-to", but rather the brain's getting fatigued at parsing text/building its knowledge database, and subjectively I still *want to* keep reading, and advancing page by page still brings me pleasure, but there's something "biological" that keeps me back (of course everything about me is biological, but I mean it in a metaphorical way, that it feels quite distinct from the motivational system that makes me *want to* read).

Now I have found an easy way to snap out of it: simply switch the book/subject. Switching from math to biology/neuroscience works better than switching from math to math (e.g. algebra to topology, category theory to recursion theory, etc), but the latter can still recover some of the mental resistance built up. I don't see how this can fit in the framework of "have-to" and "want-to". Nobody's forcing me to read these books; it's purely my desire. If the majority of executive function can be explained in such a way as expounded by the paper, then I do not see how switching subject of reading can make such a big difference.

Of course I may be an outlier here, or I'm misunderstanding what constitutes "willpower" or not. Feel free to offer your opinions.

Either way, I'm glad that this is an active area of research. I'm quite interested in motivation myself.

00

I bought these with a 4 socket adapter. However, I think my lamp can't power them all. Does anyone know a higher output lamp?

Actually I'm not even sure if that is how lights work. If someone can explain how I can the power that goes to the light bulbs, it'd be greatly appreciated.

10

I'm going out on a limb on this one, but since the whole universe includes separate branching â€œworldsâ€, and over time this means we have more worlds now than 1 second ago, and since the worlds can interact with each other, how does this not violate conservation of mass and energy?

The "number" of worlds increases, but each world is weighted by a complex number, such that when you add up all the squares of the complex numbers they sum up to 1. This effectively preserves mass and energy across all worlds, inside the universal wave function.

30

In contrast with the title, you did not show that the MWI is falsifiable nor testable.

I agree that he didn't show testable, but rather the possibility of it (and the formalization of it).

You just showed that MWI is "better" according to your "goodness" index, but that index is not so good

There's a problem with choosing the language for Solomonoff/MML, so the index's goodness can be debated. However, I think in general index is sound.

You calculate the probability of a theory and use this as an index of the "truthness" of it, but that's confusing the reality with the model of it.

I don't think he's saying that theories fundamentally have probabilities. Rather, as a Bayesian, he gives some priors to each theory. As more evidences accumulate, the right theory will update and its probability approaches 1.

The reason human understanding can't be part of the equations is, as EY says, shorter "programs" are more likely to govern the universe than longer "programs," essentially because these "programs" are more likely to be written if you throw down some random bits to make a program that governs the universe.

So I don't buy your arguments in the next section.

But that argument is not valid: we rejected the hypothesis that nebulae are not distant galaxies not because the Occam's Razor is irrelevant, but because we measured their distance and found that they are inside our galaxy; without this information, the simpler hypothesis would be that they are distant galaxies.

EY is comparing the angel explanation with the galaxies explanation; you are supposed to reject the angels and usher in the galaxies. In that case, the anticipations are truly the same. You can't really prove whether there are angels.

But how often does the Occam's Razor induce you to neglect a good model, as opposed to how often it let us neglect bad models?

What do you mean by "good"? Which one is "better" out of 2 models that give the same prediction? (By "model" I assume you mean "theory")

but indeed it is useful to represent effectively what are the results of the typical educational experiments, where the difference between "big" and "small" is in no way ambiguous, and allows you to familiarize fast with the bra-ket math.

You admit that Copenhagen is unsatisfactory but it is useful for education. I don't see any reason not to teach MWI in the same vein.

Now imagine Einstein did not develop the General Relativity, but we anyway developed the tools to measure the precession of Mercury and we have to face the inconsistency with our predictions through Newton's Laws: the analogous of the CI would be "the orbit of Mercury is not the one anticipated by Newton's Laws but this other one, now if you want to calculate the transits of Mercury as seen from the Earth for the next million years you gotta do THIS math and shut up"; the analogous of the MWI would be something like "we expect the orbit of Mercury to precede at this rate X but we observe this rate Y; well, there is another parallel universe in which the preceding rate of Mercury is Z such that the average between Y and Z is the expected X due to our beautiful indefeasible Newton's Law".

If indeed the expectation value of observable V of mercury is X but we observe Y with Y not= X (that is to say that the variance of V is nonzero), then there isn't a determinate formula for predict V exactly in your first Newton/random formula scenario. At the same time, someone who has the Copenhagen interpretation would have the same expectation value X, but instead of saying there's another world he says there's a wave function collapse. I still think that the parallel world is a deduced result from universal wave function, superposition, decoherence, and etc that Copenhagen also recognizes. So the Copenhagen view essentially say "actually, even though the equations say there's another world, there is none, and on top of that we are gonna tell you how this collapsing business works". This extra sentence is what causes the Razor to favor MWI.

Much of what you are arguing seems to stem from your dissatisfaction of the formalization of Occam's Razor. Do you still feel that we should favor something like human understanding of a theory over the probability of a theory being true based on its length?

I just want to point out some nuiances.

1) The divide between your so called "old CS" and "new CS" is more of a divide (or perhaps a continuum) between engineers and theorists. The former is concerned with on-the-ground systems, where quadratic time algorithms are costly and statistics is the better weapon at dealing with real world complexities. The latter is concerned with abstracted models where polynomial time is good enough and logical deduction is the only tool. These models will probably never be applied literally by engineers, but they provide human understanding of engineering problems, and because of their generality, they will last longer. The idea of a Turing machine will last centuries if not millenia, but a Pascal programmer might not find a job today and a Python programmer might not find a job in 20 years. Machine learning techniques constantly come in and out of vogue, but something like the PAC model will be here to stay for a long time. But of course at the end of the day it's engineers who realize new inventions and technologies.

Theorists' ideas can transform an entire engineering field, and engineering problems inspire new theories. We need both types of people (or rather, people across the spectrum from engineers to theorists).

2) With neural networks increasing in complexity, making the learning converge is no longer as simple as just running gradient descent. In particular, something like a K12 curriculum will probably emerge to guide the AGI past local optima. For example, the recent paper on neural Turing machines has already employed curriculum learning, as the authors couldn't get good performance otherwise. So there is a nontrivial maintenance cost (in designing a curriculum) to a neural network so that it adapts to a changing environment, which will not lessen if we don't better our understanding of it.

Of course expert systems also have maintenance costs, of a different type. But my point is that neural networks are not free lunches.

3) What caused the AI winter was that AI researchers didn't realize how difficult it was to do what seems so natural to us --- motion, language, vision, etc. They were overly optimistic because they succeeded in what were difficult to humans --- chess, math, etc. I think it's fair to say the ANNs have "swept the board" in the former category, the category of lower level functions (machine translation, machine vision, etc), but the high level stuff is still predominantly logical systems (formal verification, operations research, knowledge representation, etc). It's unfortunate that the the neural camp and logical camp don't interact too much, but I think it is a major objective to combine the flexibility of neural systems with the power and precision of logical systems.

Schmidhuber invented something called the speed prior that weighs an algorithm according to how fast it generates the observation, rather than how simple it is. He makes some ridiculous claims about our (physical) universe assuming the speed prior. Ostensibly one can also weigh in accuracy of approximation in there to produce another variant of prior. (But of course all of these will lose the universality enjoyed by the Occam prior)