I listened to a few Marvin Minsky lectures a few weeks ago. Now I'm trying to go back and find more information on two things he discussed in lecture 3: Cognitive Architectures, given Fall 2011. Sorry to offload this on people here but I have very little idea how to search for more information here (I tried a few things on Google and Google Scholar without any luck)
Here's quote #1 from 1:08:21:
"I think I mentioned Doug Lenat's rule. Some people will assign probabilities to things, to behaviors, and then pick the way to react in proportional to the probability that that thing has worked in the past. And Doug Lenat thought of doing that, but instead, he just put the things in a list. And whenever a hypothesis worked better than another one, he would raise it, push it toward the front of the list. And then whenever there was a choice, he would pick-- of all the rules that fit, he would pick the one at the top of the list. And if that didn't work, it would get demoted. So that's when I became an anti-probability person. That is, if just sorting the things on a list worked pretty well, are probability's going to do much better?"
This sounds fascinating. It's sounds much more computationally feasible than keeping track of probabilities and Bayesian updating. Does anyone have references for further work along these lines? Ie, specifically on keeping a small list of hypotheses, ranked, rather than computing probabilities over a large number of hypotheses? [by the way, Minksy did mention Lenat in Lecture 2, but it was very brief and didn't contain any useful information.] [Trying to search through Douglas Lenat's work is quite a headache because he's got decades of publications, most on esoteric rule-based systems. ]
Here's quote #2 from 1:09:37: (lightly edited for readability)
"Ray Solomonoff discovered that if you have a set of probabilities that something will work, and you have no memory... I think I mentioned that the other day, but it's worth emphasizing, because nobody in the world seems to know it. Suppose you have a list of things, p equals this, or that, or that. In other words, suppose there's 100 boxes here, and one of them has a gold brick in it, and the others don't. And so for each box, suppose the probability is 0.9 that this one has the gold brick, and this one as 0.01. And this has 0.01. Let's see, how many of them-- so there's 10 of these. Now, what should you do? Suppose you're allowed to keep choosing a box, and you want to get your gold brick as soon as possible. What's the smart thing to do? ... you have no memory. Maybe the gold brick is decreasing in value, I don't care. So should you keep trying 0.9 if you have no memory? Of course not. Because if you don't get it the first time, you'll never get it. Whereas if you tried them at random each time, then you'd have 0.9 chance of getting it, so in two trials, you'd have-- what am I saying? In 100 trials, you're pretty sure to get it, but in [? e-hundred ?] trials, almost certain. So if you don't have any memory, then probability matching is not a good idea. Certainly, picking the highest probability is not a good idea, because if you don't get it the first trial, you'll never get it. If you keep using the probabilities at-- what am I saying? Anyway, what do you think is the best thing to do? It's to take the square roots of those probabilities, and then divide them by the sum of the square roots so it adds up to 1. So a lot of psychologists design experiments until they get the rat to match the probability. And then they publish it. ... but if the animal is optimal and doesn't have much memory, then it shouldn't match the probability of the unknown. It should-- end of story. Every now and then, I search every few years to see if anybody has noticed this thing -- and I've never found it on the web."
It's definitely not obvious to me why the square root of the probabilities is the optimal choice here. Am curious to know more what Solomonoff was up to here. I tried searching for it, and can't find it.