2 Answers sorted by
top scoring

Nov 19, 2021

For your second question, this paper describes the square root law, though in a somewhat different setting: Strong profiling is not mathematically optimal for discovering rare malfeasors | PNAS. (Incidentally, a friend of mine used this once in an argument against stop-and-frisk.)

It doesn't give a complete proof, though it describes it as a "straightforward minimization with a Lagrange multiplier".

[-]delton1374y10

Very cool, will take a look. This basically solves question 1. It seems the original Solomonoff work isn't published anywhere. By the way, the author, William H. Press, is a real polymath! I am curious if there is any extension of this work to agents with finite memory.. as an example, the same situation where you're screening a large number of people, but now you have a memory where you can store N results of prior screenings for reference. I'm going to look into it..

2gwern4y

Seems like a memory version would be identical, just with a smaller n after subtracting the individuals you screen. When you fill up your memory with cleared individuals, why would you then ever want to 'forget' them? By stipulation, you learn nothing about other individuals or the population, only about the ones you look at. If you forget them to replace them with a new memory, that de facto makes the n bigger, and worsens your odds since you've flushed back into the pool the only individuals you knew for sure you never want to sample again (because they are clear) and so now you may waste a sample to test them again while gaining nothing. And once you remove them from the population via your memory, you're back to the solved memoryless problem and have to square-root it.

simon

Nov 19, 2021

With respect to the second question the answer will depend on the discount rate. I expect Solomonoff is assuming that we are in the limit of low discount rate, where exponential decay will look linear, so essentially you are minimizing the expected total number of attempts.

I haven't done the math to confirm Somolonoff's answer, but if you were to go to each box with probability equal to it being correct, then your expected number of attempts would be equal to the number of boxes, since each box would have an expected number of attempts conditional on it being the right box equal to the the inverse of its probability. So this is no better than choosing randomly. With this in mind it seems intuitive that some intermediate strategy, such as square roots, would then be better.

[-]gwern4y60

https://www.reddit.com/r/compsci/comments/7yd765/reference_request_on_a_result_by_ray_solomonoff/

1 comment, sorted by

top scoring

Click to highlight new comments since: Today at 2:40 PM

[-]gwern4y*240

As far as #1 goes, it's worth pointing out that we do not use Lenat's systems like Cyc. So one answer to "if it works pretty well, why don't we do that everywhere" may simply be "but it doesn't work pretty well". (Like Rodney Brooks, Minsky was never a big fan of Bayesian or connectionist approaches - perhaps because they are both so depressingly infeasibly computationally expensive - and he's always looking for shortcuts or loopholes.)

A more serious answer is: sorting (ranking) is a pervasive trick throughout statistics and decision theory. Many nonparametric methods will involve ranking, and the improper linear model literature and stuff like 0/1 discretized variables can work well and even outperform humans. I recall Chris Stucchio's somewhat notorious "Why a pro/con list is 75% as good as your fancy machine learning algorithm" post in this vein too. Lenat's list trick there sounds a lot like the "player-the-winner" adaptive trial/bandit algorithm, where if a treatment works, you use it on the next iteration, else switch to a random other, which is a great deal simpler than Thompson sampling or other adaptive algorithms. It can also be seen as a policy gradient with very large fixed-sized updates.

The drawback is, as Stucchio's title indicates, the methods may work, but you are giving up a potentially lot of performance. In order statistics or nonparametric statistics, ranking gets you robustness to outliers because you throw away all the info aside from the ordering, but then that means that if you can model the data more sensibly, you can estimate everything more efficiently than the nonparametric rank-based approaches (you could look at estimates of the cost of a Wilcox U-test vs a t-test on normally-distributed data to quantify this, as estimating median/mean is something of a best case and the comparisons get worse from there). In Stucchio's checklist example, 25% of the total is a lot of utility to leave on the table! Play-the-winner is simple, but that is its only virtue, and there are a lot of analyses to the effect that it's a pretty bad adaptive algorithm compared to Waldian sequential testing or Thompson sampling or best-arm finding. One way to put it is that from a bandit perspective, the regret is much worse than the usual log(t) regret that principled methods enjoy. (I think the regret would be linear?) No matter how much data you accumulate, a bad round might send the optimal choice back to the end of the list. A policy gradient method which never decays its step sizes will never converge, and will just oscillate eternally. Then you enjoy additional problems like being unable to calculate Value of Information or do exploration because you don't have anything remotely like a standard error, much less a posterior... And how are you going to think about expected value and risk for planning if all you have is a sorted list of options?

Cox is not mocked. If you don't update in a Bayesian fashion, it's going to cost you somehow.

Moderation Log

LESSWRONG
LW

LESSWRONG
LW

1

[ Question ]

Does anyone know what Marvin Minsky is talking about here?

1

1

2 Answers sorted by
top scoring

Nov 19, 2021

Nov 19, 2021

1

[ Question ]

Does anyone know what Marvin Minsky is talking about here?

1

1

2 Answers sorted by top scoring

Nov 19, 2021

Nov 19, 2021

2 Answers sorted by
top scoring