Prediction-Augmented Evaluation Systems

[-]habryka7y50

Promoted to curated: I think this post is the best summary of a bunch of important ideas that have been floating around, related to a bunch of things that Ought is doing, and also to a bunch of Paul's AI Alignment agenda.

I generally like the structure of the post, and also appreciate the addition of a video. I do think that the post is still somewhat hard to read and is I think unnecessarily dry, which made me a bit hesitant to curate it, and is probably also the reason why it's at a relatively low karma score.

I am very excited about people implementing more projects in this space, and think this serves as a quite good overview over potential applications and considerations. Thanks a lot for writing it!

[-]ozziegooen7y30

Thanks for the feedback! I was unsure about the structure; my main goals here was to set up a categorization system and have information explained, even if it wasn't particularly understandable. I'll mess around with other techniques in future posts.

[-]Charlie Steiner7y50

This reminds me of boosted decision trees. In fact, boosting translates very well from aggregating decision trees to aggregating human judgment.

[-]rk7y30

Thanks for the video! I had already skimmed this post when I noticed it, and then I watched it and reread the post. Perhaps my favourite thing about it was that it was slightly non-linear (skipping ahead to the diagram, non-linearity when covering sections).

Could you say a bit more about your worries with (scaling) prediction markets?

Do you have any thoughts about which experiments have the best expected information value per $?

[-]ozziegooen7y10

I'm not too optimistic about traditional prediction markets, I have feelings similar to Zvi. I haven't seen prediction markets be well subsidized for even a few dozen useful variables; in prediction augmented evaluation systems they would have to be done for thousands+ variables. They seem like more overhead per variable then simply stating one's probability and moving on.

My next step is just messing around a lot with my own prediction application and seeing what seems to work. I plan to gradually invite people, but let them mostly do their own testing. At this point, I want to get an intuitive idea of what seems useful, similar to my experiences making other experimental applications. I'm really not sure what ideas I may come up with, with more experimentation.

That said, I am particularly excited about estimating expected values of things, but realize I may not be able to make all of these public, or may have to keep things very apolitical. I expect it to be really easy to anger people if estimates that are actually important are public.

https://www.lesswrong.com/posts/a4jRN9nbD79PAhWTB/prediction-markets-when-do-they-work

[-]rk7y20

On estimating expected value, I'm reminded by some of Hanson's work where he suggests predicting later evaluation (recent example: http://www.overcomingbias.com/2018/11/how-to-fund-prestige-science.html). I think this is an interesting subcase of the evaluating subprocess. It also fits nicely with this post by PC

[-]ozziegooen7y20

Good find. I didn't see that post (it came out a day after I published this, coincidentally). I'm surprised it came out so recently but imagine he probably had similar ideas, and likely wrote them down, much earlier. I definitely recommend it for more details on the science aspect.

From the post: "For each scientific paper, there is a (perhaps small) chance that it will be randomly chosen for evaluation in, say, 30 years. If it is chosen, then at that time many diverse science evaluation historians (SEH) will study the history of that paper and its influence on future science, and will rank it relative to its contemporaries. To choose this should-have-been prestige-rank, they will consider how important was its topic, how true and novel were its claims, how solid and novel were its arguments, how influential it actually was, and how influential it would have been had it received more attention.

....

Using these assets, markets can be created wherein anyone can trade in the prestige of a paper conditional on that paper being later evaluated. Yes traders have to wait a long time for a final payoff. But they can sell their assets to someone else in the meantime, and we do regularly trade 30 year bonds today. Some care will have to be taken to make sure the base asset that is bet is stable, but this seems quite feasible."

[-]ozziegooen7y30

If you have other ideas for things to be evaluated / other uses, please post them below!

[-]ryan_b7y70

Kahneman has a consultancy now where they do noise audits, which are about making professional assessments more consistent. They mean things like loan officers, sales pricing, assorted risk evaluation.

He also mentions using ‘reasoned rules’, which are an even weighting of all of the variables. This can be done with simple algorithms, and provided the correct variables are chosen this meets or exceeds expert performance in most domains.

[-]ozziegooen7y20

Interesting. Looks like a book is coming out too: https://www.thebookseller.com/news/william-collins-scoops-kahnemans-book-7-figure-pre-empt-752276

[-]rk7y20

I'm interested in the predictors' incentives.

One problem with decision markets is that you only get paid for your information about an option if the decision is taken, which can incentivise you to overstate the case for an option (if you see its predicted benefit X, its true benefit is X+k and it would have to be at X+k+l to be chosen, if l < k, you will want to move the predicted benefit to X+k+l and make a k-l profit).

Maybe you avoid this if you pay for participation in PAES, but then you might risk people piling on to obvious judgments to get paid. Maybe you evaluate the counterfactual shift in confidence from someone making a judgment, and reward accordingly? But then it seems possible that the problems in the previous paragraph would appear again.

[-]ozziegooen7y20

I'm happy to talk theoretically, though have the suspicion that there are a whole lot of different ways to approach this problem and experimentation really is the most tractable way to make progress on it.

That said, ideally, a prediction system would include ways of predicting the EVs of predictions and predictors, and people could get paid somewhat accordingly; in this world, high-EV predictions would be ones which may influence decisions counterfactually. You may be able to have a mix of judgments from situations that will never happen, and ones that are more precise but only applicable to ones that do.

I would be likewise suspicious that naive decision markets that use one or two techniques like that would be enough to really make a system robust, but could imagine those ideas being integrated with others for things that are useful.

LESSWRONG
LW

LESSWRONG
LW

44

Prediction-Augmented Evaluation Systems

44

44

Desiderata:

Prediction-Augmentation Example

Subcomponents

Scaling & Amplification

Existing/Possible Variants

Possible Uses

Related Work