steven0461's Shortform Feed

by steven04611 min read30th Jun 201917 comments
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

This is where I'll put content that's too short for a whole post.

17 comments, sorted by Highlighting new comments since Today at 3:10 PM
New Comment

Considering how much people talk about superforecasters, how come there aren't more public sources of superforecasts? There's prediction markets and sites like ElectionBettingOdds that make it easier to read their odds as probabilities, but only for limited questions. There's Metaculus, but it only shows a crowd median (with a histogram of predictions) and in some cases the result of an aggregation algorithm that I don't trust very much. There's PredictionBook, but it's not obvious how to extract a good single probability estimate from it. Both prediction markets and Metaculus are competitive and disincentivize public cooperation. What else is there if I want to know something like what the probability of war with Iran is?

I think the Metaculus crowd median is among the highest-quality predictions out there. Especially when someone goes through all the questions where they're confident the median is off, and makes comments pointing this out. I used to do this, some months back when there were more short term questions on Metaculus and more questions where I differed from the community. When you made a bunch of comments of this type a month back on Metaculus, that covered most of the 'holes', in my opinion, and now there are only a few questions where I differ from the median prediction.

Another source of predictions is from the IARPA Geoforecasting Challenge, where if you're competing you have access to hundreds of MTurk human predictions through an API. The quality of the predictions are not as great, and there are some questions where the MTurk crowd is way off. But they do have a question on whether Iran will execute or be targeted in a national military attack.

I agree that it's quite possible to beat the best publicly available forecasts. I've been wanting to work together on a small team to do this (where I imagine the same set of people would debate and make the predictions). If anyone's interested in this, I'm datscilly on Metaculus and can be reached at [my name] at gmail.

Maybe Good Judgement Open? I don't know how they actually get their probabilities though.

I think one could greatly outperform the best publicly available forecasts through collaboration between 1) some people good at arguing and looking for info and 2) someone good at evaluating arguments and aggregating evidence. Maybe just a forum thread where a moderator keeps a percentage estimate updated in the top post.

I would trust the aggregation algorithm on Metaculus more than an average (mostly because its performance is evaluated against an average). So I think that's usually pretty decent.

I would normally trust it more, but it's recently been doing way worse than the Metaculus crowd median (average log score 0.157 vs 0.117 over the sample of 20 yes/no questions that have resolved for me), and based on the details of the estimates that doesn't look to me like it's just bad luck. It does better on the whole set of questions, but I think still not much better than the median; I can't find the analysis page at the moment.

based on the details of the estimates that doesn't look to me like it's just bad luck

For example:

  • There's a question about whether the S&P 500 will end the year higher than it began. When the question closed, the index had increased from 2500 to 2750. The index has increased most years historically. But the Metaculus estimate was about 50%.
  • On this question, at the time of closing, 538's estimate was 99+% and the Metaculus estimate was 66%. I don't think Metaculus had significantly different information than 538.

Online posts function as hard-to-fake signals of readiness to invest verbal energy into arguing for one side of an issue. This gives readers the feeling they won't lose face if they adopt the post's opinion, which overlaps with the feeling that the post's opinion is true. This function sometimes makes posts longer than would be socially optimal.

Newcomb's Problem sometimes assumes Omega is right 99% of the time. What is that conditional on? If it's just a base rate (Omega is right about 99% of people), what happens when you condition on having particular thoughts and modeling the problem on a particular level? (Maybe there exists a two-boxing lesion and you can become confident you don't have it.) If it's 99% conditional on anything you might think, e.g. because Omega has a full model of you but gets hit by a cosmic ray 1% of the time, isn't it clearer to just assume Omega gets it 100% right? Is this explained somewhere?

There's been some discussion of tradeoffs between a group's ability to think together and its safety from reputational attacks. Both of these seem pretty essential to me, so I wish we'd move in the direction of a third option: recognizing public discourse on fraught topics as unavoidably farcical as well as often useless, moving away from the social norm of acting as if a consideration exists if and only if there's a legible Post about it, building common knowledge of rationality and strategic caution among small groups, and in general becoming skilled at being esoteric without being dishonest or going crazy in ways that would have been kept in check by larger audiences. I think people underrate this approach because they understandably want to be thought gladiators flying truth as a flag. I'm more confident of the claim that we should frequently acknowledge the limits of public discourse than the other claims here.

I don't think there's a general solution. Eliezer's old quote "The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else." applies to social movements and discussion groups just as well. It doesn't matter if you're on the right or the wrong side - you have attention and resources that the war can use for something else.

There _may_ be an option to be on neither side, and just stay out. Most often, that's only available to those with no useful resources, or that can plausibly threaten both sides.

How much should I worry about the unilateralist's curse when making arguments that it seems like some people should have already thought of and that they might have avoided making because they anticipated side effects that I don't understand?

In most domains people don't make arguments because they either think they aren't strong or because making the argument would lose them social status.

The cases where an argument carries with it real danger are relatively small, and in most of those cases it should be possible to know that you are in a problematic area. In those cases, you should make arguments first nonpublically with people who you consider to be good judges of whether those arguments should be made publically.

A naive argument says the influence of our actions on the far future is ~infinity times as intrinsically important as the influence of our actions on the 21st century because the far future contains ~infinity times as much stuff. One limit to this argument is that if 1/1,000,000 of the far future stuff is isomorphic to the 21st century (e.g. simulations), then having an influence on the far future is "only" a million times as important as having the exact same influence on the 21st century. (Of course, the far future is a very different place so our influence will actually be of a very different nature.) Has anyone tried to get a better abstract understanding of this point or tried to quantify how much it matters in practice?

I haven't thought about it much, but it seems like the fraction of far future stuff isomorphic to the 21st century is probably fairly negligible from a purely utilitarian viewpoint, because the universe is so big that even using 1/1,000,000 of it for simulations would be a lot of simulations, and why would the far future want that many simulations of the 21st century? It doesn't seem like a good use of resources to do that many duplicate historical simulations in terms of either instrumental value or terminal value.

I guess I wasn't necessarily thinking of them as exact duplicates. If there are 10^100 ways the 21st century can go, and for some reason each of the resulting civilizations wants to know how all the other civilizations came out when the dust settled, each civilization ends up having a lot of other civilizations to think about. In this scenario, an effect on the far future still seems to me to be "only" a million times as big as the same effect on the 21st century, only now the stuff isomorphic to the 21st century is spread out across many different far future civilizations instead of one.

Maybe 1/1,000,000 is still a lot, but I'm not sure how to deal with uncertainty here. If I just take the expectation of the fraction of the universe isomorphic to the 21st century, I might end up with some number like 1/10,000,000 (because I'm 10% sure of the 1/1,000,000 claim) and still conclude the relative importance of the far future is huge but hugely below infinity.

Has anyone tried to make complex arguments in hypertext form using a tool like Twine? It seems like a way to avoid the usual mess of footnotes and disclaimers.