So, what do you think? Does this method seem at all promising? I'm debating with myself whether I should begin using SPIES on Metaculus or elsewhere.
I'm not super impressed tbh. I don't see "give a 90% confidence interval for x" as a question which comes up frequently? (At least in the context of eliciting forecasts and estimates from humans - it comes up quite a bit in data analysis).
For example, I don't really understand how you'd use it as a method on Metaculus. Metaculus has 2 question types - binary and continuous. For binary you have to give the prob... (read more)
17. Unemployment below five percent in December: 73 (Kalshi said 92% that unemployment never goes above 6%; 49 from Manifold)
I'm not sure exactly how you're converting 92% unemployment < 6% to < 5%, but I'm not entirely convinced by your methodology?
15. The Fed ends up doing more than its currently forecast three interest rate hikes: None (couldn't find any markets)
Looking at the SOFR Dec-22 3M futures 99.25/99.125 put spread on the 14-Feb, I put this probability at ~84%.
Thanks for doing this, I started doing it before I saw your competition an... (read more)
And one way to accomplish that would be to bet on what percentage of bets are on "uncertainty" vs. a prediction.
How do you plan on incentivising people to bet on "uncertainty"? All the ways I can think of lead to people either gaming the index, or turning uncertainty into a KBC.
The market and most of the indicators you mentioned would be dominated by the 60 that placed large bets
I disagree with this. Volatility, liquidity, # predictors, spread of forecasts will all be affected by the fact that 20 people aren't willing to get involved. I'm not sure what information you think is being lost by people stepping away? (I guess the difference between "the market is wrong" and "the market is uninteresting"?)
There are a bunch of different metrics which you could look at on a prediction market / prediction platform to gauge how "uncertain" the forecast is:
Prediction markets function best when liquidity is high, but they break completely if the liquidity exceeds the price of influencing the outcome. Prediction markets function only in situations where outcomes are expensive to influence.
There are a ton of fun examples of this failing:
I don't know enough about how equities trade during earnings, but I do know a little about how some other products trade during data releases and while people are speaking.
In general, the vast, vast, vast majority of liquidity is withdrawn from the market before the release. There will be a few stale orders people have left by accident + a few orders left in at levels deemed ridiculously unlikely. As soon as the data is released, the fastest players will general send quotes making a (fairly wide market) around their estimate of the fair price. Over time (a... (read more)
I agree identifying model failure is something people can be good at (although I find people often forget to consider it). Pricing it they are usually pretty bad at.
I'd personally be more interested in asking someone for their 95% CI than their 68% CI, if I had to ask them for exactly one of the two. (Although it might again depend on what exactly I plain to do with this estimate.)
I'm usually much more interested in a 68% CI (or a 50% CI) than a 95% CI because:
Under what assumption?
1/ You aren't "[assuming] the errors are normally distributed". (Since a mixture of two normals isn't normal) in what you've written above.
2/ If your assumption is then yes, I agree the median of is ~0.45 (although
from scipy import stats
stats.chi2.ppf(.5, df=1)
>>> 0.454936
would have been an easier way to illustrate your point). I think this is actually the assumption you're making. [Which is a horrible assumption, because if it were true, you would already be perfectly calibrated].
3/ I guess ... (read more)
I think the controversy is mostly irrelevant at this point. Leela performed comparably to Stockfish in the latest TCEC season and is based on Alpha Zero. It has most of the "romantic" properties mentioned in the post.
That isn't a "simple" observation.
Consider an error which is 0.5 22% of the time, 1.1 78% of the time. The squared errors are 0.25 and 1.21. The median error is 1.1 > 1. (The mean squared error is 1)
Metaculus uses the cdf of the predicted distribution which is better If you have lots of predictions, my scheme gives an actionable number faster
You keep claiming this, but I don't understand why you think this
If you suck like me and get a prediction very close then I would probably say: that sometimes happen :) note I assume the average squared error should be 1, which means most errors are less than 1, because 02+22=2>1
I assume you're making some unspoken assumptions here, because is not enough to say that. A naive application of Chebyshev's inequality would just say that .
To be more concrete, if you were very weird, and either end up forecasting 0.5 s.d. or 1.1 s.d. away, (still with mean 0 and average... (read more)
Go to your profile page. (Will be something like https://www.metaculus.com/accounts/profile/{some number}/). Then in the track record section, switch from Brier Score to "Log Score (continuous)"
I'd be happy to.
The 2000-2021 VIX has averaged 19.7, sp500 annualized vol 18.1.
I think you're trying to say something here like 18.1 <= 19.7, therefore VIX (and by extension) options are expensive. This is an error. I explain more in detail here, but in short you're comparing expected variance and expected volatility which aren't the same thing.
... (read more)From a 2ndary source: "The mean of the realistic volatility risk premium since 2000 has been 11% of implied volatility, with a standard deviation of roughly 15%-points" from https://www.sr-sv.com/realistic-volatility-risk-premia
I still think you're missing my point.
If you're making ~20 predictions a year, you shouldn't be doing any funky math to analyse your forecasts. Just go through each one after the fact and decide whether or not the forecast was sensible with the benefit of hindsight.
I am even explaining what an normal distribution is because I do not expect my audience to know...
I think this is exactly my point, if someone doesn't know what a normal distribution is, maybe they should be looking at their forecasts in a fuzzier way than trying to back fit some model to them.
... (read more)A
I disagree with that characterisation of our disagreement, I think it's far more fundamental than that.
To expand on 1. I think (although I'm not certain, because I find your writing somewhat convoluted and unclear) that you're making an implicit assumption that the error distribution is consistent from forecast to forecast. Namely your errors when forecastin... (read more)
I am sorry if I have straw manned you, and I think your above post is generally correct. I think we are cumming from two different worlds.
You are coming from Metaculus where people make a lot of predictions. Where having 50+ predictions is the norm and the thus looking at a U(0, 1) gives a lot of intuitive evidence of calibration.
I come from a world where people want to improve in all kids of ways, and one of them is prediction, few people write more than 20 predictions down a year, and when they do they more or less ALWAYS make dichotomous predictions. I ... (read more)
d/ is actually completely consistent with the vol market (I point this out here), so it's not clear that's their recommendation.
If you think 2 data points are sufficient to update your methodology to 3 s.f. of precision I don't know what to tell you. I think if I have 2 data point and one of them is 0.99 then it's pretty clear I should make my intervals wider, but how much wider is still very uncertain with very little data. (It's also not clear if I should be making my intervals wider or changing my mean too)
you are missing the step where I am transforming arbitrary distribution to U(0, 1)
I am absolutely not missing that step. I am suggesting that should be the only step.
(I don't agree with your intuitions in your "explanation" but I'll let someone else deconstruct that if they want)
you need less data to check whether your squared errors are close to 1 than whether your inverse CDF look uniform
I don't understand why you think that's true. To rephrase what you've written:
"You need less data to check whether samples are approximately N(0,1) than if they are approximately U(0,1)"
It seems especially strange when you think that transforming your U(0,1) samples to N(0,1) makes the problem soluble.
(If this makes no sense, then ignore it): Using an arbitrary distribution for predictions, then use its CDF (Universality of the Uniform) to convert to , and then transform to z-score using the inverse CDF (percentile point function) of the Unit Normal. Finally use this as in when calculating your calibration.
Well, this makes some sense, but it would make even more sense to do only half of it.
Take your forecast, calculate it's percentile. Then you can do all the traditional calibration stuff. All this stuff with z-scores is needless... (read more)
I absolutely considered writing about the difference between risk-neutral probabilities and real-world probabilities in this context but decided against because: Over the course of a year, the difference is going to be small relative to the width of the forecast
I'd be interested to hear if you think the differences would be material to my point. ie that [-60%, +30%] isn't a ~90% range that stocks return next year and that his forecast is not materially different to what the market is forecasting.
I think you're advocating two things here:
I think that 1. is an excellent tip in general for modelling. Here is Andrew Gelman making the same point
However, I don't think it's actually always good advice when eliciting forecasts. For example, fairly often people ask whether or not they should make a question on Metaculus continuous or binary. Almost always my answer is "make it binary". Binary questions get considerable more inte... (read more)
That's not a chart of "real rates", that's the spread between a 10y rate and a spot inflation estimate. Real rates is (ideally) the rate paid on an inflation linked bond, or at least the k-year rate minus the k-year forecast inflation. The BoE have historic data here going back to '85 and the rally is several hundred basis points less than your chart implies.
Sorry, I should have made that more clear. I am talking about the period since the start of the interest rate decline (mid 1980s to today).
I think you're going to have to be more explicit about what time period you're forecasting your market collapse. (Or whatever it is you're forecasting, it's still not clear to me).
... (read more)Let me try to rephrase that: I think we will be seeing a fundamental change in the financial markets due to an end to the 35year long reduction of real interest rates. And most of the actors have only known investing in an environment with mor
I'm not sure what you consider to be "neutral" to hold, but forward returns for holding cash don't look great either.
(I'm also not sure what you're trying to say about Warren Buffett, can you be more explicit)
tl;dr all your conclusions are equally consistent with equity returns being similar to the past going forward
a) To infer future average equity returns from the past decades seems to me quite dangerous.
Agreed, although what else do we have?
b) The valuation level (Shiller CAPE) and the elimination of the interest rate reduction effect for stock valuations indicate that expected future equity returns will probably be significantly lower than those of past decades.
Which decades are you looking at? As recently as the decade before last (2000s) we had negative r... (read more)
Thanks for flagging, fixed
I'll add that note
OK, so I am obviously biased but I'll look to see if I think this is fair.
Yeah, this is definitely my bad. I didn't ask you (or Scott) whether or not you were happy with me comparing your comments to market forecasts. I apologise. I also didn't intend to make this as normative as it sounds. (FWIW in the past I have gone to bat for your forecasting skills and given your forecast and a market forecast most of the time I would expect to update away from the market and towards you)
... (read more)I'll let Simon decide what to do with the rest. I also find it super weird
So I'm a little worried we've used different sources for your forecasts, but to explain where we differ:
If anyone can figure out how to format that table, I would appreciate it, thanks!
I don't think you've found the most unbiased description of PFOF out there
I realise you've been very careful about avoiding mentioning any explicit average in your section on "Combining External Forecasts", I was wondering if you had any thoughts on mean-vs-median (links below)
I was also wondering if you had any thoughts on extremising the forecasts you're ensembling too. (The classic example of 4 people all forecasting 60% but all based on independent information)
I'm afraid you're confused about how PFOF works. It's absolutely not about "frontrunning trades"
Okay, but your examples are now all the same as your "2." (which I don't disagree with). Size isn't the advantage here, it's being able to be involved in weird things. (I was disagreeing with your point "3.")
Small size means you can look for opportunities with a good return, but low capacity (e.g. some opportunity that could turn 10k into 20k, but couldn't turn 10M into 20M). I think this is a much bigger deal than the low slippage advantage that comes from small size.
I'm kinda curious as to what sort of opportunities people think these are (especially in developed markets)
The sorts of things which have low enough variance to be "good" trades without doing them systematically would require large, concrete mispricings. I struggle to see how the opporunity is li... (read more)
Two of those "advantages" aren't as much "advantages" as the market telling you that it thinks it knows better than you. The fact that you have lower trading costs and lower slippage (actually the same thing) is because the market doesn't respect you.
Re: information acquisition cost. Sure, you might have one small piece of information that BigTradingFirm doesn't have, but they have plenty of information you don't have. The relative value of the information is what matters.
I did a similar calculation not just for the base rate of completing his term, but of being the next nominee and the next US President a while back
There's already some discussion here
I think it would be perhaps helpful to link to a few people advocating averaging log-odds rather than averaging probabilities, eg:
Personally, I see this question as being an empirical question. Which method works best?
In the cases I care about, both averaging log odds and taking a median far outperform taking a mean. (Fwiw Metaculus agrees that it's a very safe bet too)
... (read more)In contrast, there are no conditions under which average log odds is the co
I figured that's the first thing someone would think of upon hearing "7x" which is why I mentioned "This was done using a variety of strategies across a large number of individual names" in the OP.
Right, I wasn't disagreeing with you, just explaining why 7x isn't strong evidence in my own words.
Can you please give some examples of such people? I wonder if there are any updates or lessons there for me.
Yes, but I don't think there's a huge amount of value in doing that. If you spend any time following stock touts on twitter / stock picking forums etc you wil... (read more)
Without even checking, I can think of a bunch of assets which 7x'ed since Jan 2020. (BTC/general crypto, TSLA, GME/AMC etc). So yes, I agree this depends on the portfolio you ran.
Personally, I have seen enough people claiming to outperform, but then fail to do so out of sample. (I mean, out of sample for me, not for them) for me to doubt any claim on the internet without a trading record.
Either way, I think it's very hard to convince me with just ~1.5 years of evidence that you have edge. I think if you showed me ~1k trades with some sensible risk parameters at all times, then I could be convinced. (Or if in another year and a half you have $300mm because you've managed to 7x your small HF AUM, I will be convinced).
Everyone else has already pointed out that you misunderstood what EMH states, so I wont bother adding to their chorus. (Except to say I agree with them).
I will also disagree with:
at most one-in-five people [...] It should therefore probably update us nontrivially away from the possibility that the post author just got lucky.
1 in 5 isn't especially strong evidence. How many of the other 5 people would you expect to be publishing on the internet saying "You should trade stocks".
I don't really know how you incentivise people (seriously) in the non-real money prediction markets.
Non-money prediction markets have lots of difficulties to them:
I don't know enough about the etiquette here, but I am having to fight the urge to post a bunch of memes along the lines of "It's not the bull market, I really am a genius".
I would strongly advise anyone who's considering following this to consider doing this with considerably less than their whole portfolio and with much lower expectations than 7x'ing your money.
No - I think probability is the thing supposed to be a martingale, but I might be being dumb here.