All of Davidmanheim's Comments + Replies

Small and Vulnerable

Someone with no personal experience of suffering should also be moved by that consideration.

That sounds like a fantastic reason for someone with that experience to post it, as occurred here, as a way to explain what it is like to others.

In fact, only the existence of suffering for some concrete individual justifies the abstract conclusion of altruism. Without that concrete level, the abstraction is hypothetical, and should not provide the same level of reason to be altruistic.

Scott Alexander 2021 Predictions: Buy/Sell/Hold

Update: we did this, I bought shares, we'll see how it goes.

Strong Evidence is Common

Extreme, in this context, was implying far from the consensus expectation. That implies both "seen as radical" and "involving very high [consensus] confidence [against the belief]." 

Contra your first paragraph, I think, I claim that this "extremeness" is valid Bayesian evidence for it being false, in the sense that you identify in your third paragraph - it has low prior odds. Given that, I agree that it would be incorrect to double-count the evidence of being extreme. But my claim was that, holding "extremeness" constant, the newness of a claim was in... (read more)

Scott Alexander 2021 Predictions: Buy/Sell/Hold

Want to sell me USDC on there in exchange for paypal, so I can bet? (I'll gladly pay a 2% "commission" for, say, $200 in USDC.)

9Liam Donovan8dI'm happy to; no commission needed. If anyone else wants to get money from fiat into polymarket easily with no fee, just let me know
Scott Alexander 2021 Predictions: Buy/Sell/Hold

It's a pain to redo, but can someone add Ought embedded predictions to all of these?

https://forecast.elicit.org/binary

(Alternatively/additionally, can they all be on Metaculus?)

2Zvi9dWould be happy if this happened but definitely don't have the bandwidth to do it myself.
8SimonM9dI detailed a few of them which are already on Metaculus here [https://www.lesswrong.com/posts/sAmkpdmfDMCohXiBX/scott-alexander-2021-predictions-market-prices] . If there are others which you are particularly keen to see added I'm sure they could be written
What topics are on Dath Ilan's civics exam?

Relatedly and perhaps even more fundamentally, the basic discipline of thinking about a system and implementing a mathematical model or simulation to explore these topics, which drove the insights you mention. And in many ways, it's easier to test without worrying about people gaming the system, because you can give new examples and require them to actually explore the question.

What topics are on Dath Ilan's civics exam?

That's fine, but choosing the question set to give the self-motivated children on which you provide the instant computer driven feedback is the same type of question; what is it that we want the child interested in X to learn?
 

Concretely, my 8 year old son likes math. He's fine with multiplication and division, but enjoys thinking about math. If I want him to be successful applying math later in life, should I start him on knot theory, pre-algebra equation solving, adding and subtracting unlike fractions, or coding in python? I see real advantages to ... (read more)

What topics are on Dath Ilan's civics exam?

Partly agree with your criticism of the quoted claim, but there are two things I think you should consider.

First, evaluating tests for long-term outcomes is fundamentally hard. The extent to which a 5th grade civics or math test predicts performance in policy or engineering is negligible. In fact, I would expect that the feedback from test scores in determining what a child focuses on has a far larger impact on a child's trajectory than the object level prediction allows.

Second, standardizing tests greatly reduces cost of development, and allows larger sample sizes for validation. For either reason alone, it makes sense to use standardized tests as much as possible.

2ChristianKl10dI don't believe that 5th grade civics or math tests are a good idea. At that age you want to encourage children to learn by following their curiosity and if you teach math in a structured way you likely want to have instant computer driven feedback and not the idea that children are supposed to have a certain level of math knowledge at a certain age for which they get tested.
Scott Alexander 2021 Predictions: Buy/Sell/Hold

12. Netanyahu is still Israeli PM: 40%

This is the PredictIt line for him on 6/30, and Scott’s predicting this out to January 1. I’m guessing that he didn’t notice? Otherwise, given how many things can go wrong, it’s a rather large disagreement – those wacky Israelis have elections constantly. I’m going to sell this down to 30% even though I have system 1 intuitions he’s not going anywhere. Math is math. 

 

I would buy at this price, probably up to 50%, but there are some wrinkles to how it gets resolved. At least 45% of the population really really... (read more)

4Liam Donovan9dhttps://polymarket.com/market/will-benjamin-netanyahu-remain-prime-minister-of-israel-through-june-30-2021 [https://polymarket.com/market/will-benjamin-netanyahu-remain-prime-minister-of-israel-through-june-30-2021] If you want to, you can in fact bet that Netanyahu will be PM on June 30th at 30c
LessWrong help desk - free paper downloads and more

Also just requested on reddit: https://www.reddit.com/r/Scholar/comments/mtwl4d/chapter_k_hoskin_1996_the_awful_idea_of/

LessWrong help desk - free paper downloads and more

Request: "K. Hoskin (1996) The ‘awful idea of accountability’: inscribing people into the measurement of objects. In Accountability: Power , Ethos and the Technologies of Managing, R. Munro and J. Mouritsen (Eds). London, International Thomson Business Press, and references therein."

(Cited by: Strathern, Marilyn (1997). "'Improving ratings': audit in the British University system". European Review. John Wiley & Sons. 5 (3): 305–321. doi:10.1002/(SICI)1234-981X(199707)5:3<05::AID-EURO184>3.0.CO;2-4.)

See Google Books, and Worldcat (Available in man... (read more)

4gwern19dIf Reddit falls through, email me and I can order a scan for you. (Might want to delete your duplicate comments here too.) EDIT: ordered a scan
2Davidmanheim19dI also just requested this on reddit [https://www.reddit.com/r/Scholar/comments/mtwl4d/chapter_k_hoskin_1996_the_awful_idea_of/]
2Davidmanheim19dAlso just requested on reddit: https://www.reddit.com/r/Scholar/comments/mtwl4d/chapter_k_hoskin_1996_the_awful_idea_of/
Systematizing Epistemics: Principles for Resolving Forecasts

 Yeah, that's true. I don't recall exactly what I was thinking. 

Perhaps it was regarding time-weighting, and the difficulty of seeing what your score will be based on what you predict - but the Metaculus interface handles this well, modulus early closings, which screw lots of things up. Also, log-scoring is tricky when you have both continuous and binary outcomes, since they don't give similar measures - being well calibrated for binary events isn't "worth" as much, which seems perverse in many ways.

Systematizing Epistemics: Principles for Resolving Forecasts

In many cases, yes. But for some events, the "obvious" answers are not fully clear until well after the event in question takes place - elections, for example.

How many micromorts do you get per UV-index-hour?

About 20% of Americans develop skin cancer during their lifetime, and the 5-year overall survival rate for melanoma is over 90 percent. Taking this as the mortality risk, i.e. ignoring timing and varied risk levels, it's a 2% risk of (eventual) death.

But risk of skin cancer depends on far more than sun exposure - and the more important determinant is frequency of sunbathing below age 30. Other factors that seem to matter are skin color, skin response (how much you burn,) weight, and family history of cancers.
 

Systematizing Epistemics: Principles for Resolving Forecasts

re: "Get this wrong" versus "the balance should be better," there are two different things that are being discussed. The first is about defining individual questions via clear resolution criteria, which I think is doe well, and the second is about defining clear principles that provide context and inform what types of questions and resolution criteria are considered good form.

A question like "will Democrats pass H.R.2280 and receive 51 votes in the Senate" is very well defined, but super-narrow, and easily resolved "incorrectly" if the bill is incorporated... (read more)

Systematizing Epistemics: Principles for Resolving Forecasts

I haven't said, and I don't think, that the majority of markets and prediction sites get this wrong. I think they navigate this without a clear framework, which I think the post begins providing. And I strongly agree that there isn't a slam-dunk-no-questions case for principles overriding rules, which the intro might have implied too strongly. I also agree with your point about downsides of ambiguity potentially overriding the benefits of greater fidelity to the intent of a question, and brought it up in the post. Still, excessive focus on making rules on ... (read more)

3SimonM1mo(I realise everything I'm commenting seems like a nitpick and I do think that what you've written is interesting and useful, I just don't have anything constructive to add on that side of things) I don't like litigating via quotes, but: and I read the bit I've emphasised as saying "prediction sites have got this balance wrong" contradicting your comment saying you think they have it right. I think it's really hard for this adaptive approach to work when there's more than a small group of like minded people involved in a forecast. (This is related to my final point): The problem for me (with this) is what is "clear" for some people is not clear for others. To give one example of this, the language in this question [https://www.metaculus.com/questions/5598/masks-in-schools-u-turn/#comment-47680] was completely unambiguous to me (and it's author) but another predictor found it unclear. (I don't think this is a particularly good example, but it's just one which I thought of when trying to think of an example of where some people thought something was ambiguous and some people didn't).
Thirty-three randomly selected bioethics papers

As an aside, I find it bizarre that Economics gets put at 9 - I think a review of what gets done in top econ journals would cause you to update that number down by at least 1. (It's not usually very bad, but it's often mostly useless.) And I think it's clear that lots of Econ does, in fact, have a replication crisis. (But we'll if see that is true as some of the newer replication projects actually come out with results.)

6Rob Bensinger1moI guess I was thinking of 9/10 as a relatively low bar in the grand scheme of things ("pretty good"), and placing it so far from journalism (etc.) to express my low regard for the latter more so than my high regard for econ. But it sounds like it may belong lower on the scale regardless.
Resolutions to the Challenge of Resolving Forecasts

Generally agree that there's something interesting here, but I'm still skeptical that in most prediction market cases there would be enough money across questions, and enough variance in probabilities, for this to work well.

Resolutions to the Challenge of Resolving Forecasts

For betting markets, the market maker may need to manage the odds differently, and for prediction markets, it's because otherwise you're paying people in lower brier scores for watching the games, rather than being good predictors beforehand. (The way that time-weighted brier scores work is tricky - you could get it right, but in practice it seems that last minute failures to update are fairly heavily penalized.)

Dark Matters

That's good to hear. But if "he started at 60%," that seems to mean if he "still thinks dark matter is overwhelmingly likely" he is updating in the wrong direction. (Perhaps he thought it was 60% likely that the LHC found dark matter? In which case I still think that he should update away from "overwhelmingly likely" - it's weak evidence against the hypothesis, but unless he started out almost certain, "overwhelmingly" seems to go a bit too far.)

2Charlie Steiner2moYes, 60% that the LHC would find a dark matter candidate. Anyhow, maybe you should take away that this emphasizes that he does (and cosmologists in general do) have lots of evidence.
Resolutions to the Challenge of Resolving Forecasts

Yes, that was exactly what I was thinking of, but 1) I didn't remember the name, and 2) I wanted a concrete example relevant to prediction markets.

And I agree it's hard to estimate in general, but the problem can still be relevant in many cases - which is why I used my example. In the baseball game, if the market closes before the game begins - we don't have a model as good as the market, but once the game is 7/9th complete, we can do better than the pre-game market prediction.

2gwern2moWhy close the markets, though?
Resolutions to the Challenge of Resolving Forecasts

It's an interesting idea, but one that seems to have very high costs for forecasters in keeping the predictions updated and coherent.

If we imagine that we pay forecasters the market value of their time, an active forecasting question with a couple dozen people spending a half hour each updating their forecast "costs" thousands of dollars per week. Multiplying that, even when accounting for reduced costs for similar questions, seems not worth the cost.

3mike_hawke2moHm okay. And is this a problem for prediction markets too, even though participants expect to profit from their time spent? The way I imagine it, sloppier traders will treat a batch of nearly identical questions as identical, arbitraging among them and causing the prices to converge. Meanwhile, the more literal-minded traders will think carefully about how the small changes in the wording might imply large changes in probability, and they will occasionally profit by pushing the batch of prices apart. But maybe most traders won't be that patient, and will prefer meta-resolution or offloading. I still feel like I'm onto something here...
Dark Matters

"isn't it quite odd that looking around at different parts of the universe seems to produce such a striking level of agreement on how much missing mass there is?"

But they don't. Dark matter, as a theory, posits that the amount of mass that "must be there somewhere" varies in amount and distribution in an ad-hoc fashion to explain the observations. I think it's likely that whatever is wrong with the theory, on the other hand, isn't varying wildly by where in the universe it is. Any such explanation would (need to) be more parsimonious, not less so.

And I agr... (read more)

5Charlie Steiner2moMost physicists actually have updated - if you listen to Sean Carroll's podcast, he just this week talked about how when the LHC started up he thought there was about a 60% chance of finding a dark matter candidate, and that he's updated his views in light of our failure to find it. But he also explained that he still thinks dark matter is overwhelmingly likely (because of evidence like that explained in the post).
Dark Matters

This was fantastic, and still leaves me with a conclusion that "dark matter" isn't a specific hypothesis, it's a set of reasons to think we're missing something in our theories which isn't modified gravity.

That is, saying "Given that everything we see is consistent with Gravity being correct, we conclude that there is not enough baryonic matter to account for what we see," doesn't prove the existence of large amounts of non-baryonic matter. Instead, the evidence provides strong indication that either A) there is something we can't see that has some propert... (read more)

If there's something wrong with some theory, isn't it quite odd that looking around at different parts of the universe seems to produce such a striking level of agreement on how much missing mass there is? If there was some out-of-left-field thing, I'd expect it to have confusing manifestations in many different areas and astronomers angsting about dramatically inconsistent measurements, I would not expect the CMB to end up explained away (and the error bars on those measurements are really really small) by the same 5:1 mix of non-baryonic matter vs baryon... (read more)

Strong Evidence is Common

"Worth having" is a separate argument about relative value of new information. It is reasonable when markets exist or we are competing in other ways where we can exploit our relative advantage. But there's a different mistake that is possible which I want to note.

Most extreme beliefs are false; for every correct belief, there are many, many extreme beliefs that are false. Strong consensus on some belief is (evidence for the existence of) strong evidence of the truth of that belief, at least among the considered alternatives. So picking a belief on the basi... (read more)

1Chrysophylax6dYou seem to have made two logical errors here. First, "This belief is extreme" does not imply "This belief is true", but neither does it imply "This belief is false". You shouldn't divide beliefs into "extreme" and "non-extreme" buckets and treat them differently. Second, you seem to be using "extreme" to mean both "involving very high confidence" and "seen as radical", the latter of which you might mean to be "in favour of a proposition I assign a very low prior probability". Restating my first objection, "This belief has prior odds of 1:1024" is exactly 10 bits of evidence against the belief. You can't use that information to update the probability downward, because -10 bits is "extreme", any more than you can update the probability upward because -10 bits is "extreme". If you could do that, you would have a prior that immediately requires updating based on its own content (so it's not your real prior), and I'm pretty sure you would either get stuck in infinite loops of lowering and raising the probability of some particular belief (based on whether it is "extreme" or not), or else be able to pump out infinite evidence for or against some belief.
Resolutions to the Challenge of Resolving Forecasts

I think we agree on this - iterated closing is an interesting idea, but I'm not sure it solves a problem. It doesn't help with ambiguity, since we can't find bounds. And earlier payouts are nice, but by the time we can do partial payouts, they are either tiny, because of large ranges, or they are not much before closing. (They also create nasty problems with incentive compatibility, which I'm unsure can be worked out cleanly.)

Resolutions to the Challenge of Resolving Forecasts

"partial resolution seems like it would be useful"
I hadn't thought of this originally, but Nuno added the category of "Resolve with a Probability," which does this. The idea of iterated closing of a question as the bounds improve is neat, but probably technically challenging. (GJ Inc. kind-of does this when they close answer options that are already certain to be wrong, such as total ranges below the current number of CVOID cases.) I'd also worry it creates complexity that makes it much less clear to forecasters how things will work.

"one helpful mechanism ... (read more)

2abramdemski2moHere's how I imagine it working. Suppose a prediction market includes a numerically-valued proposition, like if we forecast COVID numbers not by putting probabilities on different ranges, but rather, by letting people buy and sell contracts which pay out proportional to COVID numbers. The market price of such a contract becomes our projection. (Or, you know, some equivalent mechanism for non-cash markets.) Then, when we get partial information about COVID numbers, we create a partial payout: if we're confident covid numbers for a given period were at least 1K, we can cause sellers of the contract to pay 1K's worth to buyers. As the lower bound gets better, they pay more. Of course, the mathematical work deciding when we can be "confident" of a given lower bound can be challenging, and the forecasters have to guess how this will be handled. And a big problem with this method is that it will low-ball the number in question, since the confidence interval will never close up to a single number, and forecasters only have to worry about the lower end of the confidence interval.
RadVac Commercial Antibody Test Results

Not sure that you'd get reactions from large subunits if they fold differently than the full spike - but my biochemistry/immunology isn't enough to be sure about how this would work.

8johnswentworth2moThe marketing material for the test kit has a long description of steps they took to get the same conformation. It's still marketing material, so I don't trust it 100%, but there's at least a plausible story that it should be the same.
RadVac Commercial Antibody Test Results

 "Aside from the test result, we do have one more small piece of information to update on: I was quite congested for 1-2 days after the most recent three doses (and I was generally not congested the rest of the week). That's exactly what we'd expect to see if the vaccine is working as intended, and it's pretty strong evidence that it's doing something."


Agree that this is evidence it is doing something, but my strong prior is that the adjuvant alone (chitosan) would cause this to happen. 

I'm also unclear about why you chose the weekly schedule, or... (read more)

5johnswentworth2moMy current plan is to do one more dose, to make sure we've had three total doses with all nine peptides, then cut it off there. In general, I expect more dakka to be valuable mainly if the dosage is sufficient to induce response in some people but not everyone, so e.g. if we saw a response in my girlfriend but not in me then I'd be a lot more inclined to add another couple doses.
RadVac Commercial Antibody Test Results

I agree that posting the results was the correct thing to do, and appreciate that John is trying to figure out if this is useful - but I actually claim the post is an example of how rationality is hard, and even pursuing it can be misleading if you aren't very, very careful.

In The Twelve Virtues of Rationality, this post gets virtue points for the first (curiosity, for looking into whether it works,) third (lightness, being willing to update marginally on evidence,) fourth (evenness, updating even when the evidence isn't in the direction desired,) sixth (e... (read more)

1BB62moThey did test fot antibodies before taking RadVac. That was a prrcaution to lower the probability, that they would get antibodies due to infection.
RadVac Commercial Antibody Test Results

You need to see if the spike peptide included corresponds to the antibody being tested for - and given how many targets there are, I would be surprised if it did.

Despite holding a far lower prior on efficacy, I'm agreeing with Christian - this evidence shouldn't be a reason to update anywhere nearly as strongly as you did against effectiveness.

2johnswentworth2moBased on the fact sheet [https://www.diasorin.com/sites/default/files/allegati_prodotti/covid_-_brochure_igg_unica_m0870004366-d_low.pdf] , it sounds like they're using full S1 and S2, which together constitute basically the entire spike protein. You seem to be imagining that they would use short peptides in the test rather than whole proteins or large subunits; any particular reason why?
Making Vaccine

Mostly vague "accidents and harmful unknown unknowns aren't that unlikely here" - because we have data on baseline success at "not have harmful side effects," and it is low. We also know that lots of important side effects are unusual, so the expected loss can be high even after a number of "successes," and this is doubly true because no-one is actually tracking side effects. We don't know much about efficacy either, but again, on base rates it is somewhat low. (Base rates for mRNA are less clear, and may be far higher - but these sequences are unfiltered,... (read more)

How my school gamed the stats

Sorry, this is clearly much more confrontational than I intended.

4gjm2moTo whatever extent that's my fault, I'm sorry too. :-)
How my school gamed the stats

First, I apologize. I really didn't intend for the tone to be attacking, and I am sorry that was how it sounded. I certainly wasn't intentionally "suggesting [you were] somehow trying to hide or deny" any of the issues. I thought it was worth noting that the initial characterization was plausibly misleading, given that the sole indicator of being a "nice middle class area" seemed to be percentage of people with PhDs. Your defense was that it was no more than 3x the number of PhDs, but that doesn't mean top 1/3, a point which you later agreed to. And after ... (read more)

2gjm2moI don't actively "want to continue", in that it seems to me that the whole content of this discussion is you saying or implying that I've badly misrepresented how affluent my area is, and me pointing out in various ways that that isn't so. However, your last paragraph seems once again like an accusation of inconsistency, so let me clarify. "Upper middle class" means different things in different places. In the US, "class" is largely (but not wholly) about wealth. In the UK, "class" is largely (but not wholly) about social background. These are less different than that makes them sound because the relevant differences in social background are mostly driven by the wealth of one's forebears, and in both societies there is a strong correlation between that and one's own wealth. The US has a more wealth-based notion of class and is also richer. So being "upper middle class" in the US means a level of wealth that would make you quite rich in the UK. The UK has a more social-background-based notion of class, which in particular is strongly influenced by the existence of a (statistically very small) aristocratic class. So "upper class" in the UK means a smaller, more-elite fraction of the population than in the US, and "upper middle class" is pulled in the same direction. So being "upper middle class" in the UK typically (but not always, because of the wealth/background distinction) means being at a distinctly higher percentile of wealth than it does in the US. The combined effect of these things is to put the typical "upper middle class" person or family at something like the same level of wealth in the two countries, although there's plenty of fuzziness and variability. (Perhaps I have by now made it clear that I do have some idea how social class works in the UK, enough so that you might believe me if I tell you that (1) I am definitely lower-upper-middle-class and (2) my household income is somewhere around the 95th percentile.) So, issue 1 is that you're wanting
How my school gamed the stats

Wait, the claim was never that everyone is well off - of course we expect there to be a distribution. But if a sizeable portion of the children at the school largely have very high-socioeconomic-status parents, even if it's only 10% of the parents, that should be compared to a median of plausibly less than 1% of parents in the set of schools overall, it would be incorrect to infer that the way the school is run can be usefully compared to the "average" school.

2gjm2moI said the school was "in a nice middle-class area". You replied (way way back upthread): Which only makes sense to me if it's saying that what I described as middle-class is actually "the very upper part of the upper middle class", and what I described as middle-class is not "the best-off parents of pupils at the school" but the area the school was in. (For greater precision I should really have been talking about the people at the school rather than the area as such, but I take it that was always understood, and in fact I think the school's pupils are pretty representative of its catchment area.) And of course I agree that the school isn't average. That's why I said, way back in my original comment, Emphasis added here to make it absolutely clear that right at the outset I explicitly noted the things you're now suggesting I was somehow trying to hide or deny. Note the last sentence: I wasn't responding to a report about the failings of an average school by saying "but my nice middle-class school is different", I was responding to a report about the failings of a nice middle-class school by saying "but my nice middle-class school seems to be different from your nice middle-class school". I also want to push back a bit again about "the middle of the upper middle class" and "very high-socioeconomic-status parents". I think it is flatly untrue that the area I'm in is "in the middle of the upper middle class" by either UK or US definitions, and I think it's at best debatable whether "a sizeable portion of the children at the school largely have very high-socioeconomic-status parents". Let me quote you some bits of Wikipedia, as indicative of typical usage of the term "upper middle class". Very few people in the village where I live send their children to public schools, or even to other independent schools. (And obviously the parents of children at my daughter's school don't.) I don't know anything much about Matthew Pinsent, but again: if there's anyone here in
The slopes to common sense

Great post. 

My only comment is that I think you're confused in section iv when you say, "but the origin of the universe is essentially an infinity of inferential steps away given the sheer scale of the issue," and think that you're misunderstanding some tricky and subtle points about epistemology of science and what inferential steps would be needed. So people might be right when they say you meant "We can't make any meaningful factual claims about the origin of the universe. We are too limited to understand an event like this." - but the object level... (read more)

3George2moThat is the best example I had of how one could, e.g, disagree with a scientific field by just erring on scepticism rather than taking the opposite view. *** To answer your critique of that point, though again, I think it bares no or little relation to the article itself: * The "predictions" by which the theory is judges here are just as fuzzy and inferentially distant. * I am not a cosmologist, what I've read regrading cosmology have been mainly papers around unsupervised and semi-supervised clustering on noisy data, incidental evidence from those has made me doubt the complex ontologies proposed by cosmologists, given the seemingly huge degree of error acceptable in the process of "cleaning" data. * There are many examples of people fooling themselves into making experiments to confirm a theory and "correcting" or discarding results that don't confirm it (see e.g. phlogiston, mass of the electron, the pushback against proton gradients as a fundamental mechanism of cell energy production, vitalism-confirming experiments, roman takes on gravity) * One way science can be guarded against modelling an idealized reality that no longer is related to the real world is by making something "obviously" real (e.g. electric lightbulb, nuclear bomb, vacuum engines). * Focusing on real-world problem also allows for different types of skin-in-the-game, i.e. going against the consensus for profit, even if you think the consensus is corrupt. Cosmology is a field that requires dozens of years to "get into", it has no practical applications that validate it's theories, it's only validation comes from observational evidence using data that is supposed to describe objects that are, again, a huge inferential distance away in both time/space and SNR... data which is heavily cleaned based on models created and validated by cosmology. So I tend to err on the side of "bullshit" provided lack of relevant predictions that can be validate by lit
How my school gamed the stats

That's fair - thanks for checking, and I'd agree that that would better match "very nice middle-class area" than my assertion. (In the US, the top 2-3% is usually considered upper class, while the next 15-20% are upper middle class, and the next ~25% are "lower middle class." This income level definitely puts your neighborhood in the middle of the upper middle class.)

2gjm2moSome other relevant numbers: mean household income in my village (and the area around it that's part of the same area, as used by that tool) is about £36k before, and about £33k after, the cost-of-living adjustment. Those are means; presumably the median is lower. Again, that makes it a better-off-than-average area, but note that £36k is not by any reasonable standard a middle-upper-middle-class household income. So yes, this is definitely a nice area, but no, it's not the case that everyone here is very well off or very high-status.
How my school gamed the stats

I'd agree with most of your models, and agree that there is divergence at the extremes of a distribution - but that's at the very extremes, and usually doesn't lead to strong anti-correlation even in the extreme tails. 

But I think we're better off being more concrete. I don't know where you live, but I suspect that your postal code is around the 90% income percentile, after housing costs - a prediction which you can check easily. And that implies that the tails for income and education are still pretty well correlated at only the 97th percentile for e... (read more)

2gjm2moI checked. Annoyingly, the tool you linked to only tells you which 10%-sized block of percentiles the area is in. It says 70-80 before, and 80-90 after, adjusting for housing costs. (If you're trying to measure social status, upper-middle-class-ness, etc., then I claim you should actually use the figures before adjusting for housing costs.) That's the village where I live and where the school is located, but it takes pupils from other places too; the neighbouring village that I think provides the largest number of other pupils is in the 70-80 percentile by both metrics. The same page has a thing that gives finer-grained percentiles but only for the before-housing-costs figure (which, again, I think is actually the more relevant here). My village gets about 82%, the other one I mentioned gets about 78%. I think all of this exactly matches my original description: very nice middle-class area but not "the very upper part of the upper middle class". I'm not sure exactly what you mean by "still pretty well correlated"; 97% on one -> 90% on the other isn't so different from what my toy model says, and 97% on one -> ~80% on the other (which I think is better supported by the evidence) is pretty much exactly what my toy model says.
How my school gamed the stats

Even given your numbers, I think it's very likely  that you're underestimating how privileged the group is. Most things like educational status are pareto-distributed; 80% of PhDs are in 20% of areas. While that assumption may be unfair, if it were correct, the point with 3x the average is in the 97th percentile.

And yes, you're near Cambridge, which explains the concentration of PhDs, and makes it seem less elite compared to Cambridge itself, but doesn't change the class of the people compared to the country as a whole.

2gjm2moIt's certainly possible that I'm underestimating the level of privilege here. But I guarantee that the area is not at all "the very upper part of the upper middle class". In particular, I know a lot of the parents-with-PhDs, and I'm pretty sure none of us is upper-upper-middle-class by any reasonable definition. To whatever extent the school sounds startlingly privileged rather than merely distinctly more than averagely privileged, I think it's more likely that my estimates of the parents are skewed than that the school is really super-duper-elite. (For instance: I'm an academic sort of person, the people I know will tend to be academic sorts of people, and so it would be very unsurprising if I overestimated how many parents have PhDs. Also, I am consistently trying to overestimate rather than underestimate, because I want to be honest about the fact that this school is serving a pretty "good" population.) I don't think I agree with your second paragraph. I have a super-handwavy model in my head according to which populations like "Cambridge graduates", while of course both well-educated and high-status, are less high-status than you would expect from e.g. seeing where they sit in the distribution of education and assuming they're in the same place in the distribution of social status. Let me try to make it a bit more concrete and see whether I still disagree with you. Effect #1: education and status are correlated but not at all the same thing. Toy model: status = education + otherstuff, both are uniform(0,1). If status <= 1 then status quantile = status^2/2; if status >= 1 then status quantile = 1 - (2-status)^2/2. If your education quantile (= your education) is 0.9 then your status is uniform between 0.9 and 1.9, so your average status quantile is (if I've got the calculations right) about 0.78. If your education quantile is 0.97 then your average status quantile is about 0.82. High, but not that high. Effect #2: when you condition on somewhat extreme values
How my school gamed the stats

Note that only around 3% of UK residents have PhDs - so I strongly suspect that what you're calling "middle-class" is closer to the top 5% of the population, or what sociologists would say is the very upper part of the upper middle class. 

3gjm3moWell, the "substantial fraction" I have in mind isn't all that large. Certainly not more than 20% of pupils having >= one parent with a PhD ~= 10% of parents having PhDs, which would be ~3x the general population, suggesting that the population here might be comparable to the top 1/3 of the population as a whole. Kinda. Which is, as I say, certainly a nice middle-class area, but not much like taking the top 5% of the population. Class terminology is used differently by different people; the Wikipedia page for "upper middle class", which is probably reasonably representative, says that "[t]he upper middle class in Britain traditionally consists of the educated professionals who were born into higher-income backgrounds, such as legal professionals, executives, and surgeons"; interpreting "higher-income" fairly broadly, that might be 10% of the school's pupils. So no, not by any means "the very upper part of the upper middle class". (Also, we're near Cambridge, which means a lot of people with good academic qualifications; so in this population, good academic qualifications will be less evidence of e.g. wealth and social status than in the population at large.)
Promoting Prediction Markets With Meaningless Internet-Point Badges

Yes, it's super important to update frequently when the scores are computed as time-weighted. And for Mataculus, that's a useful thing, since viewers want to know what the current best guess is, but it's not the only way to do scoring. But saying frequent updating makes you better at forecasting isn't actually a fact about how accurate the individual forecasts are - it's a fact about how they are scored.

Making Vaccine

"Immunity" and "efficacy" seem like they should refer to the same thing, but they really don't. And if you talk to people at the FDA, or CDC, they should, and probably would, talk about efficacy, not immunity, when talking about these vaccines.

And I understand that the technical terms and usage aren't the same as what people understand, and I was trying to  point out that for technical usage, the terms don't quite mean the things you were assuming. 

And yes, the vaccines have not been proven to provide immunizing protection - which again, is diffe... (read more)

Covid 2/11: As Expected

There was a lesswrong post about this a while back that I can't find right now, and I wrote a twitter thread on a related topic. I'm not involved with the reasoning behind the structure for GJP or Metaculus, so for both it's an outside perspective. However, I was recently told there is a significant amount of ongoing internal metaculus discussion about the scoring rule, which, I think, isn't nearly as bad as it seemed. (But even if there is a better solution, changing the rule now would have really weird impacts on motivation of current users, which is cri... (read more)

2Unnamed3mohttps://www.lesswrong.com/posts/tyNrj2wwHSnb4tiMk/incentive-problems-with-current-forecasting-competitions [https://www.lesswrong.com/posts/tyNrj2wwHSnb4tiMk/incentive-problems-with-current-forecasting-competitions] ?

Having a meetup on this seems interesting. Will PM people.

Covid 2/11: As Expected

If the user is interested in getting into the top ranks, this strategy won't be anything like enough. And if not, but they want to maximize their score, the scoring system is still incentive compatible - they are better off reporting their true estimate on any given question. And for the worst (but still self-aware) predictors, this should be the metaculus prediction anyways - so they can still come away with a positive number of points, but not many. Anything much worse than that, yes, people could have negative overall scores - which, if they've predicted on a decent number of questions, is pretty strong evidence that they really suck at forecasting.

5elifland3moI think this isn't true empirically for a reasonable interpretation of top ranks. For example, I'm ranked 5th on questions that have resolved in the past 3 months [https://archive.vn/5KZvT] due to predicting on almost every question. Looking at my track record, for questions resolved in the last 3 months, evaluated at all times, here's how my log score looks compared to the community: * Binary questions (N=19): me: -.072 vs. community: -.045 * Continuous questions (N=20): me: 2.35 vs. community: 2.33 So if anything, I've done a bit worse than the community overall, and am in 5th by virtue of predicting on all questions. It's likely that the predictors significantly in front of me are that far ahead in part due to having predicted on (a) questions that have resolved recently but closed before I was active and (b) a longer portion of the lifespan for questions that were open before I became active. Edit: I discovered that the question set changes when I evaluate at "resolve time" and filter for the past 3 months, not sure why exactly. Numbers at resolve time: * Binary questions (N=102): me: .598 vs. community: .566 * Continuous questions (N=92): me: 2.95 vs. community: 2.86 I think this weakens my case substantially, though I still think a bot that just predicts the community as soon as it becomes visible and updates every day would currently be at least top 10. Elicit Prediction (elicit.org/binary/questions/uVlFdPH1T [elicit.org/binary/questions/uVlFdPH1T])I agree that this should have some effect of being less welcoming to newcomers, but I'm curious to what extent. I have seen plenty of people with worse brier scores than the median continuing to predict on GJO rather than being demoralized and quitting (disclaimer: survivorship bias).
Covid 2/11: As Expected

Not really. Overall usefulness is really about something like covariance with the overall prediction - are you contributing different ideas and models. That would be very hard to measure, while making the points incentive compatible is not nearly as hard to do.

And how well an individual predictor will do, based on historical evidence, is found in comparing their brier to the metaculus prediction on the same set of questions. This is information which users can see on their own page. But it's not a useful figure unless you're asking about relative performance, which as an outsider interpreting predictions, you shouldn't care about - because you want the aggregated prediction.

Covid 2/11: As Expected

I agree that actually offering money would require incentives to avoid, essentially, sybil attacks. But making sure people don't make "noise predictions" isn't a useful goal - those noise predictions don't really affect the overall metaculus prediction much, since it weights past accuracy.

 

Covid 2/11: As Expected

As someone who is involved in both Metaculus and the Good judgement project, I think it's worth noting that Zvi's criticism of Metaculus - that points are given just for participating, so that making a community average guess gets you points - applies to Good Judgement Inc's predictions by superforecasters in almost exactly the same way - the superforecasters are paid for a combination of participation and their performance, so that guessing the forecast median earns them money. (GJI does have a payment system for superforecasters which is more complex than this, and I probably am not allowed to talk about - but the central point remains true.)

3Andrew_Clough3moIt also applies to the stock market where buying an index fund that just invests in everything leads to fairly regular positive returns.
Load More