Someone with no personal experience of suffering should also be moved by that consideration.
That sounds like a fantastic reason for someone with that experience to post it, as occurred here, as a way to explain what it is like to others.In fact, only the existence of suffering for some concrete individual justifies the abstract conclusion of altruism. Without that concrete level, the abstraction is hypothetical, and should not provide the same level of reason to be altruistic.
Update: we did this, I bought shares, we'll see how it goes.
Extreme, in this context, was implying far from the consensus expectation. That implies both "seen as radical" and "involving very high [consensus] confidence [against the belief]." Contra your first paragraph, I think, I claim that this "extremeness" is valid Bayesian evidence for it being false, in the sense that you identify in your third paragraph - it has low prior odds. Given that, I agree that it would be incorrect to double-count the evidence of being extreme. But my claim was that, holding "extremeness" constant, the newness of a claim was in... (read more)
Want to sell me USDC on there in exchange for paypal, so I can bet? (I'll gladly pay a 2% "commission" for, say, $200 in USDC.)
It's a pain to redo, but can someone add Ought embedded predictions to all of these?
https://forecast.elicit.org/binary(Alternatively/additionally, can they all be on Metaculus?)
Relatedly and perhaps even more fundamentally, the basic discipline of thinking about a system and implementing a mathematical model or simulation to explore these topics, which drove the insights you mention. And in many ways, it's easier to test without worrying about people gaming the system, because you can give new examples and require them to actually explore the question.
That's fine, but choosing the question set to give the self-motivated children on which you provide the instant computer driven feedback is the same type of question; what is it that we want the child interested in X to learn?
Concretely, my 8 year old son likes math. He's fine with multiplication and division, but enjoys thinking about math. If I want him to be successful applying math later in life, should I start him on knot theory, pre-algebra equation solving, adding and subtracting unlike fractions, or coding in python? I see real advantages to ... (read more)
Partly agree with your criticism of the quoted claim, but there are two things I think you should consider.
First, evaluating tests for long-term outcomes is fundamentally hard. The extent to which a 5th grade civics or math test predicts performance in policy or engineering is negligible. In fact, I would expect that the feedback from test scores in determining what a child focuses on has a far larger impact on a child's trajectory than the object level prediction allows.Second, standardizing tests greatly reduces cost of development, and allows larger sample sizes for validation. For either reason alone, it makes sense to use standardized tests as much as possible.
12. Netanyahu is still Israeli PM: 40%This is the PredictIt line for him on 6/30, and Scott’s predicting this out to January 1. I’m guessing that he didn’t notice? Otherwise, given how many things can go wrong, it’s a rather large disagreement – those wacky Israelis have elections constantly. I’m going to sell this down to 30% even though I have system 1 intuitions he’s not going anywhere. Math is math.
12. Netanyahu is still Israeli PM: 40%
This is the PredictIt line for him on 6/30, and Scott’s predicting this out to January 1. I’m guessing that he didn’t notice? Otherwise, given how many things can go wrong, it’s a rather large disagreement – those wacky Israelis have elections constantly. I’m going to sell this down to 30% even though I have system 1 intuitions he’s not going anywhere. Math is math.
I would buy at this price, probably up to 50%, but there are some wrinkles to how it gets resolved. At least 45% of the population really really... (read more)
I also just requested this on reddit
Also just requested on reddit: https://www.reddit.com/r/Scholar/comments/mtwl4d/chapter_k_hoskin_1996_the_awful_idea_of/
Request: "K. Hoskin (1996) The ‘awful idea of accountability’: inscribing people into the measurement of objects. In Accountability: Power , Ethos and the Technologies of Managing, R. Munro and J. Mouritsen (Eds). London, International Thomson Business Press, and references therein."(Cited by: Strathern, Marilyn (1997). "'Improving ratings': audit in the British University system". European Review. John Wiley & Sons. 5 (3): 305–321. doi:10.1002/(SICI)1234-981X(199707)5:3<05::AID-EURO184>3.0.CO;2-4.)
See Google Books, and Worldcat (Available in man... (read more)
Noting the obvious connection to Goodhart's law - and elsewhere I've described the mistake of pushing to maximize easy-to-measure / cognitively available items rather than true goals.
Yeah, that's true. I don't recall exactly what I was thinking. Perhaps it was regarding time-weighting, and the difficulty of seeing what your score will be based on what you predict - but the Metaculus interface handles this well, modulus early closings, which screw lots of things up. Also, log-scoring is tricky when you have both continuous and binary outcomes, since they don't give similar measures - being well calibrated for binary events isn't "worth" as much, which seems perverse in many ways.
In many cases, yes. But for some events, the "obvious" answers are not fully clear until well after the event in question takes place - elections, for example.
About 20% of Americans develop skin cancer during their lifetime, and the 5-year overall survival rate for melanoma is over 90 percent. Taking this as the mortality risk, i.e. ignoring timing and varied risk levels, it's a 2% risk of (eventual) death.But risk of skin cancer depends on far more than sun exposure - and the more important determinant is frequency of sunbathing below age 30. Other factors that seem to matter are skin color, skin response (how much you burn,) weight, and family history of cancers.
re: "Get this wrong" versus "the balance should be better," there are two different things that are being discussed. The first is about defining individual questions via clear resolution criteria, which I think is doe well, and the second is about defining clear principles that provide context and inform what types of questions and resolution criteria are considered good form.A question like "will Democrats pass H.R.2280 and receive 51 votes in the Senate" is very well defined, but super-narrow, and easily resolved "incorrectly" if the bill is incorporated... (read more)
I haven't said, and I don't think, that the majority of markets and prediction sites get this wrong. I think they navigate this without a clear framework, which I think the post begins providing. And I strongly agree that there isn't a slam-dunk-no-questions case for principles overriding rules, which the intro might have implied too strongly. I also agree with your point about downsides of ambiguity potentially overriding the benefits of greater fidelity to the intent of a question, and brought it up in the post. Still, excessive focus on making rules on ... (read more)
As an aside, I find it bizarre that Economics gets put at 9 - I think a review of what gets done in top econ journals would cause you to update that number down by at least 1. (It's not usually very bad, but it's often mostly useless.) And I think it's clear that lots of Econ does, in fact, have a replication crisis. (But we'll if see that is true as some of the newer replication projects actually come out with results.)
Generally agree that there's something interesting here, but I'm still skeptical that in most prediction market cases there would be enough money across questions, and enough variance in probabilities, for this to work well.
For betting markets, the market maker may need to manage the odds differently, and for prediction markets, it's because otherwise you're paying people in lower brier scores for watching the games, rather than being good predictors beforehand. (The way that time-weighted brier scores work is tricky - you could get it right, but in practice it seems that last minute failures to update are fairly heavily penalized.)
That's good to hear. But if "he started at 60%," that seems to mean if he "still thinks dark matter is overwhelmingly likely" he is updating in the wrong direction. (Perhaps he thought it was 60% likely that the LHC found dark matter? In which case I still think that he should update away from "overwhelmingly likely" - it's weak evidence against the hypothesis, but unless he started out almost certain, "overwhelmingly" seems to go a bit too far.)
Yes, that was exactly what I was thinking of, but 1) I didn't remember the name, and 2) I wanted a concrete example relevant to prediction markets.And I agree it's hard to estimate in general, but the problem can still be relevant in many cases - which is why I used my example. In the baseball game, if the market closes before the game begins - we don't have a model as good as the market, but once the game is 7/9th complete, we can do better than the pre-game market prediction.
It's an interesting idea, but one that seems to have very high costs for forecasters in keeping the predictions updated and coherent.If we imagine that we pay forecasters the market value of their time, an active forecasting question with a couple dozen people spending a half hour each updating their forecast "costs" thousands of dollars per week. Multiplying that, even when accounting for reduced costs for similar questions, seems not worth the cost.
"isn't it quite odd that looking around at different parts of the universe seems to produce such a striking level of agreement on how much missing mass there is?"But they don't. Dark matter, as a theory, posits that the amount of mass that "must be there somewhere" varies in amount and distribution in an ad-hoc fashion to explain the observations. I think it's likely that whatever is wrong with the theory, on the other hand, isn't varying wildly by where in the universe it is. Any such explanation would (need to) be more parsimonious, not less so.And I agr... (read more)
This was fantastic, and still leaves me with a conclusion that "dark matter" isn't a specific hypothesis, it's a set of reasons to think we're missing something in our theories which isn't modified gravity.That is, saying "Given that everything we see is consistent with Gravity being correct, we conclude that there is not enough baryonic matter to account for what we see," doesn't prove the existence of large amounts of non-baryonic matter. Instead, the evidence provides strong indication that either A) there is something we can't see that has some propert... (read more)
If there's something wrong with some theory, isn't it quite odd that looking around at different parts of the universe seems to produce such a striking level of agreement on how much missing mass there is? If there was some out-of-left-field thing, I'd expect it to have confusing manifestations in many different areas and astronomers angsting about dramatically inconsistent measurements, I would not expect the CMB to end up explained away (and the error bars on those measurements are really really small) by the same 5:1 mix of non-baryonic matter vs baryon... (read more)
"Worth having" is a separate argument about relative value of new information. It is reasonable when markets exist or we are competing in other ways where we can exploit our relative advantage. But there's a different mistake that is possible which I want to note.
Most extreme beliefs are false; for every correct belief, there are many, many extreme beliefs that are false. Strong consensus on some belief is (evidence for the existence of) strong evidence of the truth of that belief, at least among the considered alternatives. So picking a belief on the basi... (read more)
I think we agree on this - iterated closing is an interesting idea, but I'm not sure it solves a problem. It doesn't help with ambiguity, since we can't find bounds. And earlier payouts are nice, but by the time we can do partial payouts, they are either tiny, because of large ranges, or they are not much before closing. (They also create nasty problems with incentive compatibility, which I'm unsure can be worked out cleanly.)
"partial resolution seems like it would be useful"I hadn't thought of this originally, but Nuno added the category of "Resolve with a Probability," which does this. The idea of iterated closing of a question as the bounds improve is neat, but probably technically challenging. (GJ Inc. kind-of does this when they close answer options that are already certain to be wrong, such as total ranges below the current number of CVOID cases.) I'd also worry it creates complexity that makes it much less clear to forecasters how things will work."one helpful mechanism ... (read more)
Not sure that you'd get reactions from large subunits if they fold differently than the full spike - but my biochemistry/immunology isn't enough to be sure about how this would work.
"Aside from the test result, we do have one more small piece of information to update on: I was quite congested for 1-2 days after the most recent three doses (and I was generally not congested the rest of the week). That's exactly what we'd expect to see if the vaccine is working as intended, and it's pretty strong evidence that it's doing something."
Agree that this is evidence it is doing something, but my strong prior is that the adjuvant alone (chitosan) would cause this to happen. I'm also unclear about why you chose the weekly schedule, or... (read more)
I agree that posting the results was the correct thing to do, and appreciate that John is trying to figure out if this is useful - but I actually claim the post is an example of how rationality is hard, and even pursuing it can be misleading if you aren't very, very careful.In The Twelve Virtues of Rationality, this post gets virtue points for the first (curiosity, for looking into whether it works,) third (lightness, being willing to update marginally on evidence,) fourth (evenness, updating even when the evidence isn't in the direction desired,) sixth (e... (read more)
You need to see if the spike peptide included corresponds to the antibody being tested for - and given how many targets there are, I would be surprised if it did.Despite holding a far lower prior on efficacy, I'm agreeing with Christian - this evidence shouldn't be a reason to update anywhere nearly as strongly as you did against effectiveness.
Mostly vague "accidents and harmful unknown unknowns aren't that unlikely here" - because we have data on baseline success at "not have harmful side effects," and it is low. We also know that lots of important side effects are unusual, so the expected loss can be high even after a number of "successes," and this is doubly true because no-one is actually tracking side effects. We don't know much about efficacy either, but again, on base rates it is somewhat low. (Base rates for mRNA are less clear, and may be far higher - but these sequences are unfiltered,... (read more)
Sorry, this is clearly much more confrontational than I intended.
First, I apologize. I really didn't intend for the tone to be attacking, and I am sorry that was how it sounded. I certainly wasn't intentionally "suggesting [you were] somehow trying to hide or deny" any of the issues. I thought it was worth noting that the initial characterization was plausibly misleading, given that the sole indicator of being a "nice middle class area" seemed to be percentage of people with PhDs. Your defense was that it was no more than 3x the number of PhDs, but that doesn't mean top 1/3, a point which you later agreed to. And after ... (read more)
Wait, the claim was never that everyone is well off - of course we expect there to be a distribution. But if a sizeable portion of the children at the school largely have very high-socioeconomic-status parents, even if it's only 10% of the parents, that should be compared to a median of plausibly less than 1% of parents in the set of schools overall, it would be incorrect to infer that the way the school is run can be usefully compared to the "average" school.
Great post. My only comment is that I think you're confused in section iv when you say, "but the origin of the universe is essentially an infinity of inferential steps away given the sheer scale of the issue," and think that you're misunderstanding some tricky and subtle points about epistemology of science and what inferential steps would be needed. So people might be right when they say you meant "We can't make any meaningful factual claims about the origin of the universe. We are too limited to understand an event like this." - but the object level... (read more)
That's fair - thanks for checking, and I'd agree that that would better match "very nice middle-class area" than my assertion. (In the US, the top 2-3% is usually considered upper class, while the next 15-20% are upper middle class, and the next ~25% are "lower middle class." This income level definitely puts your neighborhood in the middle of the upper middle class.)
I'd agree with most of your models, and agree that there is divergence at the extremes of a distribution - but that's at the very extremes, and usually doesn't lead to strong anti-correlation even in the extreme tails. But I think we're better off being more concrete. I don't know where you live, but I suspect that your postal code is around the 90% income percentile, after housing costs - a prediction which you can check easily. And that implies that the tails for income and education are still pretty well correlated at only the 97th percentile for e... (read more)
Even given your numbers, I think it's very likely that you're underestimating how privileged the group is. Most things like educational status are pareto-distributed; 80% of PhDs are in 20% of areas. While that assumption may be unfair, if it were correct, the point with 3x the average is in the 97th percentile.And yes, you're near Cambridge, which explains the concentration of PhDs, and makes it seem less elite compared to Cambridge itself, but doesn't change the class of the people compared to the country as a whole.
Note that only around 3% of UK residents have PhDs - so I strongly suspect that what you're calling "middle-class" is closer to the top 5% of the population, or what sociologists would say is the very upper part of the upper middle class.
Yes, it's super important to update frequently when the scores are computed as time-weighted. And for Mataculus, that's a useful thing, since viewers want to know what the current best guess is, but it's not the only way to do scoring. But saying frequent updating makes you better at forecasting isn't actually a fact about how accurate the individual forecasts are - it's a fact about how they are scored.
"Immunity" and "efficacy" seem like they should refer to the same thing, but they really don't. And if you talk to people at the FDA, or CDC, they should, and probably would, talk about efficacy, not immunity, when talking about these vaccines.And I understand that the technical terms and usage aren't the same as what people understand, and I was trying to point out that for technical usage, the terms don't quite mean the things you were assuming. And yes, the vaccines have not been proven to provide immunizing protection - which again, is diffe... (read more)
There was a lesswrong post about this a while back that I can't find right now, and I wrote a twitter thread on a related topic. I'm not involved with the reasoning behind the structure for GJP or Metaculus, so for both it's an outside perspective. However, I was recently told there is a significant amount of ongoing internal metaculus discussion about the scoring rule, which, I think, isn't nearly as bad as it seemed. (But even if there is a better solution, changing the rule now would have really weird impacts on motivation of current users, which is cri... (read more)
Having a meetup on this seems interesting. Will PM people.
If the user is interested in getting into the top ranks, this strategy won't be anything like enough. And if not, but they want to maximize their score, the scoring system is still incentive compatible - they are better off reporting their true estimate on any given question. And for the worst (but still self-aware) predictors, this should be the metaculus prediction anyways - so they can still come away with a positive number of points, but not many. Anything much worse than that, yes, people could have negative overall scores - which, if they've predicted on a decent number of questions, is pretty strong evidence that they really suck at forecasting.
Not really. Overall usefulness is really about something like covariance with the overall prediction - are you contributing different ideas and models. That would be very hard to measure, while making the points incentive compatible is not nearly as hard to do.And how well an individual predictor will do, based on historical evidence, is found in comparing their brier to the metaculus prediction on the same set of questions. This is information which users can see on their own page. But it's not a useful figure unless you're asking about relative performance, which as an outsider interpreting predictions, you shouldn't care about - because you want the aggregated prediction.
I agree that actually offering money would require incentives to avoid, essentially, sybil attacks. But making sure people don't make "noise predictions" isn't a useful goal - those noise predictions don't really affect the overall metaculus prediction much, since it weights past accuracy.
As someone who is involved in both Metaculus and the Good judgement project, I think it's worth noting that Zvi's criticism of Metaculus - that points are given just for participating, so that making a community average guess gets you points - applies to Good Judgement Inc's predictions by superforecasters in almost exactly the same way - the superforecasters are paid for a combination of participation and their performance, so that guessing the forecast median earns them money. (GJI does have a payment system for superforecasters which is more complex than this, and I probably am not allowed to talk about - but the central point remains true.)