This seems correct to me, but I don't immediately see the importance/relevancy of this? At any rate the escape is speculative at this point.
There have also been a dozen or so instances when new variants dominated some country that subsequently fizzled out
I completely failed to notice this, whoops. Do you have some more information on this?
The Dutch festival actually was a 2-day event with a total capacity of 10,000 people per day. But it is reasonable to assume that some amount of people attend the first and then the second day, so the total number of participants is lower than 20,000 and correspondingly the rate of infection is unknown but somewhere between 5% and 10%.
Just wanted to confirm you have accurately described my thoughts, and I feel I have a better understanding of your position as well now.
I agree with your reading of my points 1,2,4 and 5 but think we are not seeing eye to eye on points 3 and 6. It also saddens me that you condensed the paragraph on how I would like to view the how-much-should-we-trust-science landscape to its least important sentence (point 4), at least from my point of view.
As for point 3, I do not want to make a general point about the reliability of science at all. I want to discuss what tools we have to evaluate the accuracy of any particular paper or claim, so that we can have more appropriate confidence across the board. I think this is the most important discussion regardless of whether it increases or decreases general confidence. In my opinion, attempting to give a 0th-order summary by discussing the average change in confidence from this approach is doing more harm than good. The sentence "You just want to make the general point that you can't trust everything you read, with the background understanding that sometimes this is more important, and sometimes less." is exactly backwards from what I am trying to say.
For point 6, I think it might be very relevant to point out that I'm European, and the anti-vax and global warming denialism really is not that popular around where I live. They are more considered stereotypes of being untrustworthy than properly held beliefs, thankfully. But ignoring that, I think that most of the people influencing social policy and making important decisions are leaning heavily on science, and unfortunately particularly on the types of science I have the lowest confidence in. I was hoping to avoid going into great detail on this, but as short summary I think it is reasonable to be less concerned with the accuracy of papers that have low (societal) impact and more concerned with papers that have high impact. If you randomly sample a published paper on Google Scholar or whatever I'll happily agree that you are likely to find an accurate piece of research. But this is not an accurate representation of how people encounter scientific studies in reality. I see people break the fourth virtue all the way from coffeehouse discussions to national policy debates, which is so effective precisely because the link between data and conclusion is murky. So a lot of policy proposals can be backed by some amount of references. Over the past few years my attempts to be more even have led me to strongly decrease my confidence in a large number of scientific studies, if only to account for the selection effect that these, and not others, were brought to my attention.
Also I think psychology and nutrition are doing a lot better than they were a decade or two ago, which I consider a great sign. But that's more of an aside than a real point.
I've upvoted you for the clear presentation. Most of the points you state are beliefs I held several years ago, and sounded perfectly reasonable to me. However, over time the track record of this view worsened and worsened, to the point where I now disagree not so much on the object level as with the assumption that this view is valuable to have. I hope you'll bear with me as I try to give explaining this a shot.
I think the first, major point of disagreement is that the target audience of a paper like this is the "level 1" readers. To me it seems like the target audience consists of scientists and science fans, most of whom already have a lot of faith in the accuracy of the scientific process. It is completely true that showing this piece to someone who has managed to work their way into an unreasonable belief can make it harder to escape that particular trap, but unfortunately that doesn't make it wrong. That's the valley of bad rationality and all that. In fact, I think that strongly supports my main original claim - there are so many ways of using sophisticated arguments to get to a wrong conclusion, and only one way to accurately tally up the evidence, that it takes skill and dedication to get to the right answer consistently.
I'm sorry to hear about your friend, and by all means try to keep them away from posts like this. If I understand correctly, you are roughly saying "Science is difficult and not always accurate, but posts like this overshoot on the skepticism. There is some value in trusting published peer-reviewed science over the alternatives, and this view is heavily underrepresented in this community. We need to acknowledge this to dodge the most critical of errors, and only then look for more nuanced views on when to place exactly how much faith in the statements researchers make." I hope I'm not misrepresenting your view here, this is a statement I used to believe sincerely. And I still think that science has great value, and published research is the most accurate source of information out there. But I no longer believe that this "level 2 view", extrapolating (always dangerous :P) from your naming scheme, is a productive viewpoint. I think the nuance that I would like to introduce is absolutely essential, and that conflating different fields of research or even research questions within a field under this umbrella does more harm than good. In other words, I would like to discuss the accuracy of modern science with the understanding that this may apply to smaller or larger degree to any particular paper, exactly proportional to the hypothetical universe-separating ability of the data I introduced earlier. I'm not sure if I should spell that out in great detail every couple of sentences to communicate that I am not blanket arguing against science, but rather comparing science-as-practiced with truthfinding-in-theory and looking for similarities and differences on a paper-by-paper basis.
Most critically, I think the image of 'overshooting' or 'undershooting' trust in papers in particular or science in general is damaging to the discussion. Evaluating the accuracy of inferences is a multi-faceted problem. In some sense, I feel like you are pointing out that if we are walking in a how-much-should-I-trust-science landscape, to a lot of people the message "it's really not all it's cracked up to be" would be moving further away from the ideal point. And I agree. But simultaneously, I do not know of a way to get close (not "help the average person get a bit closer", but get really close) to the ideal point without diving into this nuance. I would really like to discuss in detail what methods we have for evaluating the hard work of scientists to the best of our ability. And if some of that, taken out of context, forms an argument in the arsenal of people determined to metaphorically shoot their own foot off that is a tragedy but I would still like to have the discussion.
As an example, in your quote block I love the first paragraph but think the other 4 are somewhere between irrelevant and misleading. Yes, this discussion will not be a panacea to the replication crisis, and yes, without prior experience comparing crackpots to good sources you may well go astray on many issues. Despite all that, I would still really like to discuss how to evaluate modern science. And personally I believe that we are collectively giving it more credit than it deserves, which is spread in complicated ways between individual claims, research topics and entire fields of science.
That is very interesting, mostly because I do exactly think that people are putting too much faith in textbook science. I'm also a little bit uncomfortable with the suggested classification.
I have high confidence in claims that I think are at low risk of being falsified soon, not because it is settled science but because this sentence is a tautology. The causality runs the other way: if our confidence in the claim is high, we provisionally accept it as knowledge.
By contrast, I am worried about the social process of claims moving from unsettled to settled science. In my personal opinion there is an abundance of overconfidence in what we would call "settled science". The majority of the claims therein are likely to be correct and hold up under scrutiny, but the bar is still lower than I would prefer.
But maybe I'm way off the mark here, or maybe we are splitting hairs and describing the same situation from a different angle. There is lots of good science out there, and you need overwhelming evidence to justify questioning a standard textbook. But there is also plenty of junk that makes it all the way into lecture halls, never mind all the previous hoops it had to pass through to get there. I am very worried about the statistical power of our scientific institutes in separating truth from fiction, and I don't think the settled/unsettled distinction helps address this.
It seems to me that we should be really careful before extrapolating from the specific datasets, methods, and subfields these researchers are investigating into others. In particular, I'd like to see some care put into forecasting and selecting research topics that are likely or unlikely to stand up to a multiteam analysis.
I think this is good advice, but only when taken literally. In my opinion there is more than sufficient evidence to suggest that the choices made by researchers (pick any of the descriptions you cited) have a significant impact on the conclusions of papers across a wide variety of fields. Indeed, I think this should be the default assumption until proven otherwise. I'd motivate this primarily by the argument that there are many different ways to draw a wrong conclusion (especially under uncertainty), but only one right way to weigh up all the evidence. Put differently, I think undue influence of arbitrary decisions is the default, and it is only through hard work and collective scientific standards that we stand a chance of avoiding this.
I've seen calls to improve all the things that are broken right now: <list>
I think this is a flaw in and of itself. There are many, many ways to go wrong, and the entire standard list (p-hacking, selective reporting, multiple stopping criteria, you name it) should be interpreted more as symptoms than as causes of a scientific crisis.
The crux of the whole scientific approach is that you empirically separate hypothetical universes. You do this by making your universe-hypotheses spit out predictions, and then verify them. It seems to me that by and large this process is ignored or even completely absent when we start asking difficult soft science questions. And to clarify: I don't particularly blame any researcher, or institute, or publishing agency or peer doing some reviewing. I think that the task at hand is so inhumanly difficult that collectively we are not up to it, and instead we create some semblance of science and call it a day.
From a distanced perspective, I would like my entire scientific process to look like reverse-engineering a big black box labeled 'universe'. It has input buttons and output channels. Our paradigm postulate correlations between input settings and outputs, and then an individual hypothesis makes a claim about the input settings. We track forward what outputs would be caused by any possible input setting, observe the reality, and update with Bayesian odds ratios.
The problem is frequently that the data we are relying on is influenced by an absolutely gargantuan number of factors - as an example in the OP, the teenage pregnancy rate. I have no trouble believing that statewide schooling laws have some impact on this, but possibly so do for example above-average summer weather, people's religious background, the ratio of boys to girls in a community, economic (in)stability, recent natural disasters and many more factors. So having observed the teenage pregnancy rates, inferring the impact of the statewide schooling laws is a nigh impossible task. Even just trying to put this into words my mind immediately translated this to "what fraction of the state-by-state variance in teenage pregnancy rates can be attributed to this factor, and what fraction to other factors" but even this is already an oversimplification - why are we comparing states at a fixed time, instead of tracking states over time, or even taking each state-time snapshot as an individual dataset? And why is a linear correlation model accurate, who says we can split the multi-factor model into additive components (implied by the fractions)?
The point I am failing to make is that in this case it is not at all clear what difference in the pregnancy rates we would observe if the statewide schooling laws had a decidedly negative, small negative, small positive or decidedly positive impact, as opposed to one or several of the other factors dominating the observed effects. And without that causal connection we can never infer the impact of these laws from the observed data. This is not a matter of p-hacking or biased science or anything of the sort - the approach doesn't have the (information theoretic) power to discern the answer we are looking for in the first place, i.e. to single out the true hypothesis from between the false ones.
As for your pragmatic question, how can we tell if a study is to be trusted? I'd recommend asking experts in your field first, and only listening to cynics second. If you insist on asking, my method is to evaluate whether or not it seems plausible to me that, assuming that the conclusion of the paper holds, this would show up as the announced effect observed in the paper. Simultaneously I try to think of several other explanations for the same data. If either of these tries gives some resounding result I tend to chuck the study in the bin. This approach is fraught with confirmation bias ("it seems implausible to me because my view of the world suggests you shouldn't be able to measure an effect like this"), but I don't have a better model of the world to consult than my model of the world.
Thank you for the wonderful links, I had no idea that (meta)research like this was being conducted. Of course it doesn't do to draw conclusion from just one or two papers like that, we would need a bunch more to be sure that we really need a bunch more before we can accept the conclusion.
Jokes aside, I think there is a big unwarranted leap in the final part of your post. You correctly state that just because the outcome of research seems to not replicate we should not assume evil intent (subconscious or no) on the part of the authors. I agree, but also frankly I don't care. The version of Science Nihilism you present almost seems like strawman Nihilism: "Science does not replicate therefore everything a Scientist says is just their own bias". I think a far more interesting statement would be "The fact that multiple well-meaning scientists get diametrically opposed results using the same data and techniques, which are well-accepted in the field, shows that the current standards in Science are insufficient to draw the type of conclusions we want."
Or, from a more information-theoretic point of view, our process of honest effort by scientists followed by peer review and publication is not a sufficiently sharp tool to assign numbers to the questions we're asking, and a large part of the variance in the published results is indicative of numerous small choices by the researchers instead of indicative of patterns in the data. Whether or not scientists are evil shills with social agendas (hint: they're mostly not) is somewhat irrelevant if the methods used won't separate truth from fiction. To me that's proper Science Nihilism, none of this 'intent' or 'bias' stuff.
In a similar vein I wonder if the page count of the robustness check is really an indication of a solution to this problem. The alternative seems bleak (well, you did call it Nihilism) but maybe we should allow for the possibility that the entire scientific process as commonly practiced is insufficiently powerful to answer these research questions (for example, maybe the questions are ill-posed). To put it differently, to answer a research question we need to separate the hypothetical universes where it has one answer from the hypothetical universes where it has a different answer, and then observe data to decide which universe we happen to be in. In many papers this link between the separation and the data to be observed is so strenuous that I would be surprised if the outcome was determined by anything but the arbitrary choices of the researchers.