Founder of www.rationally.io
Already many good answers, but I want to reinforce some and add others.
1. Beware of multiplicity - does the experiment include a large number of hypotheses, explicitly or implicitly? Implicit hypotheses include "Does the intervention have an effect on subjects with attributes A, B or C?" (subgroups) and "Does the intervention have an effect that is shown by measuring X, Y or Z?" (multiple endpoints). If multiple hypotheses were tested, were the results for each diligently reported? Note that multiplicity can be sneaky and you're often looking for what was left unsaid, such as a lack of plausible mechanism for the reported effect.
For example, take the experimental result "Male subjects who regularly consume Vitamin B in a non-multi-vitamin form have a greater risk of developing lung cancer (irrespective of dose)." Did they *intentionally* hypothesize that vitamin B would increase the likelihood cancer, but only if 1) it was not consumed as part of a multi vitamin and 2) in a manner that was not dose-dependent? Unlikely! The real conclusion of this study should have been "Vitamin B consumption does not appear correlated to lung cancer risk. Some specific subgroups did appear to have a heightened risk, but this may be statistical anomaly."
2. Beware of small effect sizes and look for clinical significance - does the reported effect sound like something that matters? Consider the endpoint (e.g. change in symptoms of depression, as measured by the Hamilton Depression Rating Scale) and the effect size (e.g. d = 0.3, which is generally interpreted as a small effect). As a depressive person, I don't really care about a drug that has a small effect size.* I don't care if the effect is real but small or not real at all, because I'm not going to bother with that intervention. The "should I care" question cuts through a lot of the bullshit, binary thinking and the difficulty in interpreting small effect sizes (given their noisiness).
3. Beware of large effect sizes - lots of underpowered studies + publication bias = lots of inflated effect sizes reported. Andrew Gelman's "Type M" (magnitude) errors are a good way to look at this - an estimate of the how inflated the effect size is likely to be. However, this isn't too helpful unless you're ready to bust out R when reading research. Alternately, a good rule of thumb is to be skeptical of 1) large effect sizes reported from small N studies and 2) confidence intervals wide enough to drive a trunk through.
4. Beware of low prior odds - is this finding in a highly exploratory field of research, and itself rather extraordinary? IMO this is an under-considered conclusion of Ioannidis' famous "Why Most Published Research Findings are False" paper. This Shinyapp nicely illustrates "positive predictive value" (PPV), which takes into account bias & prior odds.
5. Consider study design - obviously look for placebo control, randomization, blinding etc. But also look for repeated measures designs, e.g. "crossover" designs. Crossover designs achieve far higher power with fewer participants. If you're eyeballing study power, keep this in mind.
6. Avoid inconsistent skepticism - for one, don't be too skeptical of research just because of its funding source. All researchers are biased. It's small potatoes $$-wise compared to a Pfizer, but postdoc Bob's career/identity is on the line if he doesn't publish. Pfizer may have $3 billion on the line for their Phase III clinical trial, but if Bob can't make a name for himself, he's lost a decade of his life and his career prospects. Then take Professor Susan who built her career on Effect X being real - what were those last 30 years for, if Effect X was just anomaly?
Instead, look at 1) the quality of the study design, 2) the quality and transparency of the reporting (including COI disclosures, preregistrations, the detail and organization in said preregistrations, etc).
7. Learn to love meta-analysis - Where possible, look at meta-analyses rather than individual studies. But beware: meta-analyses can suffer their own design flaws, leading to some people saying "lies, damn lies and meta-analysis." Cochrane is the gold standard. If they have a meta-analysis for the question at hand, you're in luck. Also, check out the GRADE criteria - a pragmatic framework for evaluating the quality of research used by Cochrane and others.
*unless there is high heterogeneity in the effect amongst a subgroup with whom I share attributes, which is why subgrouping is both hazardous and yet still important.