Context: My experience is primarily with psychology papers (heuristics & biases, social psych, and similar areas), and it seems to generalize pretty well to other social science research and fields with similar sorts of methods.
One way to think about this is to break it into three main questions:
1. Is this "result" just noise? Or would it replicate?
2. (If there's something besides noise) Is there anything interesting going on here? Or are all the "effects" just confounds, statistical artifacts, demonstrating the obvious, etc.
3. (If there is something interesting going on here) What is going on here? What's the main takeaway? What can we learn from this? Does it support the claim that some people are tempted to use it to support?
There is some benefit just to explicitly considering all three questions, and keeping them separate.
For #1 ("Is this just noise?") people apparently do a pretty good job of predicting which studies will replicate. Relevant factors include:
1a. How strong is the empirical result (tiny p value, large sample size, precise estimate of effect size, etc.).
1b. How plausible is this effect on priors? Including: How big an effect size would you expect on priors? And: How definitively does the researchers' theory predict this particular empirical result?
1c. Experimenter degrees of freedom / garden of forking paths / possibility of p-hacking. Preregistration is best, visible signs of p-hacking are worst.
1d. How filtered is this evidence? How much publication bias?
1e. How much do I trust the researchers about things like (c) and (d)?
I've found that this post on how to think about whether a replication study "failed" also seems to have helped clarify my thinking about whether a study is likely to replicate.
If there are many studies of essentially the same phenomenon, then try to find the methodologically strongest few and focus mainly on those. (Rather than picking one study at random and dismissing the whole area of research if that study is bad, or assuming that just because there are lots of studies they must add up to solid evidence.)
If you care about effect size, it's also worth keeping in mind that the things which turn noise into "statistically significant results" also tend to inflate effect sizes.
For #2 ("Is there anything interesting going on here?"), understanding methodology & statistics is pretty central. Partly that's background knowledge & expertise that you keep building up over the years, partly that's taking the time & effort to sort out what's going on in this study (if you care about this study and can't sort it out quickly), sometimes you can find other writings which comment on the methodology of this study which can help a lot. You can try googling for criticisms of this particular study or line of research (or check google scholar for papers that have cited it), or google for criticisms of specific methods they used. It is often easier to recognize when someone makes a good argument than to come up with that argument yourself.
One framing that helps me think about a study's methodology (and whether or not there's anything interesting going on here) is to try to flesh out "null hypothesis world": in the world where nothing interesting is going on, what would I expect to see come out of this experimental process? Sometimes I'll come up with more than one world that feels like a null hypothesis world. Exercise: try that with this study (Egan, Santos, Bloom 2007). Another exercise: Try that with the hot hand effect.
#3 ("What is going on here?") is the biggest/broadest question of the three. It's the one that I spend the most time on (at least if the study is any good), and it's the one that I could most easily write a whole bunch about (making lots of points and elaborating on them). But it's also the one that is the most distant from Eli's original question, and I don't want to turn those post into a big huge essay, so I'll just highlight a few things here.
A big part of the challenge is thinking for yourself about what's going on and not being too anchored on how things are described by the authors (or the press release or the person who told you about the study). Some moves here:
3a. Imagine (using your inner sim) being a participant in the study, such that you can picture what each part of the study was like. In particular, be sure that you understand every experimental manipulation and measurement in concrete terms (okay, so then they filled out this questionnaire which asked if you agree with statements like such-and-such and blah-blah-blah).
3b. Be sure you can clearly state the pattern of results of the main finding, in a concrete way which is not laden with the authors' theory (e.g. not "this group was depleted" but "this group gave up on the puzzles sooner"). You need this plus 3a to understand what happened in the study, then from there you're trying to draw inferences about what the study implies.
3c. Come up with (one or several) possible models/theories about what could be happening in this study. Especially look for ones that seem commonsensical / that are based in how you'd inner sim yourself or other people in the experimental scenario. It's fine if you have a model that doesn't make a crisp prediction, or if you have a theory that seems a lot like the authors' theory (but without their jargon). Exercise: try that with a typical willpower depletion study.
3d. Have in mind the key takeaway of the study (e.g., the one sentence summary that you would tell a friend; this is the thing that's the main reason why you're interested in reading the study). Poke at that sentence to see if you understand what each piece of it means. As you're looking at the study, see if that key takeaway actually holds up. e.g., Does the main pattern of results match this takeaway or do they not quite match up? Does the study distinguish the various models that you've come up with well enough to strongly support this main takeaway? Can you edit the takeaway claim to make it more precise / to more clearly reflect what happened in the study / to make the specifics of the study unsurprising to someone who heard the takeaway? What sort of research would it take to provide really strong support for that takeaway, and how does the study at hand compare to that?
3e. Look for concrete points of reference outside of this study which resemble the sort of thing the researchers are talking about. Search in particular for ones that seem out-of-sync with this study. e.g., This study says not to tell other people your goals, but the other day I told Alex about something I wanted to do and that seemed useful; do the specifics of this experiment change my sense of whether that conversation with Alex was a good idea?
Some narrower points which don't neatly fit into my 3-category breakdown:
A. If you care about effect sizes then consider doing a Fermi estimate, or otherwise translating the effect size into numbers that are intuitively meaningful to you. Also think about the range of possible effect sizes rather than just the point estimate, and remember that the issues with noise in #1 also inflate effect size.
B. If the paper finds a null effect and claims that it's meaningful (e.g., that the intervention didn't help) then you do care about effect sizes. (e.g., If it claims the intervention failed because it had no effect on mortality rates, then you might assume a value of $10M per life and try to calculate a 95% confidence interval on the value of the intervention based solely on its effect on mortality.)
C. New papers that claim to debunk an old finding are often right when they claim that the old finding has issues with #1 (it didn't replicate) or #2 (it had methodological flaws) but are rarely actually debunkings if they claim that the old finding has issues with #3 (it misdescribes what's really going on). The new study on #3 might be important and cause you to change your thinking in some ways, but it's generally an incremental update rather than a debunking. Examples that look to me like successful debunkings: behavioral social priming research (#1), the Dennis-dentist effect (#2), the hot hand fallacy (#2 and some of B), the Stanford Prison Experiment (closest to #2), various other things that didn't replicate (#1). Examples of alleged "debunkings" which seem like interesting but overhyped incremental research: the bystander effect (#3), loss aversion (this study) (#3), the endowment effect (#3).
Here's an answer for condensed matter physics:
Step 1: Read the title, journal name, author list, and affiliations.
By reading papers in a field, talking to people in the field, and generally keeping track of the field as a social enterprise, you should be able to place papers in a context even before reading them. People absolutely have reputations, and that should inform your priors. You should also have an understanding of what the typical research methods are to answer a certain question - check either the title or the abstract to make sure that the methods used match the problem.
Actually, you know what?
Step 0: Spend years reading papers and keeping track of people to develop an understanding of trust and reputation as various results either pan our or don't. Read a few textbooks to understand the physical basis of the commonly-used experimental and theoretical techniques, then check that understanding by reading more papers and keeping track of what kind of data quality is the standard in the field, how techniques are best applied, and which techniques and methods of analysis provide the most reliable results.
For example, by combining steps 0 and 1, you can understand that certain experimental techniques might be more difficult and easier to fool yourself with, but might be the best method available for answering some specific question. If you see a paper applying this technique to this sort of question, this actually should increase your confidence in the paper relative to the base rate for this technique, because it shows that the authors are exercising good judgment. Next...
Step 2: Read the abstract and look at the figures.
This is good for understanding the paper too, not just evaluating trustworthiness. Look for data quality (remember that you learned how to judge the data quality of the most common techniques in step 0) and whether they've presented it in a way that clearly backs up the core claims of the abstract, or presents the information you're trying to learn from the paper. Data that is merely suggestive of the authors' claims is actually a red flag, because remember, everyone just presents the nicest figure they can. Responsible scientists reduce their claims when the evidence is weak.
Step 3: Read the paper.
If you have specific parts you know you care about, you can usually just read those in detail and skim the rest. But if you really care about assessing this particular paper, check the procedures and compare it to your knowledge of how this sort of work should go. If there are specific parts that you want to check yourself, and you can do so, do so. This is also useful so you can...
Step 4: Compare it to similar papers.
You should have background knowledge, but it's also useful to keep similar papers (both in terms of what methods they used, and what problem they studied) directly on hand if you want to check something. If you know a paper that did a similar thing, use that to check their methods. Find some papers on the same problem and cross-check how they present the details of the problem and the plausibility of various answers, to get a feel for the consensus. Speaking of consensus, if there are two similar papers from way in the past that you found via Google Scholar and one of them has 10x the citations of the other, take that into account. When you notice confusing statements, you can check those similar papers to see how they handled it. But once you're really getting into the details, you'll have to...
Step 5: Follow up citations for things you don't understand or want to check.
If someone is using a confusing method or explanation, there should be a nearby citation. If not, that's a red flag. Find the citation and check whether it supports the claim in the original paper (recursing if necessary). Accept that this will require lots of work and thinking, but hey, at least this feeds back into step 0 so you don't have to do it as much next time.
Step 6: Ask a friend.
There are smart people out there. Hopefully you know some, so that if something seems surprising and difficult to understand, you can ask them what they think about it.
This seems great for figuring out the consensus in a field, but not for identifying when the consensus is wrong.