Here is a thing I wrote 10 years ago assessing an N-back study. That's a easy-for-me-to-remember example, where I also remember that the writeup comes pretty close to reflecting how I was thinking through things as I was looking at the paper.
This maybe isn't quite the same as what you're asking for, but I recommend reading any past Slate Star Codex post with "Much more than you wanted to know" in its name. They differ in how technical they get, but they're generally about analyzing the literature in a field of research and trying to figure out which studies are more meaningful, what they mean, and how it all adds up.
You need to learn what to avoid: http://shape-of-code.coding-guidelines.com/2016/06/10/finding-the-gold-nugget-papers-in-software-engineering-research/
Andrew Gelman's blog has lots of what you are after: https://statmodeling.stat.columbia.edu/
I'm having trouble understanding how people assess scientific papers as I'm a layman in reading them and statistics overall.
My general (and probably too simplistic) heuristics are:
"Is the p value significant?"
"Is there a large sample size? Or small?"
"Is the population studied generalizable?"
When I read the comments in the post I hyperlinked, it doesn't seem to click for me. I need more detailed examples. Perhaps someone taking a deep dive into one specific study and picking it apart, from the introduction to methods to discussion and ending at the conclusion. Mentioning what they did right, what they did wrong, etc.
Does anyone know of any I could reference? I'm specifically reading positive psychology papers if that helps.