PSA: Please list your references, don't just link them

In what became 5th most-read new post on LessWrong in 2012, Morendil told us about a study widely cited in its field... except that source cited, which isn't online and is really difficult to get, makes a different claim — and turns out to not even be the original research, but a PowerPoint presentation given ten years after the original study was published!

Fortunately, the original study turns out to be freely available online, for all to read; Morendil's post has a link. The post also tells us the author and the year of publication. But that's all: Morendil didn't provide a list of references; he showed how the presentation is usually cited, but didn't give a full citation for the original study.

The link is broken now. The Wayback machine doesn't have a copy. The address doesn't give hints about the study's title. I haven't been able to find anything on Google Scholar with author, year, and likely keywords.

I rest my case.

43 comments, sorted by
magical algorithm
Highlighting new comments since Today at 8:49 AM
Select new highlight date

I only take citations as weak evidence until I've reviewed them. Too many people dabbling in scientism these days with the internet making it easy to link to a few articles whose abstracts support your point. Oh look another nutrition article based on rat studies and elderly stroke victims. Fun.

You don't think an article's abstract is significant Bayesian evidence? (How about the abstract of a meta-analysis?) Which is the weaker link here: from blog post to abstract or from abstract to actual paper?

Too many people dabbling in scientism these days with the internet making it easy to link to a few articles whose abstracts support your point.

Can't have those unwashed masses linking to scientific papers now can we? :)

abstracts of meta analyses are significantly better. The problem with normal papers is that the abstract doesn't always specify the methodology, effect size, and clinical relevance.

Plenty of people complain about my long lists of references in earlier posts. :(

Maybe the best is to put the references in a comment that is linked from the end of the post.

My current solution, besides paranoid archiving for Internet links, is to exploit tool tips: so the full title & authors exists in the page, but entirely unobtrusive to readers of either the article or comments. It seems to be working out so far.

I couldn't find this quickly: What's the Markdown and HTML for adding tooltips?

In Markdown, it goes like [displayed text](hyperlink "tooltip alt text"); in HTML I think it's an additional argument to <a> or <href> which goes title="tooltip alt text", so for my example above:

  • Markdown:

    [paranoid archiving for Internet links](http://www.gwern.net/Archiving%20URLs "'Archiving URLs', gwern 2013")

  • HTML:

    <a href="http://www.gwern.net/Archiving%20URLs" rel="nofollow" title="&#39;Archiving URLs&#39;, gwern 2013">paranoid archiving for Internet links</a>

I like this solution a whole lot. I'm late for work so I won't search for the specific occasion but I seem to recall I've suggested stealing it for LW.

They are when I'm reading on a desktop or a notebook, but they're a pain in the ass to scroll down beyond when I'm reading on a smaller device such as a smartphone.

Then don't do that.

Seriously: I strongly suggest that our articles should be optimised for checkability and in-depth reading, not convenience on the bus.

Actually, the Wayback Machine might have a copy, but even if it did, you couldn't get it: http://findarticles.com/robots.txt now specifies User-agent: * Disallow: / which is a big FU to the Internet Archive and also the Google cache.

By the way, based solely on the information in the article, I was able to find the citation and the actual full original publication in under 2 minutes. Can you guess how?

Ah, thanks. (Here, page 57 and following; the article is "Dissecting software failures", published in the Hewlett-Packard journal, April 1989.) I did forget to try that, but it's rather a piece of luck that Morendil's article contains that.

So how did you find it this time? I'm always curious about this phenomenon where person A goes "I can't do it!", person B says "there is a solution", and person A then goes "ah!"

This incident actually looks a bit like the Shannon anecdote I quote in http://www.gwern.net/on-really-trying

BTW, if you hate Scribd as much as I do, once you know what issue of the HP journal it's in, you can easily find the official HP archives and download the PDF at http://www.hpl.hp.com/hpjournal/pdfs/IssuePDFs/1989-04.pdf (Scribd shows up as the main hit just because they did OCR on their copy of the PDF, but where did their uploader get it, one should wonder upon seeing it.)

I'm always curious about this phenomenon where person A goes "I can't do it!", person B says "there is a solution", and person A then goes "ah!"

Yes, I noticed this back when I was doing math competitions: it was often much easier for me to find a solution to a problem if someone told me that they had found a solution, especially if they had found it quickly. The obvious corollary is that you should first approach problems as if you knew someone who had found a solution quickly, but I never successfully internalized this.

My favorite variation of this was when one of our developers asked me to review a design she was contemplating for fixing a defect.

So she went through it in some detail, and I worked through some edge cases, and finally said "Yeah, this looks OK to me. You should go talk to Mark about the tax allocation bit over here, though, because he understands the tax code better than I do and he may notice stuff I won't. For example, he'd probably notice that this will fail in cases where thus-and-such is true.... um... which I, er, wouldn't notice."

And she looked at me a little confused, and I said "So, there's a problem with this design in cases where thus-and-such is true. We should modify the design" and we kept going as if that particular brain failure hadn't been narrated out loud.

My guess is I do this all the time, but I remember that incident because I was vocalizing my thoughts.

I have also been told to use this as a problem-solving technique (namely pretending you are a different person and seeing what they would notice), but I am not very good at this either. I tried to run a simulation of MoR!Quirrell in my head, but my head is not a sufficiently interesting place for him to be at the moment, so I think he left.

  • chuckle *
    I've done some playing around with this and have come to the tentative conclusion, backed up by no evidence, that the key thing isn't really pretending to be someone else, but rather relaxing the constraints that I keep around "me". That is, it's not so much creating a "what would Mark think?" simulation as it is temporarily purging my "what kinds of things does Dave not think?" filters.
    Which is to say, it's basically a question of maximizing creativity.

So you think these sorts of incidents are just another form of rubber-ducking?

Mm... in a sufficiently broad sense, yes, but in detail, not really.

I would say that rubber-ducking (by which I assume you mean the exercise of explaining a complex technical concept, like the flow of control through code, to an inanimate object before submitting it to group review) is primarily a technique for attentional control; it forces me to actually think through a problem rather than simply telling myself that i have thought through the problem.

I think what goes on in these sorts of incidents is somewhat different, though related in many ways.

Basically, I think I've got a set of "the sorts of things Dave thinks" filters that run in my head, and there are some useful thoughts that my brain is capable of generating that tend to get excluded from my conscious awareness by those filters (because they "aren't the sort of thing Dave would think"), and sometimes it can be useful to subvert or reconfigure those filters.

And role-playing of this sort ("What would I say if I were Mark?") is one way to reconfigure those filters.

And role-playing of this sort ("What would I say if I were Mark?") is one way to reconfigure those filters.

So is that what is going on in this search and that Shannon example? But that seems a little weird, why would Benja have a 'gwern filter' in his head which says 'the article has a direct quote from G89, gwern would try searching a direct quote, so I should too'?

WRT the shannon example... well, yes and no.

I suppose something similar is going on: Shannon has been invited to step out of the frame that he's in and step into a new one, where he is identifying with his brother, who knows something important about how to get to a solution to the puzzle from where Shannon is now, and that reframing helps encourage creativity. But also, and significantly, Shannon's brother has given him a new datum: there is a discrete thing-to-be-told which would significantly help. (This is, admittedly, implicit. But if I don't assume it, the story makes no sense to me.)

So no, I don't think it's the only thing going on, or necessarily the most important thing.

And I disagree with "you can always give it to yourself," actually. Or, rather, with the implicit statement that doing so is necessarily useful. For some puzzles his brother might have instead said "Huh. You probably want to rethink your whole approach." Which is also a hint I can always give myself, but it's a different hint that leads me in different directions.

There's probably a huge number of hints like that I can give myself for any given problem, but picking them at random is perhaps not the best problem solving strategy.

Still, if I'm stuck, trying a few is better than nothing.

WRT the Benja search... I suspect that was more of a case of trying harder by virtue of being motivated by the knowledge that success is possible/likely, and to some extent breaking out of transient mental sets.

But even if it were a case of temporarily reconfiguring more persistent unhelpful filters like I describe, it wouldn't follow that Benja has a "gwern filter", merely that Benja, like gwern, has some learned techniques for finding stuff on Google, which includes 'search for direct quotes' along with a million other things, and that the default Benja filter for whatever reason excludes that technique when it searches for techniques to suggest for this kind of problem, and the role-playing exercise encourages disabling the default Benja filter, making that technique easier to access. The "gwernyness" of that reconfiguration, much like the "markiness" in my example, is rather tangential; the importance of being gwern, in this hypothetical, would be that it entails not being Benja.

But also, and significantly, Shannon's brother has given him a new datum: there is a discrete thing-to-be-told which would significantly help. (This is, admittedly, implicit. But if I don't assume it, the story makes no sense to me.)

But he's solving a puzzle, there's always a thing-to-be-told!

(shrug) Indeed. More generally, he's a bounded agent, there's always a thing-to-be-told, which may or may not have anything to do with solving jigsaw puzzles.

For example, "there's a piece that fits with another piece somewhere in this puzzle box" is certainly a thing to be told, and is always true of non-pathological jigsaw puzzles. And "There are no sharks on Mars" is also a thing his brother could have told him.

But, yes, if the Shannons didn't have an implicit shared context that strongly suggested that there was a less generic thing-to-be-told in his brother's mind than those examples, then most of what I said about the Shannon example is simply false.

What you said out loud wasn't wrong. There are likely cases which are much like the one that you did find, except that you would not be able to find them.

True, though it's less clear that Mark would probably notice them.
Still, that's probably true as well.

The other question is whether it's helpful to quickly look for obvious answers when there isn't one. The information content of "there is a solution" is actually not only one bit (yes vs no), because the fact that that person told it to you means that they solved it quickly using techniques that they already know about. This usually helps you because you either share much of their knowledge, or have an idea of what things they are knowledgeable about. The correct advice in some other cases might have been "you need to learn something else completely new before you'll get it" or "just stop trying because this problem is really of no value and has no easy answer".

Gurer'f n qverpg dhbgr sebz gur negvpyr va Zberaqvy'f cbfg. (Rot'ed in case someone else feels like trying themselves.)

Nygreangviryl, frnepuvat sbe gur hey jvgubhg /ct_2/ lvryqf gur shyy pvgngvba.

Yes, that's how I did it too. V nffhzr V pbhyq nyfb unir tbar sebz gur erfrnepure'f fheanzr naq gur lrne gb gur negvpyr gvgyr naq sbhaq vg gung jnl nyfb, ohg V unira'g gevrq fvapr gur dhbgr zrgubq jbexrq vgf hfhny frnepu zntvp.

Person B saying "there is a solution" provides person A with useful information.

Little details, such as the speed at which another person finds the solution (and the fact that they found it at all) gives clues as to what type of problem it is - divergent or convergent thinking, overall hardness, etc.

The fact that a specific person x was able to find the solution narrows the space to "things that person x would be good at solving".

Finally, the resources which another person put into finding the solution provide a rough upper bound to how many resources the seeker will have to devote to find it for himself, reducing the risk involved in the investment.

All of these effects are social in nature, which means that it is not unlikely that we humans have in-built mechanisms to use this information without being able to consciously articulate what exactly the information we have gained is.

That someone found the solution cannot be relevant in cases where it's known that there is a solution, where this effect seems to still apply. I don't see how one could extract anything about divergent or convergent thinking, since you don't know how they solved it or usually how long they took; if you knew how long it took and you knew whether they tended towards convergent thinking, then you could infer whether you should focus harder on convergent or divergent thinking, but if you know neither...?

I think my explanation of my thoughts is lacking, let me give a specific example of what I mean.

Imagine a teacher with a penchant for pointless questions ask non-mathematics students the following question:

"What is 6+7+8+9+...+347"?

Most of the students in the classroom will begin dutifully adding the numbers up. Some of them won't even bother - they've estimated the time it will take and it isn't worth the effort to solve such uninteresting busywork.

Of course, someone will take about five seconds to shout out that they have an answer.

Now the other students know that there is a way to solve the problem that doesn't involve investing a large amount of time. They'll get out of "let's tediously add all the numbers" mode and go into "let's find a quick shortcut to solving this" mode.

Everyone knew a solution existed, but they didn't imagine it would be the quick, clever sort of solution until someone actually solved it quickly. The fact that someone found the answer without investing large amounts of time and resources into the problem gave them vital information about the best method for finding the answer.

One could also appeal to the story about Gauss as a child adding up 1..100 by a clever trick, and none of his classmates figuring it out despite clearly seeing that Gauss must've done something clever.

But notice how your example does not fit my points: "since you don't know how they solved it or usually how long they took"; in this case, you have a very good estimate of how long it will take them to use the O(n) summation algorithm from all your past sums, and since you were all assigned the problem at the same time, you also know precisely how long it took them.

In the Shannon anecdote, you know nothing about how long it took the brother to answer it nor, given how heterogenous puzzles can be, how long it might take him to solve it, nor is there even any 'brute force' approach for most puzzles which you could compare against a 'clever' approach and so choose to look for a clever approach rather than spend more time executing the brute force approach.

Similarly for web searching, there's typically no brute force approach at all: if Google spits out a list of 10 hits total for the paper title and you look at all 10 and they fail, then what? What's the dumb brute force approach in searching? You simply have to try another 'clever' approach, because you've exhausted all your available data.

Sorry, you're right, I didn't read your previous post carefully enough.

I agree that if this phenomenon is real, in order to explain it in terms of a rational agent you do need to either know something about the person who solved it, or how long they took, or some other detail about them in order for this to be helpful in any way.

In the real world, however, a declaration of having solved the problem always leaves some sort of knowledge. In the web search case that just unfolded in this thread, by posting a solution you leaked the information that a solution existed and that it didn't take an unreasonable amount of time to figure out, which provided Benja additional incentive to start looking for a clever approach.

I'll agree that it does seem like there is more than simple information gain going on here though. Perhaps there are other factors, such as the insertion of an element of competition?

I'll agree that it does seem like there is more than simple information gain going on here though. Perhaps there are other factors, such as the insertion of an element of competition?

Certainly seems possible. I admit I tend to announce the time it took to find something that someone failed to as part of showing off and elevating myself, so it would be no surprise if the recipient felt shamed and inflamed into looking better - the difference between peak and average performance might explain the differential.

Thanks for the heads-up!

Yes to your overall point: link rot is a nasty problem; one that will increasingly mess with things like scientific citation.

Now for the nitpicks. G89 wasn't even the "original" study, just the earliest source I could find that discussed those "results".

What I wanted was to show the quote in question - to make it available to the reader of my post so they could check that I had my facts right. For that purpose the link is what I really needed, not "merely" a citation; and it sucks that the link went dead, but that wasn't under my control.

I have updated the post with another link (the last extant copy of this content; we can hope the link remains longer than the previous one, but I'm under no illusion that it will). I have also added the title of the original article and the publication.

BTW, I don't know what "PSA" means?

It is kind of superfluous in the title, I wish it was removed. Besides being Americentric it actually made me want to read a thoughtful suggestion less.

Point taken on "original", and thanks for updating the article! Gwern has also found a link on the HP homepage.

What I wanted was to show the quote in question - to make it available to the reader of my post so they could check that I had my facts right. For that purpose the link is what I really needed, not "merely" a citation; and it sucks that the link went dead, but that wasn't under my control.

I'm not saying you shouldn't have given the link -- I'm saying that if you had also given the citation, then even after the link broke, it would have been slightly more inconvenient but not difficult for me to look it up! That's the main point of also giving the citation: to make the source available to the reader of your post even if the link rots.