See previously “A good volunteer is hard to find”

Back in February 2012, lukeprog announced that SIAI was hiring more part-time remote researchers, and you could apply just by demonstrating your chops on a simple test: review the psychology literature on habit formation with an eye towards practical application. What factors strengthen new habits? How long do they take to harden? And so on. I was assigned to read through and rate the submissions and Luke could then look at them individually to decide who to hire. We didn’t get as many submissions as we were hoping for, so in April Luke posted again, this time with a quicker easier application form. (I don’t know how that has been working out.)

But in February, I remembered the linked post above from GiveWell where they mentioned many would-be volunteers did not even finish the test task. I did, and I didn’t find it that bad, and actually a kind of interesting exercise in critical thinking & being careful. People suggested that perhaps the attrition was due not to low volunteer quality, but to the feeling that they were not appreciated and were doing useless makework. (The same reason so many kids hate school…) But how to test this?

Simple! Tell people that their work was not useless and that even if they were not hired, their work would be used! And we could do Science by randomizing what people got the encouraging statement. The added paragraph looked like this:

The primary purpose of this project is to evaluate applicants on their ability to do the kind of work we need, but we’ll collate all the results into one good article on the subject, so even if we don’t hire you, you don’t have to feel your time was wasted.

Well, all the reviews have been read & graded as of yesterday, with submissions trickling in over months; I think everyone who was going to submit has done so, and it’s now time for the final step. So many people failed to send in any submission (only ~18 of ~40) that it’s relatively easy to analyze - there’s just not that much data!

So, the first question is, did people who got the extra paragraph do a better job of writing their review, as expressed in my rating it from 2-10?

Surprisingly, they did seem to - despite my expectation that any result would be noise as the sample is so small. If we code getting no paragraph as 0 and getting a paragraph as 1, and add the two scores to get 2-10, and strip out all personal info, you get this CSV. Load it up in R:

> mydata <- read.table("2012-feb-researcher-scores.csv", header=TRUE, sep=",") > t.test(Good~Extra, data=mydata) Welch Two Sample t-test data: Good by Extra t = -2.448, df = 14.911, p-value = 0.02723 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -4.028141 -0.277415 sample estimates: mean in group 0 mean in group 1 4.625000 6.777778

The result is not hugely robust: if you set the last score to 10 rather than 6, for example, the p-value falls to just 0.16. The effect size looks interesting though:

.... mean in group 0 mean in group 1 5.125000 6.777778 > sd(mydata$Good, TRUE) [1] 2.318405 > (6.7 - 5.125) / 2.32 [1] 0.6788793

0.67 isn’t bad.

The next question to me is, did the paragraph influence whether people would send in a submission at all? Re-editing the CSV, we load it up and analyze again:

> mydata <- read.table("2012-feb-researcher-completion.csv", header=TRUE, sep=",") > t.test(Received~Extra, data=mydata) Welch Two Sample t-test data: Received by Extra t = 0.1445, df = 36.877, p-value = 0.8859 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.3085296 0.3558981 sample estimates: mean in group 0 mean in group 1 0.4736842 0.4500000

Nope. It’s somewhat robust since we can use everyone who applied; I have to flip like 6 values before the p-value goes down to 0.07.

So, lessons learned? It’s probably a good idea to include such a paragraph since it’s so cheap and apparently isn’t at the expense of submissions in general.

New Comment
20 comments, sorted by Click to highlight new comments since:

It’s probably a good idea to include such a paragraph since it’s so cheap and apparently isn’t at the expense of submissions in general.

Well, include the paragraph if it's true.

I think I was one of the ones who "failed" to send in a sample. I was told I'd be contacted at some point. I was not (as far as I can tell) contacted at some point. Maybe this happened with other people?

As far as I am aware the hiring process is severely broken. It would be good to get some more public information and a fix.

Interesting experiment.

As someone who participated in the application process--Thank goodness I got the prompt that INCLUDED the extra paragraph! I would not have spent as long on it at all, had I not thought that it was actually being used for something.

My thought process when I was working on it went something like: "Well, it's a lot of "free" work, but I'm on their volunteer list anyway, and would happily do it, if they sent it out that way." I was definitely focusing on the "doing something useful for SI, whether I get the job or not," rather than "put in a whole bunch of effort on a random task the off-chance of getting a job."

If I had NOT thought that the work was being used for a post or somesuch, then I would have put in maybe a quarter of the time on it, at most, and almost certainly NOT gotten the job.

...Now I feel sort of bad for the people who didn't get the extra paragraph...

So, lessons learned? It’s probably a good idea to include such a paragraph since it’s so cheap and apparently isn’t at the expense of submissions in general.

Now that you've written this post up though, don't you risk people wondering "are they actually going to do that, or are they just saying that to encourage me?"

Yes, but this could be said of any experiment ever which involves deception.

And besides, typically research ethics calls for deceived participants to be debriefed. Either the meta-paper will be written in which case no deception is involved and there was no reason not to post this writeup; or it won't, in which case deception was involved and this writeup is ethically mandated. So either way, this writeup was worth doing.

this writeup is ethically mandated

Do you mean that this post to Less Wrong is the debriefing? I don't think that posting to a blog, even one that the experimental subjects often read, is enough. What did the ethics committee say? :-)


I asked the committee in my head, and they said 'We're sure they read Discussion as much as you do. So good job - as a reward, you can go order yourself some mead!' I feel sorry for all the other guys who don't have as understanding an ethics committee as I do.

So... we're not going to see an article built out of all the submissions?

I thought there was going to be, but I'm not involved in using the hired researchers; you'd have to ask Luke.

Ah. After a few weeks i just assumed that the response rate or overall quality was too low to be usable.

I'd be willing to share mine, if people were interested in the subject. It's long though.

I'm interested.

PDF for people who like those.

I don't think this is a good article title, reference to previous article notwithstanding. Good researchers are not difficult to find -- there is a glut of smart people who want to do research. It's just that such people want to do their own thing, and their time is very expensive. How much is SIAI really willing to shell out?

So there's both a glut and they're also very expensive?

They are expensive if you want them to do things other than what they want to do. Academics often consult, and they are not cheap.

I know Scott Aaronson sometimes communicates with this community. I have seen him at a meetup in Boston once. He's a world class researcher.

Let's just call a spade a spade -- what is hard is finding good people willing to do the intellectual work you want them to do for the meager price you are willing to pay. What you are doing is equivalent to folks advertising 15 dollar an hour programming jobs on college campuses and then wondering, after they find somebody, why the resulting code is terrible.

We conduct the first natural field experiment to explore the relationship between the "meaningfulness" of a task and worker effort. We employed about 2,500 workers from Amazon's Mechanical Turk (MTurk), an online labor market, to label medical images. Although given an identical task, we experimentally manipulated how the task was framed. Subjects in the meaningful treatment were told that they were labeling tumor cells in order to assist medical researchers, subjects in the zero-context condition (the control group) were not told the purpose of the task, and, in stark contrast, subjects in the shredded treatment were not given context and were additionally told that their work would be discarded. We found that when a task was framed more meaningfully, workers were more likely to participate. We also found that the meaningful treatment increased the quantity of output (with an insignificant change in quality) while the shredded treatment decreased the quality of output (with no change in quantity). We believe these results will generalize to other short-term labor markets. Our study also discusses MTurk as an exciting platform for running natural field experiments in economics.

I intended to complete the assignment, but then my (very young) cat died (very unexpectedly). I was depressed all the way through the deadline and was suffering from massive akrasia. Since then, I have filled the spare time I would have used for SIAI research with other projects (including a start-up). Oh well.