Thanks for pointing this out.
Let's use Beeminder as an example. When I emailed Daniel he said this: "we've talked with the CFAR founders in the past about setting up RCTs for measuring the effectiveness of beeminder itself and would love to have that see the light of day".
Which is a little open ended, so I'm going to arbitrarily decide that we'll study Beeminder for weight loss effectiveness.
Story* as follows:
Daniel goes to (our thing).com and registers a new study. He agrees to the terms, and tells us that this is a study which can impact health -- meaning that mandatory safety questions will be required. Once the trial is registered it is viewable publicly as "initiated".
He then takes whatever steps we decide on to locate participants. Those participants are randomly assigned to two groups: (1) act normal, and (2) use Beeminder to track exercise and food intake. Every day the participants are sent a text message with a URL where they can log that day's data. They do so.
After two weeks, the study completes and both Daniel and the world are greeted with the results. Daniel can now update Beeminder.com to say that Beeminder users lost XY pounds more than the control group... and when a rationalist sees such claims they can actually believe them.
Those participants are randomly assigned to two groups: (1) act normal, and (2) use Beeminder to track exercise and food intake.
These kind of studies suffer from the Hawthorne effect. It is better to assign the control group to do virtually anything instead of nothing. In this case I'd suggest to have them simply monitor their exercise and food intake without any magical line and/or punishment.
Thanks for the example. It leads me to questions:
7 - Can sponsors do a private mini-trial to test its trial design before going full bore (presumably, with their promise not to publicize the results)?
This is an awesome idea. I had not considered this until you posted it. This sounds great.
To work well, I think it needs a good name. In terms of long term social dynamics, creating a meta-brand that helps smaller brands seems essential. Like when people initially see the "tested by X" logo they won't know what it means.
Assuming the web app works as intended, and assuming any significant fraction the population just stop believing any of the classes of claims that might be tested this way and lack the logo, then the process should gain more and more credibility over the course of months and years. The transition from an unknown l...
In the Reproducibility Initiative PloS, and a few partner came together to improve the quality of science.
I would suggest all the people listed as advisors in the Reproducibility Initiative whether there are interested in your project. PloS would be a good trusted third-party with an existing brand.
the cost of doing these less flexible studies will approach the cost of the raw product to be tested. For most web companies, that's $0.
Nope. The cost of doing less flexible studies will be the cost of losing that flexibility. For companies which expect a particular result from a study this cost can be considerable.
One issue that seems more likely to be problematic when the web application is being created and launched than later on, is whether the questions are well designed. There's a whole area of expertise that goes into creating scales that are reliable, valid, and discriminative. One possibility is to construct them from scratch from first principles, and then make them publicly available, but another possibility is to find the best of what exists already that is open sourced.
For general biotics and meal squares it seems like some measure of "not having ...
...Here are some initial features we should consider:
- Data will be collected by a webapp controlled by a trusted third party, and will only be editable by study participants.
- The results will be computed by software decided on before the data is collected.
- Studies will be published regardless of positive or negative results.
- Studies will have mandatory general-purpose safety questions. (web-only products likely exempt)
- Follow up studies will be mandatory for continued use of results in advertisements.
- All software/contracts/questions used will be open sourc
Hi David,
This is a worthwhile initiative. All the very best to you.
I would advise that this data be maintained on a blockchain like data structure. It will be highly redundant and very difficult to corrupt, which I think is one of the primary concerns here.
Okay, sorry I've been away from the thread for a while. I spent the last half day hacking together a rough version of the data collection webapp. This seemed reasonable because I haven't heard any disagreement on that part of the project, and I hope that having some working code will excite us :)
The models are quite good and well tested at this point, but the interface is still a proof of concept. I'll have some more time tomorrow evening, which will hopefully be enough time to finish off the question rendering and SMS sending. I think with those two featu...
Could you please link to examples of the kind of marketing studies that you are talking about? I'd especially like to see examples of those that you consider good vs. those you consider bad.
This is an interesting project!
An obvious relevant model is Gwern's self experimentation on himself (http://www.gwern.net/Nootropics)
The key difference being, of course, that you are interested in group differences.
A key important step will be offering power calculations so that they sample size can be estimated prior to performing the test. (Also, so that post-hoc, you can understand how big an effect your study should have been able to detect.)
There are already some web apps that perform this, however. How will your app improve over those, or will you...
This is not at all self-evident to me. How, for example, would you demonstrate product safety (for a diverse variety of products) via a standard template?
Templates not template. I think if you know roughly which bodily systems a product is likely to effect, the questions are not so diverse.
My background is not in question selection (it's ML and webapp programming), but here goes some general question ideas for edible products:
The mandatory questions are intended to give LessWrong / everyone a say in what startups will test their products for -- NOT to provide a 100% guarantee of general safety (the FDA already handles that). We should use these questions to learn about unanticipated side effects.
Research costs money and requires competent people. If it were possible to do meaningful research on the cheap just by reusing the same template, don't you think it would be a very popular opinion already?
I'm hope it will do something akin to what Google Translate did for translation: lower the cost for modest use cases. If you want a high quality translation (poetry) you still need to hire a good translator. However, if you are willing to accept a reasonably good level of translation quality, it's now free.
I agree it's weird that somebody else hasn't noticed. testifiable.com is the closest I've found. I've already spoken with Testifiable founder's and invited them to this thread.
I'm hope it will do something akin to what Google Translate did for translation
There is a critical difference: Google Translate does not guarantee the quality of results and, in fact, often generates something close to garbage. It may produce a "reasonably good level of translation quality" or it may not and that's fine because it made zero promises about its capabilities.
You are planning to set yourself up as a standard of research which means you must generate better than adequate results every single time.
P.S. Oh, and a random thought. What do you think 4Chan will do with your "webapp"? X-)
I'm a LW reader, two time CFAR alumnus, and rationalist entrepreneur.
Today I want to talk about something insidious: marketing studies.
Until recently I considered studies of this nature merely unfortunate, funny even. However, my recent experiences have caused me to realize the situation is much more serious than this. Product studies are the public's most frequent interaction with science. By tolerating (or worse, expecting) shitty science in commerce, we are undermining the public's perception of science as a whole.
The good news is this appears fixable. I think we can change how startups perform their studies immediately, and use that success to progressively expand.
Product studies have three features that break the assumptions of traditional science: (1) few if any follow up studies will be performed, (2) the scientists are in a position of moral hazard, and (3) the corporation seeking the study is in a position of moral hazard (for example, the filing cabinet bias becomes more of a "filing cabinet exploit" if you have low morals and the budget to perform 20 studies).
I believe we can address points 1 and 2 directly, and overcome point 3 by appealing to greed.
Here's what I'm proposing: we create a webapp that acts as a high quality (though less flexible) alternative to a Contract Research Organization. Since it's a webapp, the cost of doing these less flexible studies will approach the cost of the raw product to be tested. For most web companies, that's $0.
If we spend the time to design the standard protocols well, it's quite plausible any studies done using this webapp will be in the top 1% in terms of scientific rigor.
With the cost low, and the quality high, such a system might become the startup equivalent of citation needed. Once we have a significant number of startups using the system, and as we add support for more experiment types, we will hopefully attract progressively larger corporations.
Is anyone interested in helping? I will personally write the webapp and pay for the security audit if we can reach quorum on the initial protocols.
Companies who have expressed interested in using such a system if we build it:
(I sent out my inquiries at 10pm yesterday, and every one of these companies got back to me by 3am. I don't believe "startups love this idea" is an overstatement.)
So the question is: how do we do this right?
Here are some initial features we should consider:
Any placebos used in the studies must be available for purchase as long as the results are used in advertising, allowing for trivial study replication.
Significant contributors will receive:
I'm hoping that if a system like this catches on, we can get an "effective startups" movement going :)
So how do we do this right?