Self-Blinded Caffeine RCT

niplav

I test 200mg caffeine in an n=1, m=50 self-blinded RCT. The outcomes are encouraging.

	Log-score of predictions of substance	Absorption effect size d (λ, p, σ increase)	Mindfulness effect size d (λ, p, σ increase)	Productivity effect size d (λ, p, σ increase)	Creativity effect size d (λ, p, σ increase)	Happiness effect size d (λ, p, σ increase)	Contentment effect size d (λ, p, σ increase)	Relaxation effect size d (λ, p, σ increase)	Horniness effect size d (λ, p, σ increase)
200mg Caffeine (n=1, m=40)	-0.6	0.61 (λ=13.3, p=0.00017, -0.072)	0.58 (λ=11.8, p=0.0007, 0.021)	0.58 (λ=28.9, p=1.3^-12, 0.11)	0.38 (λ=32.9, p=5.2^-15, 0.09)	0.27 (λ=10.6, p=0.002, 0.3)	0.13 (λ=7.66, p=0.02, 0.47)	-0.11 (λ=5, p=0.15, 0.42)	-0.14 (λ=1.9, p=0.64, 0.11)

I am especially interested in testing many different substances for their effect on meditation, while avoiding negative side effects. The benefits from high meditational attainments valuable to me, and seem especially likely to benefit from chemical intervention, since the Algernon argument likely doesn't apply: Meditative attainments might've not led to a fitness advantage (even, by opportunity cost, to a fitness disadvantage), and so were likely selected against, but most of us don't care that much about inclusive genetic fitness and more about psychological well-being. Evolutionary dynamics favor being like Dschingis Khan (dozens to hundreds of offspring) over Siddharta Gautama (one son), but I'd rather attain sotāpanna than pillage and murder.

And meditative attainments are costly: they take tens to hundreds to thousands of hours to reach, which would make simple psychopharmacological interventions worthwhile. I also don't buy that they miss the point of meditation—most people already struggle enough, so some help doesn't make it a cakewalk; "reach heaven through fraud". One must be careful not to fall into the trap of taking substances that feel good but lessen sensory clarity (which I believe was the original intent behind the fifth precept, and so I'll exclude e.g. opiates from the substances to test.

Variables tracked (see more here):

Meditation: 45 minutes of ānāpānasati, started 0-60 minutes after taking the dose, tracking two variables.
- Mindfulness: How aware I was of what was going on in my head, modulo my ability to influence it.
- Absorption (often called concentration): How "still" my mind was, how easily I was swept away by my thoughts.
Flashcard performance: Did my daily flashcards for ~20 minutes, started 0-60 minutes after finishing meditation.
Arm Prediction: I tried to predict whether the substance I'd taken was placebo or caffeine.
Mood: Tracking 4 different variables at random points during the day, namely
- Happiness/Sadness
- Contentment/Discontentment
- Relaxation/Stress
- Horniness/Chastity: Chastity being simply the opposite of horniness in this case.
Productivity and creativity, recorded at the end of the day.

The total cost of the experiment is at least 21.5€:

Time: The Clearer Thinking tool for the value of my time returns 15€/hour, which gives a time cost of 18.75€ for preparing the experiment.
- Time for filling: 35 minutes
- Time for preparing envelopes: 40 minutes
Cost of caffeine pills:
Cost of empty capsules: $\frac{0.03 €}{capsule} \cdot 25 capsules = 0.75 €$
Cost of sugar: Negligible.

200mg caffeine pills, placebo pills filled with sugar, of each 25. Put each pill with a corresponding piece of paper ("C" for caffeine, "P" for placebo) into an unlabeled envelope. Used seq 1 50 | shuf to number the envelopes, and sorted them accordingly.

Notes on the experiment:

3rd dose: Out of fear that the placebo pills have some sugar stuck outside of them, which could de-blind the dose, I take a bit (~10 g) of sugar with each pill.
7th dose: Increase time between consumption and starting to meditate to ~45 minutes, after finding out that the onset of action is 45 minutes-1 hour.
14th dose: Noticed that during meditation, sharpness/clarity of attention is ~high, and relaxing after becoming mindful is easy, but attention strays just as easily.
49th dose: Took the pill, meditated, lay down during meditation and fell asleep. Likely_10% placebo.

Statistical Method

In general, I'll be working with the likelihood ratio test. For this, let $v_{P}$ be the distribution of values of a variable for the placebo arm, and $v_{C}$ the distribution of values for a variable of the caffeine arm. (I apologise for the $C$ being ambiguous, since it could also refer to the control arm).

Then let $θ_{0} = (μ_{0}, σ_{0}) = M L E_{N} (v_{P})$ be the Gaussian maximum likelihood estimator for our placebo values, and $θ = (μ, σ) = M L E_{N} (v_{C})$ be the MLE for our caffeine values.

Then the likelihood ratio statistic $λ$ is defined as $λ = 2 log \frac{L_{C} (θ)}{L_{C} (θ_{0})}$ where $L_{C} (θ)$ is the likelihood the caffeine distribution assigns to the parameters $θ$ . This test is useful here because we fix all values of $θ_{0}$ . See Wasserman 2003 ch. 10.6 for more.

If $λ \approx 0$ , then the MLE for the placebo arm is very close to the MLE for the caffeine arm, the distributions are similar. If $λ > 0$ , then the MLE for the placebo arm is quite different from the caffeine arm (though there is no statement about which has higher values). $λ < 0$ is not possible, since that would mean that the MLE of the placebo distribution has a higher likelihood for the caffeine data than the MLE of the caffeine distribution itself—not very likely.

Note that I'm not a statistician, this is my first serious statistical analysis, so please correct me if I'm making some important mistakes. Sorry.

Predictions on the Outcomes of the Experiment

After collecting the data, but before analysing it, I want to make some predictions about the outcome of the experiment, similar to another attempt here.

Moved here.

Analysis

We start by setting everything up and loading the data.

import math
import numpy as np
import pandas as pd
import scipy.stats as scistat

substances=pd.read_csv('../..//data/substances.csv')

meditations=pd.read_csv('../../data/meditations.csv')
meditations['meditation_start']=pd.to_datetime(meditations['meditation_start'], unit='ms', utc=True)
meditations['meditation_end']=pd.to_datetime(meditations['meditation_end'], unit='ms', utc=True)

creativity=pd.read_csv('../../data/creativity.csv')
creativity['datetime']=pd.to_datetime(creativity['datetime'], utc=True)

productivity=pd.read_csv('../../data/productivity.csv')
productivity['datetime']=pd.to_datetime(productivity['datetime'], utc=True)

expa=substances.loc[substances['experiment']=='A'].copy()
expa['datetime']=pd.to_datetime(expa['datetime'], utc=True)

The mood data is a bit special, since it doesn't have timezone info, but that is easily remedied.

mood=pd.read_csv('../../data/mood.csv')
alarms=pd.to_datetime(pd.Series(mood['alarm']), format='mixed')
mood['alarm']=pd.DatetimeIndex(alarms.dt.tz_localize('CET', ambiguous='infer')).tz_convert(tz='UTC')
dates=pd.to_datetime(pd.Series(mood['date']), format='mixed')
mood['date']=pd.DatetimeIndex(dates.dt.tz_localize('CET', ambiguous='infer')).tz_convert(tz='UTC')

Summary Statistics

We can first test how well my predictions fared:

probs=np.array(expa['prediction'])
substances=np.array(expa['substance'])
outcomes=np.array([0 if i=='sugar' else 1 for i in substances])

drumroll

>>> np.mean(list(map(lambda x: math.log(x[0]) if x[1]==1 else math.log(1-x[0]), zip(probs, outcomes))))
-0.5991670759554912

At least this time I was better than chance:

>>> np.mean(list(map(lambda x: math.log(x[0]) if x[1]==1 else math.log(1-x[0]), zip([0.5]*40, outcomes))))
-0.6931471805599453

Meditation

Merging the meditations closest (on the right) to the consumption and selecting the individual variables of interest:

meditations.sort_values("meditation_start", inplace=True)
meditations_a=pd.merge_asof(expa, meditations, left_on='datetime', right_on='meditation_start', direction='forward')
caffeine_concentration=meditations_a.loc[meditations_a['substance']=='caffeine']['concentration_rating']
placebo_concentration=meditations_a.loc[meditations_a['substance']=='sugar']['concentration_rating']
caffeine_mindfulness=meditations_a.loc[meditations_a['substance']=='caffeine']['mindfulness_rating']
placebo_mindfulness=meditations_a.loc[meditations_a['substance']=='sugar']['mindfulness_rating']

So, does it help?

>>> (caffeine_concentration.mean()-placebo_concentration.mean())/meditations['concentration_rating'].std()
0.6119357868347828
>>> (caffeine_mindfulness.mean()-placebo_mindfulness.mean())/meditations['mindfulness_rating'].std()
0.575981762563846

Indeed! Cohen's d here looks pretty good. Taking caffeine also reduces the variance of both variables:

>>> caffeine_concentration.std()-placebo_concentration.std()
-0.0720877290884765
>>> caffeine_mindfulness.std()-placebo_mindfulness.std()
0.02186797288826836

Productivity and Creativity

We repeat the same procedure for the productivity and creativity data:

prod_a=pd.merge_asof(expa, productivity, left_on='datetime', right_on='datetime', direction='forward')
creat_a=pd.merge_asof(expa, creativity, left_on='datetime', right_on='datetime', direction='forward')
caffeine_productivity=prod_a.loc[meditations_a['substance']=='caffeine']['productivity']
placebo_productivity=prod_a.loc[meditations_a['substance']=='sugar']['productivity']
caffeine_creativity=creat_a.loc[meditations_a['substance']=='caffeine']['creativity']
placebo_creativity=creat_a.loc[meditations_a['substance']=='sugar']['creativity']

And the result is…

>>> (caffeine_productivity.mean()-placebo_productivity.mean())/prod_a['productivity'].std()
0.5784143673702401
>>> (caffeine_creativity.mean()-placebo_creativity.mean())/creat_a['creativity'].std()
0.38432393552829164

Again surprisingly good! The creativity values are small enough to be a fluke, but the productivity values seem cool.

In this case, though, caffeine increases variance in the variables (not by very much):

>>> caffeine_productivity.std()-placebo_productivity.std()
0.1139221931098384
>>> caffeine_creativity.std()-placebo_creativity.std()
0.08619686235791152

Mood

Some unimportant pre-processing, in which we filter for mood recordings 0-10 hours after caffeine intake, since pd.merge_asof doesn't do cartesian product:

mood_a=expa.join(mood, how='cross')
mood_a=mood_a.loc[(mood_a['alarm']-mood_a['datetime']<pd.Timedelta('10h'))&(mood_a['alarm']-mood_a['datetime']>pd.Timedelta('0h'))]
caffeine_mood=mood_a.loc[mood_a['substance']=='caffeine']
placebo_mood=mood_a.loc[mood_a['substance']=='sugar']

And now the analysis:

>>> caffeine_mood[['happy', 'content', 'relaxed', 'horny']].describe()
           happy    content    relaxed      horny
count  88.000000  88.000000  88.000000  88.000000
mean   52.193182  51.227273  50.704545  46.568182
std     2.396635   2.911441   3.115254   3.117601
[…]
>>> placebo_mood[['happy', 'content', 'relaxed', 'horny']].describe()
           happy    content    relaxed      horny
count  73.000000  73.000000  73.000000  73.000000
mean   51.575342  50.876712  51.041096  47.000000
std     2.101043   2.437811   2.699992   3.009245
[…]

Which leads to d of ~0.27 for happiness, ~0.13 for contentment, ~-0.11 for relaxation and ~-0.14 for horniness.

Likelihood Ratios

We assume (at first) that the data is distributed normally. Then we can define a function for the gaussian likelihood of a distribution given some parameters:

def normal_likelihood(data, mu, std):
    return np.product(scistat.norm.pdf(data, loc=mu, scale=std))

And now we can compute the likelihood ratio $\frac{L θ}{L θ_{0}}$ for the null hypothesis $θ_{0} = MLE (v_{P})$ for the placebo data $v_{P}$ , and also the result of the likelihood ratio test:

def placebo_likelihood(active, placebo):
    placebo_mle_lh=normal_likelihood(active, placebo.mean(), placebo.std())
    active_mle_lh=normal_likelihood(active, active.mean(), active.std())
    return active_mle_lh/placebo_mle_lh

def likelihood_ratio_test(lr):
    return 2*np.log(lr)

And this gives us surprisingly large values:

>>> placebo_likelihood_ratio(caffeine_concentration, placebo_concentration)
776.6147119766716
>>> likelihood_ratio_test(placebo_likelihood_ratio(caffeine_concentration, placebo_concentration))
13.309888722406932
>> placebo_likelihood_ratio(caffeine_mindfulness, placebo_mindfulness)
363.3984201164464
>>> likelihood_ratio_test(placebo_likelihood_ratio(caffeine_mindfulness, placebo_mindfulness))
11.790999616893938
>>> placebo_likelihood_ratio(caffeine_productivity, placebo_productivity)
1884090.6347491818
>>> likelihood_ratio_test(placebo_likelihood_ratio(caffeine_productivity, placebo_productivity))
28.8979116811553
>>> placebo_likelihood_ratio(caffeine_creativity, placebo_creativity)
14009015.173307568
>>> likelihood_ratio_test(placebo_likelihood_ratio(caffeine_creativity, placebo_creativity))
32.910423242578126

And, if one is interested in p-values, those correspond to (with 2 degrees of freedom each):

def llrt_pval(lmbda, df=2):
    return scistat.chi2.cdf(df, lmbda)

>>> llrt_pval([13.309888722406932,11.790999616893938, 28.8979116811553, 32.910423242578126])
array([1.66559304e-04, 7.23739116e-04 ,1.34836408e-12, 5.17222209e-15])

I find these results surprisingly strong, and am still kind of mystified why. Surely caffeine isn't that reliable!

And, the same, for mood:

>>> placebo_likelihood_ratio(caffeine_mood['happy'], placebo_mood['happy'])
204.81283712162838
>>> likelihood_ratio_test(placebo_likelihood_ratio(caffeine_mood['happy'], placebo_mood['happy']))
10.644193144917832
>>> placebo_likelihood_ratio(caffeine_mood['content'], placebo_mood['content'])
46.08310645632934
>>> likelihood_ratio_test(placebo_likelihood_ratio(caffeine_mood['content'], placebo_mood['content']))
7.6608928570645105
>>> placebo_likelihood_ratio(caffeine_mood['relaxed'], placebo_mood['relaxed'])
12.229945616108525
>>> likelihood_ratio_test(placebo_likelihood_ratio(caffeine_mood['relaxed'], placebo_mood['relaxed']))
5.007775005855661
>>> placebo_likelihood_ratio(caffeine_mood['horny'], placebo_mood['horny'])
2.670139324155222
>>> likelihood_ratio_test(placebo_likelihood_ratio(caffeine_mood['horny'], placebo_mood['horny']))
1.9642613047646074

And the p-values of those are:

>>> llrt_pval([10.644193144917832, 7.6608928570645105, 5.007775005855661, 1.9642613047646074])
array([0.0020736 , 0.02462515, 0.15015613, 0.63984027])

Conclusion

Caffeine appears helpful for everything except relaxation (and it makes me hornier, which I'm neutral about). I'd call this experiment a success and will be running more in the future, while in the meantime taking caffeine before morning meditations.

Full code for the experiment here.

Predictions on the Outcome of the Experiment

Predicting the outcomes of personal experiments give a useful way to train ones own calibration, I take it a step further and record the predictions for the world to observe my idiocy.

Prediction of Arm
- My prediction about the content of the pill is more accurate than random guesses_80%: Yes.
- My prediction about the content of the pill has a log score of more than -0.5_60%: No.
Meditation
- On days with caffeine, my average mindfulness during meditation was higher than days with placebo_60%: Yes.
- On days with caffeine, my average absorption during meditation was higher than days with placebo_40%: Yes.
- On days with caffeine, the variance of values for mindfulness during meditation was lower than on placebo days_55%: No.
- On days with caffeine, the variance of values for absorption during meditation was lower than on placebo days_35%: Yes.
- $λ < 1$ for the mindfulness values_20%: No.
- $λ < 1$ for the absorption values_25%: No.
- $λ < 4$ for the mindfulness values_82%: No.
- $λ < 4$ for the absorption values_88%: No.
Mood
- On days with caffeine, my average happiness during the day was higher than days with placebo_65%: Yes.
- On days with caffeine, my average contentment during the day was higher than days with placebo_45%: Yes.
- On days with caffeine, my average relaxation during the day was higher than days with placebo_35%: No.
- On days with caffeine, my average chastity during the day was higher than days with placebo_50%: No.
- On days with caffeine, the variance of values for happiness during the day was lower than on placebo days_55%: No.
- On days with caffeine, the variance of values for contentment during the day was lower than on placebo days_30%: No.
- On days with caffeine, the variance of values for relaxation during the day was lower than on placebo days_30%: No.
- On days with caffeine, the variance of values for chastity during the day was lower than on placebo days_50%: No.
- $λ < 1$ for the happiness values_45%: No.
- $λ < 1$ for the contentment values_40%: No.
- $λ < 1$ for the relaxation values_37%: No.
- $λ < 1$ for the chastity values_60%: No.
- $λ < 4$ for the happiness values_85%: No.
- $λ < 4$ for the contentment values_90%: No.
- $λ < 4$ for the relaxation values_90%: No.
- $λ < 4$ for the chastity values_95%: Yes.
Productivity and Creativity
- On days with caffeine, my average productivity during the day was higher than days with placebo_52%: Yes.
- On days with caffeine, my average creativity during the day was higher than days with placebo_55%: Yes.
- On days with caffeine, the variance of values for productivity during the day was lower than on placebo days_40%: No.
- On days with caffeine, the variance of values for creativity during the day was lower than on placebo days_65%: No.
- $λ < 1$ for the productivity values_40%: No.
- $λ < 1$ for the creativity values_45%: No.
- $λ < 4$ for the productivity values_75%: No.
- $λ < 4$ for the creativity values_80%: No.

I also recorded my predictions about the content of the pill on PredictionBook (1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50).

[-]Tofly1y95

Did you consider withdrawal effects at all? A day without caffeine having used it the previous few days is going to be very different from one where you haven't used caffeine in months.

[-]niplav1y20

That's a good point. I have, but shelved them for sometime later™. If it were true that withdrawal effects are important, you'd expect the difference in placebo-caffeine scores to drop as the experiment progresses, but I haven't checked yet.

[-]Mart_Korz1y40

you'd expect the difference in placebo-caffeine scores to drop

I am not sure about this. I could also imagine that the difference remains similar, but instead the baseline for concentration etc. shifts downwards such that caffeine-days are only as good as the old baseline and placebo-days are worse than the old baseline.

That's a good point, I'll keep it in mind.

[-]sludgepuddle1y10

I'm surprised you were only able to predict whether you'd taken caffeine 80% of the time. 200 mg is not a heroic dose, but on little to no tolerance it should be quite noticeable.

[-]niplav1y30

Blinding is powerful! Not sure where you get 80% from, do you mean the number of times when I was directionally right in the prediction?

[-]Ben1y40

I think they mean that beforehand you said "My prediction about the content of the pill is more accurate than random guesses_80%" - meaning that you were 80% sure you would do better than a 50/50 guess of what type of pill it was throughout the trial. Then you found that you did indeed do better than 50/50, but didn't give a number and I think sludgepuddle thought you had guessed right 80% of the time.

Ah, I see. In the table is the log score of my pill predictions (-0.6). I made predictions about the accuracy of my predictions—confusing perhaps.

[-]Mart_Korz1y12

log score of my pill predictions (-0.6)

If did not make a mistake, this score could be achieved by e.g. giving ~55% probabilities and being correct every time or by always giving 70% probabilities and being right ~69 % of the time.