Extremely cool.
As you say, looking at treatment effect heterogeneity is hard given power. But -- if I understand correctly, you did not take any melatonin between nights in which you randomized -- have you looked "treatment effect vs. length of time since last experimental night"? This would be a very crude way of getting at tolerance effects.
I have a (completed) 5-year melatonin self-experiment that I will hopefully write up later this year (although... I have been saying that for 12+ months at this point), will be fun to compare notes.
But -- if I understand correctly, you did not take any melatonin between nights in which you randomized -- have you looked "treatment effect vs. length of time since last experimental night"? This would be a very crude way of getting at tolerance effects.
Good idea! Had a brief look now: I filtered my data for the 40 days on which I took melatonin, then for each one calculated the time (in days) since I last took melatonin (so not the last day I ran the experiment, but the last day I ran the expeirment where I was in one of the two intervention groups), and looked for a correlation between number of days since previous melatonin intake and time to fall asleep. There's maybe a tiny hint that there could be tolerance effects at play, but the data is insufficient for anything conclusive:
The point on the very right is the first day where I took melatonin - for that one, the "day since last intake" is not really defined, so I just choose the maximum distance between days I had + 1.
We do find a very slightly negative correlation which seems to indicate that after taking a break from the experiment (or having had some control group days recently) made the melatonin slightly more effective at reducing time to fall asleep, but then again, a [-0.4, 0.22] CI doesn't tell us much. :)
(Update: I also made a small linear regression and obtained the formula predicted_time_to_fall_asleep = 27.1 - 0.24 * days_since_last_intake (for days on which I took melatonin) - but, again, large error bars around that coefficient)
I have a (completed) 5-year melatonin self-experiment that I will hopefully write up later this year (although... I have been saying that for 12+ months at this point), will be fun to compare notes.
Oh wow, please do!
You haven't spelled it out directly, but if I understand correctly, melatonin reduced the length of your sleep? If yes, by how much?
Thanks for asking! Are you referring to the slightly earlier wake-up time? I just had a look at the net sleep time in the three groups, and got the following comparison:
Control: 8h 00m
0.15mg: 8h 02m
0.3mg: 7h 45m
But large p values as you can guess from the overlapping CIs.
(The seeming discrepancy between this data and wake-up time can be explained by the fact that wake-up time was the absolute time, whereas net sleep time is also affected by when I went to bed and how long it took me to fall asleep)
Yeah, the earlier wake-up time was what I was referring to. I've been interested for a while in whether melatonin reduces sleep duration, most studies use too much and find slight sleep lengthening. The 0.3mg decrease of 15 minutes is intriguing, begs for a replication attempt.
Throughout the first half of 2025 I did a blinded experiment to see how low to moderate melatonin intake affects me. In this post I summarize my findings.
Tl;dr: In my blinded self-experiment on melatonin, conducted over n = 60 days and analyzed with hypothesize.io, I found significantly positive effects for time to fall asleep (averaging ~25 instead of ~35 minutes, p ~= 0.001) and feeling awake the following morning (5.74/10 instead of 4.95/10, p = 0.006; but: this effect only persisted for the lower of two dosages, and did not persist throughout the rest of the day; could well be a false positive, as I didn’t correct for multiple hypothesis testing) for a dosage as low as 0.15mg of melatonin, taken on average ~1h before going to bed.
Feel free to jump ahead to the Results section if you don't care much about my methodology.
I randomized between 3 groups on a (roughly) nightly basis:
The dosage may seem low, but past experiences with melatonin showed that taking more than 2 drops often caused me to wake up around 4am, not feeling well rested, which I wanted to avoid.
For blinding, I asked my girlfriend to prepare a glass with ~20ml of water and the 0-2 drops for me, which I then drank. I think blinding worked well, as I almost never had any idea whether the water I consumed contained melatonin or not (there were one or two exceptions where I had a feeling that the water tasted unusual, at which point I did my best not to think about it further).
Before starting the experiment I discussed my setup with GPT-4o (nowadays I would certainly use a reasoning model) and we concluded that aiming for ~60 nights of measurements would be a suitable amount for the effect sizes I was hoping to find.
As I failed to run the experiment on many evenings (e.g. due to traveling, because either my girlfriend or me weren’t home, plus skipping some days, sometimes weeks, where I felt like my situation or mental state weren’t suitable to yield representative results), it took almost 5 months (January 29 till June 19) to collect all the data. This means more than half of the days were skipped. However, given that the decision of when to skip was not causally downstream of the group assignment, but independent of it, it should be fine to analyze the data on a per-protocol basis (filtering for only the 60 days where I actually did the experiment) rather than intention-to-treat (where I would take all days into account, including those where I didn’t actually do the experiment).
This is what I measured each evening that I did run the experiment:
The next morning/day, I then further tracked:
Once I had collected 60 such data points, they turned out to entail 20 cases of the control group, 25 cases of 1 drop (0.15mg), and 15 cases of 2 drops (0.3mg) – the round numbers are coincidence, all 3 groups had the same chance of occurring each day.
I was originally planning to evaluate the data “by hand”, possibly within Google Sheets, but shortly before concluding the experiment, I learned of Hypothesize – a new sister project of Clearer Thinking, and a website that makes data analysis like this very easy, so I used that instead and can indeed recommend it[1]. All I had to do was slightly reformat my spreadsheet (making sure my table starts at the upper-left-most cell) and turn times (from hh:mm format) into numbers, then export that as CSV file, and drop it into Hypothesize, which pretty much walked me through the analysis process at that point. All the charts in the results section come directly from Hypothesize.
When planning out the experiment, I primarily expected (/hoped for) an improvement in wakefulness upon waking up the next morning, expecting around a 1-point improvement on a 10-point scale (which didn't materialize; and, in hindsight, also seems like a pretty large expected effect anyway).
I’m surprised now to notice I didn’t put much thought into how melatonin would affect how long it takes me to fall asleep, which ended up being the clearest effect of all. However, I did at least think an effect there is plausible, given I explicitly decided to measure this. Based on the measurements I decided to make, my “implied hypotheses” were that melatonin could have an impact on:
But I didn’t quantify these further.
A summary of the clearest findings:
Besides these findings, the data showed very little of interest (or significance).
Given that melatonin is very cheap and taking a single drop of it takes merely a few seconds, the results seem promising enough that I’ll keep taking one drop of melatonin (0.15mg) within an hour before going to bed in the foreseeable future. Sparing me 10 minutes of falling asleep alone seems like a great deal, whether or not there are any effects on sleep quality or wakefulness the next day.
Of course a relevant question is how my results compare to a wider audience. Hard to say! I just liked the idea of putting my results out there in a structured manner. Claude 4 claims my average time to fall asleep of ~35 minutes (under control conditions) suggests I may have gone into the experiment with pre-existing sleep issues, which I never really considered. To summarize my general approach to sleep:
I also have no meaningful insights into possible tolerance effects, but would be surprised if taking such low dosages sporadically, 40 times over 5 months, ran into that issue. And if it’s the case, the data is probably not sufficient to get to any meaningful conclusions.
A quick look did, to my surprise, yield a correlation of 0.24 (p = 0.02) between fall-asleep time and day of the experiment (looking only at intervention days, n=40), which could be tolerance-related, but could also have any number of other possible causes, such as the warmer weather in the summer months. On control days (n=20) the correlation is 0.283 (p>0.2), so it appears I generally took longer to fall asleep in later months. I guess I can't rule out tolerance effects, but would expect other (such as seasonal) causes to explain this trend.
Subjectively reported sleep quality, on a 0-10 scale. The chart shows the sleep quality of the two test groups compared to the control group. The dashed line would be effect size 0. The three colors (dark blue, light blue, gray) represent confidence intervals of 80%, 95% and 99% respectively.
Group | Mean (0…10) | Change | p-value |
0mg (Control) | 7.15 | ||
0.15mg | 7.24 | +1.3% | 0.73 |
0.3mg | 7.25 | +1.4% | 0.72 |
Conclusion: No meaningful effect. Very slightly promising direction, but would require a much larger study (or more accurate measurements than my subjective self assessment) to test whether there’s something there, and is probably not worth the effort. In short, nothing to see here.
Subjective assessment made the next morning of how many minutes it took from going to bed to actually falling asleep. Naturally, there will be a lot of noise here and my guessed time can easily be off by several minutes. But as this noise affects all groups equally and I didn’t have any better measurement available for this, seems useful enough.
Note: if the experiment was not blinded, I would be highly skeptical of this, but given that blinding worked well, I put much credence in these findings.
Group | Mean (minutes) | Change | p-value |
0mg (Control) | 34.5 | ||
0.15mg | 25.24 | -26.8% | 0.0013 |
0.3mg | 24.29 | -29.6% | 0.0009 |
Conclusion: Pretty strong positive effect. Lying awake for ten minutes less each night seems like a win.
Subjective assessment of wakefulness (on the following day) on a 0-10 scale.
Group | Mean (0…10) | Change | p-value |
0mg (Control) | 4.95 | ||
0.15mg | 5.74 | +16% | 0.006 |
0.3mg | 5.07 | +2.4% | 0.72 |
Conclusion: Questionable. The naive interpretation that 0.15mg has a notable effect on my wakefulness, yet 0.3mg has no effect may of course be possible, but I certainly wouldn’t have predicted such an outcome ahead of time. It’s somewhat reassuring that both values point in a positive direction, but I wouldn’t rely much on these – particularly in light of the two findings that follow. Plus, given the many things I’ve tested here (without correcting for multiple hypothesis testing), risk of running into false positives is high, so this may well be one.
Subjective assessment of wakefulness (on the following day) on a 0-10 scale.
Group | Mean (0…10) | Change | p-value |
0mg (Control) | 7.05 | ||
0.15mg | 7.22 | +2.4% | 0.40 |
0.3mg | 7.13 | +1.1% | 0.73 |
Conclusion: Tiny effect sizes and very far from statistical significance, so nothing to see here.
Subjective assessment of wakefulness (on the following day) on a 0-10 scale.
Group | Mean (0…10) | Change | p-value |
0mg (Control) | 7.05 (it’s coincidence that this is the exact same value as for the noon measurement) | ||
0.15mg | 6.96 | -1.3% | 0.72 |
0.3mg | 6.93 | -1.7% | 0.68 |
Conclusion: Again, probably nothing to see here.
This is not necessarily a metric I care that much about or where I would even have a strong opinion on which direction is preferable. I just wanted to have a look at effects on this, as my past experience seemed to suggest that, when taking melatonin, I occasionally would wake up at e.g. 4AM, which doesn’t really happen otherwise. Also note that the values here were pretty strongly capped by me typically getting up via a 7:30AM alarm.
In this chart, the y-axis shows the difference of the two test groups' wake-up time to the control group in hours:
Group | Mean (hh:mm) [AM] | Change | p-value |
0mg (Control) | 07:03 | ||
0.15mg | 06:51 | -12m | 0.55 |
0.3mg | 06:52 | -11m | 0.62 |
Conclusion: Direction as expected, but effect size too small to find anything conclusive. Definitely not as clear or strong an effect as I would have assumed based on past experiences.
Lastly, a speculative section about timing of the intake. This is probably mostly useless but has some nice looking charts.
While running the analysis, I thought it might be interesting to figure out what is likely to be the best time to take melatonin. I wasn’t at all systematic about this during the experiment, and this time difference ranged from 19 minutes to 174 minutes. This is the distribution (x-axis shows time difference in hours, green dots are the data points):
Note that this time delay was not randomized, so any conclusions derived from this data are much more uncertain and likely subject to confounding factors.
To test if the time makes a difference, I discarded the control group, looking only at the data where I took melatonin (not distinguishing in this case whether 0.15mg or 0.3mg), and split this resulting data (consisting of n = 40 data points) into three eye-balled groups: <36 minutes (n = 11, group 0), 36 - 60 minutes (n = 12, group 1), and >60 minutes (n = 17, group 2).
For these three groups, I then compared the time to fall asleep (as this one had the largest effect sizes and hence the best chance of yielding results):
So, it appears, directionally, that going to bed relatively early after taking melatonin worked better for me, although the differences don’t quite reach statistical significance.
Here’s each of the three groups compared to the other two groups:
Group | Mean (time to fall asleep) | p-value (group compared to other two groups) |
<36 minutes | 22.4 minutes | 0.12 |
36-60 minutes | 24.2 minutes | 0.35 |
>60 minutes | 29.8 minutes | 0.02 |
Additionally, here’s a scatter plot showing the association between the time diff between taking melatonin and going to bed (x-axis) and the time it took me to fall asleep (y-axis):
Conclusion: limited and very noisy (and also in this case only observational!) data, might very weakly suggest that taking melatonin relatively shortly before going to bed (0-60 minutes) may be beneficial – but getting more conclusive evidence on this would require an experiment design with actual randomization of the timing.
Full disclosure, I'm affiliated with some people involved in Hypothesize and may thus have a more positive inclination towards the tool. But it's a fact that it made my job analysing this data much easier and this post would likely be much less insightful/accurate without it. :)