This is an awesome self-experiment. Sadly the 100ug dose used might be too low by at least an order of magnitude. How was it chosen?
I'm very impressed by the self-experimentation and rigor, and would be excited to see and fund more of this work on Manifund.
Thanks for this background, it's super helpful!
For dosing, I think we based the dose on the Deadwyler et. al. study in monkeys as well as user experiences.
Unfortunate that a 10-20x dose in humans seems to have small effects? I would have expected them to use a C-terminal -NH2 modification on their Orexin-A to prevent degradation, but it doesn't look like it (but maybe they did and I missed it). If that's the case, might reduce the effective dose gap between ours and theirs somewhat.
I'm excited to see our results at a higher dose, though part of me is frustrated by how difficult peptides are to work with. But hopefully Takeda or someone else will perfect small-molecule orexin agonists!
I find it great that you took up the task to run the experiment. I'm a bit curious about whether part of you getting interested in orexin was downstream from my post Orexin and the quest for more waking hours.
When I asked ChatGPT "What happens to orexin if you store it in a saline solution?" I got as a response "If you store orexin in saline, expect it to become unreliable fairly quickly." Using as freemany detailed a low dose and additional a poor way to store it, probably resulted in the null effect.
That's a great post! It did more to popularize the idea than I ever could. I've been thinking about this for a while and my first writing on the topic was in 2021. I'm going to refrain from linking to it because I'm planning on depreciating that blog soon though.
Re storage and handling: this part was tricky, we opted to dissolve in sterile water (not saline) and froze the batch after mixing. So doses were only exposed to room temperature for ~minutes. We also used a C-terminal NH2 peptide that is less susceptible to degradation.
There's still no guarantee that peptide is stable under these conditions and we don't have a good way to check. This is a big reason why we thought there was a ~60% chance of a null on this trial (and the next one too perhaps). But hope springs eternal!
I think this is one of the cases where a quick discussion with an LLM can be helpful to check trial protocols. ChatGPT did find https://cdn.caymanchem.com/cdn/insert/15073.pdf which suggests dissolving the Orexin first in DMSO.
This is enormous overanalysis of an underpowered study design. N=2 to evaluate what you hypothesise to be a small effect is pointless. Did you perform a power analysis before you started?
I don't know why others are downvoting this. Almost the first thing I did on opening this article was Cmd+F search for "power." When I hear about a null result for something I care about, whether the study had enough power to detect a positive result if there is one of the first things I want to know; if the answer is no, then there's little to be gained by reading it.
I don't like the jump from "N=2" to "underpowered" (I read a good part of a book on single-case study design), but that's more analysis than I found skimming and searching through TFA.
Thank you for offering a more constructive comment.
We did a power analysis to set the total number of trials (iirc assumed d=0.5, alpha=0.05, 80% power, so ~30 total test weeks and 10 weeks/person). However, the design proved unsustainable for us and the Fitbit dropped one persons data.
Though in some sense it worked out, we can pursue a better trial now.
I personally preferred to get up early while No Magic Pill and niplav preferred to stay up late and get up at the usual time.
Nit: this article's first author is niplav and I have no idea who is "I" here.
Also your footnote got fucked by the wysiwyg -> markdown conversion
Very interesting. My crude guess is not enough getting into the brain before the peptide degrades. Orexin nasal spray trials for narcolepsy have beem kind of disappointing so far, which is why companies like Takeda are developing orexin agonists.
Keep up the experimentation. I wrote about something related by the way - S-modafinil, the shorter acting enatomer of modafinil (modafinil, as you know, boosts orexin, (or orexin signaling.. something like that).. and also boosts dopamine as well).
Yeah we used C-terminal NH2 modified orexin to prevent degradation but its possible it simply wasn't effective.
Interesting that orexin sprays haven't been working, I'll have to look into this. Do you know the names of any off the top of your head?
Love that post! Made me realize that sleep need reduction therapies have to be pretty specific in what receptors they hit. A stimulant like modafinil that hits orexin + other stuff doesn't reduce long-term sleep need in healthy individuals right? So a sleep need therapy needs to stimulate for just the right window, while enabling efficient sleep at night.
My experience from playing a lot of online chess is that tiredness, exhaustion, illness etc doesn't necessarily immediately crash my performance. Often I feel bad but still play well. Performance then crashes over the following days.
Interesting! Once you get a good nights sleep or a break does your performance go back to normal? Or does it take a few days?
Over the last few months we[1] have been doing a sleep experiment inspired by our suspicion that orexin is an exciting target for sleep need reduction.
We mildly deprived ourselves of sleep (5-5.5 hours, relative to 7-7.5 hours normally) and took either a placebo or orexin intranasally. We tracked our sleep the night before and after taking a dose in the morning and completed various tests of mental acuity during the day.
The results from our initial experiment are exclusively null results that don’t cross standard thresholds for statistical significance. Not that this was particularly surprising, we expected a ~60% chance of this happening. We’re considering next steps, and need your feedback!
For now, there are a few things to cover in the results.
Trial Design
We performed a self-blinded randomized controlled trial with blocking, each participant took either the placebo (2.5 mL of sterile water) or the orexin (100 μg of orexin-A dissolved in 2.5 mL of sterile water). Here’s the procedure, repeated for every block:
Each person had a substantial amount of leeway in how they structured their day. On sleep deprived days, I personally preferred to get up early while No Magic Pill and niplav preferred to stay up late and get up at the usual time. We each took doses at a consistent time, but the time differed between people.
We didn’t standardize things because we thought it was more important to have ecological validity, i.e. that we were using orexin the way we would actually use it in everyday life. This is a higher variance, but lower bias approach.
The Results
In our initial proposal, we pointed out that the main thing we wanted to see was orexin causing less rebound sleep the following night. A simple stimulant effect isn’t enough for us, we wanted to use orexin to sleep less and get away with it.
So here’s the average sleep time for the night after taking orexin vs the night after taking placebo:
Unfortunately, the difference wasn’t significant and the effect size is small. This could be for a couple reasons that we want to address in the next trial.
Did orexin have any sort of stimulant effect during the day? Nope, none of the mental acuity tests are significantly different.
In setting up this trial we had a sneaky second hypothesis: Does sleep deprivation actually make you dumber?
One caveat before we look at the data. Typically our “baseline” days would come after our sleep-deprivation days. So that means baseline days enjoy more cumulative practice compared to sleep-deprivation days. That should bias the results by making baseline days look better. On the other hand, if sleep deprivation has long term cumulative effects, then perhaps baseline days are at a disadvantage. But that doesn’t match our experience of feeling significantly better on baseline days.
So, does sleep deprivation make you dumber? Not really!
Depending on how you correct for multiple comparisons, the psychomotor vigilance task (PVT) differences might be significant. And I’d expect the PVT differences to become significant with more data points. From what I (Sam) remember from doing PVT on sleep-deprivation days, I felt just as fast, but I would slip-up more from inattention or distractions. This is consistent with the large gap on the slowest 10% days.
But overall, this is a nice example of how our intuitions around sleep can lead us astray. It sure feels like sleep deprivation should make you dumber. But we don’t see that here. It’s important to actually check what changes our productivity because our intuitions around this are pretty fuzzy.
The Next Trial
There's a few reasons why we might be getting a null result. We might have too few data points, or the dose might be too low, or more concerningly, we might be storing the orexin improperly.
So the first next step is a slightly bigger trial where we see if a higher dose of orexin changes our results. From anecdotes online, some have felt effects while others haven’t. But even if orexin doesn’t have obvious effects, it might still reduce sleep need. We need to try higher doses and collect more data to find out.
That said, sleep deprivation is uncomfortable for Sam and No Magic Pill and extremely uncomfortable for niplav. We’ve decided to try a different design: sleep ad libitum on all nights of the week, but observe whether orexin reduces the amount we sleep the night after. This should make it sustainable to collect a lot more data.
Appendix A: Details about the Data Analysis
We collected two separate datasets:
We aggregated mental acuity tests per-test to avoid pseudoreplication (so two data points per day), and aggregated Fitbit data per-day. We analyzed the data via matched controls (with days in a participant-block being matched as to analyze within-pair differences) and ran two separate analyses on the data; one frequentist and one Bayesian. The code for the analysis, written in Julia by Claude Opus 4.6, is available here. Our mental acuity test data is available here, aggregated full data is available here.
Frequentist Analysis and Additional Results
In our frequentist analysis we ran the paired t-test on the paired data with cardinal measurements, and the Wilcoxon signed rank test on paired data with ordinal measurements, we also report Cohen’s d for the measurements. We Bonferroni-corrected the p-values, not that that was necessary…
Variable
Effect Size
p-value
p-corrected
Orexin
Placebo
Difference
PVT Mean RT (ms)
0.100 (Cohen's d)
0.624
1.000
256.0 ± 28.0 (n=50)
253.3 ± 26.2 (n=46)
+2.7
PVT Median RT (ms)
0.149 (d)
0.469
1.000
243.6 ± 18.3 (n=50)
240.8 ± 18.9 (n=46)
+2.8
PVT Slowest 10% (ms)
-0.024 (d)
0.908
1.000
296.7 ± 59.9 (n=50)
298.2 ± 68.3 (n=46)
-1.5
DSST Correct
0.211 (d)
0.303
1.000
69.7 ± 10.6 (n=51)
67.4 ± 11.3 (n=46)
+2.3
DigitSpan Forward
0.175 (Rank-biserialcorrelationr)
0.148
1.000
7.86 ± 1.00 (n=42)
8.10 ± 1.13 (n=40)
-0.24
DigitSpan Backward
0.061 (r)
0.627
1.000
7.31 ± 0.95 (n=42)
7.38 ± 1.25 (n=40)
-0.07
DigitSpan Total
0.127 (r)
0.318
1.000
15.2 ± 1.7 (n=42)
15.5 ± 2.0 (n=40)
-0.3
SSS Rating
-0.178 (r)
0.112
1.000
3.29 ± 1.02 (n=52)
2.98 ± 0.86 (n=46)
+0.31
Sleep Duration (hrs)
0.212 (d)
0.542
1.000
8.60 ± 1.91 (n=17)
8.27 ± 1.05 (n=17)
+0.33
Sleep Efficiency (%)
-0.257 (d)
0.460
1.000
89.4 ± 5.3 (n=17)
90.5 ± 3.7 (n=17)
-1.2
Sleep Deep (min)
-0.011 (d)
0.974
1.000
74.5 ± 22.9 (n=17)
74.7 ± 19.3 (n=17)
-0.2
Sleep Light (min)
0.232 (d)
0.505
1.000
283 ± 69 (n=17)
270 ± 40 (n=17)
+13
Sleep REM (min)
-0.150 (d)
0.665
1.000
101.2 ± 27.3 (n=17)
104.9 ± 21.8 (n=17)
-3.7
Sleep Wake (min)
0.341 (d)
0.331
1.000
56.9 ± 38.0 (n=17)
46.6 ± 19.6 (n=17)
+10.3
HRV Daily RMSSD (ms)
0.079 (d)
0.814
1.000
32.8 ± 13.0 (n=18)
31.7 ± 15.1 (n=18)
+1.1
HRV Deep RMSSD (ms)
0.369 (d)
0.276
1.000
31.8 ± 12.2 (n=18)
27.2 ± 13.0 (n=18)
+4.6
SpO2 Avg (%)
-0.286 (d)
0.397
1.000
95.7 ± 1.0 (n=18)
96.0 ± 1.0 (n=18)
-0.3
SpO2 Min (%)
-0.059 (d)
0.861
1.000
93.6 ± 1.4 (n=18)
93.7 ± 1.6 (n=18)
-0.1
Breathing Rate (breaths/min)
0.314 (d)
0.382
1.000
16.4 ± 2.0 (n=17)
15.8 ± 1.9 (n=15)
+0.6
Skin Temp Δ (°C)
-0.041 (d)
0.905
1.000
0.01 ± 0.65 (n=17)
0.04 ± 0.49 (n=17)
-0.02
Steps
0.032 (d)
0.909
1.000
6478 ± 6403 (n=27)
6282 ± 5996 (n=26)
+196
Bayesian Analysis and Additional Results
We fit a hierarchical Bayesian linear model with participant random intercepts, using NUTS (4 chains × 2000 samples per metric). The primary estimand is δ, a standardized treatment effect (Cohen's d-like), with a weakly informative N(0,1) prior.
Formally, the likelihood is yᵢ ~ N(μ + δσ·treatmentᵢ + α[pᵢ], σ), where treatmentᵢ ∈ {0,1} encodes placebo/orexin. The raw treatment effect on the outcome scale is δσ; δ alone is dimensionless. Priors: μ ~ N(0,10) (vague grand mean), σ ~ half-N(0,10) (residual SD), τ ~ half-N(0,5) (between-participant SD), α[j] ~ N(0,τ) iid for each participant j.
Priors and posteriors for cognitive acuity tests and sleep measurements:
Priors and posteriors for additional Fitbit data:
Learning Effects on Mental Acuity Tests
Circles for the first test of the day, diamonds for the second test of the day.
Appendix B: Threats to Validity
Our method seems simple on its face, but there were a lot of annoyances along the way.
Appendix C: Personal Experiences
Niplav:
Sam:
No Magic Pill:
Sam Harsimony, niplav, No Magic Pill.