I ran through my checklist. Looks low-risk to me. Basically a very deadly disease that nevertheless shows nil or inefficient human-to-human transmission and doesn't show signs of regional or global growth. It depends heavily on specific South American rat species for transmission to humans. The original exposure was on or around April 1. There are no new concerning mutations (see below), and so far the flight attendant who had been feared to have picked it up on a flight has tested negative, though there's an up to 6 week incubation period and she was exposed on April 25th, about 2 weeks ago. There were also 3 other plane passengers who had contact with an infected person on a plane and they've tested negative so far as well.
The consensus public health position is that "person-to-person transmission is possible through close personal contact, such as between couples." If it turns out that this is wrong and that it's much more human-transmissible than previously believed, then the complete failure to quarantine passengers after the discovery of hantavirus on the Hondius cruise ship and first passenger death would completely alter my projection. But so far, there do not seem to be signs of that.
Getting on a evolutionary biology level, if Andes virus were to become efficiently human-transmissible, the last place I'd expect to see it enter the human population is via tourists picking it up while bird watching in a garbage dump. That's a setting where the human-rodent interface is minimal. In that setting, there's little selection pressure on the virus to jump to humans and transmit efficiently. In a hypothetical Andes virus pandemic, I'd strongly have expected patient 0 to be a Chilean or Argentinian villager living in conditions that sadly expose them to routine contact with rats. I'd expect the outbreak to spread first in a local village, not on a luxury cruise ship.
Instead, it looks to me like a couple of tourists made the risky and unusual decision to put themselves in a uniquely rodent-infested location without protection, where they likely had the misfortune to inhale a large quantity of virus-infested aerosolized rat droppings. Then they got into the close quarters of a cruise ship, which completely failed to execute the measures it should have taken to address their illness, including a failure to quarantine. This allowed limited spread to other close-quarters passengers, exactly as we have observed this virus to be capable of in the past. This, combined with the lack of new mutations, suggests we're dealing with exposure to a known-quantity, transmission-inefficient virus in a demographic that doesn't usually get exposed.
The concern is due to the fact that those passengers had travelled internationally by the time quarantining and contact tracing was initiated, combined with the deadliness of the virus. But it takes a combination of spread in international populations and transmissibility and deadliness and lack of treatment to make a pandemic, and currently we only appear to have deadliness and lack of treatment. The only reason it even gets 4 points is probably because it's so rare and such a disease of poverty that big pharma's never invested in a vaccine or treatment. The main scary thing, like ebola, is the very high CFR. But that is not the grounds for a global pandemic. It's just a tragedy for a few hundred people a year.
Version: 0.2.4
Current score: 4/14
Last updated: 8 May 2026
The Andes-strain hantavirus outbreak aboard the MV Hondius cruise ship is tragic. But I estimate a <1% chance this becomes a pandemic. It is on track to self-extinguish within weeks.
Hard disagree. 8% is wildly overinflated and it's at 4% now. If you don't find yourself substantially revising your plans based on a 50% drop in pandemic probability, and if you didn't get seriously concerned about an 8% chance of a 40% CFR pandemic, then even you aren't taking them seriously. The main utility of this prediction market is driving internet chatter on how useful prediction markets are.
I think the prediction market can be useful even if you need to apply corrections to the headline number. It makes a huge difference to me whether Manifold says 8% or 40%, even though I won't necessarily fully trust either number until it's had a while longer to shake out (and even then it depends on the market dynamics, liquidity, etc.)
I think that's an unfair deployment of xkcd mockery. 8% of a pandemic this year is not a tiny chance, which means a 50% drop is actually a big deal. The issue was interpreting the prediction market as an accurate percentage when it should just be an indication of approximate risk.
I mean, no, of course I am not changing my plans based on a 50% drop in pandemic probability. There are tons of pandemic probabilities that change from 0.1% to 0.05%, every week or so, and I am not changing my actions based on that.
In this case, knowing the probability seems to be somewhere in the 1%-10% range is already extremely helpful! I don't really need to know much more. And I have proxies I can use to evaluate the robustness of the market (like volume), so I am not miscalibrated about the noise.
Hard disagree. 8% is wildly overinflated... The main utility of this prediction market is driving internet chatter on how useful prediction markets are.
Polymarket has it at 10%. I don't know where you're located, but if you're able to buy some NO, and the WHO doesn't characterize Hantavirus as a pandemic this year, then you'll make ~11% returns.
In 2015:
B: "Do you think there's gonna be a Hantavirus pandemic?"
A: "uh well it seems unlikely"
B: "I disagree"
A: "totally valid"
I am not sure what you mean, do you currently think the probability is outside the 1%-10% range? Imperfect accuracy is fine, especially when it's easy to adjust for.
It sounds to me like you're expecting too much from all information. Consider the information that physical matter is made of atoms, which are about a nanometer across, and have positively charged protons and neutrons in the center, and then much smaller electrons whizzing around in orbits. This has been revolutionary for society, leading to so much ability to do engineering and chemistry and understanding cosmology and so forth.
A typical teenager learning this information for the first time might ask "what utility will this give me personally?" and find it has little direct application. Yet I would not advise them that this information is not worth knowing.
That's the first point, that information doesn't need to be directly connected to an outcome to be worth knowing.
The second point is that fine gradations in information are valuable. I can imagine someone similarly saying "Why should it be valuable to anyone to know the difference between Apple Stock being $285 versus $275? Surely we should just care whether it's doing well or not? Why don't we just replace it with the words "Great" "Good" "Bad" "Worse"? Yet often small signs tell us something. A few percentage points of dip can imply that a new product release went poorly. A change in CEO leading to stock price raising a few points can indicate very good things about this new CEO.
In this case, I find information like "8% chance of a pandemic" valuable in lots of ways.
Added: The third point is that public legibility is a massive value-add, and could well be most of the value. Given that it's public, it makes me more confident that, if it were to get higher, people would notice and warn me. Much of the news landscape is just people arguing whether something is an emergency (which our memetics are perversely incentivized to say is true all of the time), so whether lots of people are acting alarmed just isn't something most people can be very sensitive to changes in. The change from 5% to 50% is serious for me and yet I don't know how to tell that difference from the tone of people on twitter or in many media outlets, especially when they are not themselves precise.
To go into more detail on this, I had an LLM write the following so that the math checked out: you can imagine a world where every year carries a 5% pandemic risk, and another world where every 15 years carries a 75% pandemic risk. Over 60 years, both imply 3 expected pandemics, but they suggest very different prevention strategies: steady-state risk reduction in the first case, versus identifying and defusing rare high-risk transition periods in the second.
Like, hooray, we have a number. What now?
I mean, so much. First of all, the number is enormously more helpful than reading a bunch of articles that give me lots of detail that then force me to come up with my own model of the situation to come up with a probability.
And then the number itself of course drives all kinds of actions! Another COVID-level pandemic would be a huge deal that would change my actions drastically in hundreds of ways (this market is not about it being another COVID-level pandemic, but is a lower threshold).
Why don't you think that's useful? It seems like at least some smart people thought about it carefully, which is a great start. Are you referencing some other thread? If so, please share?
I'm referring to this post from last month (and I assume that post is why habyrka is tagging mabramov).
Sure, maybe there's some usefulness in that it got smart people thinking about the question. And it gave us a figure of 8%.
But I don't really understand what a person is to do with that number. What utility follows from that? And why is it worth, as @mabramov emphasises, nine-figure EA funding?
Thanks for clarifying.
The use is that I don't have to figure it out for myself! This saved me at least an hour, probably more.
My reaction to an 8% probability of a pandemic is currently "doing nothing," so in some sense I agree with you. If it was >50%, I would probably read up more on hantavirus and think about preparations I could make now (maybe at that point, I would have picked up on ambient freaking-out even without the prediction market, but I do like having a precise number). Maybe I "should" in fact be preparing for hantavirus even at an 8% probability, but I only have so much willpower and time in the day.
Notably, I can't immediately think of any important decisions in my life that I made differently due to prediction markets. But someone whose job depends on making quick decisions based on global events, e.g. some government role in biosecurity, would probably find calibrated estimates on this kind of thing quite useful.
Cruise ships, unfortunately, are prone to exacerbating any sort of disease outbreak due to their nature. For instance, back in 2020, aboard the Diamond Princess, Covid spread with an R0 5x higher than usual.
If anything, it's surprising that more people didn't get infected by this, which shows just how poor hantavirus is at human-to-human spread.
This is interesting because, it is not discussed enough that AlphaGo worked the same way LLMs work: it was pretrained on a large dataset of human moves, then posttrained through reinforcement learning. The shift to AlphaZero removed the pretraining, showing that pretraining data didn't matter at all for the model's capabilities (quite the contrary, AlphaZero is superior to AlphaGo). While this coding model is still pretrained, it also shows that good pretraining data doesn't matter nearly as much as one could think.
it also shows that good pretraining data doesn't matter nearly as much as one could think
I don’t think it shows that. It arguably suggests that abundant pretraining data doesn’t matter as much as one could think. As opposed to good pretraining data. I presume that the codebases + agentic coding transcripts that they SFT’d on were high quality, right? [ETA: WHOOPS SEE MATRICE REPLY]
As for data efficiency, after the pre-1930 pretraining, IIUC it takes 250 training examples ≈ 13 million tokens before “the model solves its first [SWE-bench] issue”, and 75000 training examples ≈ 4 billion tokens gets to pass@1 of 4.5%.
Is that a more or less than expected? I dunno, it depends on what you were expecting. For what it’s worth, Gemini says 13 million tokens is about what a human could read in 650 hours non-stop (40 hours/week for 16 weeks).
I don’t think it shows that. It arguably suggests that abundant pretraining data doesn’t matter as much as one could think. As opposed to good pretraining data.
I think it straightforwardly show the reverse (good pretraining data doesn't matter as much as one could think, but abundant pretraining data does)? Olmedo himself notes: "What holds the 1930 model back is that it is severely undertrained (only 260B tokens), rather than its pre-training data."
I presume that the codebases + agentic coding transcripts that they SFT’d on were high quality, right?
SFT is posttraining??
Lol oops yeah SFT is posttraining, that explains why I found your comment confusing.
The quote emphasized “after just 250 training examples”, and I thought that was the context, i.e. that you were impressed by the “just 250” part and commenting on that. But I guess the “just 250” was irrelevant to your comment.
On top of that, I tend to mentally lump pretraining and SFT together because they’re algorithmically exactly the same thing, except maybe different hyperparameters. So that’s the other half of why I misread your comment.
Still, given that pretraining and SFT are algorithmically exactly the same thing, it would follow that you need no pretraining data whatsoever if you have enough of the right kind of SFT data. …In principle. Probably not in practice. But still, that’s relevant context here I think.
I think the main shock comes from none of the data involved having code in it, and then the model quickly learns coding skills with only a few examples.
You say "with only a few examples", I say "after ~13 million tokens of some mix of code and agentic reasoning about code". Is it a "shock" because you expecting it to take much more than 13 million tokens? Or is it a "shock" because you expected "number of examples" to be an important constraint independent of the length of each example? Or something else?
Hmm. I didn't realise there were that many tokens per example. I must have glossed your comment incorrectly. Sorry on my part.
I think the initial link is good to share, but I disagree with the analogy to AlphaGo/AlphaZero. The RL process for current models still involves humans heavily in creating the tasks, assuring task correctness, and deciding what kind of tasks are useful to train on. We don't have anything like self-play except maybe a small amount in math training (synthetic math data could be construed as self-play).