I knew this guy to be operating in good faith, and the point is a meaningful caveat to the model that safety/capabilities investment ratio is what matters
I talked to someone at Mechanize who thought safety would be net unaffected by timelines. The argument was:
However, even if 1-3 are true, I claim 4 doesn't follow.
4 only holds if the time lag decreases in lockstep with increasing investment. However, this seems false because most ways capabilities would speed up involve more parallel labor/compute or faster algorithmic progress, not faster feedback loops from the real world that would transfer to safety.
By analogy, suppose that the speed of cars increased from 20mph to 100mph in one year in 1910. Even if the same technology sped up the invention of airbags, ABS, etc. to 2010 levels, more people would die in car crashes because they couldn't be installed until the next model year [1], unless companies proactively did real-world testing and delayed the release of the super fast car until such safety features could be invented and integrated, which would be a huge competitive disadvantage. Likewise, if cyberattacks/power-seeking/etc are solvable but require real-world data and fixes only make it into the next model release, immediately getting a 2035-era superintelligence plus 10 years of safety research will result in way more cyberattacks and power-seeking.
[1] Retooling Ford factories was actually super expensive, I read somewhere that it required shutting down factories for six months.
We can use the number of mistakes to get a very noisy estimate of Claude 4.5 Sonnet's coffee time horizon. By my count, Claude made three unrecoverable mistakes that required human assistance:
Now this was a "try until success" task rather than a success/failure task. But if we try to apply the same standards as the METR benchmark, the task needs to be economically valuable (so includes adding milk/sugar) and any mistake that would make it non-viable to automate should count as a failure. I think any robot butler that typically made one of these mistakes would be unemployable.
I'd guess an experienced human would take about 7 minutes to make coffee in an unfamiliar house if they get the milk and sugar ready while the kettle is boiling, so we get a rate of 1 failure every 2.3 human minutes, which means a 50% chance of success would occur around ln(2) * 2.3 = 1.6 minutes. Of course, this is just one task, but we already know its coffee time horizon isn't like 20 minutes-- the probability of three events from a Poisson process with rate ln(2) / 20 is only 0.2%. Claude says the 95% confidence interval is (33 seconds, 8 minutes).
This is below trend for RLBench, though the data is extremely bad. If I speculate anyway, maybe real-world tasks like coffee are harder than RLBench or OSWorld-- it certainly requires much more planning than 5-20 second simulated robotics tasks. Or maybe it just hasn't been trained for the real world.
METR could probably use a methodology like this if we had more long tasks and labeling were free, so maybe it's worth looking into methodologies like having smarter agents unblock dumber agents where we can automate things.
Tagging @Megan Kinniment who has also thought about recoverable and unrecoverable failures
I'm spending about 1/4 of my time thinking about how to best get data on this and predict whether we're heading for a software intelligence explosion. For now, one thought is that the inference scaling curve is more likely to be a power law, because it's scalefree and consistent with a world where AIs are prone to get stuck when doing harder tasks, but get stuck less and less as their capability increases.
My current guess is still something like the independent-steps model which has a power law.
If you want it to be default, LW should enable it by default with a checkbox for "Hide score"
I guess I should review this post given I noticed the unit conversion error in the original. How did I do that? It was really nothing special, OP explicitly said they were confused about what the strange unit "ppm*hr" meant, so I thought about what it could mean, cross-referenced and it turned out the implied concentration was lower than expected. It's super important to have clear writing, the skill of tracking orders of magnitude and units will be familiar to anyone who does Fermi estimates regularly, and it probably helped too to read OP's own epistemic spot check blog posts as a baby rationalist.
This is one of the best April Fool's jokes ever on this platform. It's well executed, is still extremely funny, and illustrates the folly (from the alignment community perspective anyway) of doing capabilities research while not really thinking about whether your safety plan makes sense. The only way it could be better is if it started a conversation in the media or generated broad agreement or something, which it doesn't appear to have (eg Matthew Barnett doesn't agree). But this is a super high bar so I still think it deserves 4.
This gets 9 points from me. I think it's the first I had heard of the Jones Act, and the post's anti-Jones-Act stance is now one that I am proud to still hold. It's so distortionary that shipping between US ports is more than twice the cost of equivalent international ports, and for very dubious strategic benefit. Imagine if the law were instead that 50% of the volume of all ships between US ports must be filled with rubber ducks. The Jones act is actually WORSE than this in many respects because not only does it >double the price, it removes flexibility from supply chains and surge capacity during disasters.
The post also touches on how special interest groups control American politics beyond just "big oil" etc., all the ways a market economy should make its citizens' lives better and how many of them go through shipping, and the failure of American shipbuilding [1]. It predates Abundance, which was 4 months later in March 2025, and is certainly an abundance idea.
As for downsides, it's somewhat long-winded and I'm a bit skeptical that repeal is actually feasible (some of the commenters point out the large number of people who would actually need to be compensated, and I don't think a government at our current competence level could do this).
[1] This last topic is getting more relevant, as the US Navy recently canceled the Constellation program, which marks its third straight failed frigate program.
I'm giving this -4 points because it seems anti-helpful on net, but not all bad.
Unfortunately the available benchmark tasks do not allow for 99%+ reliability measurements. Because we don't have 1,000 different one-minute tasks the best we could do would be something like whether GPT5.1 can do all 40 tasks 25 times each with perfect reliability. Most likely it will succeed at all of them because we just don't have a task that happens to trip it up.
As for humans' 99.9%, at a granular enough level it would be 0.2 seconds (typing one keystroke) because few people have higher than 99.9% accuracy. But in the context of a larger task, we can correct our typos, so it isn't super relevant.