An indiscriminate space weapon against low earth orbit satellites is feasible for any spacefaring nation. Recent rumors claim that Russia is already developing one, so I'm writing a post about it. I will explain why I think
What objections or details should I include? Also is it a dangerous infohazard?
I have a correction which would take a while to fully write up. Basically, it seems like non-maneuverable warheads could still evade cheap interceptors using static winglets/grid fins, because their enormous velocity means even a small amount of lift would allow them to pull several gs and move tens-hundreds of meters to the side. The defense has several options against this but I would need to see if any of them work.
Plausibly yes, but I'd be worried enough about residual exposure (reflections off walls, improper installation) to other UV wavelengths that installation is likely to require some care and expense too. The second link has several accounts of acute health effects from people doing upper room UV wrong. Probably still great to have in train stations, airports etc though given the enormous benefit/cost ratio.
how much more effective is far UVC (shone directly on people) vs upper room UV?
I'm not really sure, there would be a component from surfaces and a component from extra ACH due to not relying on vertical air mixing. There are probably studies.
The light should point mostly in a horizontal plane just below the ceiling of the room, so that no one has the light shining directly in their eyes. Here's a source and there are more sources linked from here, including a DIY guide.
Since upper room UVGI when not filtered to 222nm is probably safe, and far-UVC which IS filtered to 222nm is probably safe even when it shines on occupants' eyes and bodies, it stands to reason that upper-room UVC has enough safety margin. To the best of my knowledge, far-UVC has been tested up to doses equivalent to 3 years of 8h/day exposure at the current safety threshold, but eyes are delicate so I would prefer studies of 10-100x higher cumulative doses.
It's already safe beyond a reasonable doubt if kept above eye level (7 ft / 2.13m), since this massively cuts the dose absorbed. I think many public spaces should install uvc immediately, and if they're not convinced yet just use removable shutters that keep it above eye level until more research is done.
It goes without saying METR didn't simply stipulate 'linear score improvement = exponentially increasing time horizons": it arose from a lot of admirable empirical work demonstrating the human completion time is roughly log-distributed.
Not sure I agree with this, we constructed the benchmark to span a wide range, and the empirical work was mostly to show that the model success rate curve was logistic in human time.
But at least when taken as the colloquial byword for AI capabilities, this crucial contour feels a bit too mechanistic to me. I take that you can generalise the technique widely to other benchmarks deepens rather than alleviates this concern: if human benchmarking exercises would give log-distributed horizons across the items in many (/most?) benchmarks, such that progressive linear increments in model performance would give a finding of exponentially improving capabilities, maybe too much is being proven.
This isn't always the case. What matters for a benchmark is a range that spans many orders of magnitude, not exactly log distribution, and in that report, we saw that many benchmarks generally spanned narrower ranges than the METR task suite, so there isn't enough data to prove that increments in model performance. In others, task length was poorly correlated with difficulty for models. But taken together, the fact that models went from 0% to saturating MATH500, then the AIME, and now IMO points to a dramatic increase in capabilities.
I also think our everyday experience points to something like an exponential relationship between an intuitive "task complexity rating" and human time. It's natural to think of a level n+1 task as being decomposable into several level n tasks (eg get groceries = drive to the store, shop, drive back from the store) which naturally gives you exponential.
It doesn't seem to me the frontier has gotten ~3x more capable over this year, and although I'm no software engineer, it doesn't look from the outside like e.g. Opus 4.5 is 2x better at SWE than Opus 4.1, etc.
Some of this is probably the benchmark being unrepresentative of real tasks, but it's not clear why an agent with 2x the time horizon should feel 2x better. When using an assistant with double the time horizon on real tasks, you need to intervene half as much, but each intervention takes you longer, since it's written about double the code, fails in more complicated ways, and you have less understanding of what it's doing. Combined with Amdahl's law effects, I wouldn't be surprised if doubling the time horizon only causes a speedup of 1.2x or so on average on tasks much longer than the agent's time horizon, which are still most of them.
@habryka what do you see lightcone's discount rate as? I'll likely be in a better position to donate next year but could donate now if urgent.
If you don't emotionally believe in enough uncertainty to use normal reasoning methods like "what else has to go right for the future to go well and how likely does that feel", or "what level of superintelligence can this handle before we need a better plan", and you want to think about the end to end result of an action, and you don't want to use explicit math or language, I think you're stuck. I'm not aware of anyone who has successfully used the dignity frame-- maybe habryka? It seems to replace estimating EV with something much more poorly defined which, depending on your attitude towards it, may or may not be positively correlated with what you care about. I also think doing this inner sim end-to-end adds a lot more noise than just thinking about whether the action accomplishes some proximal goal.
Thanks for writing this. I disagree with some of the claims in this post but, as one of the authors on the original paper, I've been meaning to write a post on ways time horizon is overrated/misinterpreted. Limited number and distribution of tasks is definitely near the top of the list.
Yes any spacefaring country, probably not most private citizens given the ITAR controls that are already put on rocket tech to regulate ballistic missiles. As for bioweapons, they're higher on the escalation ladder and harder to control. Will be sure to think about it more and cover this