Someone points out that in the case of one firm and one person, it's mathematically impossible to get perfect price discrimination because the owner's willingness to spend will be arbitrarily high if all the profits flow back to them. Not sure what this means for the larger case.
Have you looked at rituals people do in the ~hopeless situations Daniel mentioned for inspiration? I think the actions of the seed bank during the siege of Leningrad were admirable; they frequently used honor/glory framings and the consequences for future generations as motivation. But this is all used to inspire people to do their heroic jobs, so without people having agency, it seems tricky to frame things well and unclear how they should relate to possible doom.
Not speaking for anyone else at METR, but I personally think it's inherently difficult to raise the salience of something like time horizon during a period of massive hype without creating some degree of hype about the benchmark, and the overall project impact is still highly net positive.
Basically, companies already believe in and explicitly aim for recursive self-improvement, but the public doesn't. Therefore, we want to tell the public what labs already believe-- that RSI could be technically feasible within a few years, that current AIs can do things that take humans a couple of hours under favorable conditions, and that there's a somewhat consistent trend. We help the public make use of this info to reduce risks, eg by communicating with policymakers and helping companies formulate RSPs, which boosts the ratio of benefit to cost.
You might still think: how large is the cost? Well, the world would look pretty different if investment towards RSI were the primary effect of the time horizon work. Companies would be asking METR how to make models more agentic, enterprise deals would be decided based on time horizon, and we'd see leaked or public roadmaps from companies aiming for 16 hour time horizons by Q2. (Being able to plan for the next node is how Moore's Law likely sped up semiconductor progress; this is much more difficult to do for time horizon for various reasons.) Also, the amount of misbehavior-- especially power-seeking-- from more agency has been a bit below my expectations, so it's unlikely we'll push things above a near term danger threshold.
If we want to create less pressure towards RSI, it's not clear what to do. There are some choices we made in the original paper and the current website, like not color-coding the models by company, discussing risks in several sections of the paper, not publishing a leaderboard, keeping many tasks private (though this is largely for benchmark integrity) and adding various caveats and follow-up studies. More drastic options include releasing numbers less frequently, making a worse benchmark, or doing less publicity, and none of these seem appealing in the current environment, although they might become so in the future.
Raptor's claimed vacuum ISP is 380 [...] I also don't know where I'd go if I wanted to prove to myself that the number is legit (wikipedia just cites an Elon tweet...).
The Isp of a closed cycle rocket engine with a given propellant mix is largely a function of its chamber pressure and expansion ratio, so one can use a program like RPA to plug in known numbers and see what other claims are consistent with an Isp of 380. Example (for SL variant) in this tweet.
My guess is that 380 is achievable if they close the throat and use a large enough nozzle, but they'll opt for slightly lower in order to cram 9 engines into the upper stage. With Starship staging at record low velocities, reducing gravity losses through higher thrust might matter more than a 1% efficiency gain.
Don't exactly disagree but there's a difference between Starship landing reliably and scaling up vs truly being "finished".
$15/kg basically requires airline-like operations (keeping total operational cost to 3-4x fuel cost) while maintaining a 4% payload fraction. I don't think the next version of Starship is capable of this due to the sheer number of kinks to work out to get the number of maintenance items down to ~0 per launch with a reusable upper stage, so the first time it could happen is with a later version of Starship similar to what happened with the Falcon 9 Block 5, which took 8 years after the first Falcon 9 launch and had almost 2x the payload. Also possible that it requires a complete redesign (9 meter -> 12 meter diameter, new engines) or future advances in e.g. TPS material, or doesn't happen until all the maintenance is automated.
I didn't like the new feed until I found the Source Weights sliders and was able to upweight Recent Comments and Latest Posts to 50, restoring some of the old behavior. But I think I still prefer the old interface. Not sure yet if I'll keep this opinion or learn to interact with LW in some other way.
I liked reading recent comments because
I also think the interface could be improved a bit. This one took up a huge amount of vertical space on my screen and I didn't understand any of the three comments, nor were they relevant to me. Many of the comments are only understandable with context from a parent one is unlikely to read, so these should either be filtered out or just take up less space.
To clarify I'm not very confident that AI will be aligned; I still have a >5% p(takeover doom | 10% of AI investment is spent on safety). I'm not really sure why it feels different emotionally but I guess this is just how brains are sometimes.
I'm glad to see this post come out. I've previously opined that solving these kinds of problems is what proves a field has become paradigmatic:
Paradigms gain their status because they are more successful than their competitors in solving a few problems that the group of practitioners has come to recognize as acute. ––Thomas Kuhn
It has been proven many times across scientific fields that a method that can solve these proxy tasks is more likely to achieve an application. The approaches sketched out here seem like a particularly good fit for a large lab like GDM, because the North Star can be somewhat legible and the team has enough resources to tackle a series of proxy tasks that are relevant and impressive. Not that it would be a bad fit elsewhere either.
And in terms of communication costs (which are paid at the synaptic junction for the synapse -> dendrite -> soma path), that 1e5 eV is only enough to carry a reliable 1 bit signal only about ~100mm (1e5 nm) distance through irreversible wires (the wire bit energy for axons/dendrites and modern cmos is about the same).
This model of interconnect energy has been thoroughly debunked here, as coax cables violate it by a factor of 200: https://www.lesswrong.com/posts/fm88c8SvXvemk3BhW/brain-efficiency-cannell-prize-contest-award-ceremony
If it applies in the specific cases of axons and cmos there should be justification of why it does, though given the amount of prior discussion I don't think this would be fruitful.
Should everyone do pragmatic interpretability, or are pragmatic interp and curiosity-driven basic science complementary? What should people do who are highly motivated by and have found success using the curiosity frame?