I have computed time horizon trends for more general software engineering tasks (i.e. with a bigger context) and my preliminary results point towards a logistic trend, i.e. the exponential is already tapering off. However, I am still pretty uncertain about that.
I predict this is basically due to noise, or at best is a very short-lived trend, similarly to the purported faster trend of RL scaling allowing a doubling of 4 months on certain tasks that is basically driven by good scaffolding (which is what RL-on-CoTs was mostly shown to be) and not a creation of new capabilities.
I won't speak for Jacob Pfau, but the easy answer for why infinite time horizons don't exist is simply due to the fact that we have a finite memory capacity, so tasks that require more than a certain amount of memory simply aren't doable.
You can at the very best (though already I'm required to deviate from real humans by assuming infinite lifespans) have time horizons that are exponentially larger than the memory capacity that you have, and this is because once you go beyond 2^B time, where B is the bits of memory, you must repeat yourself in a loop, meaning that if a task requires longer than 2^B units of time to solve, you will never be able to complete the task.
I do agree that METR's horizon work is definitely overrelied on (there's only a few datapoints and there are reasons to believe that the benchmark is biased towards tasks that require little context or memory, among other issues), but I do think the exponential growth in AI capabilities is very plausible a priori, and I wrote up a post on why this should generally be expected (though a caveat is that the doubling times can differ dramatically, so we do need to make sure that we aren't overextrapolating from a narrow selection of tasks), so I think METR's observation of exponential growth is likely to generalize to messy tasks, it's just that the time horizons and doubling factors are different.
One big reason why people don't endorse Heuristic C (though not all of the reason) is that the general population are much more selfish/have much higher time preference than LW/EA people, and in general one big assumption that I think EAs/LWers rely on way too much is that the population inherently cares about the future of humanity, independent of their selfish preferences.
More generally, I think Robin Hanson's right to say that a lot of our altruism is mostly fictional, and is instead a way to signal to exploit social systems/cooperate with other people when it isn't fictional, and the behavior we see is most likely in a world where people's altruism is mostly fictional combined with people not knowing all that much about AI.
This is complementary with other explanations like xpym's.
More generally, a potential crux with a lot of the post is that I think that something like "rationalizing why your preferred policies are correct" to quote PoignardAzur, is ultimately what has to happen to ethical reasoning in general, and there's no avoiding that part, and thus involves dealing with conflict theory inevitably (the comment is how the proposed examples are bad since they invoke political debates/conflict theory issues, but contra that comment I think this isn't avoidable in this domain).
There are interesting questions to ask around how we got to the morals we have (I'd say that something like cooperation between people who need to share things in order to thrive/survive explains why we developed any altruism/moral system that wasn't purely selfish), but in general the moral objectivism assumptions embedded in the discourse are pretty bad if we want to talk about how we got to the morality/values that we have, and it's worth trying to frame the discussion in moral relativist terms.
For what it's worth, I don't think it matters for now, for a couple of reasons:
So I don't really worry about models trying to change their behavior in ways that negatively affect safety/sandbag tasks via steganography/one-forward pass reasoning to fool CoT monitors.
We shall see in 2026 and 2027 whether this continues to hold for the next 5-10 years or so, or potentially more depending on how slowly AI progress goes.
Edit: I retracted the claim that most capabilities come from CoT, due to the paper linked in the very next tweet, and think that RL on CoTs is basically a capability elicitation, not a generator of new capabilities.
While I don't like Mechanize's post fully, and have some reservations about the level of technological determinism expressed in the article, I do think that I'm substantially more skeptical of shifting the paradigm of AIs in ways that don't boost capabilities nearly as much as Sahil thinks, and I'm much more skeptical of the claim that automating away humans was a contingent goal, and tend towards more technological determinism than Sahil:
The Future of AI is Already Written
(I already talked about why AIs are easy to shut down using a simpler hypothesis than Sahil did, this comment is more about how it's way more difficult to steer technological development than people appreciate, and also incorrectly overestimate the level of control humanity does have over things).
I think the counterpoint basically makes the paper instantly become ~0 evidence for the claim that large latent reasoners will exist by the next year, and in general more generic task improvements matter more than specialized task improvements due to the messiness and complexity of reality, and one of my updates over the past 2 years is that RL inference/pre-training scaling dwarfs scaffolding improvements by such large margins that scaffolding quickly becomes worthless, so I no longer consider scaffolded LLMs as a relevant concern/threat.
I'd update back to your prior belief on how likely LLMs will become latent reasoners/have something like neuralese.
I'd also be substantially worried about data leakage here.
I'm retracting the claim that scaffolding doesn't matter permanently (though admittedly I was biased by stuff like the AutoGPT stuff being no longer talked about, presumably because newer LLMs have completely obsoleted their scaffolding).
Edit: Apparently current RL is just mostly the good version of scaffolding that people thought in 2023, if you believe the paper here.
Firstly, your researchers normally have access to the model architecture. This is unfortunate if you want to avoid it leaking. It's not clear how important this is. My sense is that changes to model architecture have been a minority of the algorithmic improvement since the invention of the transformer.
I agree with this, but I'd say it's good to do this anyways, because if AIs start being able to do more and more research, then the chances of architecture/paradigm changes goes up, and this is especially true if AI labor scales faster than human labor, so it's worth preventing this possibility early on.
Also, good news on the new Tinker API.
I would have just answered "It depends on what you want to do", with there being no set best prior/Universal Turing Machine, because of theorems like the No Free Lunch theorem (and more generally a takeaway from learning/computational theories is that there is no one best prior that was always justified, contrary to the ancient philosopher's hopes).
Link to long comments that I want to pin, but are too long to be pinned:
https://www.lesswrong.com/posts/Zzar6BWML555xSt6Z/?commentId=aDuYa3DL48TTLPsdJ
https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/?commentId=Gcigdmuje4EacwirD
https://www.lesswrong.com/posts/DCQ8GfzCqoBzgziew/?commentId=RhTNmgZqjJpzGGAaL