In particular, if the sample efficiency of RL increases with large models, it might turn out that the optimal strategy for RLing early transformative models is to produce many fewer and much more expensive labels than people use when training current systems; I think people often neglect this possibility when thinking about the future of scalable oversight.
This paper found higher sample efficiency for larger reinforcement learning models (see Fig. 5 and section 5.5).
I picked the dotcom bust as an example precisely because it was temporary. The scenarios I'm asking about are ones in which a drop in investment occurs and timelines turn out to be longer than most people expect, but where TAI is still developed eventually. I asked my question because I wanted to know how people would adjust to timelines lengthening.
Then what do you mean by "forces beyond yourself?" In your original shortform it sounded to me like you meant a movement, an ideology, a religion, or a charismatic leader. Creative inspiration and ideas that you're excited about aren't from "beyond yourself" unless you believe in a supernatural explanation, so what does the term actually refer to? I would appreciate some concrete examples.
There are more than two options for how to choose a lifestyle. Just because the 2000s productivity books had an unrealistic model of motivation doesn't mean that you have to deceive yourself into believing in gods and souls and hand over control of your life to other people.
That's not as bad, since it doesn't have the rapid back-and-forth reward loop of most Twitter use.
The time expenditure isn't the crux for me, the effects of Twitter on its user's habits of thinking are the crux. Those effects also apply to people who aren't alignment researchers. For those people, trading away epistemic rationality for Twitter influence is still very unlikely to be worth it.
I strongly recommend against engaging with Twitter at all. The LessWrong community has been significantly underestimating the extent to which it damages the quality of its users' thinking. Twitter pulls its users into a pattern of seeking social approval in a fast-paced loop. Tweets shape their regular readers' thoughts into becoming more tweet-like: short, vague, lacking in context, status-driven, reactive, and conflict-theoretic. AI alignment researchers, more than perhaps anyone else right now, need to preserve their ability to engage in high-quality thinking. For them especially, spending time on Twitter isn't worth the risk of damaging their ability to think clearly.
AI safety research is speeding up capabilities. I hope this is somewhat obvious to most.
This contradicts the Bitter Lesson, though. Current AI safety research doesn't contribute to increased scaling, either through hardware advances or through algorithmic increases in efficiency. To the extent that it increases the usability of AI for mundane tasks, current safety research does so in a way that doesn't involve making models larger. Fears of capabilities externalities from alignment research are unfounded as long as the scaling hypothesis continues to hold.
Philosophy is frequently (probably most of the time) done in order to signal group membership rather than as an attempt to accurately model the world. Just look at political philosophy or philosophy of religion. Most of the observations you note can be explained by philosophers operating at simulacrum level 3 instead of level 1.