To me there seem to be many examples of good impactful strategy research that don't introduce big new ontologies or go via illegibility:
I do also see examples of big contributions that are in the form of new ontologies like reframing superintelligence. But these seem less common to me.
If you have utility proportional to the logarithm of 1 dollar plus your wealth, and you Nash bargain across all your possible selves, you end up approximately maximizing the expected logarithm of the logarithm of 1 dollar plus your wealth
I wonder if you could make the result here a lot less extreme risk aversion if you took the disagreement point to be "your possible selves control money proportional to their probability" rather than "no money"
I'd have thought that metr trend is largely newer models sustaining the SAME slope but for MORE TOKENS. Ie the slope goes horizontal after a bit for AI (but not for humans), and the point at which it goes horizontal is being delayed more and more
Yeah, I thought this piece struck a really nice tone and pointed to something important.
Re the counterfactual, hard to know. I was already thinking about risks from centralising AGI development, and about the ease of the leading project getting a DSA at this point. And I think Lukas was already thinking about the risk of AI-enabled coups. So I think it's pretty unlikely that this was counterfactually responsible for the ai-enabled coups report happening vs not.
But I certainly read this piece and it influenced my thinking, and I believe Lukas read it as well. I think it made me feel more confident to lean in to disagreement with the status quo and more conviction in doing that.
Thanks! (Quickly written reply!)
I believe I was here thinking about how society has, at least in the past few hundred years, spent a minority of GDP on obtaining new raw materials. Which suggests that access to such materials wasn't a significant bottleneck on expansion.
So it's a stronger claim that "hard cap". I think a hard cap would, theoretically, result in all GDP being used to unblock the bottleneck, as there's no other way to increase GDP. I think you could quantify the strength of the bottleneck as the marginal elasticity of GDP to additional raw materials. In a task-based model, i think the % of GDP spent on each task is proportional to this elasticity?
Though this seems kind of like a fully general argument
Yeah, I think maybe it is? I do feel like, given the very long history of sustained growth, it's on the sceptic to explain why their proposed bottleneck will kick in with explosive growth but not before. So you could state my argument as: raw materials never bottlenecked growth before; no particular reason they would just bc growth is faster bc that faster growth is driven by having more labour+capital which can be used for gathering more resources; so we shouldn't expect raw materials to bottleneck growth in the future.
TBC, this is all compatible with "if we had way more raw materials then this would boost output". E.g. in Cobb Douglas doubling an input to output increases output notably, but there still aren't bottlenecks.
(And i actually agree that it's more like CES with rho<0, i.e. raw materials is a stronger bottleneck, but just think we'll be able to spend output to get more raw materials.)
(Also, to clarify: this is all about the feasibility of explosive growth. I'm not claiming it would be good to do any of this!)
Yeah, i think one of the biggest weaknesses of this model, and honestly of most thinking on the intelligence explosion, is not carefully thinking through the data.
During an SIE, AIs will need to generate data themselves, by doing the things that human researchers currently do to generate data. That includes finding new untapped data sources, creating virtual envs, creating SFT data themselves by doing tasks with scaffolds, etc.
OTOH it seems unlikely they'll have anything as easy as the internet to work with. OTOH, internet data is actually v poorly targeted at teaching AIs how to do crucial real-world tasks, so perhaps with abundant cognitive labour you can do much better and make curriculla that directly targeted the skills that most need improving
Yep, the 'gradual boost' section is the one for this. Also my historical work on the compute-centric model (see link in post) models gradual automation in detail.
So if you've fully ignored the fact that pre-ASARA systems have sped things up, then accounting for that will make takeoff less fast bc by the time ASARA comes around you'll have already plucked much of the low-hanging fruit of software progress.
But I didn't fully ignore that, even outside of the gradual boost section. I somewhat adjusted my estimate of r and of "distance to effective limits" to account for intermediate software progress. Then, in the gradual boost section, i got rid of these adjustments as they weren't needed. Turned out that takeoff was then faster. My interpretation (as i say in the gradual boost section): dropping those adjustments had a bigger effect than changing the modelling.
To put it anothr way: if you run the gradual boost section but literally leave all the parameters unchanged, you'll get a slower takeoff.
Forethought is hiring!
You can see our research here.
You can read about what it’s like to work with us here.
We’re currently hiring researchers, and I’d love LW readers to apply.
If you like writing and reading LessWrong, I think you might also enjoy working at Forethought.
I joined Forethought a year ago, and it’s been pretty transformative for my research. I get lots of feedback on my research and great collaboration opportunities.
The median views of our staff are often different from the median views of LW. E.g. we probably have a lower probability on AI takeover (though I’m still >10% on that). That's part of the reason i'm excited for LW readers to apply. I think a great way to make intellectual progress is via debate. So we want to hire ppl who strongly disagree with us, and have their own perspectives on what’s going on in AI.
We’ve also got a referral bounty of £10,000 for counterfactual recommendations for successful Senior Research Fellow hires, and £5,000 for Research Fellows.
The deadline for applications is Sunday 2nd November. Happy to answer questions!
I also work at Forethought!
I agree with a lot of this post, but wanted to flag that I would be very excited for ppl doing blue skies research to apply and want Forethought to be a place that's good for that. We want to work on high impact research and understand that sometimes mean doing things where it's unclear up front if it will bear fruit.
Thanks for articulating your view in such detail. (This was written with transcription software. Sorry if there are mistakes!)
AI risk:
When I articulate the case for AI takeover risk to people I know, I don't find the need to introduce them to new ontologies. I can just say that AI will be way smarter than humans. It will want things different from what humans want, and so it will want to seize power from us.
But I think I agree that if you want to actually do technical work to reduce the risks, that it is useful to have new concepts that point out why the risk might arise. I think reward hacking, instrumental convergence, and corrigibility are good examples.
To me, this seems like a case where you can identify a new risk without inventing a new ontology, but it's plausible that you need to make ontological progress to solve the problem.
Simulations:
On the simulation argument, I think that people do in fact reason about the implications of simulations for example thinking about a-causal trade or threat dynamics. So I don't think that it hasn't gone anywhere. It obviously hasn't become very practical yet, but I wouldn't think that that's due to the nature of the concept vs the inherent subject matter.
I don't really understand why we would need new concepts to think about what's outside a simulation rather than just applying our existing concepts that we use to describe the physical world outside of simulations within our universe and to describe other ways that the universe could have been.
LTism:
Okay, it's helpful to know that you see these as providing new valuable ontologies to some extent.
In my mind, there is not much ontological innovation going on in these concepts, because they can be stated in one sentence using pre-existing concepts. Vulnerable world hypothesis is the idea that at some point, there are so many technologies that we will develop that one of them will allow the person who develops it to easily destroy everyone else. Astronomical waste is the idea that there is a massive amount of stuff in space, but that if we wait a hundred years before grabbing it all, we will still be able to grab pretty much just as much stuff. So there is no need to rush.
To be clear, I think that this work is great. I just thought you had something more illegible in mind by what you consider to be ontological progress. So maybe we're closer to each other than I thought.
Extinction:
It sometimes seems to me like you jump to the conclusion that all the action is in the edge cases without actually arguing for it. According to most of the traditional stories about AI risk, everyone does literally die. And in worlds where we align AI, I do expect that people will be able to stay in their biological forms if they want to.
Lock in:
I'm sympathetic that there's useful work to do in finding a better ontology here
Human powergrabs:
I've seen you say this a lot, but still not seen you actually argue for it convincingly. it seems totally possible that alignment will be easy, and that the only force behind the power grab will be coming from humans, with AI only doing it because humans train them to do so. It also seems plausible that the humans that develop superintelligence don't try to do a power grab, but that the AI is misaligned and does so itself. In my mind, both of the pure case scenarios are very plausible. Again, it seems to me like you're jumping to the conclusion that all the action is in the edge case, without arguing for it convincingly.
Separating out the two is useful for thinking about mitigations because there are certain technical mitigations you do for misaligned AI that don't help with human motivation to seek power. And there are certain technical and governance mitigations you would do if you're worried about human seeking power that would not help with misaligned AIs
Epistemics:
it seems pretty plausible to me that if you improved our fundamental understanding of how societal epistemics works, that would really help with improving it. At the same time, I think identifying that this is a massive lever over the future is important strategy work even if you haven't yet developed the new ontology. This might be like identifying that AI takeover risk is a big risk without developing the ontology needed to say solve it
Zooming out:
In general, a theme here is that I'm finding myself more sympathetic with your claims if we need to fully solve a v complex problem like alignment. But disagreeing that you need new ontologies to identify new, important problems.
I like the idea that you could play a role as translating between the pro-illegible camp and the more legible sympathetic people, because I think you are a clear writer, but certainly seem drawn to illegible things