This isn't what I mean. It doesn't mean you're not using real things to construct your argument, but that doesn't mean the structure of the argument reflects something real. Like, I kind of imagine it looking something like a rationalist Jenga tower, where if one piece gets moved, it all crashes down. Except, by referencing other blog posts, it becomes a kind of Meta-Jenga: a Jenga tower composed of other Jenga towers. Like "Coherent decisions imply consistent utilities". This alone I view to be its own mini Jenga tower. This is where I think String Theorists went wrong. It's not that humans can't, in theory, form good reasoning based on other reasoning based on other reasoning and actually arrive at the correct answer, it's just that we tend to be really, really bad at it.
The sort of thing that would change my mind: there's some widespread phenomenon in machine learning that perplexes most, but is expected according to your model, and any other model either doesn't predict it as accurately, or is more complex than yours.
I dislike the overuse of analogies in the AI space, but to use your analogy, I guess it's like you keep assigning a team of engineers to build a car, and two possible things happen. Possibility One: the engineers are actually building car engines, which gives us a lot of relevant information for how to build safe cars (toque, acceleration, speed, other car things), even if we don't know all the details for how to build a car yet. Possibility Two: they are actually just building soapbox racers, which doesn't give us much information for building safe cars, but also means that just tweaking how the engineers work won't suddenly give us real race cars.
If progress in AI is continuous, we should expect record levels of employment. Not the opposite.
My mentality is if progress in AI doesn't have a sudden, foom-level jump, and if we all don't die, most of the fears of human unemployment are unfounded... at least for a while. Say we get AIs that can replace 90% of the workforce. The productivity surge from this should dramatically boost the economy, creating more companies, more trading, and more jobs. Since AIs can be copied, they would be cheap, abundant labor. This means anything a human can do that an AI still can't becomes a scarce, highly valued resource. Companies with thousands or millions of AI instances working for them would likely compete for human labor, because making more humans takes much longer than making more AIs. Then say, after a few years, AIs are able to automate 90% of the remaining 10%. Then that creates even more productivity, more economic growth, and even more jobs. This could continue for even a few decades. Eventually, humans will be rendered completely obsolete, but by that point (most) of them might be so filthy rich that they won't especially care.
This doesn't mean it'll all be smooth-sailing or that humans will be totally happy with this shift. Some people probably won't enjoy having to switch to a new career, only for that new career to be automated away after a few years, and then have to switch again. This will probably be especially true for people who are older, those who have families, want a stable and certain future, etc. None of this will be made easier by the fact it'll probably be hard to tell when true human obsolescence is on the horizon, so some might be in a state of perpetual anxiety, and others will be in constant denial.
I think my main problem with this is that it isn't based on anything. Countless times, you just reference other blog posts, which reference other blog posts, which reference nothing. I fear a whole lot of people thinking about alignment are starting to decouple themselves from reality. It's starting to turn into the AI version of String Theory. You could be correct, but given the enormous number of assumptions your ideas are stacked on (and that even a few of those assumptions being wrong leads to completely different conclusions), the odds of you even being in the ballpark of correct seem unlikely.
At first I strong-upvoted this, because I thought it made a good point. However, upon reflection, that point is making less and less sense to me. You start by claiming current AIs provide nearly no data for alignment, that they are in a completely different reference class from human-like systems... and then you claim we can get such systems with just a few tweaks? I don't see how you can go from a system that, you claim, provides almost no data for studying how an AGI would behave, to suddenly having a homunculus-in-the box that becomes superintelligent and kills everyone. Homunculi seem really, really hard to build. By your characterization of how different actual AGI is from current models, it seems this would have to be fundamentally architecturally different from anything we've built so far. Not some kind of thing that would be created by near-accident.
Contra One Critical Try: AIs are all cursed
I don't feel like making this a whole blog post, but my biggest source for optimism for why we won't need to one-shot an aligned superintelligence is that anyone who's trained AI models knows that AIs are unbelievably cursed. What do I mean by this? I mean even the first quasi-superintelligent AI we get will have so many problems and so many exploits that taking over the world will simply not be possible. Take a "superintelligence" that only had to beat humans at the very constrained game of Go, which is far simpler than the real world. Everyone talked about how such systems were unbeatable by humans, until some humans used a much "dumber" AI to find glaring holes in Leela Zero's strategy. I expect, in the far more complex "real world", a superintelligence will have even more holes, and even more exploits, a kind of "swiss chess superintelligence". You can say "but that's not REAL superintelligence", and I don't care, and the AIs won't care. But it's likely the thing we'll get first. Patching all of those holes, and finding ways to make such an ASI sufficiently not cursed will also probably mean better understanding of how to stop it from wanting to kill us, if it wanted to kill us in the first place. I think we can probably get AIs that are sufficiently powerful in a lot of human domains, and can probably even self-improve, and still be cursed. The same way we have AIs with natural language understanding, something once thought to be a core component of human intelligence, that are still cursed. A cursed ASI is a danger for exploitation, but it's also an opportunity.
I'm kind of surprised this has almost 200 karma. This feels much more like a blog post on substack, and much less like the thoughtful, insightful new takes on rationality that used to get this level of attention on the forum.
Why would it matter if they notice or not? What are they gonna do? EMP the whole world?
I think you're missing the point. If we could establish that all important information had been extracted from the original, would you expect humans to then destroy the original or allow it to be destroyed?
My guess is that they wouldn't. Which I think means practicality is not the central reason why humans do this.
if we could somehow establish how information from the original was extracted, do you expect humans to then destroy the original or allow it to be destroyed?