Engineer working on next-gen satellite navigation at Xona Space Systems. I write about effective-altruist and longtermist topics at nukazaria.substack.com, or you can read about puzzle videogames and other things at jacksonw.xyz
I think that jumping straight to big-picture ideological questions is a mistake. But for what it's worth, I tried to tally up some pros and cons of "ideologically who is more likely to implement a post-scarcity socialist utopia" here; I think it's more of a mixed bag than many assume.
I think when it comes to the question of "who's more likely to use AGI build fully-automated luxury communism", there are actually a lot of competing considerations on both sides, and it's not nearly as clear as you make it out.
Xi Jinping, the leader of the CCP, seems like kind of a mixed bag:
Meanwhile on the American side, I'd probably agree with you that the morality of America's current national leaders strikes me as... leaving much to be desired, to put it lightly. Personally, I would give Trump maybe only 1 or 1.5 points out of three on my earlier criteria of "fundamentally humanist outlook + not a psychopath + not a destructive idiot".
Finally, I would note that you are basically raising concerns about humanity's "gradual disempowerment" through misaligned economic and political processes, AI concentration-of-power risks where a small cadre of capricious national leaders and insiders gets to decide the fate of humanity, etc. Per my other comment in this thread, these types of AI safety concerns seem like right now they are being discussed almost exclusively in the West, and not in China. (This particular gradual-disempowerment stuff seems even MORE lopsided in favor of the West, even compared to superintelligence / existential risk concerns in general, which are already more lopsided in favor of the West than the entire category of AI safety overall.) So... maybe give some weight to the idea that if you are worried about a big problem, the problem might be more likely to get solved in the country where people are talking about the problem!
@Tomás B. There is also vastly less of an "AI safety community" in China -- probably much less AI safety research in general, and much less of it, in percentage terms, is aimed at thinking ahead about superintelligent AI. (ie, more of China's "AI safety research" is probably focused on things like reducing LLM hallucinations, making sure it doesn't make politically incorrect statements, etc.)
When people ask this question about the relative value of "US" vs "Chinese" AI, they often go straight for big-picture political questions about whether the leadership of China or the US is more morally righteous, less likely to abuse human rights, et cetera. Personally, in these debates, I do tend to favor the USA, although certainly both the US and China have many deep and extremely troubling flaws -- both seem very far from the kind of responsible, competent, benevolent entity to whom I would like to entrust humanity's future.
But before we even get to that question of "What would national leaders do with an aligned superintelligence, if they had one," we must answer the question "Do this nation's AI labs seem likely to produce an aligned superintelligence?" Again, the USA leaves a lot to be desired here. But oftentimes China seems to not even be thinking about the problem. This is a huge issue from both a technical perspective (if you don't have any kind of plan for how you're going to align superintelligence, perhaps you are less likely to align superintelligence), AND from a governance perspective (if policymakers just think of AI as a tool for boosting economic / military progress and haven't thought about the many unique implications of superintelligence, then they will probably make worse decisions during an extremely important period in history).
Now, indeed -- has Trump thought about superintelligence? Obviously not -- just trying to understand intelligent humans must be difficult for him. But the USA in general seems much more full of people who "take AI seriously" in one way or another -- sillicon-valley CEOs, pentagon advisers, billionare philanthropists, et cetera. Even in today's embarassing administration, there are very high-ranking people (like Elon Musk and J. D. Vance) who seem at least aware of the transformative potential of AI. China's government is more opaque, so maybe they're thinking about this stuff too. But all public evidence suggests to me that they're kinda just blindly racing forward, trying to match and surpass the West on capabilities, without giving much thought as to where this technology might ultimately go.
Thanks for this informative review! (May I suggest that The Witness is a much better candidate for "this generation's Myst"!)
That's a good point -- the kind of idealized personal life coach / advisor Dario describes in his post "Machines of Loving Grace" is definitely in a sense a personality upgrade over Claude 3.7. But I feel like when you think about it more closely, most of the improvements from Claude to ideal-AI-life-coach are coming from non-personality improvements, like:
Semi-related: if I'm reading OpenAI's recent post "How we think about safety and alignment" correctly, they seem to announce that they're planning on implementing some kind of AI Control agenda. Under the heading "iterative development" in the section "Our Core Principles" they say:
In the future, we may see scenarios where the model risks become unacceptable even relative to benefits. We’ll work hard to figure out how to mitigate those risks so that the benefits of the model can be realized. Along the way, we’ll likely test them in secure, controlled settings. We may deploy into constrained environments, limit to trusted users, or release tools, systems, or technologies developed by the AI rather than the AI itself.
Given the surrounding context in the original post, I think most people would read those sentences as saying something like: "In the future, we might develop AI with a lot of misuse risk, ie AI that can generate compelling propaganda or create cyberattacks. So we reserve the right to restrict how we deploy our models (ie giving the biology tool only to cancer researchers, not to everyone on the internet)."
But as written, I think OpenAI intends the sentences above to ALSO cover AI control scenarios like: "In the future, we might develop misaligned AIs that are actively scheming against us. If that happens, we reserve the right to continue to use those models internally, even though we know they're misaligned, while using AI control techniques ('deploy into constrained environments, limit to trusted users', etc) to try and get useful superalignment work out of them anyways."
I don't have a take on the pros/cons of a control agenda, but I haven't seen anyone else note this seeming policy statement of OpenAI's, so I figured I'd write it up.
Maybe this is obvious / already common knowledge, but I noticed that OpenAI's post seems to be embracing an AI control agenda for their future superalignment plans. The heading "iterative development" in the section "Our Core Principles" says the following (emphasis mine):
It’s an advantage for safety that AI models have been growing in usefulness steadily over the years, making it possible for the world to experience incrementally better capabilities. This allows the world to engage with the technology as it evolves, helping society better understand and adapt to these systems while providing valuable feedback on their risks and benefits. Such iterative deployment helps us understand threats from real world use and guides the research for the next generation of safety measures, systems, and practices.
In the future, we may see scenarios where the model risks become unacceptable even relative to benefits. We’ll work hard to figure out how to mitigate those risks so that the benefits of the model can be realized. Along the way, we’ll likely test them in secure, controlled settings. We may deploy into constrained environments, limit to trusted users, or release tools, systems, or technologies developed by the AI rather than the AI itself. These approaches will require ongoing innovation to balance the need for empirical understanding with the imperative to manage risk. For example, making increasingly capable models widely available by sharing their weights should include considering a reasonable range of ways a malicious party could feasibly modify the model, including by finetuning (see our 2024 statement on open model weights). We continue to develop the Preparedness Framework to help us navigate and react to increasing risks.
I think most people would read those bolded sentences as saying something like: "In the future, we might develop AI with a lot of misuse risk, ie AI that can generate compelling propaganda or create cyberattacks. So we reserve the right to restrict how we deploy our models (ie giving the biology tool only to cancer researchers, not to everyone on the internet)."
But as written, I think OpenAI intends the sentences above to ALSO cover scenarios like: "In the future, we might develop misaligned AIs that are actively scheming against us. If that happens, we reserve the right to continue to use those models internally, even though we know they're misaligned, while using AI control techniques ('deploy into constrained environments, limit to trusted users', etc) to try and get useful superalignment work out of them anyways."
I don't have a take on the broader implications of this statement (trying to get useful work out of scheming AIs seems risky but also plausibly doable, and other approaches have their own risks, so idk). But I haven't seen anyone else note this seeming policy statement of OpenAI's, so I figured I'd write it up.
I enjoyed this post, which feels to me part of a cluster of recent posts pointing out that the current LLM architecture is showing some limitations, that future AI capabilities will likely be quite jagged (thus more complementary to human labor, rather than perfectly substituting for labor as a "drop-in remote worker"), and that a variety of skills around memory, long-term planning, agenticness, etc, seem like like important bottlenecks.
(Some other posts in this category include this one about Claude's abysmal Pokemon skills, and the section called "What I suspect AI labs will struggle with in the near term" in this post from Epoch).
Much of this stuff seems right to me. The jaggedness of AI capabilities, in particular, seems like something that we should've spotted much sooner (indeed, it feels like we could've gotten most of the way just based on first-principles reasoning), but which has been obscured by the use of helpful abstractions like "AGI" / "human level AI", or even more rigorous formulations like "when X% of tasks in the economy have been automated".
I also agree that it's hard to envision AI transforming the world without a more coherent sense of agency / ability to play pokemon / etc, although I'm agnostic over whether we'll be able to imbue LLMs with agency via tinkering with scaffolds and training with RL (as discussed elsewhere in this comment thread). At least mild versions of agency seem pretty doable with RL -- just train on a bunch of videogames and web-browsing tasks, and I expect AI to get pretty good at completing videogames and web tasks. But whether that'll scale all the way to being able to manage large software projects and do people's entire jobs autonomously, I dunno.
If math is solved, though, I don't know how to estimate the consequences, and it might invalidate the rest of my predictions.
...there's an explicit carve-out for ??? consequences if math is solved
This got me curious, so I talked to Claude about it. Unfortunately it seems like some of the biggest real-world impacts of "solving math" might come in the form of very significant AI algorithmic improvements, which might obviate some of your other points! (Also: the state of cybersecurity might be thrown into chaos, quant trading would get much more powerful albeit not infinitely powerful, assorted scientific tools could see big improvments.) Here is my full conversation; for the most interesting bit, scroll down to Claude's final response (ctrl-f for "Category 1: Direct Mathematical Optimization).
Improved personality is indeed a real, important improvement in the models, but (compared to traditional pre-training scaling) it feels like more of a one-off "unhobbling" than something we should expect to continue driving improved performance in the future. Going from pure next-token-predictors to chatbots with RLHF was a huge boost in usefulness. Then, going from OpenAI's chatbot personality to Claude's chatbot personality was a noticeable (but much smaller) boost. But where do we go from here? I can't really imagine a way for Anthropic to improve Claude's personality by 10x or 100x (whatever that would even mean). Versus I can imagine scaling RL to improve a reasoning model's math skills by 100x.
My thoughts about the story, perhaps interesting to any future reader trying to decipher the "mysterianism" here, a la these two analyses of sci-fi short stories, or my own attempts at exegeses of videogames like Braid or The Witness. Consider the following also as a token of thanks for all the enjoyment I've recieved from reading Gwern's various analyses over the years.