That’s the path the world seems to be on at the moment. It might end well and it might not, but it seems like we are on track for a heck of a roll of the dice.
I agree with almost everything you've written in this post, but you must have some additional inside information about how the world got to this state, having been on the board of OpenAI for several years, and presumably knowing many key decision makers. Presumably this wasn't the path you hoped that OpenAI would lead the world onto when you decided to get involved? Maybe you can't share specific details, but can you at least talk generally about what happened?
(In additional to satisfying my personal curiosity, isn't this important information for the world to have, in order to help figure out how to get off this path and onto a better one? Also, does anyone know if Holden monitors the comments here? He apparently hasn't replied to anyone in months.)
How much have you looked into other sources of AI risk?
We have a lot of experience and knowledge of building systems that are broadly beneficial and safe, while operating in the human capabilities regime.
What? A major reason we're in the current mess is that we don't know how to do this. For example we don't seem to know how to build a corporation (or more broadly an economy) such that its most powerful leaders don't act like Hollywood villains (race for AI to make a competitor 'dance')? Even our "AGI safety" organizations don't behave safely (e.g., racing for capabilities, handing them over to others, e.g. Microsoft, with little or no controls on how they're used). You yourself wrote:
Unfortunately, given that most other actors are racing for as powerful and general AIs as possible, we won’t share much in terms of technical details for now.
How is this compatible with the quote above?!
Looking forward to your next post, but in the meantime:
My first thought upon hearing about Microsoft deploying a GPT derivative was (as I told a few others in private chat) "I guess they must have fixed the 'making up facts' problem." My thinking was that a big corporation like Microsoft that mostly sells to businesses would want to maintain a reputation for only deploying reliable products. I honestly don't know how to adjust my model of the world to account for whatever happened here... except to be generically more pessimistic?
But it seems increasingly plausible that AIs will not have explicit utility functions, so that doesn’t seem much better than saying humans could merge their utility functions.
There are a couple of ways to extend the argument:
I think AIs with simpler values (e.g., paperclip maximizers) have an advantage with both 1 and 2, which seems like bad news for AI risk.
Whereas shard theory seems aimed at a model of human values that’s both accurate and conceptually simple.
Let's distinguish between shard theory as a model of human values, versus implementing an AI that learns its own values in a shard-based way. The former seems fine to me (pending further research on how well the model actually fits), but the latter worries me in part because it's not reflectively stable and the proponents haven't talked about how they plan to ensure that things will go well in the long run. If you're talking about the former and I'm talking about the latter, then we might have been talking past each other. But I think the shard-theory proponents are proposing to do the latter (correct me if I'm wrong), so it seems important to consider that in any overall evaluation of shard theory?
BTW here are two other reasons for my worries. Again these may have already been addressed somewhere and I just missed it.
PBR-A, EGY, BTU, ARCH, AMR, SMR.AX, YAL.AX (probably not a good time to enter this last one) (Not investment advice, etc.)
I don't think I understand, what's the reason to expect that the "acausal economy" will look like a bunch of acausal norms, as opposed to, say, each civilization first figuring out what its ultimate values are, how to encode them into a utility function, then merging with every other civilization's utility function? (Not saying that I know it will be the latter, just that I don't know how to tell at this point.)
Also, given that I think AI risk is very high for human civilization, and there being no reason to suspect that we're not a typical pre-AGI civilization, most of the "acausal economy" might well consist of unaligned AIs (created accidentally by other civilizations), which makes it seemingly even harder to reason about what this "economy" looks like.