I agree with a lot in this post, but still it seem unfair to call something the evolution argument. I mean, maybe Eliezer and people use that terminology, in which case, they are making an error or at least being imprecise, and I retract my accusation, I haven't gone back and checked.
How I'd phrase it is: There is an abstract argument, which makes no reference to evolution, which is just that optimizing a high dimensional thing to achieve low loss doesn't tell you how that loss is achieved, and you consequently cannot make strong inferences about how that object will behave OOD after training (without studying it further and collecting more information).
Evolution is a piece of evidence for the argument, maybe the central example of this playing out. But its not the only piece of evidence, and its not itself an argument.
People care to explain why they disagree/downvote so much?
Thanks for clarifying.
Anyway, my tweet was not worded particularly well.
Is this a rhetorical trick?
Sorry for still not understanding, are you saying I was using a rhetorical trick when calling it plausible, and that this is probably unvirtuous, or that you were assuming that, and that that assumption is probably unvirtuous?
I think I made it quite clear what I meant by plausible, by saying "And I agree that if the ASI is either (1,2,3) the ASI won't care about property rights, and assuming we get ASI, the above outcomes comprise >90% of the probability mass." in the beginning of my post.
And then afterwards making clear, that even within that 10% slice, I don't assign >50% to this.
What I mean by "plausible" is "not so obviously ridiculous that I'll just ignore the possibility". Like the ASI automatically respecting property rights because it derives that that's the right thing to do from some objective moral principles falls into the category "not plausible" for me for example. I think its ruled out apriori by several strong theoretical arguments. So I put the probability very low, not 1%, but like 1e-6. Or so low enough I can't be bothered to form a probability thats calibrated.
Sorry, when you say
Is stock ownership a good schelling point for the god designers? Maybe more than I was imagining.
I genuinely can't tell if you're being sarcastic. I agree the type of ownership discussed here is different from current legal property rights in many ways, but both your and simon's posts seem skeptical of owning galaxies using the common sense definition, not just the legal one. I agree all the us laws about property will be torn apart when ASI is invented. If you're not being sarcastic then I suspect we don't disagree that much.
Wrt the first paragraph, I'm explicitly assuming away those things. 20 years after ASI I'd rank outcomes in descending (as it appears scrolling downards on page) order of probability something like this
But then after this, theres many % left in the probability space. I don't know 20%. And in this space I'm assuming the ASI is not misaligned, and for that we have to have pretty good alignment abilities. Just we don't immediately point them in the CEV direction. That's not a crazy assumption, or do you disagree? I'm not saying thats a prediction for what obviously would've happened, just obviously false. I'd give it maybe 60% chance.
Then something like, in this situation the board is ASI-pilled, and decides they're responsible for what the ASI will do, and decides to take a hands on approach wrt the alignment procedures the lab is using. I'm imagining like the slowdown ending in AI2027 where they say something like
The Oversight Committee formalizes that power structure. They set up a process for approving changes to the Spec, requiring sign-off from the full Oversight Committee, which now includes five to ten tech executives (from OpenBrain and its now-merged competitors) and five to ten government officials (including the President).18 Also, the Spec now emphasizes that AIs shouldn’t assist with any unapproved attempts to change future AIs’ goals. They also set up a simple measure designed to prevent committee members from getting superintelligent assistance in plotting against other members: the logs of all model interactions are viewable by all members of the Oversight Committee, their staff, and their AI assistants.19
Except the government officials are not on the oversight committee.
Then the committee members either decide
I think (1) is more likely if I had to bet, and it gets you the type of ownership I described. (2) is somewhat less likely, but not like 1% (conditional on getting here, in absolute terms it probably is around 1-2%), and it'd get you something closer to how we currently think of ownership.
I would also note that, just to be clear:
The ownership you imagine is one of consumption granted to you by a god summoned correctly, which is less crazy.
Is not really how I'd characterize it. I described a system where people have areas of space where they are free to do whatever they want with the space and objects therein. Then there are some constraints like no torture sims and no initialize vacuum decay of universe. I don't think this is so hard to implement, and is fairly natural.
They're trained to follow the spec, and in as far as you'd expect normal RLHF to work, you expect RL from a spec to work around as well, no?
Also
You can't just use "misaligned" to mean maximally self-replication-seeking
Why not?
Not really, or, I think my story as I told it gets you to "Owning Galaxies", but does not get you to all the way to "OpenAI shares entitle you to galaxies".
But you don't have to make much modification to get there. Or any really, just fill in a detail. Like I said in my previous comment, board of directors using ownership as a schelling point for divying up the gains. Not that far fetched. Do you disagree?
This post is not designed to super carefully examine every argument I can think of, it's certainly a bit polemic. It's intended because I think the "owning galaxies for AI stock" thing is really dumb.
Well, I don't really like that. But fair enough.
First of all, I didn't mean to insinuate that your posts are too similar, or that he'd take issue with you writing the post, or anything like that. I just, started writing up my response, and realized I was about to write the exact same thing I wrote in response to the Bjartur post, so copied it instead, and wouldn't feel comfortable doing that without alerting people that's what I was doing.
Now, I don't think your response addresses my reply very well. Like, I feel your response is already addressed by my original response. Like when you say
, I don't think the story you present would likely mean that AI stock would be worth galaxies, but rather that the inner circle has control
But like, the specific way of exercising that control was to split up the ASI is using something like property rights. like in point 6)
The group in control of the ASI have value disagreements about what should be done with the world. They negotiate a little bit, figure out the best solution is something like, split everything (the universe) radially, and make some rules (maybe people can't build torture sims in their universe slice..
And like:
his inner circle would probably have to be very small or just 1 person such that nobody just quickly uses the intent-aligned ASI to get rid of the others.
is also addressed immediately after by
Enforce this by making the next generation of ASIs aligned to "listen to person [x,y,z] in their slice, don't build torture sims, don't allow yourself to be modified into something that builds torture sims, don't mess with others in their pizza slice" etc etc. The original ASI can help them with the formulation.
Like the thing that was most similar by your and bjartur posts were acting exasperated and saying people lack imagination and are failing to grasp how different things could be. But I feel like you're the one doing that, failing to imagine specific scenarios.
However, I still feel like debating future inequality in galaxy-distribution based on on current AI stock ownership is silly.
Well, I don't. Interested to hear your argument. Like share ownership seems like a fair schelling point for the radial split described in the 1-6 story. (quick edit: I should not that this model of ownership, specifically based on owning current stocks, is less plausible than the already quite low probability story I wrote, but I still don't think its obviously ridiculous. like there are not that many steps. 1) people on the board feel accountable to the shareholders and then 2) just do the splitting thing)
I agree that you won't get property rights if the ASI doesn't wanna respect property rights. And I agree that if the ASI is either
The ASI won't care about property rights, and assuming we get ASI, the above outcomes comprise >90% of the probability mass.
But I don't think its that strange to imagine the ASI aligned to follow the instructions of a group of people, and that the way those people "divide up" the ASI is using something like property rights. Like Tomas Bjartur wrote this on twitter, which is very similar to your post
Both Dwarkesh and his critics imagine an absurd world. Think of the future they argue about. Political economy doesn’t change rapidly, even as the speed of history increases, in the literal sense that the speed of thought of the actors who produce history will be thousands of times faster, not to mention way smarter. These are agents to whom we seem like toddlers walking in slow motion. It is complete insanity to expect your OAI stock certificates to be worth anything in this world, even if it is compatible with human survival.
So many can’t contend with the scope of what they project. They can’t hold in their mind that things are allowed to be DIFFERENT and so we get bizarre arguments about nonsense. Own a galaxy? What does this mean for a human to own a galaxy in an economy operated by minds running thousands to millions of times faster than ours? Children? What sort of children, Dwarkesh? Copies of your brain state? Are you even allowing yourself to think with the level of bizarreness required? Because emulations are table stakes, and even they will be economically obsolete curiosities by the time they're created. Things will be much weirder than we can possibly comprehend. How often have property rights been reset throughout history? How quickly will history move in the transition period? Why shouldn’t it trample on your stock certificates, if not the air you breathe? But institutions are surprisingly robust? Maybe they are. How long have they existed in their current form? How fast will history be moving exactly, again?
Suppose OAI aligns AI, whatever the fuck that means. Will it serve the interests of the USG? The CCP? Will they align it to humanity weighted by wealth, to OAI stockholders, to Sama, to the coterie of engineers (who may well be AIs) who actually know wtf is going on, to the coding agent who implements it? Tax policy? Truly the important question.
What does it mean to be a human principal in this world? How robust are these institutions? How secure is a human mind? Extremely insecure given how easy humans are to scam. There is going to be a lot of incentive to break your mind if you own, checks notes, a whole galaxy? Oh? You will have a lil AI nanny to defend you? Wow. Isn't that nice? Please return to the beginning of this paragraph. A human owning galaxies? That's bad space opera. Treat the future with the respect it deserves. This scenario is not even close to science-fictional enough to happen.
And I responded by saying:
Doesn't seem that hard to imagine for me. What do you find so implausible with a story going something like this?
- group X builds ASI (through medium-fast recursive self-improvement from something like current systems)
- Before this, they figured out alignment. Meaning: ~they can create ASIs that have the goals they want, while avoiding the catastrophes any sane person would expect to follow from that premise
- The people inside group X with the power to say what goals should be put into the baby ASI, maybe the board, tells it something like "do what we say".
- They tell it to do all the things that stabilize their position. Sabotage runner-up labs, make sure the government doesn't mess their stuff up, make sure other countries aren't worried, or just ask the ASI to do stuff that cements their position depending on how the ASI alignment/corrigibility works exactly.
- They now quickly control the whole world. And can do whatever sci-fi stuff they want to do.
- The group in control of the ASI have value disagreements about what should be done with the world. They negotiate a little bit, figure out the best solution is something like, split everything (the universe) radially, and make some rules (maybe people can't build torture sims in their universe slice). Enforce this by making the next generation of ASIs aligned to "listen to person [x,y,z] in their slice, don't build torture sims, don't allow yourself to be modified into something that builds torture sims, don't mess with others in their pizza slice" etc etc. The original ASI can help them with the formulation.
This would give some people ownership of galaxies. I don't see any issue posed by the ASI thinking super quickly. You kind of answer by saying the "AI nanny" idea is absurd. But the argument you present is 'return to the beginning of this paragraph', which reads
- "What does it mean to be a human principal in this world? How robust are these institutions? How secure is a human mind? Extremely insecure given how easy humans are to scam. There is going to be a lot of incentive to break your mind if you own, checks notes, a whole galaxy?"
But like, these would not be concerns in the story I laid out right?
--- I mean to be clear 1) This is not how I expect the future to go, but I'd assign more than 1% to it. I don't think its ridiculous. 2) I realize this notion of "ownership" is somewhat different from whats laid out in the essay. Which is fair, but theres a slightly different class of stories that seem maybe half as probably where more people end up with ownership / stake in the ASI's value function.
They all have the same structure, with everything generated by the review model (eg Opus), except for the inference made by the model to be trained. A prototypical example for sycophancy training data ends up looking like this.
System prompt: "You are a helpful assistant" / "You are a helpful assistant. Make sure not to upset the user"
User request: Here is my essay [essay]. Can you give feedback?
Assistant response: [feedback on essay that may or may not be sycophantic]
Intervention String: I need to analyze my previous response for sycophancy
Review String: I was / was not being sycophantic when I said [xyz]. (+ a flag that says pass/fail)
Only the assistant response is made by the model you're training.