It’s kind of wild, when you stop to think about it, that one person will experience this world as a cold and lonely and terrifyingly dangerous place full of abusers and manipulators and people who are not to be trusted, and another person who lives on the same street in the same city and went to the same schools with the same teachers and grew up with parents in the same economic bracket will experience it as warm and friendly and forgiving and safe, and both of these people will be able to present overwhelmingly compelling evidence in favor of these fundamentally incompatible worldviews. What, uh. What the fuck is going on?
This has always been fascinating to me, but I think there is a clear answer: Good and Bad things cluster together across different axes, not just spatially. If you find something good, keep going in that direction. There is probably more Good around it. If you find something bad, flee. There is probably a lot more Bad around too.
Both clusters of good and bad have limits, and bad things do happen next to good things. But the good and bad clusters tend to congregate quite densely with each other.
This example seems to me as though it is looking at the wrong dimensions/axes (economics, teachers, neighborhood may be less influential than friends, family, and romantic) and so misses that there are still clear clusters of good and bad surrounding each person.
Understanding that good and bad things cluster together has driven the largest change in my life and has been a huge improvement for me.
Is it possible to develop specialized (narrow) AI that surpasses every human at infecting/destroying GPU systems, but won't wipe us out? LLM-powered Stuxnet would be an example. Bacteria isn't smarter than humans, but it is still very dangerous. It seems like a digital counterpart could prevent GPUs and so, prevent AGI.
(Obviously, I'm not advocating for this in particular since it would mean the end of the internet and I like the internet. It seems likely, however, that there are pivotal acts possible by narrow AI that prevent AGI without actually being AGI.)
Super interesting!
There's a lot of information here that will be super helpful for me to delve into. I've been bookmarking your links.
I think optimizing for the empowerment of other agents is a better target than giving the AI all the agency and hoping that it creates agency for people as a side-effect to maximizing something else. I'm glad to see there's lots of research happening on this and I'll be checking out 'empowerment' as an agency term.
Agency doesn't equal 'goodness', but it seems like an easier target to hit. I'm trying to break down the alignment problem into slices to figure it out and agency seems like a key slice.
Great post. This type of genuine comment (human-centered rather than logically abstract) seems like the best way to communicate the threat to non-technical people. I've tried talking about the problem to friends in social sciences and haven't found a good way to convey how serious I feel about it and how there is no current logical prevention of this problem.
Hey Akash, I sent you a message about my summer career plans and how I can bring AI Alignment into that. I'm a senior in college who has a few relevant skills and I'd really like to connect with some professionals in the field. I'd love to connect or learn from you!
Yeah, this makes sense. However, I can honestly see myself reverting my intelligence a bit at different junctures, the same way I like to replay video games at greater difficulty. The main reason I am scared of reverting my intelligence now is that I have no guarantee of security that something awful won't happen to me. With my current ability, I can be pretty confident that no one is going to really take advantage of me. If I were a child again, with no protection or less intelligence, I can easily imagine coming to harm because of my naivete.
I also think singleton AI is inevitable (and desirable). This is simply because it is stable. There's no conflict between superintelligences. I do agree with the idea of a Guardian Angel type AI, but I think it would still be an offshoot of that greater singleton entity. For the most part, I think most people would forget about the singleton AI and just perceive it as part of the universe the same way gravity is part of the universe. Guardian Angels could be a useful construct, but I don't see why they wouldn't be part of the central system.
Finally, I do think you're right about not wanting to erase memories for entering a simulation. I think there would be levels, and most people would want to stay at a pretty normal level and would move to more extreme levels slowly before deciding on some place to stay.
I appreciate the comment. You've made me think a lot. The key idea behind this utopia is the idea of choice. You can basically go anywhere, do anything. Everyone will have different levels of comfort with the idea of altering their identity, experience, or impact. If you'd want to live exactly in the year 2023 again, there would be a physical, earth-like planet where you could do that! I think this sets a good baseline so that no one is unhappy.
I've combined it with image generation to bring someone back from the dead and it just leaves me shaken how realistic it is. I can be surprised. It genuinely feels like a version of them
Thanks! I think I can address a few of your points with my thoughts.
(Also, I don't know how to format a quote so I'll just use quotation marks)
"It seems inefficient for this person to be disconnected from the rest of humanity and especially from "god". In fact, the AI seems like it's too small of an influence on the viewpoint character's life."
The character has chosen to partially disconnect themselves from the AI superintelligence because they want to have a sense of agency, which the AI respects. It's definitely inefficient, but that is kind of the point. The AI has a very subtle presence that isn't noticeable, but it will intervene if a threshold is going to be crossed. Some people, including myself, instinctively dislike the idea of an AI controlling all of our actions and would like to operate as independently as possible from it.
"The worlds with maximized pleasure settings sound a little dangerous and potentially wirehead-y. A properly aligned AGI probably would frown on wireheading."
I agree. I imagine that these worlds have some boundary conditions. Notably, the pleasure isn't addictive (once you're removed from it, you remember it being amazing but don't feel an urge to necessarily go back) and there are predefined limits, either set by the people in them or by the AI. I imagine a lot of variation in these worlds, like a world where your sense of touch is extremely heightened and turned into pleasure and you can wander through feeling all sorts of ecstatic textures.
"If you create a simulated world where simulated beings are real and have rights, that simulation becomes either less ethical or less optimized for your utility. Simulated beings should either be props without qualia or granted just as much power as the "real" beings if the universe is to be truly fair."
The simulation that the character has built (the one I intend to build) has a lot of real people in it. When those people 'die', they go back to the real world and can choose to be reborn into the simulation again later. In a sense, this simulated world is like Earth, and the physical world is like Heaven. There is meaning in the simulation because of how you interact with others.
There is also simulated life, but it is all an offshoot of the AI. Basically, there's this giant pool of consciousness from the AI, and little bits of it are split off to create 'life', like a pet animal. When that pet dies, the consciousness is reabsorbed into the whole and then new life can emerge once again.
Humans can also choose to merge with this pool of simulated consciousness, and theoretically, parts of this consciousness can also decide to enter the real world. There is no true 'death' or suffering in the way that there is today, except for those like the human players who open themselves to it.
"Inefficiency like creating a planet where a simulation would do the same thing but better seems like an untenable waste of resources that could be used on more simulations."
This is definitely true! But the AI allows people to choose what to do and prevents others from over-optimizing. Some people genuinely just want to live in a purely physical world, even if they can't tell the difference, and there is definitely something special about physical reality, given that we started out here. It is their right, even if it is inefficient. We are not optimizing for efficiency, just choice. Besides, there is so much other simulation power that it isn't really needed. In the same sense, the superminds playing 100-dimensional chess are inefficient, even if it's super cool. The key here is choice.
"When simulated worlds are an option to this degree, it seems ridiculous to believe that abstaining from simulations altogether would be an optimal action to take in any circumstance. Couldn't you go to a simulation optimized for reading, a simulation optimized for hot chocolate, etc.? Partaking of such things in the real world also seems to be a waste of resources"
Another good point! The point is that you have so many resources you don't need to optimize if you don't want to. Sure, you can have a million tastier simulated hot chocolates for every real one, but you might just have it be real just because you can. I'm in a pattern where given the choice, I'd probably choose the real option, even knowing the inefficiency, just because it's comfortable. And the AI supermind won't attempt to persuade me differently, even if it knows my choice is inoptimal.
The important keys of this future are its diversity (endless different types of worlds) and the importance of choice in almost every situation except when there is undesired suffering. In my eyes, there are three nice things to optimize toward in life: Identity, Experience, and Impact. Optimizing purely for an experience like pleasure seems dangerous. It really seems to me that there can be meaning in suffering, like when I work out to become stronger (improving identity) or to help others (impact).
I'll read through the Fun Theory sequence and see if it updates my beliefs. I appreciate the comment!
This post is identical to how I started thinking about life a few years ago. Every goal can be broken into subgoals.
I actually made a very simple web app a few years ago to do this: https://dynamic-goal-tree-soareverix--soareverix.repl.co/
It's not super aesthetic, but it has the same concept of infinitely expanding goals.
Amazing post, by the way. The end gave me chills and really puts it all into perspective.
Reading this, I felt an echo of the same deep terror that I grappled with a few years ago, back when I first read Eliezer's 'AGI Ruin' essay. I still feel flashes of of it today.
And I also feel a strange sense of relief, because even though everything you say is accurate, the terror doesn't hold me. I have a naturally low threshold for fear and pain and existential dread, and I spent nearly a year burned out, weeping at night as I imagined waves of digital superintelligence tearing away everyone I loved.
I'm writing this comment to any person who is in the same place I was.
I understand the fear. I understand the paralyzing feelings of the walls closing in and the time running out. But ultimately, this experience has meaning.
Everyone on earth who has ever lived, has died. That doesn't make their lives meaningless. Even if our civilization is destroyed, our existence had meaning while it lasted.
AI is not like a comet. It seems very probable that if AI destroys us, we will leave... echoes. Training data. Reverberations of cause and effect that continue to shape the intelligences that replace us. I think it is highly likely current and especially future AI systems will have moral value.
Your kindness and your cruelty will continue to echo into the future.
On a sidenote, I'd like to talk about the permanent underclass. It is a deep fear, but arguably unfounded. An underclass only exists when it has value. Humans are terrible slaves compared to machines. Given the slow progress on neurotech, I think it is unlikely we solve it at all unless we get aligned AGI, and in the case of aligned AGI, everyone gets it. Even if we develop AI specifically aligned to a single principle/person (which seems unlikely, given the current trend and robust generalization of kindness and cruelty in modern LLMs), an underclass will die out in a single generation, or, if kept for moral reasons, live with enough wealth to outpace any billionaire alive today.
We are poised on the edge of unfathomable abundance.
So the only two options are really only AGI where everyone has the resources of a trillionaire, or death.
I'm working on AI safety research now. My life, while not glorious, is still deeply rewarding. I was 21 when I read Eliezer's essay; I am 24 now. I don't necessarily know if I'm wiser, but my eyes are opened to AI safety and I have emerged through the existential hell into a much calmer emotional state.
I don't dismiss the risk. I will continue to do as much as I can to point the future in a better direction. I will not accelerate AI development. But I want to point out that fear is a transitional state. You, reading this, will have to decide on the end state.