When thinking about embedded agency it might be helpful to drop the notion of ‘agency’ and ‘agents’ sometimes, because it might be confusing or underdefined. Instead one could think of processes running according to the laws of physics. Or of algorithms running on a stack of interpreters running on the hardware of the universe.
In addition (or as a corollary) to an alternative way of thinking about agents, you will also read about an alternative way of thinking about yourself.
The following is mostly a stream of thought that went through my head after I drank a cup of strong milk coffee and sat down reading Embedded Agency. I consume caffeine only twice a week. So when I do, it takes my thinking to new places. (Or maybe it's because the instant coffee powder that I use expired in March 2012.)
My thoughts kick off at this paragraph of Embedded Agency:
In addition to hazards in her external environment, Emmy is going to have to worry about threats coming from within. While optimizing, Emmy might spin up other optimizers as subroutines, either intentionally or unintentionally. These subsystems can cause problems if they get too powerful and are unaligned with Emmy’s goals. Emmy must figure out how to reason without spinning up intelligent subsystems, or otherwise figure out how to keep them weak, contained, or aligned fully with her goals.
This is our alignment problem repeated in Emmy. In other words, if we solve the alignment problem, it is also solved in Emmy and vice versa. If we view the machines that we want to run our AI on as part of ourselves, we're the same as Emmy.
We are a low-capability Emmy. We are partially and often unconsciously solving the embedded agency subproblems using heuristics. Some of which we know, some of which we don't know. As we try to add to our capabilities, we might run into the limitations of those heuristics. Or not necessarily the limitations yet. We don't even know how exactly they work and how to implement them on computers. Computers are our current tool of choice to overcome the physical, biological and psychological limits of our brains.
Another way of adding to our capabilities is amplifying them using organizations. Then we have a composite (cf. the software design pattern): groups of agents can be viewed as one agent. An agent together with a tool is one agent. An agent interacting with part of the environment is one agent.
In the other direction an agent with an arm amputated (ie. without that arm) is still an agent. How much can we remove and still have an agent? Here we run into the thing with an agent being part of the environment, made out of non-agentic pieces.
How can we talk about agency at all? We're just part of some giant physical reaction. And we're a part that happens to be working on getting bigger stuff done, which is a human notion. From this point of view the earth is just a place of the universe where some quirky things (eg. people selling postcards showing crocodiles wearing swim goggles) are happening according to the laws of physics.
Fundamentally we're nothing else than a supernova or a neutron star making its pasta shapes or a quasar sending out huge amounts of radiation. We're just another physical process in the universe. (Duh.) When the aliens come visit (and maybe kill) us we have a chance to observe (albeit for a short time) the intermediate state of a physical process similar to the one that has been going on on earth.
You could take this as a cue to lean back and see how it all rolls out. You could also be worried that you might not be able to maintain your laid-back composure when you fall into poverty because you didn't work hard, or when your loved ones are raped and killed in a war, because you didn't help the world stay stable and keep peace (mostly).
Or you could lean forward and join those who are working on getting bigger stuff done.
Where does this illusion of choice come from, anyway? Is this the necessary ingredient for an agent? I don't think so, because an agent need not be conscious, which is a requirement for having illusions, I guess. (Aside: how do we deal with consciousness? Is it an incidental feature? Or is it necessary/very helpful for physical processes that have the grand and intricate results that we want?)
Is it helpful to talk about agency at all, then? In the end an aligned embedded agent is just another process running according to the laws of physics that hopefully has the results we want to have. We don't talk about agency when a thermite charge welds two pieces of rail together. We do talk about agency when we see humans playing tennis. Where is the boundary and what is the benefit of drawing one?
So I've gotten rid of agency. We only have processes running according to the laws of physics, or alternatively, algorithms running on a stack of interpreters running on the hardware of the universe. Does this help with thinking about the embedded subproblems? I don't know yet, but I will keep it in mind. Then the question is: how do we kick off the kind of processes that will have the results that we want?
- Is this way of looking at the world new to you?
- Did it help you in thinking about embedded agency?
- Where am I using wrong terminology?
- Where is my thinking misguided or wrong?
Please read this in the most loving-kindness way possible: every time I see a LW post starting with a paragraph of hedging and self-deprecation (which is about half of them), I feel like taking the author by their shoulders and shaking them violently until they gain some self-confidence.
Let the reader judge for themselves whether the post is misguided, badly structured, or repetitive. I guarantee nothing bad will happen to you if they come to this conclusion themselves. The worst thing that could happen is that Someone On The Internet will think bad things about you, but I let me assure you, this will not be any worse if you leave out the hedging.
Note: people will be more likely to attack your idea (i.e. provide you with valuable feedback, i.e. this is a good thing) if you seem to stand behind the idea (i.e. if you leave out the hedging).
Before someone says something about conveying confidence levels:
On the other hand, I have to commend you for not starting your paragraph of hedging with the phrase "Epistemic status".
By "insight porn" I mean a genre of writing, I don't mean this as a derogatory term. ↩︎
Thanks for the feedback! I will remove most of the hedging and self-deprecation. Mainly because of your point about making people more likely to attack my reasoning.
By the way, in my case the hedging and self-deprecation doesn't come from a lack of self-confidence.
Nice! I find it much more pleasant to read :)