I've been working on defining "optimizer", and I'm wondering about what people consider to be or not be an optimizer. I'm planning on taking about it in my own post, but I'd like to ask here first because I'm a scaredy cat.
I know a person or AI refining plans or hypotheses would generally be considered an optimizer.
What about systems that evolve? Would an entire population of a type of creature be its own optimizer? It's optimizing for genetic fitness of the individuals, so I don't see why it wouldn't be. Evolutionary programming just emulates it, and it'... (read more)
I agree that intelligent agents have a tendency to seek power and that that is a large cause of what makes them dangerous. Agents could potentially cause catastrophes in other ways, but I'm not sure if any are realistic.
As an example, suppose an agent creates powerful self-replicating nanotechnology that makes a pile of paperclips, the agent's goal. However, since they are self-replicating the agent didn't want to spend the time engineering a way to stop replication, the nanobots eat the world.
But catastrophes like this would probably also be dealt w... (read more)
I hadn't thought about the distinction between gaining and using resources. You can still wreak havoc without getting resources, though, by using them in a damaging way. But I can see why the distinction might be helpful to think about.
It still seems to me that an agent using equation 5 would pretty much act like a human imitator for anything that takes more than one step, so that's why I was using it as a comparison. I can try to explain my reasoning if you want, but I suppose it's a moot point now. And I don't know if I'm right, anyways.
Basically, ... (read more)
Is there much the reduced-impact agent with reward shaping could do that an agent using human mimicry couldn't?
Perhaps it could improve over mimicry by being able to consider all actions, while a human mimic would only in effect consider the actions a human would. But I don't think there are usually many single-step actions to choose from, so I'm guessing this isn't a big benefit. Could the performance improvement come from better understanding the current state than mimics could? I'm not sure when this would make a big difference, though.
I'm also still co... (read more)
I have a question about attainable utility preservation. Specifically, I read the post "Attainable Utility Preservation: Scaling to Superhuman", and I'm wondering how and agent using the attainable utility implementation in equations 3, 4, and 5 could actually be superhuman. I've been misunderstanding things and mis-explaining things recently, so I'm asking here instead of the post for now to avoid wasting an AI safety researcher's time.
The equations incentivize the AI to take actions that will provide an immediate reward in the next timestep, but pe... (read more)
Thanks for the link. It turns out I missed some of the articles in the sequence. Sorry for misunderstanding your ideas.
I thought about it, and I don't think your agent would have the issue I described.
Now, if the reward function was learned using something like a universal prior, then other agents might be able to hijack the learned reward function to make the AI misbehave. But that concern is already known.
Thanks for the response.
In my comment, I imagined the agent used evidential or functional decision theory and cared about the actual paperclips in the external state. But I'm concerned other agent architectures would result in misbehavior for related reasons.
Could you describe what sort of agent architecture you had in mind? I'm imagining you're thinking of an agent that learns a function for estimating future state, percepts, and reward based on the current state and the action taken. And I'm imagining the system uses some sort of learning algorithm that ... (read more)
I realized both explanations I gave were overly complicated and confusing. So here's a newer, hopefully much easier to understand, one:
I'm concerned a reduced-impact AI will reason as follows:
"I want to make paperclips. I could use this machinery I was supplied with to make them. But the paperclips might be low quality, I might not have enough material to make them all, and I'll have some impact on the rest of the world, potentially large ones due to chaotic effects. I'd like something better.
What if I instead try to take over the world and make huge numbe... (read more)
Oh, I'm sorry, I looked through posts I read to see where to add the comment and apparently chose the wrong one.
Anyways, I'll try to explain better. I hope I'm not just crazy.
An agent's beliefs about what the world it's currently in influence its plans. But its plans also have the potential to influence its beliefs about what world it's currently in. For example, if the AI original think it's not in a simulation, but then plans on trying to make lots of simulations of it, then it would think it's more likely that it currently is in a simulation. Similarly,... (read more)
Am I correct that counterfactual environments for computing impact in an reduced-impact agent would need to include acausal connections, or the AI would need some sort of constraint on the actions or hypotheses considered, for the impact measure to work correctly?
If it doesn't consider acausal impacts, then I'm concerned the AI would consider this strategy: act like you would if you were trying to take over the world in base-level reality. Once you succeed, act like you would if you were in base-level reality and trying to run an extremely large number of ... (read more)