"So how can it ensure that future self-modifications will accomplish its current objectives? For one thing, it has to make those objectives clear to itself. If its objectives are only implicit in the structure of a complex circuit or program, then future modifications are unlikely to preserve them. Systems will therefore be motivated to reflect on their goals and to make them explicit." -- Stephen M. Omohundro, The Basic AI Drives
This AI becomes able to improve itself in a haphazard way, makes various changes that are net improvements but may introduce value drift, and then gets smart enough to do guaranteed self-improvement, at which point its values freeze (forever). -- Eliezer Yudkowsky, What I Think, If Not Why
I have stopped understanding why these quotes are correct. Help!
More specifically, if you design an AI using "shallow insights" without an explicit goal-directed architecture - some program that "just happens" to make intelligent decisions that can be viewed by us as fulfilling certain goals - then it has no particular reason to stabilize its goals. Isn't that anthropomorphizing? We humans don't exhibit a lot of goal-directed behavior, but we do have a verbal concept of "goals", so the verbal phantom of "figuring out our true goals" sounds meaningful to us. But why would AIs behave the same way if they don't think verbally? It looks more likely to me that an AI that acts semi-haphazardly may well continue doing so even after amassing a lot of computing power. Or is there some more compelling argument that I'm missing?
The quotes are correct in the sense that "P implies P" is correct; that is, the authors postulate the existence of an entity constructed in a certain way so as to have certain properties, then argue that it would indeed have those properties. True, but not necessarily consequential, as there is no compelling reason to believe in the future existence of an entity constructed in that way in the first place. Most humans aren't like that, after all, and neither are existing or in-development AI programs; nor is it a matter of lacking "intelligen... (read more)
Saying that there is an agent refers (in my view; definition for this thread) to a situation where future events are in some sense expected to be optimized according to some goals, to the extent certain other events ("actions") control those future events. There might be many sufficient conditions for that in terms of particular AI designs, but they should amount to this expectation.
So an agent is already associated with goals in terms of its actual effect on its environment. Given that agent's own future state (design) is an easily controlled pa... (read more)
Let's start with the template for an AGI, the seed for a generally intelligent expected-utility maximizer capable of recursive self-improvement.
As far as I can tell, the implementation of such a template would do nothing at all because its utility-function would be a "blank slate".
What happens if you now enclose the computation of Pi in its utility-function? Would it reflect on this goal and try to figure out its true goals? Why would it do so, where does the incentive come from?
Would complex but implicit goals change its behavior? Why would it i... (read more)
Do you not count reward-seeking / reinforcement-learning / AIXI-like behavior as goal-directed behavior? If not, why not? If yes, it doesn't seem possible to build an AI that makes intelligent decisions without a goal-directed architecture.
A superintelligence might be able to create a jumble of wires that happen to do intelligent things, but how are we humans supposed to stumble onto something like that, given that all existing examples of intelligent behavior and theories about intelligent decision... (read more)
I think that that's where you're looking at it differently from Eliezer et al. I think that Eliezer at least is talking about an AI which has goals, but does not, when it starts modifying itself, understand itself well enough to keep them stable. Once it gets good enough at self modification to keep its goals stable, it will do... (read more)
I'm finding it hard to imagine an agent that can get a diversity of difficult things done in a complex environment without forming goals and subgoals, which sounds to me like a requirement of general intelligence. AGI seems to require many-step plans and planning seems to require goals.
The Omohundro quote sounds like what humans do. If humans do it, machines might well do it too.
The Yudkowsky quote seems more speculative. It assumes that values are universal, and don't need to adapt to local circumstances. This would be in contrast to what has happened in evolution so far - where there are many creatures with different niches and the organisms (and their values) adapt to the niches.
I understood Omohundro's Basic AI Drives as applying only to successful (although not necessarily Friendly) GAI. If a recursively self-improving GAI had massive value drift with each iterative improvement to its ability at reaching its values, it'd end up just flailing around, doing a stochastic series of actions with superhuman efficiency.
I think the Eliezer quote is predicated on the same sort of idea--that you've designed the AI to attempt to preserve its values; you just did it imperfectly. Assuming the value of value preservation isn't among the ones... (read more)
Part of the problem, it appears to me, is that you're ascribing a verbal understanding to a mechanical process. Consider; for AIs to have values those values must be 'stored' in a medium compatible with their calculations.
However, once an AI begins to 'improve' itself -- that is, once an AI has as an available "goal" the ability to form better goals -- then it's going to base the decisions of what an improved goal is based on the goals and values it already has. This will cause... (read more)
If the AI is an optimization process, it will try to find out what it's optimizing explicitly. If not, it's not intelligent.