summary: It seems likely that for advanced agents, the agent's representation of the world will [change in unforeseen ways as it becomes smarter. The ontology identification problem is to create a preference framework for the agent that optimizes the same external facts, even as the agent modifies its representation of the world. For example, if the intended goal were to create large amounts of diamond material, one type of ontology identification problem would arise if the programmers thought of carbon atoms as primitive during the AI's development phase, and then the advanced AI discovered nuclear physics.]
Summary: It seems likely that for advanced agents, the agent's representation of the world will change in unforeseen ways as it becomes smarter. The ontology identification problem is to create a preference framework for the agent that optimizes the same external facts, even as the agent modifies its representation of the world. [For example](technical tutorial)" (ontology_identification_technical_tutorial-1)For example, if the intended goal were to create large amounts of diamond material, one type of ontology identification problem would arise if the programmers thought of carbon atoms as primitive during the AI's development phase, and then the advanced AI discovered nuclear physics.
A simplified but still very difficult open problem in value alignment theory is to give an unbounded program implementing a diamond maximizer that will turn as much of the physical universe into diamond as possible. The goal of "making diamonds" was chosen to have a crisp-seeming definition for our universe (the amount of diamond is the number of carbon atoms covalently bound to three other carbon atoms). If we can crisply define exactly what a 'diamond' is, we can avert issues of trying to convey complex values into the agent. (The unreflective diamond maximizer putatively has unlimited computing power, runs on a processor, and confronts no other agents similar to itself. This averts many other problems of Reflectivity, decision theory and value alignment.)
Even with a seemingly crisp goal of "make diamonds", we might still run into two problems if we tried to write a [hand-hand-coded object-level utility function]function that Identified the amount of diamond material:
To introduce the general issues in ontology identification, we'll try to walk through the anticipated difficulties of constructing an unbounded agent that would maximize diamonds, by trying specific methods and suggesting anticipated difficulties of those methods.
This difficulty ultimately arises from AIXI being constructed around a Cartesian paradigm of sequence prediction, with AIXI's sense inputs and motor outputs being treated as sequence elements, and the Turing machines in its hypothesis space having inputs and outputs matched to the sequence elements and otherwise being treated as black boxes. This means we can only get AIXI to maximize direct functions of its sensory input, not any facts about the outside environment.
(We can't make AIXI maximize diamonds by making it want pictures of diamonds because then it will just, e.g., build an environmental subagent that seizes control of AIXI's webcam and shows it pictures of diamonds. If you ask AIXI to show itself sensory pictures of diamonds, you can get it to show its webcam lots of pictures of diamonds, but this is not the same thing as building an environmental diamond maximizer.)
As an unrealistic example: Suppose someone was trying to define...
summary: It seems likely that for advanced agents, the agent's representation of the world will [change in unforeseen ways as it becomes smarter. The ontology identification problem is to create a preference framework for the agent that optimizes the same external facts, even as the agent modifies its representation of the world. For example, if the intended goal were to create large amounts of diamond material, one type of ontology identification problem would arise if the programmers thought of carbon atoms as primitive during the AI's development phase, and then the advanced AI discovered nuclear physics.]
A simplified but still very difficult open problem in AI alignment is to givestate an unbounded program implementing a diamond maximizer that will turn as much of the physical universe into diamond as possible. The goal of "making diamonds" was chosen to have a crisp-seeming definition for our universe (the amount of diamond is the number of carbon atoms covalently bound to threefour other carbon atoms). If we can crisply define exactly what a 'diamond' is, we can avert issues of trying to convey complex values into the agent. (The unreflective diamond maximizer putatively has unlimited computing power, runs on a Cartesian processor, and confronts no other agents similar to itself. This averts many other problems of reflectivity, decision theory and value alignment.)
Summary: It seems likely that for advanced agents, the agent's representation of the world will change in unforeseen ways as it becomes smarter. The ontology identification problem is to create a preference framework for the agent that optimizes the same external facts, even as the agent modifies its representation of the world. For example, if the intended goal were to create large amounts of diamond material, one type of ontology identification problem would arise if the programmers thought of carbon atoms as primitive during the AI's development phase, and then the advanced AI discovered nuclear physics.
Clickbait: How do we link an agent's utility function to its model of the world, when we don't know what that model will look like?
A simplified but still very difficult open problem in valueAI alignment theory is to give an unbounded program implementing a diamond maximizer that will turn as much of the physical universe into diamond as possible. The goal of "making diamonds" was chosen to have a crisp-seeming definition for our universe (the amount of diamond is the number of carbon atoms covalently bound to three other carbon atoms). If we can crisply define exactly what a 'diamond' is, we can avert issues of trying to convey complex values into the agent. (The unreflective diamond maximizer putatively has unlimited computing power, runs on a Cartesian processor, and confronts no other agents similar to itself. This averts many other problems of Reflectivityreflectivity, decision theory and value alignment.)
Even with a seemingly crisp goal of "make diamonds", we might still run into two problems if we tried to write a hand-coded object-level utility function that Identifiedidentified the amount of diamond material:
Suppose our own real universe was amended to otherwise be exactly the same, but contain a single Impermeableimpermeable hypercomputer. Suppose we defined an agent like the one above, using simulations of 1900-era classical models of physics, and ran that agent on the hypercomputer. Should we expect the result to be an actual diamond maximizer - that most mass in the universe will be turned into carbon and arranged into diamonds?
Intuitively, we would think it was common sense for an agent that wanted diamonds to react to the experimental data identifying nuclear physics, by deciding that a carbon atom is 'really' a nucleus containing six protons, and atomic binding is 'really' covalent electron-sharing. We can imagine this agent [common-sensically]common-sensically updating its model of the universe to a nuclear model, and redefining the 'carbon atoms' that its old...