~~summary: It seems likely that for advanced agents, the agent's representation of the world will [change in unforeseen ways as it becomes smarter. The ontology identification problem is to create a~~ ~~preference framework~~ ~~for the agent that optimizes the same external facts, even as the agent modifies its representation of the world.~~ ~~For example, if the~~ ~~intended goal~~ ~~were to~~ ~~create large amounts of diamond material~~, one type of ontology identification problem would arise if the programmers thought of carbon atoms as primitive during the AI's development phase, and then the advanced AI discovered nuclear physics.]

so they are very general, but since a DBN is a causal model, they make it possible for a preference framework to talk about 'the cause of a picture of a diamond' in a way that you couldn't look for 'the cause of a picture of a diamond' inside a general Turing machine. Again, this might fail if the DBN has no 'natural' way of representing the environment except as a DBN simulating some other program that simulates the environment.

Summary: It seems likely that for advanced agents, the agent's representation of the world will change in unforeseen ways as it becomes smarter. The ontology identification problem is to create a preference framework for the agent that optimizes the same external facts, even as the agent modifies its representation of the world. ~~[For example](technical tutorial)" (ontology_identification_technical_tutorial-1)~~For example, if the intended goal were to create large amounts of diamond material, one type of ontology identification problem would arise if the programmers thought of carbon atoms as primitive during the AI's development phase, and then the advanced AI discovered nuclear physics.

A simplified but still very difficult open problem in value alignment theory is to give an unbounded program implementing a diamond maximizer that will turn as much of the physical universe into diamond as possible. The goal of "making diamonds" was chosen to have a crisp-seeming definition for our universe (the amount of diamond is the number of carbon atoms covalently bound to three other carbon atoms). If we can crisply define exactly what a 'diamond' is, we can avert issues of trying to convey complex values into the agent. (The unreflective diamond maximizer putatively has unlimited computing power, runs on a processor, and confronts no other agents similar to itself. This averts many other problems of Reflectivity, decision theory and value alignment.)

Even with a seemingly crisp goal of "make diamonds", we might still run into two problems if we tried to write a ~~[hand-~~hand-coded object-level utility ~~function]~~function that Identified the amount of diamond material:

To introduce the general issues in ontology identification, we'll try to walk through the anticipated difficulties of constructing an unbounded agent that would maximize diamonds, by trying specific methods and suggesting anticipated difficulties of those methods.

This difficulty ultimately arises from AIXI being constructed around a Cartesian paradigm of sequence prediction, with AIXI's sense inputs and motor outputs being treated as sequence elements, and the Turing machines in its hypothesis space having inputs and outputs matched to the sequence elements and otherwise being treated as black boxes. This means we can only get AIXI to maximize direct functions of its sensory input, not any facts about the outside environment.

(We can't make AIXI maximize diamonds by making it want pictures of diamonds because then it will just, e.g., build an environmental subagent that seizes control of AIXI's webcam and shows it pictures of diamonds. If you ask AIXI to show itself sensory pictures of diamonds, you can get it to show its webcam lots of pictures of diamonds, but this is not the same thing as building an environmental diamond maximizer.)

As an unrealistic example: Suppose someone was trying to define...

Read More (1664 more words)

summary: It seems likely that for advanced agents, the agent's representation of the world will [change in unforeseen ways as it becomes smarter. The ontology identification problem is to create a preference framework for the agent that optimizes the same external facts, even as the agent modifies its representation of the world. For example, if the intended goal were to create large amounts of diamond material, one type of ontology identification problem would arise if the programmers thought of carbon atoms as primitive during the AI's development phase, and then the advanced AI discovered nuclear physics.]

Considers only hypotheses that directly represent universes as huge systems of classical atoms, so that the function 'count atoms bound to ~~three~~four other carbon atoms' can be directly run over any possible future the agent considers.
Assigns probabilistic priors over these possible atomic representations of the universe.
Somehow maps each atomic representation onto the agent's sensory experiences and motor actions.
[Bayes-updates its priors] based on actual sensory experiences, the same as classical AIXI.
Can evaluate the 'expected diamondness on the next turn' of a single action by looking at all hypothetical universes where that action is performed, weighted by their current probability, and summing over the expectation of diamond-bound carbon atoms on their next clock tick.
Can evaluate the 'future expected diamondness' of an action, over some finite time horizon, by assuming that its future self will also Bayes-update and maximize expected diamondness over that time horizon.
On each turn, outputs the action with greatest expected diamondness over some finite time horizon.

A simplified but still very difficult open problem in AI alignment is to ~~give~~state an unbounded program implementing a diamond maximizer that will turn as much of the physical universe into diamond as possible. The goal of "making diamonds" was chosen to have a crisp-seeming definition for our universe (the amount of diamond is the number of carbon atoms covalently bound to ~~three~~four other carbon atoms). If we can crisply define exactly what a 'diamond' is, we can avert issues of trying to convey complex values into the agent. (The unreflective diamond maximizer putatively has unlimited computing power, runs on a Cartesian processor, and confronts no other agents similar to itself. This averts many other problems of reflectivity, decision theory and value alignment.)

~~Summary: It seems likely that for advanced agents, the agent's representation of the world will~~ ~~change in unforeseen ways as it becomes smarter. The ontology identification problem is to create a~~ ~~preference framework~~ ~~for the agent that optimizes the same external facts, even as the agent modifies its representation of the world.~~ ~~For example, if the~~ ~~intended goal~~ ~~were to~~ ~~create large amounts of diamond material~~, one type of ontology identification problem would arise if the programmers thought of carbon atoms as primitive during the AI's development phase, and then the advanced AI discovered nuclear physics.

~~Clickbait: How do we link an agent's utility function to its model of the world, when we don't know what that model will look like?~~

~~Tab: Technical Tutorial~~

~~my todo~~

Introduction: The ontology identification problem for unreflective diamond maximizers.maximizers

A simplified but still very difficult open problem in ~~value~~AI alignment ~~theory~~ is to give an unbounded program implementing a diamond maximizer that will turn as much of the physical universe into diamond as possible. The goal of "making diamonds" was chosen to have a crisp-seeming definition for our universe (the amount of diamond is the number of carbon atoms covalently bound to three other carbon atoms). If we can crisply define exactly what a 'diamond' is, we can avert issues of trying to convey complex values into the agent. (The unreflective diamond maximizer putatively has unlimited computing power, runs on a Cartesian processor, and confronts no other agents similar to itself. This averts many other problems of ~~Reflectivity~~reflectivity, decision theory and value alignment.)

Even with a seemingly crisp goal of "make diamonds", we might still run into two problems if we tried to write a hand-coded object-level utility function that ~~Identified~~identified the amount of diamond material:

Agent using classical atomic hypotheses.hypotheses

Suppose our own real universe was amended to otherwise be exactly the same, but contain a single ~~Impermeable~~impermeable hypercomputer. Suppose we defined an agent like the one above, using simulations of 1900-era classical models of physics, and ran that agent on the hypercomputer. Should we expect the result to be an actual diamond maximizer - that most mass in the universe will be turned into carbon and arranged into diamonds?

Anticipated failure of AIXI-atomic in our own universe: trying to maximize diamond outside the simulation.simulation

Intuitively, we would think it was common sense for an agent that wanted diamonds to react to the experimental data identifying nuclear physics, by deciding that a carbon atom is 'really' a nucleus containing six protons, and atomic binding is 'really' covalent electron-sharing. We can imagine this agent ~~[common-sensically]~~common-sensically updating its model of the universe to a nuclear model, and redefining the 'carbon atoms' that its old...

			v1.30.0Oct 15th 2016 GMT
			v1.29.0Oct 14th 2016 GMT	removing underscores from summary
			v1.28.0Oct 14th 2016 GMT	(-619)
			v1.27.0Oct 14th 2016 GMT
			v1.26.0Dec 18th 2015 GMT	(+623/-5)
			v1.25.0Dec 18th 2015 GMT
			v1.24.0Dec 18th 2015 GMT	(+9/-9)
			v1.23.0Dec 16th 2015 GMT	(+250/-1075)
			v1.22.0Nov 4th 2015 GMT	(+9/-4)
			v1.21.0Aug 19th 2015 GMT	(+438/-149)

			v1.30.0Oct 15th 2016 GMT
			v1.29.0Oct 14th 2016 GMT	removing underscores from summary
			v1.28.0Oct 14th 2016 GMT	(-619)
			v1.27.0Oct 14th 2016 GMT
			v1.26.0Dec 18th 2015 GMT	(+623/-5)
			v1.25.0Dec 18th 2015 GMT
			v1.24.0Dec 18th 2015 GMT	(+9/-9)
			v1.23.0Dec 16th 2015 GMT	(+250/-1075)
			v1.22.0Nov 4th 2015 GMT	(+9/-4)
			v1.21.0Aug 19th 2015 GMT	(+438/-149)

LESSWRONG
LW

LESSWRONG
LW

Ontology identification problem

Ontology identification problem

Introduction: The ontology identification problem for unreflective diamond maximizers.maximizers

Agent using classical atomic hypotheses.hypotheses

Anticipated failure of AIXI-atomic in our own universe: trying to maximize diamond outside the simulation.simulation