Here is a possible mechanism for the Decision Theory Update Problem.
The agent first considers several scenarios, each with a weight for its importance. Then for each scenario the agent compares what the output of its current decision theory is with what the output of its candidate decision theory would be, and computes a loss for that scenario. The loss will be higher when the current decision theory considers the new preferred actions to be certainly wrong or very wrong. The agent will update its decision theory when the above weighted average loss is sma... (read more)
Here is a possible mechanism for the Decision Theory Update Problem.
The agent first considers several scenarios, each with a weight for its importance. Then for each scenario the agent compares what the output of its current decision theory is with what the output of its candidate decision theory would be, and computes a loss for that scenario. The loss will be higher when the current decision theory considers the new preferred actions to be certainly wrong or very wrong. The agent will update its decision theory when the above weighted average loss is sma... (read more)