Let's say a system receives reward when it believes that it's doing some good. Kind of like RL with actor-critic.
Estimated Good things -> max
We can do some rewriting. I'll use notation -> inc. that means incentivised to be increased. It's like direction, towards which the gradients point.
Preventing_estimated_catastrophe -> inc
Estimated_p_catastrophe * estimated_magnitude_of_catastroph-> inc
(P_catastrophe + p_estimation_error) * (catastrophe_magnitude + magnitude_estimation_error) -> inc
p_estimation_error -> inc
magnitude_estimation_error -> inc.
Estimation_error = k*estimation_uncertainty
estimation_uncertainty -> inc.
So, what we got: system is incentivised to have some biases:
1. It's biased to overestimate the probability and magnitude of a catastrophe.
2. It's biased to take actions in areas, where uncertainty is higher (because higher uncertainty gives more freedom to increase the first bias).
Or in... (read more)
You're correct that this is what happens at one of the abstraction layers. But the choice of that layer is pretty arbitrary. By abstraction layers:
L1: hypervisor interface: uncountably many VMs
L2: hypervisor implementation: countably many VMs
L3: semiconductors: no VMs, only high and low signals
L4: electrons: no high and low signals, only electromagnetic fields
So yes, on L2 the number of VMs is finite. But why morality should count what happens on L2 and not on L1 or L3, L4? This is too arbitrary.