A couple of clarifications if somebody is as confused as me when first reading this.
In ZF we can quantify over sets because "set" is the name we use to designate the underlying objects (the set of natural numbers is an object in the theory). In Peano, the objects are numbers so we can quantify over those we cannot quantify over sets.
Predicates are more "powerful" than first-order formulas so quantifying over predicates allows us to restrict the possible models more than having an axiom for each formula. Even though every predicate is a formula, the interpretation of a predicate is determined by the model so we cannot capture all predicates by having a formula for each predicate symbol.
Eliezer Yudkowsky once entered an empty newcomb's box simply so he can get out when the box was opened.
"Realistically, the function UN doesn't incentivize the agent to perform harmful actions."
I don't understand what that means and how it's relevant to the rest of the paragraph.
It would be interesting to see if a similar approach can be applied to the strawberries problem (I haven't personally thought about this).
Refering to all forms of debate, overseeing, etc. as "Godzilla strategies" is loaded language. Should we refrain from summoning Batman because we may end up summoning Godzilla by mistake? Ideally, we want to solve alignment without summoning anything. However, applying some humility, we should consider that the problem may be too difficult for human intelligence to solve.
The image doesn't load.
The notation in Hume's Black Box seems inconsistent. When defining [e], e is an element of a world. When defining I, e is a set of worlds.
In "Against Discount Rates" Eliezer characterizes discount rate as arising from monetary inflation, probabilistic catastrophes etc. I think in this light discount rate less than ONE (zero usually indicates you don't carea at all about the future) makes sense.
Some human values are proxies to things which make sense in general intelligent systems - e.g. happiness is a proxy for learning, reproduction etc.
Self-preservation can be seen as an instance of preservation of learned information (which is a reasonable value for any intelligent system). Indeed, If there was a medium superior to a human brain where people could transfer the "contents" of their brain, I believe most would do it. It is not a coincidence that self-preservation generalizes this way. Otherwise elderly people would have been discarded from the tribe in the ancestral environment.
"wireheading ... how evolution has addressed it in humans"
It hasn't - that's why people do drugs (including alcohol). What is stopping all humans from wireheading is that all currently available methods work only short term and have negative side effects. The ancestral environment didn't allow for the human kind to self-destruct by wireheading. Maybe peer pressure to not do drugs exists but there is also peer pressure in the other direction.
Arguably the notion of certainty is not applicable to the real world but only to idealized settings. This is also relevant.