Sorted by New


Value Learning is only Asymptotically Safe

Even granting that it is possible for cosmic rays to flip any given bit, or any sequence of bits, in a computer's memory, it is far from clear to me that the probability of this happening approaches 1 over the lifetime of the universe. It isn't very hard to come up with cases where an event is both completely possible, and has probability 0: for instance, if I pick a number at random with uniform distribution from the closed interval [0,1], the probability I will pick 1 is 0 even though 1 is as likely a choice as any other option on the interval. And in the concrete case you're referring to, the universe has finite time to flip these bits before it sinks into entropy. Moreover, I wouldn't expect the sequence of datapoints needed to convince an AI that humans are hostile (or whatever) to be invariant across time: as the AI accrued more data, it would plausibly require more data to persuade it to change its mind.

Decision Theory FAQ

In the last chapter of his book "Utility Theory for Decision Making," Peter Fishburn published a concise rendering of Leonard Savage's proof that "rational" preferences over events implied that one behaved "as if" he (or she) was obeying Expected Utility Theory. He furthermore proved that following Savage's axioms implied that your utility function is bounded (he attributes this extension of the proof, in its essence, to Savage). So Subjective Expected Utility Theory has an answer to the St. Petersburg Paradox "built in" to its axioms. That seems like a point well worth mentioning in this article.