Comparing Utilities

"The Nash solution differs significantly from the other solutions considered so far. [...]

2. This is the first proposal where the additive constants matter. Indeed, now the multiplicative constants are the ones that don't matter!"

In what sense do additive constants matter here? Aren't they neutralized by the subtraction?

Do mesa-optimizer risk arguments rely on the train-test paradigm?

You don't even need a catastrophe in any global sense. Disrupting the training procedure at step t should be sufficient.

AI Unsafety via Non-Zero-Sum Debate

"My intuition is that there will be a class of questions where debate is definitely safe, a class where it is unsafe, and a class where some questions are safe, some unsafe, and we don’t really know which are which."

Interesting. Do you have some examples of types of questions you expect to be safe or potential features of save questions? Is it mostly about the downstram consquences that answers would have, or more about instrumental goals that the questions induce for debaters?

Tradeoff between desirable properties for baseline choices in impact measures

I like the insight that offsetting is not always bad and the idea of dealing with the bad cases using the task reward. State-based reward functions that capture whether or not the task is currently done also intuitively seem like the correct way of specifying rewards in cases where achieving the task does not end the episode.

I am a bit confused about the section on the markov property: I was imagining that the reason you want the property is to make applying standard RL techniques more straightforward (or to avoid making already existing partial observability more complicated). However if I understand correctly, the second modification has the (expectation of the) penalty as a function of the complete agent policy and I don't really see, how that would help. Is there another reason to want the markov property, or am I missing some way in which the modification would simplify applying RL methods?

Good and bad ways to think about downside risks

Nice post!

I would like to highlight that a naive application of the expected value perspective could lead to problems like the unilateralist's curse and think that the post would be even more useful for readers who are new to these kinds of considerations if it discussed that more explicitly (or linked to relevant other posts prominently).

My prediction for Covid-19

"If, at some point in the future, we have the same number of contagious people, and are not at an appreciable fraction of group immunity, it will at that point again be a solid decision to go into quarantine (or to extend it). "

I think for many people the number of infections at which this becomes a good idas has increased as we have more accurate information about the CFR and how quickly realistic countermeasures can slow down an outbreak in a given area, which should decrease credence in some of the worst case scenarios many were worried about a few months ago.

The case for C19 being widespread

"Czech Researchers claim that Chinese do not work well "

This seems to be missing a word ;)

Conflict vs. mistake in non-zero-sum games

Nitpick: I am pretty sure non-zero-sum does not imply a convex Pareto front.

Instead of the lens of negotiation position, one could argue that mistake theorists believe that the Pareto Boundary is convex (which implies that usually maximizing surplus is more important than deciding allocation), while conflict theorists see it as concave (which implies that allocation is the more important factor).

Credibility of the CDC on SARS-CoV-2

Even if the claim was usually true on longer time scales, I doubt that pointing out an organisations mistakes and not entirely truthful statements usually increases the trust in them on the short time scales that might be most important here. Reforming organizations and rebuilding trust usually takes time.

Load More