JesseClifton

JesseClifton's Comments

Instrumental Occam?

In model-free RL, policy-based methods choose policies by optimizing a noisy estimate of the policy's value. This is analogous to optimizing a noisy estimate of prediction accuracy (i.e., accuracy on the training data) to choose a predictive model. So we often need to trade variance for bias in the policy-learning case (i.e., shrink towards simpler policies) just as in the predictive modeling case.

MichaelA's Shortform

There are "reliabilist" accounts of what makes a credence justified. There are different accounts, but they say (very roughly) that a credence is justified if it is produced by a process that is close to the truth on average. See (this paper)[https://philpapers.org/rec/PETWIJ-2].

Frequentist statistics can be seen as a version of reliabilism. Criteria like the Brier score for evaluating forecasters can also be understood in a reliabilist framework.

Exploring safe exploration

Maybe pedantic but, couldn't we just look at the decision process as a sequence of episodes from the POMDP, and formulate the problem in terms of the regret incurred by our learning algorithm in this decision process? In particular, if catastrophic outcomes (i.e., ones which dominate the total regret) are possible, then a low-regret learning algorithm will have to be safe while still gathering some information that helps in future episodes. (On this view, the goal of safe exploration research is the same as the goal of learning generally: design low-regret learning algorithms. It's just that the distribution of rewards in some cases implies that low-regret learning algorithms have to be "safe" ones.)

Sections 5 & 6: Contemporary Architectures, Humans in the Loop

I definitely think it's worth exploring. I have the intuition that creating a single agent might be difficult for various logistical and political reasons, and so it feels more robust to figure out the multiagent case. But I would certainly like to have a clearer picture of how and under what circumstances several AI developers might implement a single compromise agent.

Sections 1 & 2: Introduction, Strategy and Governance

Ah, I see now that I did not make this clear at all. The main thing in the case of war is that, under certain payoff structures, a state might not be able to credibly commit to the terms of a peaceful settlement if it is expected to increase in power relative to its counterpart. Thus the state who expects to lose relative power will sometimes rather wage preventative war (while it is still relatively strong) than settle. This is still a problem in models with complete information and divisible stakes.

I'll try to edit the text to make this clearer soon, thanks for bringing it up.

Sections 1 & 2: Introduction, Strategy and Governance

It seems plausible that if players could truthfully disclose private information and divide stakes, the ability to credibly commit would often not be needed

Even if the players can find a settlement that they both prefer to conflict (e.g., flipping a coin to decide who gets the territory) there's still the problem of committing to honoring that settlement (you might still just attack me if the coin doesn't land in your favor). So I think there's still a problem. But maybe you're saying that if there's really no private information, then there is no credibility problem, because players can anticipate defections because they know everything about their counterpart? Something like that?

Preface to EAF's Research Agenda on Cooperation, Conflict, and TAI

Do you think focusing on s-risks leads to meaningfully different technical goals than focusing on other considerations?

I think it definitely leads to a difference in prioritization among the things one could study under the broad heading of AI safety. Hopefully this will be clear in the body of the agenda. And, some considerations around possible downsides of certain alignment work might be more salient to those focused on s-risk; the possibility that attempts at alignment with human values could lead to very bad “near misses” is an example. (I think some other EAF researchers have more developed views on this than myself.) But, in this document and my own current research I’ve tried to choose directions that are especially important from the s-risk perspective but which are also valuable by the lights of non-s-risk-focused folks working in the area.

[Just speaking for myself here]

I find myself someone confused by s-risks as defined here

For what it’s worth, EAF is currently deliberating about this definition and it might change soon.