Cooperation, Conflict, and Transformative Artificial Intelligence: A Research Agenda


Against strong bayesianism

I don't think bayesianism gives you particular insight into that for the same reasons I don't think it gives you particular insight into human cognition

In the areas I focus on, at least, I wouldn’t know where to start if I couldn’t model agents using Bayesian tools. Game-theoretic concepts like social dilemma, equilibrium selection, costly signaling, and so on seem indispensable, and you can’t state these crisply without a formal model of preferences and beliefs. You might disagree that these are useful concepts, but at this point I feel like the argument has to take place at the level of individual applications of Bayesian modeling, rather than a wholesale judgement about Bayesianism.

misleading concepts like "boundedly rational" (compare your claim with the claim that a model in which all animals are infinitely large helps us identify properties that are common to "boundedly sized" animals)

I’m not saying that the idealized model helps us identify properties common to more realistic agents just because it's idealized. I agree that many idealized models may be useless for their intended purpose. I’m saying that, as it happens, whenever I think of various agentlike systems it strikes me as useful to model those systems in a Bayesian way when reasoning about some of their aspects --- even though the details of their architectures may differ a lot.

I didn’t quite understand why you said “boundedly rational” is a misleading concept, I’d be interested to see you elaborate.

if we have no good reason to think that explicit utility functions are something that is feasible in practical AGI

I’m not saying that we should try to design agents who are literally doing expected utility calculations over some giant space of models all the time. My suggestion was that it might be good --- for the purpose of attempting to guarantee safe behavior --- to design agents which in limited circumstances make decisions by explicitly distilling their preferences and beliefs into utilities and probabilities. It's not obvious to me that this is intractable. Anyway, I don't think this point is central to the disagreement.

Against strong bayesianism

I agree with the rejection of strong Bayesianism. I don’t think it follows from what you’ve written, though, that “bayesianism is not very useful as a conceptual framework for thinking either about AGI or human reasoning”.

I'm probably just echoing things that have been said many times before, but:

You seem to set up a dichotomy between two uses of Bayesianism: modeling agents as doing something like "approximate Solomonoff induction", and Bayesianism as just another tool in our statistical toolkit. But there is a third use of Bayesianism, the way that sophisticated economists and political scientists use it: as a useful fiction for modeling agents who try to make good decisions in light of their beliefs and preferences. I’d guess that this is useful for AI, too. These will be really complicated systems and we don’t know much about their details yet, but it will plausibly be reasonable to model them as “trying to make good decisions in light of their beliefs and preferences”. In turn, the Bayesian framework plausibly allows us to see failure modes that are common to many boundedly rational agents.

Perhaps a fourth use is that we might actively want to try to make our systems more like Bayesian reasoners, at least in some cases. For instance, I mostly think about failure modes in multi-agent systems. I want AIs to compromise with each other instead of fighting. I’d feel much more optimistic about this if the AIs could say “these are our preferences encoded as utility functions, these are our beliefs encoded as priors, so here is the optimal bargain for us given some formal notion of fairness” --- rather than hoping that compromise is a robust emergent property of their training.

Equilibrium and prior selection problems in multipolar deployment

The new summary looks good =) Although I second Michael Dennis' comment below, that the infinite regress of priors is avoided in standard game theory by specifying a common prior. Indeed the specification of this prior leads to a prior selection problem.

The formality of "priors / equilibria" doesn't have any benefit in this case (there aren't any theorems to be proven)

I’m not sure if you mean “there aren’t any theorems to be proven” or “any theorem that’s proven in this framework would be useless”. The former is false, e.g. there are things to prove about the construction of learning equilibria in various settings. I’m sympathetic with the latter criticism, though my own intuition is that working with the formalism will help uncover practically useful methods for promoting cooperation, and point to problems that might not be obvious otherwise. I'm trying to make progress in this direction in this paper, though I wouldn't yet call this practical.

The one benefit I see is that it signals that "no, even if we formalize it, the problem doesn't go away", to those people who think that once formalized sufficiently all problems go away via the magic of Bayesian reasoning

Yes, this is a major benefit I have in mind!

The strategy of agreeing on a joint welfare function is already a heuristic and isn't an optimal strategy; it feels very weird to suppose that initially a heuristic is used and then we suddenly switch to pure optimality

I’m not sure what you mean by “heuristic” or “optimality” here. I don’t know of any good notion of optimality which is independent of the other players, which is why there is an equilibrium selection problem. The welfare function selects among the many equilibria (i.e. it selects one which optimizes the welfare). I wouldn't call this a heuristic. There has to be some way to select among equilibria, and the welfare function is chosen such that the resulting equilibrium is acceptable by each of the principals' lights.

Equilibrium and prior selection problems in multipolar deployment

both players want to optimize the welfare function (making it a collaborative game)

The game is collaborative in the sense that a welfare function is optimized in equilibrium, but the principals will in general have different terminal goals (reward functions) and the equilibrium will be enforced with punishments (cf. tit-for-tat).

the issue is primarily that in a collaborative game, the optimal thing for you to do depends strongly on who your partner is, but you may not have a good understanding of who your partner is, and if you're wrong you can do arbitrarily poorly

Agreed, but there's the additional point that in the case of principals designing AI agents, the principals can (in theory) coordinate to ensure that the agents "know who their partner is". That is, they can coordinate on critical game-theoretic parameters of their respective agents.

How special are human brains among animal brains?

Chimpanzees, crows, and dolphins are capable of impressive feats of higher intelligence, and I don’t think there’s any particular reason to think that Neanderthals are capable of doing anything qualitatively more impressive

This seems like a pretty cursory treatment of what seems like quite a complicated and contentious subject. A few possible counterexamples jump to mind. These are just things I remember coming across when browsing cognitive science sources over the years.

My nonexpert sense is that it is at least controversial both how each of this is connected with language, and the extent to which nonhumans are capable of them.

Instrumental Occam?

In model-free RL, policy-based methods choose policies by optimizing a noisy estimate of the policy's value. This is analogous to optimizing a noisy estimate of prediction accuracy (i.e., accuracy on the training data) to choose a predictive model. So we often need to trade variance for bias in the policy-learning case (i.e., shrink towards simpler policies) just as in the predictive modeling case.

MichaelA's Shortform

There are "reliabilist" accounts of what makes a credence justified. There are different accounts, but they say (very roughly) that a credence is justified if it is produced by a process that is close to the truth on average. See (this paper)[].

Frequentist statistics can be seen as a version of reliabilism. Criteria like the Brier score for evaluating forecasters can also be understood in a reliabilist framework.

Load More