It will cost you nothing to "bribe" a Utilitarian

[-]joseph_c2mo142

I think the problem here is the assumption that there is only one AI company. If there are multiple AI companies and they don't form a trust, then they need to bid against each other to acquire safety researchers, right? This is like in economics where if you are the only person selling bread, you can sell it for less than its value to any given customer, but if there are multiple people selling bread you need to sell it for $ε$ minus your competitors' prices.

[-]Gabriel Alfour1mo101

This is not a problem, this is completely within the framework!

Even with a single AI accelerationist corporation and a single employee, you may reason about bargaining power (Wikipedia).

What is true regardless of the number of accelerationist corps is that the influx of safety-branded researchers willing to work for them will drive down the safety premium.

For instance, 80k hours tried to increase their supply, and is proud to have done so. Like they say, their advisees now work at DeepMind and Anthropic!

[-]J Bostock1mo30

This is a much more interesting point IMHO. If the positive externalities are on the labour then you want to oversupply Utilitarian X-Riskers as much as possible. If the positive externalities are mostly on the compensation then you want a closed shop union.

Of course this is predicated on an individual-level causal attribution of utility being how utilitarians make decisions.

[-]dx261mo21

I realize I'm being a little pedantic here, but on the "joke" calculation: the problem here is that is a binary function depending on whether $k$ utilitarians join or not, right? For instance, let $- s_{P R}$ be the effective safety premium from $k$ safety-minded utilitarians joining (the value being negative as joining presumably accelerates the company), and suppose that each utilitarian joining leads to $- s_{P R} / k$ acceleration. Then a rational utilitarian would demand $(ϵ + s_{P R}) / k$ premium, which is not negligible.

Going back to the joke calculation, it implies that the bottleneck to preventing defection is coordination: $k$ utilitarians acting together would not join for $ϵ / k$ value as it is against their interests, but individually they have zero counterfactual impact, so they all join. In the real world, coordination is plausibly relevant, but it seems like the more relevant problem is just the object-level beliefs of those joining: how impactful is your work versus how much it accelerates capabilities, along with vague non-utilitarian desires like status that bias people towards joining the big labs.

^{^}

"Voluntary commitment" is already a contradiction. If it's voluntary, it doesn't commit one to much. But "Securing a voluntary commitment" is advanced!

^{^}

The CEOs of Anthropic, DeepMind, and OpenAI have each personally told me that they were better than each other to justify their choices.

LESSWRONG
LW

LESSWRONG
LW

41

It will cost you nothing to "bribe" a Utilitarian

41

41

Abstract

1) Introduction

2) Formal Framework

3) Implications

4) Future Work

Conclusion