Crossposted to the Effective Altruism Forum
So far the idea of differential technological development has been discussed in a way that either (1) emphasizes ratios of progress rates, (2) ratios of remaining work, (3) maximizing or minimizing correlations (for example, minimizing the overlap between the capability to do harm and the desire to do so), (4) implementing safe tech before developing and implementing unsafe tech, and (5) the occasional niche analysis (possibly see also a complementary aside relating differential outcomes to growth rates in the long run). I haven’t seen much work talking about how various capabilities (a generalization of technology) may interact with each other in general in ways that prevent downside effects (though see also The Vulnerable World Hypothesis), and I wish to elaborate on this interaction type.
As technology improves, our capacity to do both harm and good increases and each additional capacity unlocks new capacities that can be implemented. For example the invention of engines unlocked railroads, which in turn unlocked more efficient trade networks. However, the invention of engines also enabled the construction of mobile war vehicles. How, in an ideal world, could we implement capacities so we get the outcomes we want while creating minimal harm and risks in the process?
What does implementing a capacity do? It enables us to change something. A normal progression is:
The problem is that downside effects in stages 2 and 3 could overwhelm the value achieved during those stages and at stage 4, especially when considering powerful game changing technologies that could lead to existential risks.
Even more fundamentally, as agents in the world we want to avoid shifting the expected utility in a negative direction relative to other options (the opportunity costs). We want to implement new capacities in the best sequence, like with any other plan, so as to maximize the value we achieve. The value is a property of an entire plan and the value is harder to think about than just what is the optimal (or safe) next thing to do (ignoring what is done after). We wish to make choosing which capacities to develop more manageable and easier to think about. One way to do this is to make sure that each capacity we implement is immediately an improvement relative to the state we’re in before implementing it (this simplification is an example of a greedy algorithm heuristic). What does this simplification imply about the sequence of implementing capacities?
This implies that what we want to do is to have the capacities so we may do good without the downside effects and risks of those capacities. How do we do this? If we’re lucky the capacity itself has no downside risks, and we’re done. But if we’re not lucky we need to implement a regulator on that capacity: a safety regulator. Let’s define a safety regulator as a capacity that helps control other capacities to mitigate their downside effects. Once a capacity has been fully safety regulated, it is then unlocked and we can implement it to positive effect.
Some distinctions we want to pay attention to are then:
Running the suggested heuristic strategy then looks like: If a capacity is unlocked, then implement it; otherwise, implement either an unlocked safety regulator for it first or choose a different capacity to implement. We could call this a safety regulated capacity expanding feedback loop. For instance, with respect to nuclear reactions humanity (1) had the implemented capacity of access to radioactivity, (2) this made available the safety regulator of controlling chain reactions, (3) determining how to control chain reactions was implemented (through experimentation and calculation), (4) this unlocked the capacity to use chain reactions (in a controlled fashion), (5) and the capacity of using chain reactions was implemented.
Limitations and extensions to this method:
How do you deal with the knowledge problem? Typically, the actual, experienced pain in steps 2 and 3 is critical to the safety measures implemented in 3 and enjoyed in 4. The progress is not delayed for all possible problems, but the worst of them get addressed - the incentive to be safe (reduce pain) aligns with the incentive to use the technology at all.
This works for pain (risk that's short-term enough to measure the cost and incidence of). It's not clear that it works for rarer but more severe risks (x-risk or just giant economic risk).
In other words, the regulators are part of the technology in the first place - what's the guarantee (or even the mechanism to start) that the regulators are addressing only the critical risks?
How do you see the safety regulator model working in a case like bridges, where safety is already part of the primary function of the system, i.e. a bridge is built to optimize for getting people across a gap they couldn't otherwise cross, and being better at being a bridge (getting more people across), means being safer (fewer people fail to make it across for deadly reasons)? It's not entirely clear where we might draw the line to demarcate a safety regulator in such cases where safety is naturally part of the function.
(though see also The Vulnerable World Hypothesis), and wish to elaborate on this interaction type.
What is the subject of 'wish'?
Edited to add "I" immediately in front of "wish".