“Concern, Respect, and Cooperation” is a contemporary moral-philosophy book by Garrett Cullity which advocates for a pluralistic foundation of morality, based on three distinct principles:
What I recently noticed here and want to write down is a loose correspondence between these different foundations for morality and some approaches to safe superintelligence:
Cullity argues that none of his principles is individually a satisfying foundation for morality, but that all four together (elaborated in certain ways with many caveats) seem adequate (and maybe just the first three). I have a similar intuition about AI safety approaches. I can’t yet make the analogy precise, but I feel worried when I imagine corrigibility alone, CEV alone, bargaining alone (whether causal or acausal), or Earth-as-wildlife-preserve; whereas I feel pretty good imagining a superintelligence that somehow balances all four. I can imagine that one of them might suffice as a foundation for the others, but I think this would be path-dependent at best. I would be excited about work that tries to do for Cullity’s entire framework what CEV does for pure single-agent utilitarianism (namely, make it more coherent and robust and closer to something that could be formally specified).
So how are we supposed to solve ELK, if we are to assume that it's intractable?
A different answer to this could be that a "solution" to ELK is one that is computable, even if intractable. By analogy, algorithmic "solutions" to probabilistic inference on Bayes nets are still solutions even though the problem is provably NP-hard. It's up to the authors of ELK to disambiguate what they're looking for in a "solution," and I like the ideas here (especially in Level 2), but just wanted to point out this alternative to the premise.
For various collective-epistemics and cooperative-decision-making endeavours, I think a key technical enabler might be DVCS for structured data. To that end, I am interested in funding work in this direction. Aside from being in a position to allocate some funding, I think I have some comparative advantage in a broad inside-view awareness of potentially relevant theoretical footholds, and this post is intended to start unfurling that inside view, initially as a list of links. People who I fund to work on this should read the abstracts of all of these papers, pay special attention to those marked with (!), skim/read further as they see fit, and cite them in their writeups (at least under "related work"). I'm posting this here as part of a general policy of using this platform for any of my even-vaguely-technical output that goes beyond tweets.
I'm interested in all of the further topics you mentioned, but especially how to derive this setup from polynomial functors and how it gives rise to an ELK proposal.
My impression of the plurality perspective around here is that the examples you give (e.g. overweighting contemporary ideology, reinforcing non-truth-seeking discourse patterns, and people accidentally damaging themselves with AI-enabled exotic experiences) are considered unfortunate but acceptable defects in a "safe" transition to a world with superintelligences. These scenarios don't violate existential safety because something that is still recognizably humanity has survived (perhaps even more recognizably human than you and I would hope for).
I agree with your sense that these are salient bad outcomes, but I think they can only be considered "existentially bad" if they plausibly get "locked-in," i.e. persist throughout a substantial fraction of some exponentially-discounted future light-cone. I think Paul's argument amounts to saying that a corrigibility approach focuses directly on mitigating the "lock-in" of wrong preferences, whereas ambitious value learning would try to get the right preferences but has a greater risk of locking-in its best guess.
I think this order does satisfy Homogeneous Mixtures, but not Intermediate Mixtures. Homogeneous Mixtures is a theorem if you model lotteries as measures, because it’s asking that your preference ordering respect a straight-up equality of measures (which it must if it’s reflexive).
Intermediate Mixtures and Weak Dominance are asking that your preference ordering be willing to strictly order mixtures if it would strictly order their components in a certain way, and the ordering you’ve proposed preserves sanity by sometimes refusing to rank pathological mixtures.
It may not be too late; I believe Eric originated the initialism, and it hasn’t spread too widely yet. I too would vote for QLNR.
To anyone who is still not convinced—that last move, , is justified by Tonelli’s theorem, merely because (for all ).
The way I look at this is that objects like live in a function space like , specifically the subspace of that where the functions are integrable with respect to counting measure on and . In other words, objects like are probability mass functions (pmf). is , and is , and of anything else is . When we write what looks like an infinite series , what this really means is that we’re defining a new by pointwise infinite summation: . So only each collection of terms that contains a given needs to form a convergent series in order for this new to be well-defined. And for it to equal another , the convergent sums only need to be equal pointwise (for each , ). In Paul’s proof above, the only for which the collection of terms containing it is even infinite is . That’s the reason he’s “just calculating” that one sum.
Log-normal is a good first guess, but I think its tails are too small (at both ends).
Some alternatives to consider:
Of course, the best Bayesian forecast you could come up with, derived from multiple causal factors such as hardware and economics in addition to algorithms, would probably score a bit better than any simple closed-form family like this, but I'd guess literally only about 1 to 2 bits better (in terms of log-score).