davidad

Wiki Contributions

Comments

What's a good probability distribution family (e.g. "log-normal") to use for AGI timelines?
Answer by davidadApr 18, 2022Ω13

Log-normal is a good first guess, but I think its tails are too small (at both ends).

Some alternatives to consider:

  • Erlang distribution (by when will k Poisson events have happened?), or its generalization, Generalized gamma distribution
  • Frechet distribution (what will be the max of a large number of i.i.d. samples?) or its generalization, Generalized extreme value distribution
  • Log-logistic distribution (like log-normal, but heavier-tailed), or its generalization, Singh–Maddala distribution

Of course, the best Bayesian forecast you could come up with, derived from multiple causal factors such as hardware and economics in addition to algorithms, would probably score a bit better than any simple closed-form family like this, but I'd guess literally only about 1 to 2 bits better (in terms of log-score).

davidad's Shortform

“Concern, Respect, and Cooperation” is a contemporary moral-philosophy book by Garrett Cullity which advocates for a pluralistic foundation of morality, based on three distinct principles:

  • Concern: Moral patients’ welfare calls for promotion, protection, sensitivity, etc.
  • Respect: Moral patients’ self-expression calls for non-interference, listening, address, etc.
  • Cooperation: Worthwhile collective action calls for initiation, joining in, collective deliberation, sharing responsibility, etc. And one bonus principle, whose necessity he’s unsure of:
  • Protection: Precious objects call for protection, appreciation, and communication of the appreciation.

What I recently noticed here and want to write down is a loose correspondence between these different foundations for morality and some approaches to safe superintelligence:

  • CEV-maximization corresponds to finding a good enough definition of human welfare that Comcern alone suffices for safety.
  • Corrigibility corresponds to operationalizating some notion of Respect that would alone suffice for safety.
  • Multi-agent approaches lean in the direction of Cooperation.
  • Approaches that aim to just solve literally the “superintelligence that doesn’t destroy us” problem, without regard for the cosmic endowment, sometimes look like Protection.

Cullity argues that none of his principles is individually a satisfying foundation for morality, but that all four together (elaborated in certain ways with many caveats) seem adequate (and maybe just the first three). I have a similar intuition about AI safety approaches. I can’t yet make the analogy precise, but I feel worried when I imagine corrigibility alone, CEV alone, bargaining alone (whether causal or acausal), or Earth-as-wildlife-preserve; whereas I feel pretty good imagining a superintelligence that somehow balances all four. I can imagine that one of them might suffice as a foundation for the others, but I think this would be path-dependent at best. I would be excited about work that tries to do for Cullity’s entire framework what CEV does for pure single-agent utilitarianism (namely, make it more coherent and robust and closer to something that could be formally specified).

ELK Computational Complexity: Three Levels of Difficulty

So how are we supposed to solve ELK, if we are to assume that it's intractable?

A different answer to this could be that a "solution" to ELK is one that is computable, even if intractable. By analogy, algorithmic "solutions" to probabilistic inference on Bayes nets are still solutions even though the problem is provably NP-hard. It's up to the authors of ELK to disambiguate what they're looking for in a "solution," and I like the ideas here (especially in Level 2), but just wanted to point out this alternative to the premise.

davidad's Shortform

For various collective-epistemics and cooperative-decision-making endeavours, I think a key technical enabler might be DVCS for structured data. To that end, I am interested in funding work in this direction. Aside from being in a position to allocate some funding, I think I have some comparative advantage in a broad inside-view awareness of potentially relevant theoretical footholds, and this post is intended to start unfurling that inside view, initially as a list of links. People who I fund to work on this should read the abstracts of all of these papers, pay special attention to those marked with (!), skim/read further as they see fit, and cite them in their writeups (at least under "related work"). I'm posting this here as part of a general policy of using this platform for any of my even-vaguely-technical output that goes beyond tweets.

  • (!) A Categorical Theory of Patches
  • (!) The CALM Theorem: When Distributed Consistency is Easy
  • Logic and Lattices for Distributed Programming
  • (!) Double-pushout-rewriting of C-sets
    • This paper describes the theory and implementation of an operation (double-pushout-rewriting a.k.a. DPO-rewriting) on a data model (C-sets) which is what I'm currently most excited about exploring as a notion of "patch" (or "commit") in DVCS of structured data.
    • (!) One specific type of structured data I'm particularly interested in applying the proposed work to is hypernets, a combinatorial representation of string diagrams, defined alongside a double-pushout-rewriting scheme for them in section 4 of Functorial String Diagrams for Reverse-mode Automatic Differentiation. Ideally, this should be a special case of the above.
  • A Generalized Concurrent Rule Construction for Double-Pushout Rewriting
    • Describes when DPO rewrite rules are compatible in the sense that they can be merged into a single concurrent rule.
  • Replication-Aware Linearizability
    • This paper introduces a consistency criterion that is weak enough to be satisfied by most CRDTs, but stronger than mere eventual consistency. I would expect any DVCS to probably satisfy this criterion.
  • (!) Making CRDTs Byzantine Fault Tolerant
    • This short paper outlines some simple, git-like mechanisms for rejecting invalid messages that I would want to see in basically any decentralized system.
  • (!) Fixing incremental computation: Derivatives of fixpoints & the recursive semantics of Datalog
    • I would like the system to support incremental updating of the outputs of functions defined on the structured data (a.k.a. "views" in the database sense), and I think this paper describes a good building block for that. I have flagged it for special attention because I want compatibility with the Datalog model to be in the back of our minds as we make design decisions.
  • Knowledge Representation in Bicategories of Relations
    • This is an alternative data-model where database states are functors from  instead of functors from . This is in some ways closer to the relational database model, so there might be some interest in generalizing the C-Set DPO algorithm to "C-Rel"s. However, morphisms in  can be represented as spans in , and that's probably easier. 
  • Datalog: Bag Semantics via Set Semantics
  • Algebraic Property Graphs
  • Concise, Type-Safe, and Efficient Structural Diffing
REPL's: a type signature for agents

I'm interested in all of the further topics you mentioned, but especially how to derive this setup from polynomial functors and how it gives rise to an ELK proposal.

A broad basin of attraction around human values?

My impression of the plurality perspective around here is that the examples you give (e.g. overweighting contemporary ideology, reinforcing non-truth-seeking discourse patterns, and people accidentally damaging themselves with AI-enabled exotic experiences) are considered unfortunate but acceptable defects in a "safe" transition to a world with superintelligences. These scenarios don't violate existential safety because something that is still recognizably humanity has survived (perhaps even more recognizably human than you and I would hope for).

I agree with your sense that these are salient bad outcomes, but I think they can only be considered "existentially bad" if they plausibly get "locked-in," i.e. persist throughout a substantial fraction of some exponentially-discounted future light-cone. I think Paul's argument amounts to saying that a corrigibility approach focuses directly on mitigating the "lock-in" of wrong preferences, whereas ambitious value learning would try to get the right preferences but has a greater risk of locking-in its best guess.

Impossibility results for unbounded utilities

I think this order does satisfy Homogeneous Mixtures, but not Intermediate Mixtures. Homogeneous Mixtures is a theorem if you model lotteries as measures, because it’s asking that your preference ordering respect a straight-up equality of measures (which it must if it’s reflexive).

Intermediate Mixtures and Weak Dominance are asking that your preference ordering be willing to strictly order mixtures if it would strictly order their components in a certain way, and the ordering you’ve proposed preserves sanity by sometimes refusing to rank pathological mixtures.

QNR prospects are important for AI alignment research

It may not be too late; I believe Eric originated the initialism, and it hasn’t spread too widely yet. I too would vote for QLNR.

Impossibility results for unbounded utilities

To anyone who is still not convinced—that last move, , is justified by Tonelli’s theorem, merely because (for all ).

Impossibility results for unbounded utilities

The way I look at this is that objects like live in a function space like , specifically the subspace of that where the functions are integrable with respect to counting measure on and . In other words, objects like are probability mass functions (pmf). is , and is , and of anything else is . When we write what looks like an infinite series , what this really means is that we’re defining a new by pointwise infinite summation: . So only each collection of terms that contains a given needs to form a convergent series in order for this new to be well-defined. And for it to equal another , the convergent sums only need to be equal pointwise (for each , ). In Paul’s proof above, the only for which the collection of terms containing it is even infinite is . That’s the reason he’s “just calculating” that one sum.

Load More