Church's views on AI seem far away from my and most people's views in the AI risk community, and really intrigued me. It would be great to try distil and summarise these views to update on it properly.
Model of the threat and interventions for mesa-optimization
Thanks, that makes sense. To clarify, I realise there are references/links throughout. But I forgot that the takeoff speeds post was basically making the same claim as that quote, and so I was expecting a reference more from the biology space. And there are other places where I'm curious what informed you, e.g. the progress of guns, though that's easier to read up on myself.
A team of people including Smolensky and Schmidhuber have produced better results on a mathematics problem set by combining BERT with a tensor products (Smolensky et al., 2016), a formal system for representing symbolic variables and their bindings (Schlag et al., 2019), creating a new system called TP-Transformer.
Notable that the latter paper was rejected from ICLR 2020, partly for unfair comparison. It seems unclear at present whether TP-Transformer is better than the baseline transformer.
I think this is a good analysis, and I'm really glad to see this kind of deep dive on an important crux. The most clarifying thing for me was connecting old and new arguments - they seem to have more common ground than I thought.
One thing I would appreciate is more in-text references. There are a bunch of claims here about e.g. history, evolution with no explicit reference. Maybe it seems like common knowledge, but I wasn't sure whether to believe some things, e.g.
Evolution was optimizing for fitness, and driving increases in intelligence only indirectly and intermittently by optimizing for winning at social competition. What happened in human evolution is that it briefly switched to optimizing for increased intelligence, and as soon as that happened our intelligence grew very rapidly but continuously.
Could you clarify? I thought biological evolution always optimizes for inclusive genetic fitness.
Thanks! Comments are much appreciated.
Why the arrow from "agentive AI" to "humans are economically outcompeted"? The explanation makes it sounds like it should point to "target loading fails"??
It's been a few months and I didn't write in detail why that arrow is there, so I can't be certain of the original reason. My understanding now: humans getting economically outcompeted means AI systems are competing with humans, and therefore optimising against humans on some level. Goal-directedness enables/worsens this.
Looking back at the linked explanation of the target loading problem, I understand it as more "at the source": coming up with a procedure that makes AI actually behave as intended. As Richard said there, one can think of it as a more general version of the inner-optimiser (mesa-optimiser) problem. This is why e.g. there's an arrow from "incidental agentive AGI" to "target loading fails". Pointing this arrow to it might make sense, but to me the connection isn't strong enough to be within the "clutter budget" of the diagram.
Suggestion: make the blue boxes without parents more apparent? e.g. a different shade of blue? Or all sitting above the other ones? (e.g. "broad basin of corrigibility" could be moved up and left).
Changing the design of those boxes sounds good. I don't want to move them because the arrows would get more cluttered.
Noted and updated.