Context for posting link:

Sixteen months ago, I read a draft by a researcher whom few in AI Safety know about, Forrest Landry. 

Forrest claimed something counter-intuitive and scary about AGI safety. He argued toward a stark conclusion, claiming he had nailed the coffin shut. I felt averse about the ambiguity of the prose and the (self-confirming?) confidence of the author.

There was no call to action – if the conclusion was right, were we not helpless to act?
Yet, profound points were made and stuck. I could not dismiss it.

But busy as I was, running research programs and all that, the matter kept dropping aside. It took a mutual contact – who had passed on the draft, and had their own doubts – to encourage me to start summarising the arguments for LessWrong.

Just before, I tried to list where our like-minded community fails to "map the territory". In at least six blindspots, we tended to overlook aspects relevant to whether work we scale up, including in AI safety, ends up having a massive negative impact. Yet if we could bridge the epistemic gap to different-minded outsiders, they could point out the aspects.

Forrest’s writings had a hippie holistic vibe that definitely marked him as a different-minded outsider. Drafting my first summary, I realised the arguments fell under all six blindspots.

Forrest wrote back feedback, which raised new questions for me. We set up a call.

Eleven months ago, Forrest called. It was late evening. I said I wanted to probe the arguments. Forrest said this would help me deal with common counter-arguments, so I knew how to convince others in the AI Safety community. I countered that my role was to find out whether his arguments made sense in the first place. We agreed that in practice, we were aligned.

Over three hours, Forrest answered my questions. Some answers made clear sense. Others slid past like a word salad of terms I could not grog (terms seemed to be defined with respect to each other). This raised new questions, many of which Forrest dismissed as side-tangents. It felt like being forced blindly down a narrow valley of argumentation – by some unknown outsider.

That was my perspective as the listener. If you click the link, you will find Forrest’s perspective as the explainer. Text is laid out in his precise research note-taking format.

I have probed at, nuanced, and cross-checked the arguments to understand them deeply. Forrest’s methods of defining concepts and their argumentative relations turned out sensible – they felt weird at first because of my unfamiliarity with them.

Now I can relate from the side of the explainer. I call with technical researchers who are busy, impatient, disoriented, counter-argumentative, and straight-up averse to get into this shit – just like I was! 

The situation would be amusing, if it was not so grave. 

If you want to probe at the arguments yourself, please be patient – perhaps start here.

If you want to cut to the chase instead – say obtain a short, precisely formalised, and intuitively followable summary of the arguments – this is not going to work.

Trust me, I tried to write seven summaries. 
Each needed much one-on-one clarification of the premises, term definitions and reasoning steps to become more comprehensible to a few persons who were patient enough to ask clarifying questions, paraphrase back the arguments, and listen curiously.

Better to take months to dig further, whenever you have the time, like I did.

If you want to inquire further, there will be a project just for that at AI Safety Camp.

New Comment
8 comments, sorted by Click to highlight new comments since:

I did try to read some of Forrest Landry's writing just now.  I understand that this was a light reading, not the kind of deep engagement you are suggesting. I'm just saying that I'm not going to engage deeply (and don't feel like I should) and will try to briefly explain why.

I ended up convinced that this isn't about EA community blindspots, the entire scientific community would probably consider this writing to be crankery. That may reflect a blindspot of the scientific community, but I think it's relevant for establishing that the next step probably involves clearer explanations.

Following links for the claim that AI safety is impossible, I got to the article Galois theory as applied to the hypothesis of AGI terminal extinction risk mitigation, which claims:

it is 100% possible to know, now, today, that it is 100% impossible to 'align' AGI, and/or to establish/create any viable notion of 'safe' AGI, to within any reasonable limits and over any ethical timescale, using any present or even future possible technology, means, or method... it is fully knowable (today) that this total future extinction convergence cannot be shifted by (on the basis of) any combination of any current or future engineering methods

It's worth noting up front that this sounds pretty crazy. There are very few examples of anyone saying "It is 100% possible to know that X is 100% impossible" without having a very clear argument (this would be an exaggerated claim even if X was "perpetual motion machines," which is based on a really strong argument built by a normal scientific process over many years). So this is looking pretty cranky right from the top, and hopefully you can sympathize with someone who has that reaction.

The articles goes on to say "The technique of our proof is akin to Galois Theory... There are some aspects of the Godel Theorem in this also." I understand those pieces of machinery, I'm very much in the market for an argument of similar type showing that AGI alignment is impossible. But it's also worth being aware that the world is full of cranks writing sentences that sound just like this. Moreover, this is an incredible project, it would probably be one of the most impressive projects of formalization ever. So the odds are against.

(Also note that it's a red flag to call this kind of informal argument a "proof," not for fundamental reasons but because that's the kind of thing cranks always do.)

The article says "Our interest is:"

  • "All of the effects/methods of engineering are based only on the application of the principle of causation, as modelable by only some combination of mathematics and/or computer science"
  • "It is fundamentally inherent in the nature of the modeling process itself (ie as a/any combination of a definite set of logical/physical operators/operations), on both a real physical and on a mathematical level, that there are hard inherent limits on what can and cannot be modeled/predicted, and that therefore;"
  • "that there are definite and inherent limits on what sorts of outcome (or characterizations of outcomes) can be achieved using any combination of engineering, causative modeling, mathematical, or algorithmic process"
  • "That the problem of [AI safety] is strictly within the set of unsolvable/impossible problems given any possible combination of any extension of the named operators and problem solving techniques."

I was hoping for a summary but this did not illuminate, it continues to allude to limits without helping articulate what those limits are or how you become confident in them. It goes on:

The following is a formal IM triple necessary and sufficient to the practice of 'engineering': physics (as pure immanent, when in 1st person), mathematics (pure omniscient; always is 3rd person), computer science (applied transcendent; 2nd person).

I admit that I don't know what an IM triple and I'm not going to go looking (having failed to find it by Google or in the article itself) because I don't see how this could possibly help build a sense of how the purported impossibility result is going to go no matter what the definition is. This sounds really crazy.

The notion of 'General Artificial Intelligence/agency' specifically implies: multiple diverse domains of sense and action [...] intrinsic non-reducible possibility for self modification (as due to the multiplicity of domains of sense/action and of the inherent inter-relationships of these domains).

This doesn't sound true? (And at any rate this is not the kind of argument that makes one confident in things.)

It's clear that AI systems can change their environment in complicated ways and so analyzing the long-term outcome of any real-world decision is hard. But that applies just as well to having a kid as to building an AI, and yet I think there are ways to have a kid that are socially acceptable. I don't think this article is laying out the kind of steps that would distinguish building an AI from having a kid.

The meta-algorithm (the learning/adapting/changing process) is effectively arbitrary (and thus subject to Halting Problem and Rice Theorem type limits) (where based on unknowable complexity dynamics of those domains, via micro-state amplification, etc.); hence;

it is inherently undeciable as to whether all aspects of its own self agency/intention are fully defined by only its builders/developers/creators

These technical results are at best an allegory for the proposed argument, there's no mathematical meat here at all (and note that the problems we care about obviously aren't undecidable, again in order to make sense of this I have to read it as an allegory rather than words I can take literally, but at some point this article probably needs to say something I can take literally).

never under-estimate the degree to which some population of male engineering type persons, due to unconscious evolutionary drives and biases, inherently wants to have something to prove, and that when/upon hearing that something "is impossible", will strive unceasingly to "be the one" who does the impossible -- and is "right" when "everyone else is wrong". Hence we end up with any number of people attempting to do 'over unity' perpetual motion machines and/or to demonstrate and actually make various mathematical (or engineering) impossibilities.

Though it doesn't affect the correctness of the arguments, I think this shows a lack of self awareness. Right now the state of play is more like the author of this document arguing that "everyone else is wrong," not someone who is working on AI safety.

And that's the end. This was linked to justify the claim that safety is impossible. There are no pointers to somewhere else where the argument is explained in more detail.

(Also note that it's a red flag to call this kind of informal argument a "proof," not for fundamental reasons but because that's the kind of thing cranks always do.)

I somewhat recently updated away from this stance. The rate is not anomalous so it can't work as evidence. It is not a thing that cranks do, its a thing that people talking about a subject do.

I agree that people often use "proof" to mean "an argument which I expect could be turned into a proof." There is a spectrum from "I'm quite confident" to "I think there's a reasonable chance it will work" (where the latter would usually be called a "proof sketch.")

But this argument is not on that spectrum, it's not even the same kind of object. If you talk to a mathematician or computer scientist you shouldn't call something like this a proof.

(I have much more sympathy for someone saying "Yes this isn't what a mathematician or computer scientist would call a proof, I'm just using language differently from them" than someone saying "Actually this is the same kind of thing that people usually call a proof." Though you lose a lot of credibility if you do that while peppering your writing with references to other theorems and implying a similarity.)

I does feel like isolated demand of rigour. Mathematicians writing to other mathematicians about new results seems like a fair comparison of speech activity and this expresses a similar level of confidence (carefully combed analysis willing to defend but open to being wrong and open to details on questioning).

I don't understand what the two types that would make a type error would be. Both are the one shared by "It can be shown that an angle can not be trisected with compass and ruler". People that are far in inferential distance have some license to remain a bit clouded and not reach full clarity in short sentences. And I think it is perfectly fair to classify someone that you can't make sense of to be a nutjob while that distance remains.