A philosophical argument against "the AI-fear".

New to LessWrong?

New Comment
17 comments, sorted by Click to highlight new comments since: Today at 9:38 AM

There’s a bad argument against AGI risk that goes kinda like this:

  1. Transformers will not scale to AGI.
  2. Ergo, worrying about AGI risk is silly.
  3. Hey, while you're here, let me tell you about this other R&D path which will TOTALLY lead to AGI … …
  4. Thanks for listening! (applause)

My read is that this blog post has that basic structure. He goes through an elaborate argument and eventually winds up in Section 10 where he argues that a language model trained on internet data won’t be a powerful agent that gets things done in the world, but, if we train an embodied AI with a robot body, then it could be a powerful agent that gets things done in the world.

And my response is: “OK fine, whatever”. Let’s consider the hypothesis “we need to train an embodied AI with a robot body in order to get a powerful agent that gets things done in the world”. If that’s true, well, people are perfectly capable of training AIs with robot bodies! And if that’s really the only possible way to build a powerful AGI that gets things done in the world, then I have complete confidence that sooner or later people will do that!!

We can argue about whether the hypothesis is correct, but it’s fundamentally not a crazy hypothesis, and it seems to me that if the hypothesis is true then it changes essentially nothing about the core arguments for AGI risk. Just because the AI was trained using a robot body doesn’t mean it can’t crush humanity, and also doesn’t mean that it won’t want to.

In Venkatesh’s post, the scenario where “people build an embodied AI with a robot body” is kinda thrown in at the bottom, as if it were somehow a reductio ad absurdum?? I’m not crystal clear on whether Venkatesh thinks that such an AI (A) won’t get created in the first place, or (B) won’t be able to crush humanity, or (C) won’t want to crush humanity. I guess probably (B)? There’s kinda a throwaway reference to (C) but not an argument. A lot of the post could be taken as an argument against (B), in which case I strongly disagree for the usual reasons, see for example §3.2 here (going through well-defined things that an AGI could absolutely do sooner or later, like run the same algorithms as John von Neumann’s brain but 100× faster and with the ability to instantly spin off clone copies etc.), or §1.6 here (for why radically superhuman capabilities seem unnecessary for crushing humanity anyway).

(Having a robot body does not prevent self-reproducing—the AGI could presumably copy its mind into an AI with a similar virtual robot body in a VR environment, and then it’s no longer limited by robot bodies, all it would need is compute.)

(I kinda skimmed, sorry to everyone if I’m misreading / mischaracterizing!)

I would ask him: why can an "AI" be better at chess than any human, but can't be better than any human at worldly games like politics, war, or conquest?

I bounced off this a short ways through because it seemed like it was focused on consciousness and something-it-is-like-to-be-ness, which just has very little to do with AI fears as commonly described on LessWrong. I tried skipping to the end to see if it would tie the gestalt of the argument together and see if I missed something.

Can you give a brief high level overview of who you're arguing against and what you think their position is? Or, what the most important takeaways of your position are regardless of whether they're arguing against anything in particular?

You’re saying “you”, but the blog post was written by Venkatesh Rao, who AFAIK does not have a LessWrong account.

I think that Rao thinks that he is arguing against AI fears as commonly described on LessWrong. I think he thinks that something-it-is-like-to-be-ness is a prerequisite to being an effective agent in the world, and that’s why he brought it up. Low confidence on that though.

Ah, whoops. Well, then I guess given the circumstances I'll reframe the question as "PointlessOne, what are you hoping we get out of this?"

Also, lol I just went to try and comment on the OP and it said "only paid subscribers can comment."

[-]lc2y20

If someone here has an existing subscription I'd love for them to use it to copy Steven Byrnes top level comment. Otherwise I'm gonna pay to do so reluctantly in the next couple hours.

Umm, different audiences have different shared assumptions etc., and in particular, if I were writing directly to Venkatesh, rather than at lesswrong, I would have written a different comment.

Maybe if I had commenting privileges at Venkatesh’s blog I would write the following:

My impression from Section 10 is that you think that, if future researchers train embodied AIs with robot bodies, then we CAN wind up with powerful AIs that can do the kinds of things that humans can do, like understand what’s going on, creatively solve problems, take initiative, get stuff done, make plans, pivot when the plans fail, invent new technology, etc. Is that correct? 

If so, do you think that (A) nobody will ever make AI that way, (B) this type of AI definitely won’t want to crush humanity, (C) this type of AI definitely wouldn’t be able to crush humanity even if it wanted to? (It can be more than one of the above. Or something else?)

(I disagree with all three, briefly because, respectively, (A) “never” is a very long time, (B) we haven’t solved The Alignment Problem, and (C) we will eventually be able to make AIs that can run essentially the same algorithms as run by adult John von Neumann’s brain, but 100× faster, and with the ability to instantly self-replicate, and there can eventually be billions of different AIs of this sort with different skills and experiences, etc. etc.)

I’m OK with someone cross-posting the above, and please DM me if he replies. :)

[-]lc2y20

Replied, we'll see.

I shared it as I though it might be interesting alternative view on the topic often discussed here. It was somewhat new to me, at least.

Sharing is not endorsement, if you're asking that. But it might be a discussion starter.

[-]lc2y61

In the same way smart Christians have a limited amount of time to become an atheist before they irrecoverably twist their minds into an escher painting justifying their theological beliefs, I propose people have a limited amount of time to see the danger behind bash+GPT-100 before they become progressively more likely to make up some pseudophilosophical argument about AI alignment being an ill posed question, and thus they're not gonna get eaten by nanobots.

Some random quotes (not necessarily my personal views, and not pretending to be a good summary):

“Sentience” is normally constructed as a 0-1 pseudo-trait that you either have or not, but could be conceptualized in a 0-1-infinity way, to talk about godly levels of evolved consciousness for example. A rock might have Zen Buddha-nature but is not sentient, so it is a 0. You and I are sentient at level 1. GPT-3 is 0.8 but might get to 1 with more GPUs, and thence to 100. At a super-sentience level of 1008 we would grant it god-status. I think this is all philosophical nonsense, but you can write and read sentences like this, and convince yourself you’re “thinking.”

Then:

AlphaGoZero becomes “proof” of the conceptual coherence of “super-intelligence,” because the “moves ahead” legible metric makes sense in the context of the game of Go (notably, a closed-world game without an in-game map-territory distinction). This sort of unconscious slippage into invalid generalizations is typical of philosophical arguments that rest on philosophical nonsense concepts.

And:

It’s like trying to prove ghosts exist by pointing to the range of material substantiveness that exists between solid objects and clouds of vapor, so you get to an “ectoplasm” hypothesis, which is then deployed as circumstantial “evidence” in favor of ghosts.

.

The problem arises because of a sort of motte-and-bailey maneuver where people use the same vague terms (like “intention”) to talk about thermostats, paperclip maximizers, and inscrutable god intelligences that are perhaps optimizing some 9321189-dimensional complex utility function that is beyond our puny brains to appreciate.

My personal summary of the point in the linked post, likely grossly inadequate, is that Hyperanthropomorphism is a wild, incoherent and unjustified extrapolation from assuming that "there is something to be like a person/monkey/salamander" to "there is something to be like superintelligent", which then becomes a scary monster to end us all unless it's tamed.

The author also constructs a steelmanned version of this argument, called Well-Posed God AIs:

Again, running with the best-case what-if (or what some would call worst-case), I think I know how to proceed. The key is two traits I’ve already talked about in the past that are associated with robots, but not disembodied AIs: situatedness, and embodiment.

...

Situatedness means the AI is located in a particular position in space-time, and has a boundary separating it from the rest of the world.

Embodiment means it is made of matter within its boundary that it cares about it in a specific way, seeking to perhaps protect it, grow it, perpetuate it, and so on.

...

Why are these important, and why don’t current AIs have them? The thing is, robots can have what I call worldlike experiences by virtue of being situated and embodied that an AI model like GPT-3 cannot. Being robot-like is not the only way for an AI to have a worldlike experience of being, but it is one way, so let’s start there.

[skipped a lot]

Why does SIILTBness [something it is like to be...] emerge in a human as it grows from infanthood? The answer, I think, is painfully simple: because we experience the actual world, which is in fact non-nonsensical to partition into I, and non-I, and proceed from there.

...

The blooming-buzzing confusion of an infant’s world is some blurry version of this experience, and the important thing is that it can bootstrap into a worldlike experience, because there is, in fact, a world out there to be experienced!

How it is different from AI:

But now consider our best modern AIs, trained on a very different sort of “blooming buzzing confusion” — the contents of the scraped internet.

Is that a worldlike experience? One that will naturally resolve into an I, not-I, sorted perceptions, map-territory distinctions, other minds to which SIILTBness can be imputed by the I, and so on?

The answer is clearly no. The sum of the scraped data of the internet isn’t about anything, the way an infant’s visual field is about the world.

and

Could you argue that the processes at work in a deep-learning framework are sufficiently close to the brain that you could argue the AI is still constructing an entirely fictional, dream-like surreal worldlike experience? One containing an unusual number of cat-like visual artifacts, and lacking a “distance” dimension, but coherent in other ways, a maya-world for itself? Perhaps it is evolving what Bruce Sterling has been calling “alt intelligence”?

Certainly. But to the extent the internet is not about the actual world in any coherent way, but just a random junk-heap of recorded sensations from it, and textual strings produced about it by entities (us) that it hasn’t yet modeled, a modern AI model cannot have a worldlike experience by overfitting The junk-heap. At best, it can have a surreal, dream-like experience. One that at best, encodes an experience of time (embodied by the order of training data presented to it), but no space, distance, relative arrangement, body envelope, materiality, physics, or anything else that might contribute to SIILTBness.

On reconciling the two:

So if you actually wanted to construct an AI capable of coherently evolving along trajectories that get to hyper-SIILTBness, and perhaps exhibiting super-traits of any sort, that’s where you’d start: by feeding it vast amounts of sensorily structured training data that admits a worldlike experience rather than a dreamlike one, induces an I for which SIILTBNess is at least a coherent unknown, cranking up various knobs to Super 11, and finally, waiting for it to turn superevil,5 or at least superindifferent to us.

Then finally, there might be an “alignment” problem that is potentially coherent and distinct enough from ordinary engineering risk management to deserve special attention. I assess the probability of getting there to be about the same as discovering that ghosts are real.

Again, this is just an incomplete summary from a cursory reading.

Take a list of tasks such as

  1. Winning a chess game
  2. Building a mars rover
  3. Writing a good novel

....

I think it is possible to make a machine that does well at all these tasks, not because it has a separate hardcoded subsection for each task, but because there are simple general principles, like occams razor and updating probability distributions, that can be applied to all of them. 

The existence of the human brain, which does pretty well at a wide variety of tasks, despite those tasks not being hard coded in by evolution, provides some evidence for this. 

AIXI is a theoretical AI that brute force simulates everything. It should do extremely well on all of these tasks. Do you agree that if we had infinite compute, AIXI would be very good at all tasks, including hacking its reward channel.

Do you agree that there is nothing magical about human brains? 

Like many philosophical arguments against superintelligence, is doesn't make clear where the intelligence stops. Can a single piece of software be at least 2500 dan at chess, and be able to drive a car for a million miles without accident? Can it do that and also prove the reinmann hypothesis and making a billion dollar startup? Looking at compute or parameters, you might be able to say that no AI could achieve all that with less than X flops. I have no idea how you would find X. But at least those are clear unambiguous technical predictions. 

Perhaps this is too much commentary (on Rao's post), but given (I believe) he's pretty widely followed/respected in the tech commentariat, and has posted/tweeted on AI alignment before, I've tried to respond to his specific points in a separate LW post.  Have tried to incorporate comments below, but please suggest anything I've missed.  Also if anyone thinks this isn't an awful idea, I'm happy to see if a pub like Noema (who have run a few relevant things e.g. Gary Marcus, Yann LeCun, etc.) would be interested in putting out an (appropriately edited) response - to try to set out the position on why alignment is an issue, in publishing venues where policymakers/opinion makers might pick it up (who might be reading Rao's blog but are perhaps not looking at LW/AF).  Apologies for any conceptual or factual errors, my first LW post :-)

Maybe, disclaimer.

  • I have no formal philosophical education. Nor do I have much exposure to the topic as an amateure.
  • Neither do I have any formal logic education but I have some exposure to the concepts in my professional endaevours.
  • These are pretty much unedited notes as read OP. At the moment I don't have much left in me to actually make a coherent argument out of them so you can treat them as random thoughts.

There might be something it is like to be a computer or robot at salamander levels of capability at least, or there might not. But it’s a well-posed possibility, at least if you start with Strong AI assumptions. My fellow AI-fear skeptics might feel I’m ceding way too much territory at this point, but yes, I want to grant the possibility of SIILTBness here, just to see if we get anywhere interesting.

But you know what does _not_, even at this thought-experiment level, possess SIILTBness? The AIs we are currently building and extrapolating, via lurid philosophical nonsense, into bogeymen that represent categorically unique risks. And the gap has nothing to do with enough transistors or computational power.

This is a weird statement. The AIs we currently build are basically stateless functions. That is because we don't let AIs change themselves. Once we invent a NN that can train itself and let it do it we're f… doomed in any case where we messed up the error function (or where NN can change it). STIILTBness implies continuity of experience. Saying that current AI doesn't have it, while, probably, factually correct at the moment, doesn't mean it can't ever.

In my opinion, there are two necessary conditions for hyperanthropomorphic projection to point to something coherent and well-posed associated with an AI or robot (which is a precondition for that something to potentially become something worth fearing in a unique way that doesn’t apply to other technologies):

  1. SIILTBness: There is something it is like to be it
  2. Hyperness: The associated quality of experience is an enhanced entanglement with reality relative to our own on one or more pseudo-trait dimensions

Almost all discussion and debate focuses on the second condition: various sorts of “super”-ness.

This is most often attached to the pseudo-trait dimension of “intelligence,” but occasionally to others like “intentionality” and “sentience” as well. Super-intentionality for example, might be construed as the holding of goals or objectives too complex to be scrutable to us.

In my experience, "intelligence" as a word might be used to refer to the entity but the actual pseudo-trait is a bit different. I'd call it "problem solving" followed, probably, by "intentionality" or "goal fixation". E.g. paperclip maximiser has "intention" to keep making paperclips and possesses capacity to solve any general problem such as coming up with novel approached to obtain raw material and neutralise interference from meatbags trying to stop being turned into paperclips.

a bus schedule printed on paper arguably thinks 1000 moves ahead (before the moves start to repeat), and if you can’t memorize the whole thing, it is “smarter” than you.

Does it though? A road construction is planned next Monday and diversion is introduced. Suddenly, schedule is very wrong but can not correct its predictions. Author mentioned Gettier earlier but here they seemingly forgot about that concept.

The sum of the scraped data of the internet isn’t about anything, the way an infant’s visual field is about the world. So anything trained on the text and images comprising the internet cannot bootstrap a worldlike experience. So conservatively, there is nothing it is like to be GPT-3 or Dalle2, because there is nothing the training data is about.

However, that is an experience. And it is world-like, but it can not be experience as you're used to. You can't touch or smell it but you can experience it. Likewise, you can manipulate it just not the way you're used to. You can not push anything a meter to the left but you can write a post on Substack.

You can argue this implies absence of AI STIILTBness but it can also point to non-transferrability of STIILTBness. Or rather to inadequacy of the term as defined. You as a human can experience what it's like to be a salamander because there's a sufficient overlap in ability to experience same things in a similar way but you can not experience what it's like to be a NN on the Internet because the experience is too alien.

This also meant that NN can not experience what it's like to be a human (or a salamander) but it doesn't mean it can not model humans in a sufficiently accurate way to be able to interact with humans. And NN on the Internet probably can gather enough information through its experiences of said Internet to build such a model.

Could you argue that the processes at work in a deep-learning framework are sufficiently close to the brain that you could argue the AI is still constructing an entirely fictional, dream-like surreal worldlike experience? One containing an unusual number of cat-like visual artifacts, and lacking a “distance” dimension, but coherent in other ways, a maya-world for itself? Perhaps it is evolving what Bruce Sterling has been calling “alt intelligence”?

Certainly. But to the extent the internet is not about the actual world in any coherent way, but just a random junk-heap of recorded sensations from it, and textual strings produced about it by entities (us) that it hasn’t yet modeled, a modern AI model cannot have a worldlike experience by overfitting The junk-heap. At best, it can have a surreal, dream-like experience. One that at best, encodes an experience of time (embodied by the order of training data presented to it), but no space, distance, relative arrangement, body envelope, materiality, physics, or anything else that might contribute to SIILTBness.

Internet is not completely separated from the physical world (which we assume is the base reality), though. Internet is full of webcams, sensors, IoT devices, space satellites, and such. In a sense, an NN on the Internet can experience real world better than any human: it can see through electron microscope the tiniest things and through telescopes the biggest and furthest things, it can see in virtually all EM spectrum, it can listen through thousands of microphones all over the planet at the same time in ranges wider than any ear can hear, it can register every vibration a seismograph detects (including those on Mars), it can feel atmospheric pressure all over the world, it can know how much water is in most rivers on the planet, it knows atmospheric composition everywhere at once.

It likewise can manipulate world in some ways more than any particular human can. Thousands of robotic arms all over the world can assemble all sorts of machines. Traffic can be diverted by millions of traffic light, rail switches, instructions to navigation systems on planes and boats. How much economic influence can an NN on the Internet have by only manipulating data on the Internet itself?

Incompleteness of experience does not mean absence or deficiency of STIILTBness. Humans until very recently had no idea that there was a whole zoo of subatomic particles. Did that deny or inhibit human STIILTBness? Doesn't seem like it. Now we're looking at quantum physics and still coming to grips with the idea that actual physical reality can be completely different to what we seem to experience. That, however, didn't make a single philosopher even blink.

Is this enough for it to pose a special kind of threat? Possibly. Psychotic humans, or ones tortured with weird, incoherent sensory streams might form fantastical models of reality and act in unpredictable and dangerous ways as a result. But this is closer to a psychotic homeless person on the street attacking you with a brick than a “super” anything. It is a “sub” SIILTBness, not “super” because the sensory experiences that drive the training do not admit an efficient and elegant fit to a worldlike experience that could fuel “super” agent-like behaviors of any sort.

Doesn't this echo the core concern of "the AI-fear view"? AGI might end up not human-like but capable of influencing world. Its "fantastical model of reality" can be just close enough so we end up "attacked with a brick".

So if you actually wanted to construct an AI capable of coherently evolving along trajectories that get to hyper-SIILTBness, and perhaps exhibiting super-traits of any sort, that’s where you’d start: by feeding it vast amounts of sensorily structured training data that admits a worldlike experience rather than a dreamlike one, induces an I for which SIILTBNess is at least a coherent unknown, cranking up various knobs to Super 11, and finally, waiting for it to turn superevil, or at least superindifferent to us.

This is a weird strategy to propose in a piece entitled "Beyond Hyperantropomorphism". It basically suggests recreation of an artificial human intelligence by going through the typical human experience and then, somehow, by "cranking up dials" achieve hyper-STIILTBness. I don't believe anyone on the "AI-fear" side of the argument actually worried about this specific scenario. After all if the AI is human-like there's not much of an alignment problem. The AI would already understand what humans want. We might still need to negotiate with it to convince it doesn't need to kill us all. Well, we have half of an alignment problem.

The other half is for the case that I believe is more likely: a NN on the Internet. In this case we would actually need to let it know what is that we actually want, which is arguably harder because to my knowledge no one ever has actually fully stated it to any degree of accuracy. OP dismisses this case on the basis of STIILTBness non-transferrability.


Overall, I feel like this is not a good argument. I have vague reservations on the validity of the approach. I don't see justification for some of the claim, but author openly admits that some claims have no justification provided:

I’m going to assume, without justification or argument, that we can attribute SIILTBness to all living animals above the salamander complexity point

I'm also not convince that there's solid logical progression from one claim to the next at every step but I'm a little to tired to investigate it further. Maybe it's just my lack of education rearing its ugly face.

In the end, I feel like author doesn't engage fully in good faith. There's a lot of mentions of the central concept of STIILTBness and even an OK introduction of the concept in the first two parts but the core of the argument seem to be left out.

They do not get what it means to establish SIILTBness or why it matters.

And author fully agrees that people don't understand why it matters while also not actually trying to explain why it does.

For me it's an interesting new way of looking at AI but i fail to see how it actually addresses "the AI-fear".