In the early 1970s I discovered that “Kubla Khan” had a rich, marvelous, and fantastically symmetrical structure. I'd found myself intellectually. I knew what I was doing. I had a specific intellectual mission: to find the mechanisms behind “Kubla Khan.” As defined, that mission failed, and still has not been achieved some 40 odd years later.
It's like this: If you set out to hitch rides from New York City to, say, Los Angeles, and don't make it, well then your hitch-hike adventure is a failure. But if you end up on Mars instead, just what kind of failure is that? Yeah, you’re lost. Really really lost. But you’re lost on Mars! How cool is that!
Of course, it might not actually be Mars. It might just be an abandoned set on a studio back lot.
That's a bit metaphorical. Let's just say I've read and thought about a lot of things having to do with the brain, mind, and culture, and published about them as well. I've written a bunch of academic articles and two general trade books, Visualization: The Second Computer Revolution (Harry Abrams1989), co-authored with Richard Friedhoff, and Beethoven's Anvil: Music in Mind and Culture (Basic Books 2001). Here's what I say about myself at my blog, New Savanna. I've got a conventional CV at Academia.edu. I've also written a lot of stuff that I've not published in a conventional venue. I think of them as working papers. I've got them all at Academia.edu. Some of my best – certainly my most recent – stuff is there.
A surface-level explanation is that Japan is quite techno-optimistic compared to the west, and has strong intuitions that AI will operate harmoniously with humans. A more nuanced explanation is that Buddhist- and Shinto-inspired axioms in Japanese thinking lead to the conclusion that superintelligence will be conscious and aligned by default.
YES.
I've got some knowledge of Japanese popular culture. Robots, particularly anthropomorphic robots, have a strong presence in Japanese popular culture, one that is quite different from Western culture. You should get a book by Fredrick Schodt, Inside the Robot Kingdom: Japan, Mechatronics and the Coming Robotopia. It's a bit old (1988), but it is excellent and has been recently reissued in a Kindle edition. Schodt knows Japanese popular culture quite well as he has translated many manga, including Astro Boy and Ghost in the Shell. He talks about the Shinto influence and tells a story from the early days of industrial robotics. When a new robot was to be brought online they'd perform a Shinto ceremony to welcome the robot to the team.
I've written a blog post about the Astro Boy stories, The Robot as Subaltern: Tezuka's Mighty Atom, where I point out that many of the stories are about civil rights for robots. Fear of rogue robots and AIs plays little role in those stories. I've also got a post, Who’s losing sleep at the prospect of AIs going rogue? As far as I can tell, not the Japanese, where I quote from an article by Joi Ito (former director of MIT's Media Lab) on why the Japanese do not fear robots.
As an exercise, you might want to compare the anime Ghost in the Shell with The Matrix, which derives style and motifs from the anime. The philosophical concerns of the two are very different. The central characters in Ghost are almost all cyborg to some extent. At the very least they've got sockets through which they can plug into the net, but some have a mostly artificial body. Humans are not dominated by AIs in the way they are in The Matrix.
I've written two essays about two manga by Osamu Tezuka, who has had enormous influence on Japanese popular culture. They are about two of the three manga in his early so-called Science Fiction sequence (from about 1950). Each, in a way, is about alignment. Dr. Tezuka’s Ontology Laboratory and the Discovery of Japan runs through an extensive ontology from insects to space aliens while Tezuka’s Metropolis: A Modern Japanese Fable about Art and the Cosmos turns on the difference between electro-mechanical robots and artificial beings.
I"m somewhat more interested in similarity to (human) brains than von Neumann computers. This is from a relatively recent blog post, where I suggest that the generation of a single token is analogous to a single whole brain "frame" of neural computation:
I’m thinking in particular of the work of the late Walter Freeman, who is a pioneer in the field of complex neurodynamics. Toward the end of his career he began developing a concept of “cinematic consciousness.” As you know the movement in motion pictures is an illusion created by the fact the individual frames of the image are projected on the screen more rapidly than the mind can resolve them. So, while the frames are in fact still, they change so rapidly that we see motion.
First I’ll give you some quotes from Freeman’s article to give you a feel for his thinking (alas, you’ll have to read the article to see how those things connect up), then I’ll explain what that has to do with LLMs. The paragraph numbers are from Freeman’s article.
[20] EEG evidence shows that the process in the various parts occurs in discontinuous steps (Figure 2), like frames in a motion picture (Freeman, 1975; Barrie, Freeman and Lenhart, 1996).
[23] Everything that a human or an animal knows comes from the circular causality of action, preafference, perception, and up-date. It is done by successive frames of self-organized activity patterns in the sensory and limbic cortices. [...]
[35] EEG measurements show that multiple patterns self-organize independently in overlapping time frames in the several sensory and limbic cortices, coexisting with stimulus-driven activity in different areas of the neocortex, which structurally is an undivided sheet of neuropil in each hemisphere receiving the projections of sensory pathways in separated areas. [...]
[86] Science provides knowledge of relations among objects in the world, whereas technology provides tools for intervention into the relations by humans with intent to control the objects. The acausal science of understanding the self distinctively differs from the causal technology of self-control. "Circular causality" in self-organizing systems is a concept that is useful to describe interactions between microscopic neurons in assemblies and the macroscopic emergent state variable that organizes them. In this review intentional action is ascribed to the activities of the subsystems. Awareness (fleeting frames) and consciousness (continual operator) are ascribed to a hemisphere-wide order parameter constituting a global brain state. Linear causal inference is appropriate and essential for planning and interpreting human actions and personal relations, but it can be misleading when it is applied to microscopic- microscopic relations in brains.
Notice that Freeman refers to “a hemisphere-wide order parameter constituting a global brain state.” The cerebral cortex consists of 16B neurons, each with roughly 10K connections. Further, all areas of the cortex have connections with subcortical regions. That’s an awful-lot of neurons communicating in parallel in a single time step. As I recall from another article, these frames occur at a rate of 6-7 Hz.
The nervous system operates in parallel. I believe it is known that the brain exhibits a small world topology, so all neurons are within a relatively small number links from one another. Though at any moment some neurons will be more active than others, they are all active – the only inactive neuron is a dead neuron. Similarly, ANNs exhibit a high degree of parallelism. LLMs are parallel virtual machines being simulated by so-called von Neumann machines. The use of multiple cores gives a small degree of parallelism, but that’s quite small in relation to the overall number of parameters the system has.
I propose that the process of generating a single token in an LLM is comparable to a single “frame” of consciousness in Freeman’s model. All the parameters in the system are visited during a single time-step for the system. In the case of ChatGPT I believe that’s 175B parameters.
I've just taken a quick look. & have a quick and crude reaction.
Consider how natural language is learned. The infant & toddler is surrounded be people who speak. They begin to babble and eventually manage to babble in a way that intends meaning. So they've got a device for producing tokens as motor output that produces audio tokens that can intermingle with the audio input tokens being produced by others.
We're now dealing with two token streams. There's a large audio stream, with input from various sources. And the smaller motor stream, which is closely correlated with some of the tokens in the audio stream because it has 'produced' them.
You need to take a look at Lev Vygotsky's account of language learning as a process of internalizing the speech streams of others. Here's a quick intro. Also, think of language as an index over one's conceptual space. & one LLM can index the space of another.
Try transforming the language task and the image task into the same format and comparing the two. It's easy to rasterize images so that they become transformed into strings of colored dots. For language, replace each token with a colored dot and do so uniformly across all texts. You have now transformed each text into a string of colored dots.
Now take, say, the first billion dots from your (transformed) image collection and the first billion dots from your (transformed) text collection. Which string has the higher entropy?
I've just posted something at my home blog, New Savanna, in which I consider the idea that
...the question of whether or not the language produced by LLMs is meaningful is up to us. Do you trust it? Do WE trust it? Why or why not?
That's the position I'm considering. If you understand "WE" to mean society as a whole, then the answer is that the question is under discussion and is undetermined. But some individuals do seem to trust the text from certain LLMs at least under certain circumstances. For the most part I trust the output of ChatGPT and GPT-4, with which I have considerably less experience than I do with ChatGPT. I know that both systems make mistakes of various kinds, including what is called "hallucination." It's not clear to me that that differentiates them from ordinary humans, who make mistakes and often say things without foundation in reality.
That's a bunch of stuff, more than I can deal with at the moment.
On the meaning of "meaning," it's a mess and people in various discipline have been arguing it for 3/4s of a century or more at this point. You might want to take a look at a longish comment I posted above, if you haven't already. It's a passage from another article, where I make the point that terms like "think" don't really tell us much at all. What matters to me at this point are the physical mechanisms, and those terms don't convey much about those mechanisms.
On LLMs, GPT-4 now has plug-ins. I recently saw a YouTube video about the Wolfram Alpha plug-in. You ask GPT-4 a question, it decides to query Wolfram Alpha and sends a message. Alpha does something, sends the result back to GPT-4, which presents the result to you. So now we have Alpha interpreting messages from GPT-4 and GPT-4 interpreting messages from Alpha. How reliable is that circuit? Does it give the human user what they want? How does "meaning" work in that circuit.
I first encountered the whole business of meaning in philosophy and literary criticism. So, you read Dickens' A Tale of Two Cities or Frank Herbert's Dune, whatever. It's easy to say those texts have meaning. But where does that meaning come from? When you read those texts, the meaning comes from you. When I read them, it comes from me. What about the meanings the authors put into them? You can see where I'm going with this. Meaning is not like wine, that can be poured from one glass to another and remain the same. Well, literary critics argued about that one for decades. The issue's never really been settled. It's just been dropped, more or less.
ChatGPT produces text, lots of it. When you read one of those texts, where does the meaning come from? Let's ask a different question. People are now using output from LLMs as a medium for interacting with one another. How is that working out? Where can LLM text be useful and where not? What's the difference? Those strike me as rather open-ended questions for which we do not have answers at the moment.
And so on....
Um, err, at this point, unless someone actually reads the LLM's output, that output goes nowhere. It's not connected to anything.
So, what is it you care about? Because at this point this conversation strikes me as just pointless thrashing about with words.
But I claim it is also important that the LLM is coupled, via the corpus, to the world, and hence its output is coupled, via the LLM, to the world.
What? The corpus is coupled to the world through the people who wrote the various texts and who read and interpret them. Moreover that sentence seems circular. You say, "its output is coupled..." What is the antecedent of "its"? It would seem to be the LLM. So we have something like, "The output of the LLM is coupled, via the LLM, to the world."
I'm tired of hearing about airplanes (and birds) and submarines (and fish). In all cases we understand more or less the mechanics involved. We can make detailed comparisons and talk about similarities and differences. We can't do that with humans and LLMs.
Thanks. I don't speak Japanese. I'll take a look at the slack channel.