A note on 'semiotic physics'

[-]Bill Benzon3y40

Have you thought of exploring the existing literature on the complex dynamics of nervous systems. It’s huge, but it does use the math you guys are borrowing from physics.

I’m thinking in particular of the work of the late Walter Freeman, who is a pioneer in the field. Toward the end of his career he began developing a concept of “cinematic consciousness.” As you know the movement in motion pictures is an illusion created by the fact the individual frames of the image are projected on the screen more rapidly than the mind can resolve them. So, while the frames are in fact still, they change so rapidly that we see motion.

First I’ll give you some quotes from Freeman’s article to give you a feel for his thinking (alas, you’ll have to read the article to see how those things connect up), then I’ll explain what that has to do with LLMs. The numbers are from Freeman’s article.

[20] EEG evidence shows that the process in the various parts occurs in discontinuous steps (Figure 2), like frames in a motion picture (Freeman, 1975; Barrie, Freeman and Lenhart, 1996).
[23] Everything that a human or an animal knows comes from the circular causality of action, preafference, perception, and up-date. It is done by successive frames of self-organized activity patterns in the sensory and limbic cortices. [...]
[35] EEG measurements show that multiple patterns self-organize independently in overlapping time frames in the several sensory and limbic cortices, coexisting with stimulus-driven activity in different areas of the neocortex, which structurally is an undivided sheet of neuropil in each hemisphere receiving the projections of sensory pathways in separated areas. [...]
[86] Science provides knowledge of relations among objects in the world, whereas technology provides tools for intervention into the relations by humans with intent to control the objects. The acausal science of understanding the self distinctively differs from the causal technology of self-control. "Circular causality" in self-organizing systems is a concept that is useful to describe interactions between microscopic neurons in assemblies and the macroscopic emergent state variable that organizes them. In this review intentional action is ascribed to the activities of the subsystems. Awareness (fleeting frames) and consciousness (continual operator) are ascribed to a hemisphere-wide order parameter constituting a global brain state. Linear causal inference is appropriate and essential for planning and interpreting human actions and personal relations, but it can be misleading when it is applied to microscopic- microscopic relations in brains.

Notice that Freeman refers to “a hemisphere-wide order parameter constituting a global brain state.” The cerebral cortex consists of 16B neurons, each with roughly 10K connections. Further, all areas of the cortex have connections with subcortical regions. That’s an awful-lot of neurons communicating in parallel in a single time step. As I recall from another article, these frames occur at a rate of 6-7 Hz.

The nervous system operates in parallel. I believe it is known that the brain exhibits a small world topology, so all neurons are within a relatively small number links from one another. Though at any moment some neurons will be more active than others, they are all active – the only inactive neuron is a dead neuron. Similarly, ANNs exhibit a high degree of parallelism. LLMs are parallel virtual machines being simulated by so-called von Neumann machines. The use of multiple cores gives a small degree of parallelism, but that’s quite small in relation to the overall number of parameters the system has.

I propose that the process of generating a single token in an LLM is comparable to a single “frame” of consciousness in Freeman’s model. All the parameters in the system are visited during a single time-step for the system. In the case of ChatGPT I believe that’s 175B parameters.

Thus the assertion that ChatGPT generates one token at a time, based on the previous string, while true, is terribly reductive and thus misleading. The appearance of a token is in fact more or less a side-effect of evolving a trajectory from the initial prompt.

[-]metasemi3y11

Thanks very much for these comments and pointers. I'll look at them closely and point some others at them too.

[-]Bill Benzon3y40

You might also look at this:

Andrew M. Saxe, James L. McClelland, and Surya Ganguli, A mathematical theory of semantic development in deep neural networks, PNAS, vol. 116, no. 23, June 4, 2019, 11537-11546, https://www.pnas.org/content/116/23/11537

Abstract: An extensive body of empirical research has revealed remarkable regularities in the acquisition, organization, deployment, and neural representation of human semantic knowledge, thereby raising a fundamental conceptual question: What are the theoretical principles governing the ability of neural networks to acquire, organize, and deploy abstract knowledge by integrating across many individual experiences? We address this question by mathematically analyzing the nonlinear dynamics of learning in deep linear networks. We find exact solutions to this learning dynamics that yield a conceptual explanation for the prevalence of many disparate phenomena in semantic cognition, including the hierarchical differentiation of concepts through rapid developmental transitions, the ubiquity of semantic illusions between such transitions, the emergence of item typicality and category coherence as factors controlling the speed of semantic processing, changing patterns of inductive projection over development, and the conservation of semantic similarity in neural representations across species. Thus, surprisingly, our simple neural model qualitatively recapitulates many diverse regularities underlying semantic development, while providing analytic insight into how the statistical structure of an environment can interact with nonlinear deep-learning dynamics to give rise to these regularities.

[-]Bill Benzon3y10

BTW, the Annual Review of Condensed Matter Physics has an article on Statistical Mechanics of Deep Learning, by some people from Google Brain and Stanford. I believe the Annual Reviews are now all open access, so you might what to look around. The Annual Review of Linguistics might have some stuff for you.

[-]Bill Benzon3y10

You're welcome.

[-]Bill Benzon3y40

YES. In this paper I examine a number of trajectories from the "outside." https://www.academia.edu/97862447/ChatGPT_tells_stories_and_a_note_about_reverse_engineering_A_Working_Paper

[-]ejb3y*20

I'm familiar with semiotics and language models, but I don't understand why you're calling this "semiotic physics" instead of "computational linguistics".

Linguistics vs. semiotics - I think you might say, that with text tokens, we're not just talking about natural language, we're talking about arbitrary symbol systems. If an LLM can program, do math, learn dyck languages, or various icons like emojis, one might say that it's working with more than just natural language. However, I'd argue that (a) this is still a vastly limited subset of sign systems, and any sign system in an LLM is being encoded into a completely symbolic (i.e. text tokens, not icons or indexes) format, and (b) "linguistics" and "language" are widely-accepted terminology for NLP and LLMs, so this seems like a barrier of communication and extra translation cost for many readers. I'd also note that even psycholinguists that study human language production also account for non-symbolic sign systems in language use such as prosody (e.g. voice loudness is or can be iconic).

"As linguistically capable creatures, we experience the simulator's outputs as semantic. The tokens in the generated trajectory carry meaning, and serve as semiotic signs. This is why we refer to the simulator's physics-analogue as semiotic physics." Why did we jump from "linguistics" to "sign systems"? It's not because of semantics, and none of the analyses in the simulators seminar sequence seem to rely on terms or distinctions specific to semiotics and not linguistics, e.g. symbol-icon-index, or sign-object-interpretant.

I'm also aware there are deep historical and theoretical connections between linguistics and semiotics, e.g. Saussure's Course of General Linguistics, but you're not mentioning any of that here.
Physics vs. computational - I don't understand what makes this "physics of X" instead of "computational modeling of X". I get that you're talking about learning dynamics, but there's tons of "computational" work that does just this in computational linguistics (e.g. Partha Niyogi's work on computational models of language evolution is particularly "physics-y"), cognitive science, neuroscience, and A.I., and it doesn't seem to lose anything with using "computational" instead of "physics"

You say "In this analogical sense, a simulator such as GPT implements a "physics" whose "elementary particles" are linguistic tokens.". Linguistics is the subfield of science where the primitive units are linguistic tokens... Maybe I'm missing something here, but it seems like you can have these nice analogies with simulation and multiverses without calling it "physics".

How does this relate to computational semiotics?

[-]Bill Benzon3y20

If you don't mind, I'll make a remark.

If it had been up to me, which it most certainly wasn't and isn't, it would be called the complex dynamics of transformers, or perhaps LLMs, because that's what it seems to be. That's where the math is from and where it's been most developed. I just ignore the semiotics part of the name. In any event I tend to think the notion of semiotics has long been overgeneralized to the point where it has little meaning. As far as I can tell there's not much of a connection with any of the intellectual traditions that fly the semiotics flag.

As you say, there's "a barrier of communication and extra translation cost for many readers." Well, yeah, if anyone wants to publish or post this work outside of LessWrong, the terminology is likely to prove problematic. If you look around you'll find that that's an issue in several posts, communicating with the larger intellectual world. I have no idea how that's going to work out in the long run.

[-]metasemi3y20

Thank you for these comments - I look forward to giving the pointers in particular the attention they deserve. My immediate and perhaps naive answer/evasion is that semiotic physics alludes to a lower level analysis: more analogous to studying neural firing dynamics on the human side than linguistics. One possible response would be, "Well, that's an attempt to explain saying 'physics', but it hardly justifies 'semiotic'." But this is - in the sense of the analogy - a "physics" of particles of language in the form of embeddable tokens. (Here I have to acknowledge that the embeddings are generally termed 'semantic', not 'semiotic' - something for us to ponder.)

[-]ejb3y20

semiotic physics alludes to a lower level analysis: more analogous to studying neural firing dynamics on the human side than linguistics

Many classic debates in cognitive science and AI, e.g. between symbolism and connectionism, translate to claims about neural substrates. Most work with LLMs that I've seen abstracts over many such details, and seems in some ways more akin to linguistics, describing structure in high-level behavior, than neuroscience. It seems like there's lots of overlap between what you're talking about and Conceptual Role Semantics - here's a nice, modern treatment of it in computational cognitive science.

I think I kind of get the use of "semiotics" more than "physics". For example, with multi-modal LLMs the symbol/icon barrier begins to dissolve, so GPT-4 can reason about diagrams to some extent. The wikipedia entry for social physics provides some relevant context:

"More recently there have been a large number of social science papers that use mathematics broadly similar to that of physics, and described as "Computational social science"

[-]Bill Benzon3y10

You should check out Stephen Wolfram's long post What Is ChatGPT Doing … and Why Does It Work? Scroll down to the section, Meaning Space and Semantic Laws of Motion:

We discussed above that inside ChatGPT any piece of text is effectively represented by an array of numbers that we can think of as coordinates of a point in some kind of “linguistic feature space”. So when ChatGPT continues a piece of text this corresponds to tracing out a trajectory in linguistic feature space. But now we can ask what makes this trajectory correspond to text we consider meaningful. And might there perhaps be some kind of “semantic laws of motion” that define—or at least constrain—how points in linguistic feature space can move around while preserving “meaningfulness”?

He goes on to talk visualize a trajectory as GP-2 completes a sentence. Earlier in the article – the section on Neural Nets – he talks of "attractors" and "attractor basins."

[-]metasemi3y13

I did read this and agree with you that it's exactly the same as semiotic physics as understood here!

[-]Bill Benzon3y10

Come to think of it, I have a working paper that speculates on this sort of thing: Virtual Reading: The Prospero Project Redux.

You might also look at (notice the date): E. Alvarez-Lacalle, B. Dorow, J.-P. Eckmann, and E. Moses. Hierarchical structures induce long-range dynamical correlations in written texts. PNAS Vol. 103 no. 21. May 23, 2006: 7956–7961. doi: 10.1073/pnas.0510673103 Here's the abstract:

Thoughts and ideas are multidimensional and often concurrent, yet they can be expressed surprisingly well sequentially by the translation into language. This reduction of dimensions occurs naturally but requires memory and necessitates the existence of correlations, e.g., in written text. However, correlations in word appearance decay quickly, while previous observations of long-range correlations using random walk approaches yield little insight on memory or on semantic context. Instead, we study combinations of words that a reader is exposed to within a “window of attention,” spanning about 100 words. We define a vector space of such word combinations by looking at words that co-occur within the window of attention, and analyze its structure. Singular value decomposition of the co-occurrence matrix identifies a basis whose vectors correspond to specific topics, or “concepts” that are relevant to the text. As the reader follows a text, the “vector of attention” traces out a trajectory of directions in this “concept space.” We find that memory of the direction is retained over long times, forming power-law correlations. The appearance of power laws hints at the existence of an underlying hierarchical network. Indeed, imposing a hierarchy similar to that defined by volumes, chapters, paragraphs, etc. succeeds in creating correlations in a surrogate random text that are identical to those of the original text. We conclude that hierarchical structures in text serve to create long-range correlations, and use the reader’s memory in reenacting some of the multidimensionality of the thoughts being expressed.

^{^}

I recognize some may not be ready to stipulate that human-style semantics is a necessary component of the simulator's model. I think it is, but won't attempt to defend that in this brief note. Skeptics are invited to treat it as a hypothesis based on the ease and consistency with which GPT-3 can be prompted to produce text humans recognize as richly and densely meaningful, and to see testing this hypothesis as one of the goals of semiotic physics.

^{^}

It's in the nature of any analogy that the analogues are similar in some ways but not others. In this case, state changes in semiotic physics are many orders of magnitude coarser-grained (relative to the state) than those in quantum physics, the state space itself is infinitesimally smaller, the time evolution operator carries more information and more structure, and so on. We can look for hypotheses where things are similar and take caution where they're different, bearing in mind that the analogy itself is a prompt, not a theory.

^{^}

I don't attempt to explore alignment implications in this post, which is meant simply to introduce the high-level semiotic physics concept. Such issues are touched on in the original Simulators post and its comments.

^{^}

This said, it's worth emphasizing that simulacra need not be human, or animate, or agentic at all.

^{^}

There's no implication or expectation that the time evolution operator of semiotic physics will be representable in such a compact form as the Schrödinger equation. The balance of information load between state and time evolution operator in the simulator is very different from the analogous balance in quantum mechanics. In the latter, a relatively simple operator transforms a vast state, while in a GPT-like system, the state is many, many, many orders of magnitude simpler, and the operator—the simulator's trained model—comparatively vast. For its dynamics to be captured in a one-line formula would imply a surprising degree of compressibility.

^{^}

Again, this is dubious. But it must be premised even to arrive at this scenario.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

11

A note on 'semiotic physics'

11

Ω 3

11

Ω 3

Introduction

TL;DR

Trajectories

Simulators are multiverse generators

Semiotic physics

The semantic realm and the physical realm