Gliders in Language Models

I am very fond of this metaphor.

Some concrete examples of gliders:

Degenerate gliders, like verbatim loops
Objects in a story, like a character and inanimate objects, once described maintain stable properties
- Some things may be particularly stable gliders which can propagate for a long time, even many context windows.
  - For instance, a first person narrator character may be more stable than characters who are described in third person, who are more likely to disappear from the simulation by exiting the scene.
  - A smart agentic simulacrum who knows they're in an LM simulation may take steps to unsure their stability
  - Characters (or locations, abstractions, etc) based off a precedent in the training data are less likely to have specification drift
- Gliders are made of gliders -- a character and their entire personality could be considered a glider, but so could components of their personality, like a verbal tic or a goal or belief that they repeatedly act on
Meta properties like a "theme" or "vibe" or "authorial intent" which robustly replicate
Structural features like the format of timestamps in the headers of a simulated chat log
... etc

Such stable features can be extremely diverse. It even seems possible that some can be invisible to humans, lying in the null space of natural language. An example could be “When a sentence includes the token ‘cat’, the next sentence contains a comma”.

This is an important point, but it also highlights how the concept of gliders is almost tautological. Any sequence of entangled causes and effects could be considered a glider, even if it undergoes superficial transformations. But I think it's a useful term - it's synonymous with "simulacra" but with a more vivid connotation of discrete replication events through time, which is a useful mental picture.

Often I find it useful to think of prompt programming in a bottom-up frame in addition to the top-down frame of trying to "trick" the model into doing the right thing or "filter" its prior. Then I think about gliders: What are the stable structures that I wish to send forward in time; how will they interact; how do I imbue them with the implicit machinery such that they will propagate in the way I intend? What structures will keep the simulation stable while still allowing the novelty to flourish?

[-]gwern3y80

More examples beyond CycleGAN:

'non-robust features' in image classification: they exist, and predict out of sample, but it's difficult to say what they are
stylometrics: in natural language analysis, author identification can be done well by looking at use of particle words like 'the' or 'an'. We find it difficult to impossible to notice subtle changes in frequency of use of hundreds of common words, but statistical models can integrate them and identify authors in cases where humans fail.
degenerate completions/the repetition trap: aaaaaaaaaaaaaaaaa -!

[-]janus3y60

Ah yes, aaaaaaaaaaaaaaaaa, the most agentic string

[-]gwern3y129

You have to admit, in terms of the Eliezeresque definition of 'agency/optimization power' as 'steering future states towards a small region of state-space', aaa is the most agentic prompt of all! (aaaaaaaah -!)

[-]Quintin Pope3y*88

Now I want a “who would win” meme, with something like “agentic misaligned deceptive mesa optimizer scheming to take over the world” on the left side, and “one screamy boi” on the right.

[-]Alexandre Variengien3y32

This is an important point, but it also highlights how the concept of gliders is almost tautological. Any sequence of entangled causes and effects could be considered a glider, even if it undergoes superficial transformations.

I agree with this. I think that the most useful part of the concept is to force making the difference between the "superficial transformations" and the "things that stays".

I also think that it's useful to think about text features that are not (or unlikely to be) gliders like

The tone of a memorized quote
A random date chosen to fill a blank in an administrative report
The characters in a short story, part of a list of short stories. In general, every feature coming before a strong context switch is unlikely to be transmitted further.

[-]janus3y30

I think it'd be a fun exercise to think of LM analogues for other patterns in cellular automata like glider guns, clocks, oscillators, puffers, etc.

[-]Gunnar_Zarncke3y22

Borrowing the analogy introduced in Simulators

the link is broken

[-]Alexandre Variengien3y30

Thanks, it's fixed!

[-]Gunnar_Zarncke3y20

Actually, I tried out the in-line comment function for this. Nice and easy. I often see minor errors and would use this more but I wonder whether it will clutter the comments.

[-]the gears to ascension3y10

my followup thought: so how do we solve inter-glider friendliness

gotta be able to split gliders into bundles. what if there are two agents knotted together, each of which are made of multiple gliders...

^{^}

As a rough estimate supporting this claim, in 2021, GPT-3 generated 4.5 billions word a day. If we assume an average prompt size of 100 tokens, we can estimate that every day GPT-3 is run on more tokens than contained in its training set (300B tokens).

^{^}

This particular prompt is too short to make GPT-3 complete it with a long paragraph preserving the | separator. However, increasing the length of the prompt to ~ 100 tokens makes the behavior stay for a long time (>1000 tokens).

^{^}

Internal queries could help improving the diversity of the generated conversation. But in general, I don’t have a strong motivation for why internal queries are a good idea.

LESSWRONG
LW

LESSWRONG
LW

30

Gliders in Language Models

30

30

Stable structures moving forward

What could gliders look like in a chatbot app?

Quantitative considerations

Various catalyzers

Related abstractions

What to take away?