Transformers Represent Belief State Geometry in their Residual Stream

Ω 1124d

Produced while being an affiliate at PIBBSS^[1]. The work was done initially with funding from a Lightspeed Grant, and then continued while at PIBBSS. Work done in collaboration with @Paul Riechers, @Lucas Teixeira, @Alexander Gietelink Oldenziel, and Sarah Marzen. Paul was a MATS scholar during some portion of this work. Thanks to Paul, Lucas, Alexander, Sarah, and @Guillaume Corlouer for suggestions on this writeup.

Introduction

What computational structure are we building into LLMs when we train them on next-token prediction? In this post we present evidence that this structure is given by the meta-dynamics of belief updating over hidden states of the data-generating process. We'll explain exactly what this means in the post. We are excited by these results because

We have a formalism that relates training data to internal

...

(Continue Reading – 3335 more words)

dr_s3m20

Given that the model eventually outputs the next token, shouldn't the final embedding matrix be exactly your linear fit matrix multiplied by the probability of each state to output a given token? Could you use that?

2dr_s8m

This is extremely cool! Can you go into more detail about the step used to project the 64 dimensional residual stream to 3 dimensional space? Did you do a linear fit over a few test points and then used it on all the others?

2cousin_it2h

I have maybe a naive question. How much do we need to know to find the MSP image within the neural network? Is it only doable if we know the HMM to begin with? Or could it be feasible someday to inspect a neural network, find something that looks like an MSP image, and infer the HMM from it?

1Adam Shai10h

This all looks correct to me! Thanks for this.

Morpheus's Shortform

Morpheus

Morpheus13m10

Can anyone here recommend particular tools to practice grammar? Or with strong opinions on the best workflow/tool to correct grammar on the fly? I already know Grammarly and LanguageTool, but Grammarly seems steep at $30 per month when I don’t know if it is any good. I have tried GPT-4 before, but the main problems I have there, is that it is too slow and changes my sentences more than I would like (I tried to make it do that less through prompting, which did not help that much).

I notice that feeling unconfident about my grammar/punctuation leads me to wri... (read more)

How to know whether you are an idealist or a physicalist/materialist

JackOfAllTrades

22m

You have heard and perhaps even used the expression "observable universe", right? What is included in the purportedly observable universe? The moon? The whole of the moon? If you had heard the expression "observable universe" a century ago, would you have been including the far side of the moon in that category?

Nathan Young's Shortform

Nathan Young

Nathan Young1h20

I recall a comment on the EA forum about Bostrom donating a lot to global dev work in the early days. I've looked for it for 10 minutes. Does anyone recall it or know where donations like this might be recorded?

What's up with all the non-Mormons? Weirdly specific universalities across LLMs

mwatkins

tl;dr: Recently reported GPT-J experiments [1 2 3 4] prompting for definitions of points in the so-called "semantic void" (token-free regions of embedding space) were extended to fifteen other open source base models from four families, producing many of the same bafflingly specific outputs. This points to an entirely unexpected kind of LLM universality (for which no explanation is offered, although a few highly speculative ideas are riffed upon).

Work supported by the Long Term Future Fund. Thanks to quila for suggesting the use of "empty string definition" prompts, and to janus for technical assistance.

Introduction

"Mapping the semantic void: Strange goings-on in GPT embedding spaces" presented a selection of recurrent themes (e.g., non-Mormons, the British Royal family, small round things, holes) in outputs produced by prompting GPT-J to define...

(Continue Reading – 7902 more words)

3the gears to ascension3h

Claude is such a swell dude tbh. hope he's ok

Ann1h10

Hope so, yeah. I'm cautiously optimistic he's doing well by his standards at least.

1Ann12h

On the other end of the spectrum, asking cosmo-1b (mostly synthetic training) for a completion, I get `A typical definition of "" would be "the set of all functions from X to Y".`

4Gunnar_Zarncke14h

If I haven't overlooked the explanation (I have read only part of it and skimmed the rest), my guess for the non-membership definition of the empty string would be all the SQL and programming queries where "" stands for matching all elements (or sometimes matching none). The small round things are a riddle for me too.

Daniel Dennett has died (1942-2024)

114

kave

20h

This is a linkpost for https://dailynous.com/2024/04/19/daniel-dennett-death-1942-2024/

Daniel Dennett, professor emeritus of philosophy at Tufts University, well-known for his work in philosophy of mind and a wide range of other philosophical areas, has died.
Professor Dennett wrote extensively about issues related to philosophy of mind and cognitive science, especially consciousness. He is also recognized as having made significant contributions to the concept of intentionality and debates on free will. Some of Professor Dennett’s books include Content and Consciousness (1969), Brainstorms: Philosophical Essays on Mind and Psychology (1981), The Intentional Stance (1987), Consciousness Explained (1992), Darwin’s Dangerous Idea (1995), Breaking the Spell (2006), and From Bacteria to Bach and Back: The Evolution of Minds (2017). He published a memoir last year entitled I’ve Been Thinking. There are also several books about him and his ideas. You

...

(See More – 158 more words)

johnlawrenceaspden2h20

A Great Man and an inspiration to me and to this community and to all thinking men.

God rest his soul in peace in Paradise.

1tangerine4h

My introduction to Dennett, half a lifetime ago, was this talk: That was the start of his profound influence on my thinking. I especially appreciated his continuous and unapologetic defense of the meme as a useful concept, despite the many detractors of memetics. Sad to know that we won't be hearing from him anymore.

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)

Rationality Freiburg

Freiburg - Lightning Discussions

May 10thInnenhof, Rehlingstraße 9, Freiburg im Breisgau

omark, Bibhu kar

English: https://www.rationality-freiburg.de/events/2024-05-10-lightning-discussions/

Deutsch: https://www.rationality-freiburg.de/de/termine/2024-05-10-blitzdiskussionen/

Self-Blinded L-Theanine RCT

niplav

6mo

Value tracked	Effect size d (λ, p, σ change)	Effect size d (λ, p, σ change)
	200 mg Caffeine (n=1, m=50)	500 mg L-theanine (n=1, m=50)
Log-score substance prediction^[1]	-0.6	-0.7
Absorption	0.61 (λ=13.3, p=0.00017, -0.072)	0.04 (λ=1.38, p=0.77, -0.07)
Mindfulness	0.58 (λ=11.8, p=0.0007, 0.021)	0.12 (λ=0.72, p=0.89, -0.018)
Productivity	0.58 (λ=28.9, p=1.3^-12, 0.11)	-0.28 (λ=5.51, p=0.109, 0.03)
Creativity	0.45 (λ=51, p=4.6^-27, 0.09)	-0.12 (λ=5.05, p=0.14, -0.04)
Happiness	0.27 (λ=10.6, p=0.002, 0.3)	0.16 (λ=3.98, p=0.27, -0.155)
Contentment	0.13 (λ=7.66, p=0.02, 0.47)	0.25 (λ=6.83, p=0.04, -0.04)
Relaxation	-0.11 (λ=5, p=0.15, 0.42)	0.12 (λ=1.5, p=0.74, 0.02)
Chastity^[2]	-0.14 (λ=1.9, p=0.64, 0.11)	-0.03 (λ=1.15, p=0.8, 0.25)
Flashcard ease	0.003 (λ≈∞, p≈0, -0.009)	-0.072 (λ=∞, p≈0, -0.01)
Flashcard ease factor	-0.039 (λ≈∞, p≈0, -32.7)	0.0026 (λ=∞, p≈0, -18.9)
Flashcard new interval	0.011 (λ≈∞, p≈0, -1.88)	-0.016 (λ=∞, p≈0, 3.1)
Time per flashcard^[3]	0.006 (λ≈∞, p≈0, 273.4)	0.003 (λ=∞, p≈0, 13.66)

L-Theanine is synergistic with caffeine in regards to attention switching^[318] and alertness^[319]^[320] and reduces susceptibility to distractions (focus).^[320][321] However, alertness seems to be relatively subjective

...

(See More – 875 more words)

Mir3h10

Edit: I found the post usefwl, thankmuch!!

Mh, was gonna ask when you were taking it. I'm preparing to try it as a sleep-aid for when I adjust my polyphasic sleep-schedule (wanting to go fm 16h-cycles potentially down to 9h) bc it seems potentially drowsymaking and has much faster plasma decay-rate^[1] compared to alts. This is good for polyphasic if not want drowsy aft wake.

The data in ^[1] concerns 100mg tablets, however, and a larger dose (eg 400mg) may be longer. The kinetic model^[2] they use will prob be good estimate of p... (read more)

The Poker Theory of Poker Night

omark

13d

This is a linkpost for https://www.codeandbugs.com/post/poker-theory-poker-night/

Link to my own article. I removed the explanation of EV since I assume on LW that's not necessary.

A group of friends and I occasionally like to get together to play Poker. Yet something keeps happening that I have observed time and again with these kinds of group gatherings: It is hard to find a suitable date and then on top people cancel last minute. This is demotivating for other participants, who in turn also become less committed and this often leads to such groups failing.

Here is one theory of why this happens and how to solve it, explained with Poker. This article will assume Texas Hold'em Poker, probably the most popular variant.

tl;dr People's incentives are not aligned. The solution is to create a social rule that makes folding (canceling attendance) have a bit...

(Continue Reading – 2660 more words)

omark3h10

I'm gonna guess that you actually wouldn't make people pay for drinks if they said they missed because they had COVID, there was a death in the family, etc.?

This is a tough call. How do you determine what is a "legitimately bad enough" case to miss the event? The examples you mention are clearly bad enough but there are other situation where it's much more personal. If I'm feeling low on energy is that a choice I am making or an unavoidable fact about my metabolism? You would have to set up some kind of tribunal or voting for deciding on these cases. Th... (read more)

LESSWRONGDaniel Dennett has died, far too young (1942-2024)
LW

Recommendations

Latest Posts

Quick Takes

Popular Comments

Recent Discussion

Introduction

Introduction

LessOnline

A Festival of Writers Who are Wrong on the Internet

May 31 - Jun 2, Berkeley, CA