I suppose if you had more hidden states than observables, you could distinguish hidden-state prediction from next-token prediction by the dimension of the fractal.

Reply

Transformers Represent Belief State Geometry in their Residual Stream

Nisan9d20

If I understand correctly, the next-token prediction of Mess3 is related to the current-state prediction by a nonsingular linear transformation. So a linear probe showing "the meta-structure of an observer's belief updates over the hidden states of the generating structure" is equivalent to one showing "the structure of the next-token predictions", no?

Reply

The Waluigi Effect (mega-post)

Nisan24dΩ120

The subject of this post appears in the "Did you know..." section of Wikipedia's front page(archived) right now.

Reply

Modern Transformers are AGI, and Human-Level

Nisan1moΩ121513

I'm saying "transformers" every time I am tempted to write "LLMs" because many modern LLMs also do image processing, so the term "LLM" is not quite right.

"Transformer"'s not quite right either because you can train a transformer on a narrow task. How about foundation model: "models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks".

Reply

Modern Transformers are AGI, and Human-Level

Nisan1moΩ9137

I agree 100%. It would be interesting to explore how the term "AGI" has evolved, maybe starting with Goertzel and Pennachin 2007 who define it as:

a software program that can solve a variety of complex problems in a variety of different domains, and that controls itself autonomously, with its own thoughts, worries, feelings, strengths, weaknesses and predispositions

On the other hand, Stuart Russell testified that AGI means

machines that match or exceed human capabilities in every relevant dimension

so the experts seem to disagree. (On the other hand, Stuart & Russell's textbook cite Goertzel and Pennachin 2007 when mentioning AGI. Confusing.)

In any case, I think it's right to say that today's best language models are AGIs for any of these reasons:

They're not narrow AIs.
They satisfy the important parts of Goertzel and Pennachin's definition.
The tasks they can perform are not limited to a "bounded" domain.

In fact, GPT-2 is an AGI.

Reply

1

The Worst Form Of Government (Except For Everything Else We've Tried)

Nisan1mo4-10

Maybe the right word for this would be corporatism.

Reply

Simple versus Short: Higher-order degeneracy and error-correction

Nisan2mo82

I'm surprised to see an application of the Banach fixed-point theorem as an example of something that's too implicit from the perspective of a computer scientist. After all, real quantities can only be represented in a computer as a sequence of approximations — and that's exactly what the theorem provides.

I would have expected you to use, say, the Brouwer fixed-point theorem instead, because Brouwer fixed points can't be computed to arbitrary precision in general.

(I come from a mathematical background, fwiw.)

Reply

Prediction market: Will John Wentworth's Gears of Aging series hold up in 2033?

Nisan2mo20

For reference, here's the Gears of Aging sequence.

Reply

Importing a Python File by Name

Nisan3mo40

This article saved me some time just now. Thanks!

Reply

Newton's law of cooling from first principles

Nisan3mo20

Scaling temperature up by a factor of 4 scales up all the velocities by a factor of 2 [...] slowing down the playback of a video has the effect of increasing the time between collisions [....]

Oh, good point! But hm, scaling up temperature by 4x should increase velocities by 2x and energy transfer per collision by 4x. And it should increase the rate of collisions per time by 2x. So the rate of energy transfer per time should increase 8x. But that violates Newton's law as well. What am I missing here?

Reply