Aprillion (Peter Hozák)

https://peter.hozak.info

Posts

Sorted by New

3Aprillion (Peter Hozák)'s Shortform

16d

3Aprillion (Peter Hozák)'s Shortform

16d

3The Usefulness Paradigm

41Why square errors?

Wiki Contributions

Roko's Basilisk

(+208/-272)

Comments

Transformers Represent Belief State Geometry in their Residual Stream

Aprillion (Peter Hozák)7d10

To me as a programmer and not a mathematitian, the distinction doesn't make practical intuitive sense.

If we can create 3 functions f, g, h so that they "do the same thing" like f(a, b, c) == g(a)(b)(c) == average(h(a), h(b), h(c)), it seems to me that cross-entropy can "do the same thing" as some particular objective function that would explicitly mention multiple future tokens.

My intuition is that cross-entropy-powered "local accuracy" can approximate "global accuracy" well enough in practice that I should expect better global reasoning from larger model sizes, faster compute, algorithmic improvements, and better data.

Implications of this intuition might be:

myopia is a quantity not a quality, a model can be incentivized to be more or less myopic, but I don't expect it will be proven possible to enforce it "in the limit"
instruct training on longer conversations outght to produce "better" overall conversations if the model simulates that it's "in the middle" of a conversation and follow-up questions are better compared to giving a final answer "when close to the end of this kind of conversation"

What nuance should I consider to understand the distinction better?

Transformers Represent Belief State Geometry in their Residual Stream

Aprillion (Peter Hozák)9d94

transformer is only trained explicitly on next token prediction!

I find myself understanding language/multimodal transformer capabilities better when I think about the whole document (up to context length) as a mini-batch for calculating the gradient in transformer (pre-)training, so I imagine it is minimizing the document-global prediction error, it wasn't trained to optimize for just a single-next token accuracy...

Transformers Represent Belief State Geometry in their Residual Stream

Aprillion (Peter Hozák)9d20

Can you help me understand a minor labeling convention that puzzles me? I can see how we can label from the Z1R process as $η_{11}$ in MSP because we observe 11 to get there, but why $S_{1}$ is labeled as $η_{01}$ after observing either 100 or 00, please?

Aprillion (Peter Hozák)'s Shortform

Aprillion (Peter Hozák)16d10

Pushing writing ideas to external memory for my less burned out future self:

agent foundations need path-dependent notion of rationality
- economic world of average expected values / amortized big O if f(x) can be negative or you start very high
- vs min-maxing / worst case / risk-averse scenarios if there is a bottom (death)
alignment is a capability
- they might sound different in the limit, but the difference disappears in practice (even close to the limit? 🤔)
in a universe with infinite Everett branches, I was born in the subset that wasn't destroyed by nuclear winter during the cold war - no matter how unlikely it was that humanity didn't destroy itself (they could have done that in most worlds and I wasn't born in such a world, I live in the one where Petrov heard the Geiger counter beep in some particular patter that made him more suspicious or something... something something anthropic principle)
- similarly, people alive in 100 years will find themselves in a world where AGI didn't destroy the world, no matter what are the odds - as long as there is at least 1 world with non-zero probability (something something Born rule ... only if any decision along the way is a wave function, not if all decisions are classical and the uncertainty comes from subjective ignorance)
- if you took quantum risks in the past, you now live only in the branches where you are still alive and didn't die (but you could be in pain or whatever)
- if you personally take a quantum risk now, your future self will find itself only in a subset of the futures, but your loved ones will experience all your possible futures, including the branches where you die ... and you will experience everything until you actually die (something something s-risk vs x-risk)
- if humanity finds itself in unlikely branches where we didn't kill our collective selves in the past, does that bring any hope for the future?

Natural Latents: The Concepts

Aprillion (Peter Hozák)1mo30

Now, suppose Carol knows the plan and is watching all this unfold. She wants to make predictions about Bob’s picture, and doesn’t want to remember irrelevant details about Alice’s picture. Then it seems intuitively “natural” for Carol to just remember where all the green lines are (i.e. the message M), since that’s “all and only” the information relevant to Bob’s picture.

(Writing before I read the rest of the article): I believe Carol would "naturally" expect that Alice and Bob share more mutual information than she does with Bob herself (even if they weren't "old friends", they both "decided to undertake an art project" while she "wanted to make predictions"), thus she would weight the costs of remembering more than just the green lines against the expected prediction improvement given her time constrains, lost opportunities, ... - I imagine she could complete purple lines on her own, and then remember some "diff" about the most surprising differences...

Also, not all of the green lines would be equally important, so a "natural latent" would be some short messages in "tokens of remembering", not necessarily correspond to the mathematical abstraction encoded by the 2 tokens of English "green lines" => Carol doesn't need to be able to draw the green lines from her memory if that memory was optimized to predict purple lines.

If the purpose was to draw the green lines, I would be happy to call that memory "green lines" (and in that, I would assume to share a prior between me and the reader that I would describe as: "to remember green lines" usually means "to remember steps how to draw similar lines on another paper" ... also, similarity could be judged by other humans ... also, not to be confused with a very different concept "to remember an array of pixel coordinates" that can also be compressed into the words "green lines", but I don't expect people will be confused about the context, so I don't have to say it now, just keep in mind if someone squirts their eyes just-so which would provoke me to clarify).

It's OK to eat shrimp: EAs Make Invalid Inferences About Fish Qualia and Moral Patienthood

Aprillion (Peter Hozák)5mo4-2

yeah, I got a similar impression that this line of reasoning doesn't add up...

we interpret other humans as feeling something when we see their reactions

we interpret other eucaryotes as feeling something when we see their reactions 🤷

The Brain as a Universal Learning Machine

Aprillion (Peter Hozák)6mo10

(there are a couple of circuit diagrams of the whole brain on the web, but this is the best. From this site.)

could you update the 404 image, please? (link to the site still works for now, just the image is gone)

Features and Adversaries in MemoryDT

Aprillion (Peter Hozák)6mo10

S5

What is S5, please?

Are humans misaligned with evolution?

Aprillion (Peter Hozák)6mo32

I agree with what you say. My only peeve is that the concept of IGF is presented as a fact from the science of biology, while it's used as a confused mess of 2 very different concepts.

Both talk about evolution, but inclusive finess is a model of how we used to think about evolution before we knew about genes. If we model biological evolution on the genetic level, we don't have any need for additional parameters on the individual organism level, natural selection and the other 3 forces in evolution explain the observed phenomena without a need to talk about invididuals on top of genetic explanations.

Thus the concept of IF is only a good metaphor when talking approximately about optimization processes, not when trying to go into details. I am saying that going with the metaphor too far will result in confusing discussions.

Are humans misaligned with evolution?

Aprillion (Peter Hozák)6mo110

humans don't actually try to maximize their own IGF

Aah, but humans don't have IGF. Humans have https://en.wikipedia.org/wiki/Inclusive_fitness, while genes have allele frequency https://en.wikipedia.org/wiki/Gene-centered_view_of_evolution ..

Inclusive genetic fitness is a non-standard name for the latter view of biology as communicated by Yudkowsky - as a property of genes, not a property of humans.

The fact that bio-robots created by human genes don't internally want to maximize the genes' IGF should be a non-controversial point of view. The human genes successfully make a lot of copies of themselves without any need whatsoever to encode their own goal into the bio-robots.

I don't understand why anyone would talk about IGF as if genes ought to want for the bio-robots to care about IGF, that cannot possibly be the most optimal thing that genes should "want" to do (if I understand examples from Yudkowsky correctly, he doesn't believe that either, he uses this as an obvious example that there is nothing about optimization processes that would favor inner alignment) - genes "care" about genetic success, they don't care about what the bio-robots outght to believe at all 🤷