Itay Yona — LessWrong

Great stuff! Shamelessly sharing my work: this emergent behavior is also the reason for a known LLM failure mode in repeating the same token.

Interpreting the Repeated Token Phenomenon in Large Language Models
https://arxiv.org/abs/2503.08908

Inferring the model dimension of API-protected LLMs

Itay Yona2y30

The true rank is revealed because the output dimensionality is vocab_size, which is >> hidden_dim. It is unclear how to get something equivalent to that from the cortex. It is possible to record multiple neurons (population) and use dimensionality reduction (usually some sort of manifold learning) to learn the true dimensionality of the population. It is useful in some areas of the brain such as the hippocampal formation.

Analogies between Software Reverse Engineering and Mechanistic Interpretability

Itay Yona3y30

Thanks, that's a good insight. The graph representation of code is very different than automated decompiling like hex-rays in my opinion. I agree that graph representation is probably the most critical step towards a more high-level analysis and understanding. I am not sure why you claim it required decades of tools because since the dawn of computer-science turing-machines were described with graphs.

In any case this is an interesting point as it suggest we might want to focus on finding graph-like concepts which will be useful for describing the different states of a neural network computation, and later developing IDA-like tool :)

since we share similar backgrounds and aspiration feel free to reach out:

https://www.linkedin.com/in/itay-yona-b40a7756/

Analogies between Software Reverse Engineering and Mechanistic Interpretability

Itay Yona3y50

I strongly agree! When you study towards RE it is critical to understand lots of details about how the machine works, and most people I knew were already familiar with those. They were lacking the skills of using their low-level understanding to actually conduct useful research effectively.

It is natural to pay much less attention to 1->2 phase since there are much more intermediate researchers than complete newbies or experts. It is interesting because when discussing with the intermediate researchers they might think they are discussing with person 1 instead of person 3.

Thanks you gave me something to think about :)

What if memes are common in highly capable minds?

Answer by Itay YonaJun 06, 202210

[In my opinion]

Memes are self-replicating concepts (given you have enough humans to spread them). Highly capable minds are different as they contain predictive models of: world, self, and others. This allows them to manipulate both objects in the world, and other people to fulfill their needs. Since memes don't have these capacities, and even though they are related to human behavior, they should not be accounted as the cause of human behavior. Even if the best way to explain human behavior is through memes, they don't necessarily account of most of the decision-making process.

[/In my opinion]

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments