LESSWRONG
LW

218
Itay Yona
26150
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
The Curious Case of the bos_token
Itay Yona3mo20

Great stuff! Shamelessly sharing my work: this emergent behavior is also the reason for a known LLM failure mode in repeating the same token.

Interpreting the Repeated Token Phenomenon in Large Language Models
https://arxiv.org/abs/2503.08908

 

Reply
Inferring the model dimension of API-protected LLMs
Itay Yona1y30

The true rank is revealed because the output dimensionality is vocab_size, which is >> hidden_dim. It is unclear how to get something equivalent to that from the cortex. It is possible to record multiple neurons (population) and use dimensionality reduction (usually some sort of manifold learning) to learn the true dimensionality of the population. It is useful in some areas of the brain such as the hippocampal formation.

Reply
Analogies between Software Reverse Engineering and Mechanistic Interpretability
Itay Yona3y30

Thanks, that's a good insight. The graph representation of code is very different than automated decompiling like hex-rays in my opinion. I agree that graph representation is probably the most critical step towards a more high-level analysis and understanding. I am not sure why you claim it required decades of tools because since the dawn of computer-science turing-machines were described with graphs. 

In any case this is an interesting point as it suggest we might want to focus on finding graph-like concepts which will be useful for describing the different states of a neural network computation, and later developing IDA-like tool :)

since we share similar backgrounds and aspiration feel free to reach out:

https://www.linkedin.com/in/itay-yona-b40a7756/

Reply
Analogies between Software Reverse Engineering and Mechanistic Interpretability
Itay Yona3y50

I strongly agree! When you study towards RE it is critical to understand lots of details about how the machine works, and most people I knew were already familiar with those. They were lacking the skills of using their low-level understanding to actually conduct useful research effectively.

It is natural to pay much less attention to 1->2 phase since there are much more intermediate researchers than complete newbies or experts. It is interesting because when discussing with the intermediate researchers they might think they are discussing with person 1 instead of person 3.

 

Thanks you gave me something to think about :)

Reply
What if memes are common in highly capable minds?
Answer by Itay YonaJun 06, 202210

[In my opinion]

Memes are self-replicating concepts (given you have enough humans to spread them). Highly capable minds are different as they contain predictive models of: world, self, and others. This allows them to manipulate both objects in the world, and other people to fulfill their needs. Since memes don't have these capacities, and even though they are related to human behavior, they should not be accounted as the cause of human behavior. Even if the best way to explain human behavior is through memes, they don't necessarily account of most of the decision-making process.

[/In my opinion]

Reply
10Reflections on Trusting Trust & AI
3y
1
34Analogies between Software Reverse Engineering and Mechanistic Interpretability
Ω
3y
Ω
6