LESSWRONG
LW

723
JoNeedsSleep
117480
Message
Dialogue
Subscribe

https://joneedssleep.github.io/ :)

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
1JoNeedsSleep's Shortform
1y
2
Mech Interp Wiki Page and Why You Should Edit Wikipedia
JoNeedsSleep1mo10

Hi Aryaman, thanks again for the great technical writeup in the mech interp article. Moving to the mech interp talk page to address the COI and RS concerns.

Reply
Mech Interp Wiki Page and Why You Should Edit Wikipedia
JoNeedsSleep2mo10

Good call! Linking the relevant pages.

Reply
Mech Interp Wiki Page and Why You Should Edit Wikipedia
JoNeedsSleep2mo40

I don't think they've become less important. Wikipedia is pretty heavily cited by LLMs when they go and do their own research in my experience, so Wikipedia articles are still valuable even if fewer humans visit it. 

On the point of Google not prioritizing it so heavily - I don't think Google indexes a lot of new Wikipedia articles but old established articles still top the search result. In our case, the mech interp wiki page never got indexed by Google until a Wikipedia New Page reviewer marked it as reviewed a couple days ago - now it's a top result.

Reply
College Advice For People Like Me
JoNeedsSleep6mo40

totally had Henry's voice playing while reading your comment

Reply2
JoNeedsSleep's Shortform
JoNeedsSleep8mo21

My best attempt at attempting to characterize Kant's Transcendental Idealism - Kant's idealism says that essence--not existence--is dependent on us. That is to say, what it is to be is dependent on how we understand. For example, the schema of classification in biology, such as genetic proximity, depends on what purposes they serve to us. What it is for animals to be depends, in other words, on the biologist. To draw the biology analogy ad absurdum, transcendental idealism says something like "the genetic composition is the condition of the possibility of how we are able to make sense of biological objects in the first place". The existence of these classification schema is dependent on our mind a priori.

Reply
JoNeedsSleep's Shortform
JoNeedsSleep1y41

The distinction between inner and outer alignment is quite unnatural. For example, even the concept of reward hacking implies the double-fold failure of a reward that is not robust enough to exploitation, and a model that develops instrumental capabilities as to find a way to trick the reward; indeed, in the case of reward hacking, it's worth noting that depending on the autonomy of the system in question, we could attribute the misalignment as inner or outer. At its core, this distinction comes out of the policy <-> reward scheme of RL, though prediction <-> loss function in SL can be similarly characterized; I doubt how well this framing generalizes to other engineering choices.

Reply
IMO challenge bet with Eliezer
JoNeedsSleep1y30

Eliezer seems on track to win: current AI benchmark for IMO geometry problems is at 27/30 (IMO Gold human performance is at 25.9/30). This new benchmark was set by LLM-augmented neurosymbolic AI.

Wu's Method can Boost Symbolic AI to Rival Silver Medalists and AlphaGeometry to Outperform Gold Medalists at IMO Geometry [2024 April]

Reply
Transformers Represent Belief State Geometry in their Residual Stream
JoNeedsSleep1y20

Thank you for the insightful post! You mentioned that:

Consider the relation a transformer has to an HMM that produced the data it was trained on. This is general - any dataset consisting of sequences of tokens can be represented as having been generated from an HMM.

and the linear projection consists of:

Linear regression from the residual stream activations (64 dimensional vectors) to the belief distributions (3 dimensional vectors).

Given any natural language dataset, if we didn't have the ground truth belief distribution, is it possible to reverse engineer (data → model) a HMM and extract the topology of the residual stream activation? 

I've been running task salient representation experiments on larger models and am very interested in replicating and possibly extending your result to more noisy settings.

Reply
75Mech Interp Wiki Page and Why You Should Edit Wikipedia
2mo
16
19Undergrad AI Safety Conference
8mo
0
9Call for Applications: XLab Summer Research Fellowship
8mo
0
1JoNeedsSleep's Shortform
1y
2
9Notes on Tuning Metacognition
1y
0