I think that more countries oficially warning the world about AI risk can do a lot to shift the overton window, which is very impactfull.
I somehow stumbled on this old post. I'm curious how your experiment with diffrent reinforcment scheduled worked.
My prediction is that your original, get one M&M for each pomodory (bundled when that is practical) worked best, and any exra randomness didn't help.
My reasoning. When ever I read about reinforcment and extinction, I run the following test: Would that outcome be predicted by assuming the animal was inteligently trying to predict what is going on? And the answer is always "yes".
E.g. Why is varied schedules harder to extinguish? Becasue it requires more evideince to make sure the reward is gone. If the reward is predictable, noticeing it's absense is easy, but if it's undpredictable, then you never know. If you're a lab animal.
When I apply this heuristic to your situation, then if you miss an M&M one day, you know what is going on, and you know that this is does not mean the M&M has stopped forever. This is very diffrent from animal training.
The latest Claude models, if asked to add two numbers together and then queried on how they did it, will still claim to use the standard “carry ones” algorithm for it.
Could anyone check if the lying feature activates for this? My guess is "no", 80% confident.
Thanks for responding.
I was imagining "local" meaning below 5 or 10 tokens away, partly anchored on the example of detokenisation, from the previous posts in the sequence, but also because that's what you're looking at. If your definition of "local" is longer than 10 tokens, then I'm confused why you didn't show the results for longer trunkations. I though the plot was to show what happens if you include the local context but cut the rest.
Even if there is specialisation going on between local and long range, I don't expect a sharp cutoff what is local v.s. non-local (and I assume neither do you). If some such soft boundary exists and it where in the 5-10 range, then I'd expect the 5 and 10 context lines to not be so correlated. But if you think the soft boundary is further away, then I agree that this correlation dosn't say much.
Attemting to re-state what I read from the graph: Looking at the green line, the fact that most of the drop in cosine similarity for t is in the early layers, suggests that longer range attention (more than 10 tokens away), is mosly located in the early layers. The fact that the blue and red line has their larges drops in the same regions, suggest that short-ish (5-10) and very short (0-5) attention is also mostly located there. I.e. the graph does not give evidence of range specialication for diffrent attention layes.
Did you also look at the statistics of attention distance for the attention patherns of various attention heads? I think that would be an easier way to settle this. Although mayne there is some techical dificulty in ruling out irrelevant attention that is just an artifact of attention needing to add up to one?
I don't think this plot shows what you claim it shows. This looks like no specialication of long range v.s. short range to me.
My main argument for this interpretataion is that the green and red lines move in almost perfect syncoronisation. This shows that attending to tokens that are 5-10 tokes away is done in the same layers as attending to tokens that are 0-5 tokens away. The fact that the blue line dropps more sharply only shows that close context is very important, not that it happens first, given that all three lines start dropping right away.
What it looks like to me:
(Very low confidence on 2-4, since the effects from previous lack of context can be amplified in later in-place prosessing, wich would confound any intepretation of the grah in later layers.)
I think piotrm's question/consern was if there is an injection that just taggs the sentence it's injected into as the correct sentence, no mather the question. One way to test this is to ask a diffrent question and see if this effects the resut.
A related thing I'd be interested in is weather or not some injecions where easier to localise, and what these injections where. And also how the strenght of the injection effects the localisation success.
I've come across mentions of this concepts a few times, and I had a very hard time getting the concept stick in my head. I remember that the concept felt wrong and/or aversive.
However I recently experienced a situation where this was the right tool. At some point in pondering the situation, my brain decided to reach for "split and commit", and now it just feel like a perfectly normal thing to do. And I also feel like I've done this mental move a bunch before, without having any specific name for it.
I can't replicate my previous reaction to the concept, so I don't know what's up with that.
A thing that is not in the essay but that I noticed my self:
In any complicated situation, I will not take one but many seperate action in response. Some of these will be the same in several possible worlds. In my current case in particular, there are more things I would do the same in the less likely world, than I would have naivly expected, if I didn't acctually concidered it. Noticing this was a relif. It helped me see that [things beeing the other way] did not force me to do something I didn't want.
I do exect some amount of superpossition, i.e. the model is using almost orthogonal directions to encode more concept than it has neurons. Depending on what you mean by "larger" this will result in a world model that is larger than the network. However such an encoding will also result in noise. Superpossition will nessesarely lead to unwanted small amplitude connections between uncorelated concepts. Removing these should imporve performance, and if it dosn't it means that you did the decomposition wrong.
This paper is on my to-read list :)
Is this something you're stilld doing?
(Just asking in general to keep track of what resouses exists.)