Actually interested in your post and spaced repetition (SR) techniques, although I am not a specialist. I had no idea what means: "Anki," but it was on my LessWrong feed, so I gave it a look.
Consider adding a brief line to the intro - for those who, clueless like me, find their way to your post.
Something to identify what it is and most important acknowledge the developer - like: Anki is a free, open-source flashcard program that utilizes spaced repetition and active recall to help users memorize information effectively.
Elmes, D. (2024). Anki (Version 2.1.66) [Software]. https://apps.ankiweb.net/
https://en.wikipedia.org/wiki/Anki_(software)
Apologies if this is too basic or repeated elsewhere.
Thank you for the post.
Leo
Minor Point which may be mentioned in comments, but is the numbering in the subheads 'off' or 'deliberate?'
If you revise / reprint this post here or in your "Don't Worry About the Vase" substack, perhaps a note or correction?
Principles #1 / #4 / #8 / #10 / #13
Great post - I only noticed the numbering when I was making notes.
Cheers,
Is Lindsey using a nuanced definition of ""concept injection?
I am a non-specialist, just trying to follow and understand. I have to look up many [most] definitions and terms. This may ne a trivial matter, but for me to understand, definitions matter.
When I look up a meaning of "application steering" I find something more permanent. Has any discussion focused on Lindsey' use of the term concept injection as an application of activation steering: "We refer to this technique as concept injection—an application of activation steering"
To me the term suggests something in training that persists in the model, when in reality it's a per-query intervention. [Claude tells me that] another term would be: "inference-time modifications" that don't permanently alter the model - they're applied during each forward pass where the effect is desired.
If Activation Steering refers to 'modifying activations during a forward pass to control model behavior' and Concept Injection (Lindsey's usage) is a specific application where they "inject activation patterns associated with specific concepts directly into a model's activations" to test introspection, then isn't this much more like transient inference-time modifications that don't permanently alter the model.
[again, I am not a specialist ;-)]