I used your prompt experiment to get a similar result to the emergent misalignment finetuning paper. Unedited replication log: https://docs.google.com/document/d/1-oZ4PnxpVca_AZ0UY5tQK2GANO89UIFyTz5oBzEXnMg/edit?usp=sharing
That subliminal prompting experiment is so clean! Thanks for the writeup 🚀
I tried stacking top-k layers ResNet-style on MLP 4 of TinyStories-8M and it worked nicely with Muon, with fraction of variance explained reduced by 84% when going from 1 to 5 layers (similar gains to 5xing width and k), but the dead neurons still grew with the number of layers. However dropping the learning rate a bit from the preset value seemed to reduce them significantly without loss in performance, to around 3% (not pictured).
Still ideating but I have a few ideas for improving the information-add of Delphi:
Hey, I love this work!
We've had success fixing dead neurons using the Muon or Signum optimizers, or by adding a linear k-decay schedule (all available in EleutherAI/sparsify). The alternative optimizers also seem to speed up training a lot (~50% reduction).
To the best of my knowledge all dead neurons get silently excluded from the auto-interpretability pipeline, there's a PR just added to log this more clearly https://github.com/EleutherAI/delphi/pull/100 but yeah having different levels of dead neurons probably affects the score.
This post updates me towards trying out stacking more sparse layers, and towards adding more granular interpretability information.
Is there anything you recommend for understanding the history of the field?
Thanks for the comment, fair point! I found Vast.AI to be frustratingly unreliable when I started using it but it seems to have improved over the last three months, to the point where it feels comparable to (how I remember) LambdaLabs. LambdaLabs definitely has the best UI/UX though. I've amended the post to clarify.
I've had one great and one average experience with RunPod customer service, but haven't interacted with anyone from the other services.
Unless you have access to your own GPU or a private cluster, you will probably want to rent a machine. The best guide to getting set up with Vast.AI, the cheapest and (close to) the most reliable provider, is James Dao's doc: https://docs.google.com/document/d/18-v93_lH3gQWE_Tp9Ja1_XaOkKyWZZKmVBPKGSnLuL8/edit?usp=sharing
In addition,
The main providers are Vast.AI, RunPod, Lambda Labs, and newcomer TensorDock. Occasionally someone will rent out all the top machines over multiple providers, so be mentally prepared to switch between providers.
Lambda Labs has persistent storage in some regions, the most reliable machines, and the best UX. Their biggest issues... (read 480 more words →)
If we can patch a head from "The Eiffel Tower is in" to "The Colosseum is in" and flip the answer from Rome to Paris, that seems like strong evidence that that head contained the key information about the input being the Eiffel Tower!
[nit] I believe this part should read "Paris to Rome".
Logan I'm so glad you're alright!