We've found a method that tells you: * How functionally similar two neural networks are across ALL inputs, * Computed solely from the weights (i.e. no data), * Using a principled generalization of cosine similarity. There's only one catch: you have to use a tensor network. We've already shown that...
If we have models that are 10x less efficient but completely interpretable[1],this would be a multi-billion dollar industry. If you just needed to train your bio-model 10x longer to [reverse-engineer human bio-markers for dementia], then you can now sell your product. Task-specific models are much, much smaller than general SOTA...
The ones who are most successful at writeathons (Inkhaven, NaNoWriMo) are those with an overhang of things to say, usually in the form of: * draft posts * daydreams When Scott Alexander said: > Whenever I see a new person who blogs every day, it's very rare that that never...
This is my project proposal for Pivotal. Apply as a mentee by May 3rd The field has accumulated a vocabulary of computational primitives (induction heads, skip-trigrams) through post-hoc analysis. We propose building a toy language from these known primitives to train tensor-transformers (see an early example in the last section)...
AKA scalable oversight of value drift TL;DR LLMs could be aligned but then corrupted through RL, instrumentally converging on deep consequentialism. If LLMs are sufficiently aligned and can properly oversee their training updates, we they can prevent this. SOTA models can arguably be considered ~aligned,[1] but this isn't my main...
What's the best case scenario regarding OpenAI's contract w/ the Department of War (DoW)? * We have access to the full contract * It's airtight * OAI's engineers are on top of things in case the DoW breaks the contract * There's actual teeth for violations But even then, the...
I'll have a weird dream and wake up in a funk. Be overwhelmed w/ work. Read lots of news/reddit and become very upset or angry. Obviously it's good to feel these things, but sometimes I continue to feel awful no matter how hard I try to "process my feelings" (or...