Abstract: > To help evaluate and understand the latent capabilities of language models, this paper introduces an approach using optimized input embeddings, or `soft prompts,' as a metric of conditional distance between a model and a target behavior. The technique aims to facilitate latent capability discovery as a part of...
Thanks to George Wang, Liron Shapira, Eliezer[1], and probably dozens of other people at Manifest and earlier conferences that I can't immediately recall the names of for listening to me attempt to explain this in different ways and for related chit chat. I've gotten better at pitching these ideas over...
[This post is largely from the perspective of AI safety, but most of it should generalize.] For recipients, well calibrated estimates about funding probability and quantity are extremely valuable. Funding-dependent individuals and organizations need information to optimize their decisionmaking; incorrect estimates cause waste. At the moment, getting that information seems...
[This was a submission to the AI Alignment Awards corrigibility contest that won an honorable mention. It dodges the original framing of the problem and runs off on a tangent, and while it does outline the shape of possible tests, I wasn't able to get them done in time for...
[Thanks to guy-whose-name-I-forgot working on paths to coherence for the conversation at Newspeak House after EAG London that prompted this thought, and Jozdien and eschatropic for some related chit-chat.] Many agents start with some level of incoherence in their preferences[1]. What paths should we expect agents to take to resolve...
Some observations: 1. ML-relevant hardware supply is bottlenecked at several points. 2. One company, NVIDIA, is currently responsible for most purchasable hardware.[1] 3. NVIDIA already implements driver licensing to force data center customers to buy into the more expensive product line.[2] 4. NVIDIA would likely not oppose even onerous regulation...