Most of what we currently call "feature discovery" in language models is wrapped up in dictionary-learning methods like sparse autoencoders (SAEs) – which work, and which have been scaled to millions of features on frontier-scale models, but which bundle two distinct commitments into a single training objective: a reconstruction loss...
Cross-posted from the Leap Labs blog For many people, including me, the real promise of AI is massively accelerated scientific discovery. Chatbots, vibe coding, video generation: these things are magical, but what I really want is superhuman medicine, radical life extension, humanity blossoming out into the universe. Understanding the universe....
All examples in this post can be found in this notebook, which is also probably the easiest way to start experimenting with PIZZA. From the research & engineering team at Leap Laboratories (incl. @Arush, @sebastian-sosa, @Robbie McCorkell), where we use AI interpretability to accelerate scientific discovery from data. What is...
We are thrilled to introduce Leap Labs, an AI startup. We’re building a universal interpretability engine. We design robust interpretability methods with a model-agnostic mindset. These methods in concert form our end-to-end interpretability engine. This engine takes in a model, or ideally a model and its training dataset (or some...
The set of anomalous tokens which we found in mid-January are now being described as 'glitch tokens' and 'aberrant tokens' in online discussion, as well as (perhaps more playfully) 'forbidden tokens', 'unspeakable tokens' and 'cursed tokens'. We've mostly just called them 'weird tokens'. GPT-3 speaks of 'the unspeakable one' when...
tl;dr: This is a follow-up to our original post on prompt generation and the anomalous token phenomenon which emerged from that research. Work done by Jessica Rumbelow and Matthew Watkins in January 2023 at SERI-MATS. part of a typical semantically coherent cluster we found in GPT2-small's embedding space Clustering As...
UPDATE (14th Feb 2023): ChatGPT appears to have been patched! However, very strange behaviour can still be elicited in the OpenAI playground, particularly with the davinci-instruct model. More technical details here. Further (fun) investigation into the stories behind the tokens we found here. Work done at SERI-MATS, over the past...