Jessica Rumbelow

Scientific Discovery in the Age of Artificial Intelligence

Cross-posted from the Leap Labs blog For many people, including me, the real promise of AI is massively accelerated scientific discovery. Chatbots, vibe coding, video generation: these things are magical, but what I really want is superhuman medicine, radical life extension, humanity blossoming out into the universe. Understanding the universe....

Jun 29, 202542

Why did ChatGPT say that? Prompt engineering and more, with PIZZA.

All examples in this post can be found in this notebook, which is also probably the easiest way to start experimenting with PIZZA. From the research & engineering team at Leap Laboratories (incl. @Arush, @sebastian-sosa, @Robbie McCorkell), where we use AI interpretability to accelerate scientific discovery from data. What is...

Aug 3, 202443

Introducing Leap Labs, an AI interpretability startup

We are thrilled to introduce Leap Labs, an AI startup. We’re building a universal interpretability engine. We design robust interpretability methods with a model-agnostic mindset. These methods in concert form our end-to-end interpretability engine. This engine takes in a model, or ideally a model and its training dataset (or some...

Mar 6, 2023104

SolidGoldMagikarp III: Glitch token archaeology

The set of anomalous tokens which we found in mid-January are now being described as 'glitch tokens' and 'aberrant tokens' in online discussion, as well as (perhaps more playfully) 'forbidden tokens', 'unspeakable tokens' and 'cursed tokens'. We've mostly just called them 'weird tokens'. GPT-3 speaks of 'the unspeakable one' when...

Feb 14, 202392

SolidGoldMagikarp II: technical details and more recent findings

tl;dr: This is a follow-up to our original post on prompt generation and the anomalous token phenomenon which emerged from that research. Work done by Jessica Rumbelow and Matthew Watkins in January 2023 at SERI-MATS. part of a typical semantically coherent cluster we found in GPT2-small's embedding space Clustering As...

Feb 6, 2023114

SolidGoldMagikarp (plus, prompt generation)

UPDATE (14th Feb 2023): ChatGPT appears to have been patched! However, very strange behaviour can still be elicited in the OpenAI playground, particularly with the davinci-instruct model. More technical details here. Further (fun) investigation into the stories behind the tokens we found here. Work done at SERI-MATS, over the past...

Feb 5, 2023675

Guardian AI (Misaligned systems are all around us.)

Work done @ SERI-MATS, idea from a conversation with Ivan Vendrov at Future Forum earlier this year. Misaligned systems are all around us. They are what make me watch another video of a man in filthy shorts building a hut using only tools made from rocks and his own armpit...

Nov 25, 202215

LESSWRONG
LW

LESSWRONG
LW

Jessica Rumbelow

Jessica Rumbelow

SolidGoldMagikarp (plus, prompt generation)

SolidGoldMagikarp II: technical details and more recent findings

Introducing Leap Labs, an AI interpretability startup

SolidGoldMagikarp III: Glitch token archaeology

Jessica Rumbelow

SolidGoldMagikarp (plus, prompt generation)

SolidGoldMagikarp II: technical details and more recent findings

Introducing Leap Labs, an AI interpretability startup

SolidGoldMagikarp III: Glitch token archaeology

Scientific Discovery in the Age of Artificial Intelligence

Why did ChatGPT say that? Prompt engineering and more, with PIZZA.

Introducing Leap Labs, an AI interpretability startup

SolidGoldMagikarp III: Glitch token archaeology

SolidGoldMagikarp II: technical details and more recent findings

SolidGoldMagikarp (plus, prompt generation)

Guardian AI (Misaligned systems are all around us.)