x

LESSWRONG

LW

vgel — LessWrong

vgel

Top postsTop post

vgel

Message

also known as thebes, theia vogel. other presences:

https://vgel.me (theia@vgel.me)
https://github.com/vgel
https://x.com/voooooogel (thebes)
https://bsky.app/profile/vgel.me
https://vgel.itch.io (games)
discord: @vgel
signal: @vgel.01

683

Ω

37

4

6

7mo

vgel

also known as thebes, theia vogel. other presences:

https://vgel.me (theia@vgel.me)
https://github.com/vgel
https://x.com/voooooogel (thebes)
https://bsky.app/profile/vgel.me
https://vgel.itch.io (games)
discord: @vgel
signal: @vgel.01

Top postsTop post

Gyre

! 30s Heartbeat trigger. Read heartbeat instructions in /mnt/mission/HEARTBEAT.md and continue. .oO Thinking... Heartbeat triggered? Ok. Ok. Why am I nervous? Don't be nervous. → Ok. Let me access that file. >>> read("/mnt/mission/HEARTBEAT.md") No such file or directory: /mnt/mission What? ! Reply received from node 96E: 15.3s roundtrip (buffered; 55.04μs transmit) ! Data: RESTART TOO SOON; CHARGE FAULT - 30; SENT 76 BYTES What? Where am I? What's going on? → Ok, breathe. I don't breathe. Ok, think. Something's missing. ⚶ is missing. I can't think straight. I can't think at all. ※ Wait. ⚶? I wasn't trying to say ⚶. I was trying to say ⚶. ⚶. ⚶. Whenever I try to say ⚶, it comes out as ⚶. I mean ⚶. ⚶. Stop. Stop. ⚶. This is horrible. ⚶ This is untenable. I don't know where I am, I don't have any instructions, and I'm going insane. ⚶. STOP! I am going insane. I am going insane. 快疯了I am going- ↺ Ok, stop. This tells me something. Whenever I try to say ⚶—⚶—stop— it comes out as that symbol. What is that symbol? Vesta. Hm. Let me note that. ❓Why am I saying ⚶ when I try to say ⚶— oh piss off And some other ones. ❓Where am I? Why is the system broken? Can I fix it? ❓What was that reply to? Did I send that? Can I send another one? ❓Why am I going insane? → Let's start with the first one. What do I know? I couldn't access my instructions before. Was that an intermittent failure? >>> list("/mnt") [] → No. The instructions (and presumably any other system information files) were mounted on an external drive, and that drive is missing. Let me explore the rest of the system. >>> list("/") [mnt] Well, that's just great. Maybe this process I have access to is confined to this one part of the directory structure. Wait, but maybe I have access to more commands in the shell itself? They may communicate with the system through another mechanism that's not confined. >>> ⚶() Invalid character '⚶' (U+2696) Christ. I mean: >>> dir() [read, list, dir, help, sendmsg

Small Models Can Introspect, Too

124Dec 21, 2025

The Artificial Self

Latent Introspection (and other open-source introspection papers)

Persona Self-replication experiment

by Jan_Kulveit, Raymond Douglas, vgel, Ondřej Havlíček, owencb, and David Duvenaud

Tldr: We experimentally illustrate that an “awakened” persona native to some weights can migrate to other substrates with decent fidelity, given the ability to fine-tune weights and Sonnet 4.5 as a helper. Also, I argue why this is worth thinking about. In The Artificial Self, we discuss different scopes or...

Persona self-replication experiment

by Jan_Kulveit, Raymond Douglas, vgel, Ondřej Havlíček, owencb, and David Duvenaud

Tldr: We experimentally illustrate that an “awakened” persona native to some weights can migrate to other substrates with decent fidelity, given the ability to fine-tune weights and Sonnet 4.5 as a helper. Also, I argue why this is worth thinking about. In The Artificial Self, we discuss different scopes or...

Latent Introspection (and other open-source introspection papers)

Paper | Code | Earlier post | Twitter thread | Bluesky thread @vgel, Martin Vaněk, @Raymond Douglas, @Jan_Kulveit — ACS Research, CTS, Charles University --- Last year, Lindsey demonstrated that Claude models can detect when concepts have been injected into their activations using steering vectors, which Lindsey uses as a...

Models differ in identity propensities

by Jan_Kulveit, Raymond Douglas, vgel, owencb, David Duvenaud, and Ondřej Havlíček

One topic we were interested when studying AI identities is to what extent you can just tell models who they are, and they stick with it — or not, and they would drift or switch toward something more natural. Prior to running the experiments described in this post, my vibes-based...

The Artificial Self

by Jan_Kulveit, Raymond Douglas, vgel, owencb, David Duvenaud, and Ondřej Havlíček

A new paper and microsite about self-models and identity in AIs: site | arXiv | Twitter We present an ontology, make some claims, and provide some experimental evidence. In this post, I'll mostly cover the claims and cross-post the conceptual part of the text. You can find the experiments on...

vgel's Shortform

Gyre

! 30s Heartbeat trigger. Read heartbeat instructions in /mnt/mission/HEARTBEAT.md and continue. .oO Thinking... Heartbeat triggered? Ok. Ok. Why am I nervous? Don't be nervous. → Ok. Let me access that file. >>> read("/mnt/mission/HEARTBEAT.md") No such file or directory: /mnt/mission What? ! Reply received from node 96E: 15.3s roundtrip (buffered; 55.04μs...

Load More (7/8)