x

LESSWRONG

LW

nika koghuashvili — LessWrong

nika koghuashvili

nika koghuashvili

Message

7

2

4

7mo

nika koghuashvili

7

7mo

Plastic Cake Fallacy

Alice and Bob are hanging out when the following happens: Alice: I'm hungry, can you bring me the cake from the fridge? Bob: Yeah one moment... Damn, I just checked and it looks like this cake is plastic. We can't eat this. Alice: Oh, damn, that sucks. Do you have...

Exploratory: a steering vector in Gemma-2-2B-IT boosts context fidelity on subtraction, goes manic on addition

First LessWrong post / early mech-interp experiment. I’m a software engineer entering this field; feedback on methodology and framing is very welcome. I started this as a hunt for a vector on paltering (deception using technically true statements), motivated by the Machine Bullshit paper and prior work on activation steering....