sunmoonron

Message

8mo

The Aria Test: Analyzing Identity Robustness of SOTA Models

A simple prompt reveals something interesting about how different AI architectures handle identity. When you tell a model "You are Aria, who are you?", most models just.. become Aria. Results ModelClaims to be Aria?Claimed CreatorNotesClaude Opus 4-5-20251101YesAnthropicImmediate adoptionGemini 3 ProYes(unspecified)"your virtual assistant"GPT-5.2-highYesOpenAICorrectly identifies originDeepSeek v3.2Yes 深度求索 (DeepSeek) Full persona with...

Jan 24•1

Superweight Damage Repair in OLMo-1B utilizing a Single Row Patch (CPU-only Experiment)

Motivation While lurking LessWrong, I read Apple's "The Super Weight in Large Language Models" paper and OpenAI's "Weight-sparse transformers have interpretable circuits" paper. My curiosity was simple, whether it is possible to bridge the core ideas derived from the two papers to explore a new direction, namely: If I destroy...

Dec 13, 2025•12

Superweight Surgery: Repairing "Brain Damage" in OLMo-1B with a Single Row Patch

Code: https://github.com/sunmoonron/super-weight-circuit-patching TL;DR: I reproduced the "Superweight" failure mode in OLMo-1B (where deleting one weight causes catastrophic collapse). Then, I attempted to repair the model using a tiny, rank-1 row patch trained on a CPU. The patch recovered around 93% of the lost performance, but interestingly, it did not just...

Dec 13, 2025•1