x

LESSWRONG

LW

Yuxiao — LessWrong

Yuxiao

Top postsTop post

Yuxiao

Message

I'm an AI safety researcher — mostly working on ways to see inside the systems we’ve built and understand what moves them. My background runs through statistical inference, machine learning, and generative models; lately I’ve been in the borderlands between mechanistic interpretability and probabilistic thinking,...

12

4

1

2y

Yuxiao

I'm an AI safety researcher — mostly working on ways to see inside the systems we’ve built and understand what moves them. My background runs through statistical inference, machine learning, and generative models; lately I’ve been in the borderlands between mechanistic interpretability and probabilistic thinking,...

From Oragnized Shelves to Layered Catalogs: Architectural Explorations for Sparse Autoencoders -- Crosscoders & Ladder SAEs Towards Hierarchical Data Structure

eby Yuxiao Li, Maxim Panteleev, Eslam Zaher, Zachary Baker, Maxim Finenko July 2025 | SPAR Spring '25 A post in our series "Feature Geometry & Structured Priors in Sparse Autoencoders" > TL;DR: We investigate how to impose and exploit structures across depths and widths in LLM internal representations. Building on...

Aug 10, 2025•3

From Messy Shelves to Master Librarians: Toy-Model Exploration of Block-Diagonal Geometry in LM Activations

by Yuxiao Li, Zachary Baker, Maxim Panteleev, Maxim Finenko June 2025 | SPAR Spring '25 A post in our series "Feature Geometry & Structured Priors in Sparse Autoencoders" > TL;DR: We explore the intrinsic block-diagonal geometry of LLM feature space--first observed in raw embeddings and family-tree probes--by measuring cosine-similarity heatmaps....

Jul 19, 2025•6

From Unruly Stacks to Organized Shelves: Toy Model Validation of Structured Priors in Sparse Autoencoders

by Yuxiao Li, Henry Zheng, Zachary Baker, Eslam Zaher, Maxim Panteleev, Maxim Finenko June 2025 | SPAR Spring '25 A post in our series "Feature Geometry & Structured Priors in Sparse Autoencoders" > TL;DR: We frame saprse autoencoding as a latent variable model (LVM) and inject simple correlated priors to...

Jul 6, 2025•9