Abstract/Concrete Axis: Robustness Testing Semantic Feature Separation in Gemma 3 270M
In my previous post, I found that abstract and concrete prompts activate almost entirely separate feature clusters in Gemma 3 270M’s sparse autoencoder decomposition. The abstract side looked like reasoning operations (qualification, problems/issues); the concrete side looked like physical domain ontologies (composition, geology). The Jaccard similarity between their active feature...