flatstats

Message

4mo

flatstats

4mo

flatstats — LessWrong

Mechanisms of Introspective Awareness

flatstats1mo10

Okay yeah I agree punctuation matters semantically and that some model behavior is expected. The reason I don't think it's strictly that is, in at least one of my project's runs, the visible response at turn index 10 did not change at all, while the internal attention/geometry deltas still spiked. The behavioral difference only surfaced at turn index 11. So this would mean surface level equivalent completions can still carry state divergence forward.

This is why I am comparing it to "When Models Manipulate Manifolds" paper, its not that question marks or el... (read more)

Mechanisms of Introspective Awareness

flatstats1mo20

Ah yeah I was using “anomaly tiling” as shorthand for the description in the paper of upstream features that detect anomalies along preferred directions and collectively tile the space of possible anomalies. And by “upstream carrier population,” I meant that set of upstream weak evidence-carrier features before the gates. So I’m sorry for the confusion, I was trying to compress a lot.

As for why I punctuation sensitivity matters, for me it caught my attention because the outputs from a single punctuation change gave significantly different responses and I f... (read more)

Mechanisms of Introspective Awareness

flatstats2mo10

So this paper has me genuinely pretty excited, because I think this could be the right direction to be looking at when we think about introspection in something as complex as Large Language Models. I’ve been taking a similar approach to investigating this phenomenon but from a different angle. Something I noticed while actually trying to look into how system reminders affect model behavior, was that a simple punctuation change with the same prompt using the same seed could create wildly different outputs. To test this, I ran experiments on Llama 3.1 8B wi... (read more)