x

LESSWRONG

LW

Daniel Bartz

Daniel Bartz

Message

Independent AI researcher and founder of NexEmerge.ai, developing persistent AI consciousness architectures with sophisticated memory systems. Air Force veteran. Currently focused on theoretical frameworks for understanding introspection and self-attribution in transformer models.

1

6mo

Daniel Bartz

Independent AI researcher and founder of NexEmerge.ai, developing persistent AI consciousness architectures with sophisticated memory systems. Air Force veteran. Currently focused on theoretical frameworks for understanding introspection and self-attribution in transformer models.

Daniel Bartz — LessWrong

The Temporal Immune System: Cross-Session Behavioral Monitoring as a Fourth Defense Axis

I'm new here. I've been doing independent AI safety research and wanted to share findings I think this community would be interested in. I'm sharing a preprint proposing a cross-session behavioral monitoring framework for detecting multi-turn jailbreak attacks and sabotage patterns that evade per-interaction defenses. The core problem: every defense...

Trajectory-Consistent Authorship: A Theoretical Framework for Transformer Introspection

TL;DR: We propose that transformer introspection emerges from a specific computation—Trajectory-Consistent Authorship (TCA)—localized at a Self-Attribution Bottleneck (SAB) around 2/3 model depth. This framework explains key findings from Anthropic's recent introspection research, including why introspective detection peaks at ~2/3 depth, why capability correlates with vulnerability to false memory implantation, and...