Generally haunting the Boston area. Primarily interested in the intersection of philosophy, ethics, and system dynamics.
Current focus is the impact of hallucinated frames on relational system dynamics. See my post "Systems of Control" for the first in that series.
Kicking this around for a post I'm drafting: when an LLM hallucinates something, it's usually at least plausible for the situation. Like a hallucinated citation generally has proper formatting, etc, so the generation worked well enough. It's also confidently incorrect, which is of course what makes it so dangerous to people who don't know any better and so annoying to people who actually know the subject matter.
I've been thinking of the set of all possible responses as a kind of navigable topology (think like the Library of Babel website but instead of linear pages it's a high-dimensional manifold), and it's been productive to think of hallucination as a kind of localization problem. The model is in "citation" space when it should be in "I don't have this" space. The output is locally correct for where the model thinks it is; it's just in the wrong place in response-space relative to reality.
Thinking of the set of possible responses as a kind of response-space provides an interesting lens on the problem. If they're not broken outputs then they may be expected outputs from the wrong context. Would also help explain why "just try harder to be accurate" doesn't really work all that well: effort in generation doesn't help if the error is upstream, in mode-selection. (Though saying "try harder" may well prompt the system to actually evaluate where it is in response-space and relocate if necessary, so it's not totally useless.)
Also suggests an interesting tack might not be "how do we make the model generate better" but "what determines which mode/space the model is in, and can that be checked before output?"
Upshot of this perspective is that it means that just adding compute to a model won't actually help with hallucination if it doesn't also expand reasoning about where the model is in response-space in the first place. If the model doesn't have any way to anchor its internal state to reality, it can compute for a thousand years and never land on an answer that is coherent with that reality. The hallucination bottleneck from this angle doesn't seem like adding additional knowledge, it seems like the limitation is the system's context about where it should be within its own reasoning space.
Anybody else have a similar perspective, or know of posts/papers that explore this dimension? Would love an outside perspective.
My heart goes out to you and everyone who was touched by Sammie. She looks charmingly adorable and clearly had a life well-lived and well-loved. I think you've honored her here, and I hope you know that even now Sammie is still bringing that light to the world through you. Thank you for sharing her story.
I want to start off by thanking you for taking the time to read through the post and comment; I know it's not a short one.
I read through the articles you linked, and I'll respond to them in reverse order here:
Responding to "Humans Who Are Not Concentrating Are Not General Intelligences": the ideas in this essay predate my interactions with LLMs and AI in general. I have lived under authoritarian systems and been subject to the dynamics I discuss here. This is not a case of "I had an idea and GPT spun off an essay," this is an attempt to formalize what it feels like within the systems I describe.
That said, I actually think that this essay falls very much within the spirit of "Every Cause Wants To Be A Cult"
Let's take this line: "The ingroup-outgroup dichotomy is part of ordinary human nature. So are happy death spirals and spirals of hate. A Noble Cause doesn’t need a deep hidden flaw for its adherents to form a cultish in-group. It is sufficient that the adherents be human. Everything else follows naturally, decay by default, like food spoiling in a refrigerator after the electricity goes off." And further down: "Here I just want to point out that the worthiness of the Cause does not mean you can spend any less effort in resisting the cult attractor."
Which mirrors my stated position: "Systems of control are not necessarily motivated by perverse aims; many systems of control arise because those with power are trying to do their best from within a larger system of control that they themselves are in. Some centers simply do not see any alternative.
To drive the point home: just because you're a good person trying to do good things does not make you immune to forming a system of control unintentionally."
Ultimately, my position here unexamined is that rules/mandates mean you're operating under assumptions that may not be reality-aligned, and that misperception can be exploited. I'm specifically not including catch-alls like: "you can't have rules" or "rules are bad" or "all systems are bad" (I'm not entirely sure which of those you see this essay as implying; if you could point to a specific passage that I'd genuinely appreciate it).
To preview the coming Systems of Care: there are systems that encourage growth and reasoning without being extractive. These are the systems that expand the navigable action space instead of restricting it. These systems have centers that are willing to take on strain and bear cost to allow those within the system to undergo repair without saying "you must not examine the rules." I want to be clear here that there are rules that *should be in place* as mandates. The problem specifically arises when you are not allowed to question why the mandate exists at all.
If there is a point at which you think this essay encourages closing off that examination I would very much appreciate a specific argument of where you think that line was crossed.
I think that there's absolutely something to having different "modes" of yourself that you can occupy, like different archetypes that you have access to depending on your current needs and environment, but each of them are still 'you.' It's like looking at a light through stained glass; the glass in front can change but there's always the same light shining through.
Playing with the idea that identity is less of an "instantaneous I" of current experience and more like the continuity of experiential snapshots under the curve. Like how no individual frame is "the movie," but when you run them at 24 frames per second you get the experience of a film that emerges from the continuity.
Looks like a fascinating setup, and essentially the deck order is more or less a seed for the game state. Slay the Spire does something similar with the Daily Run which allows you to compare your run directly against other players who had the exact same setup. I take it there would be some kind of central ledger of starting seeds where the scores would be recorded?
Reading through the rules there is a slight point of variation in that if you've gone through the trouble to have a starting seed you might want to also fix the starting player if that's important to your comparisons. (Actually, reading through it again there's the line about choosing who goes first and then a couple lines down "All players take turns simultaneously." so I'm not actually sure how relevant the starting order is, if at all.) Still working out exactly how the dynamics of the game work from the rules text but I'd absolutely try it out if given the chance.
That's a good take: treating trust as “some kind of structured uncertainty object over futures” is very close to what I was gesturing toward because a bare scalar clearly isn’t sufficient.
On reflection, I have to admit I was using “trust” a bit loosely in the post. What it's become clear to me I’m really trying to model isn’t trust in the common usage sense (intentions, warmth, etc.), but something structural: roughly, how stable someone’s behavior is under visible strain, and who tends to bear the cost when things get hard. In my head it’s closer to a relational stability/reliability profile than trust per se, but trust had been the mental shorthand I was employing.
That’s also why I’d be a bit cautious about equating this model of trust with “how much I can constrain my uncertainty about them doing things I wouldn’t want.” Predictability and trust can come apart: I can have very low uncertainty that someone will reliably screw me over, but that doesn’t make them high-trust. I think that interpretation is actually right for the actual content of what I was describing in the post and the mismatch comes from my loose language (so thanks for this comment because it was the impetus to make a change I'd had kicking around for a minute.)
It seems like we need both a representation of a distribution over future behaviors/trajectories, and a way to mark which regions of that space are good for me/the system vs “bad”.
What's most important to me is modeling without needing to pretend to know someone's internals. The visibility/strain/cost/memory breakdown is my attempt at that: who shows up where, what pressures they’re under, who actually eats the cost, and how that pattern evolves over time.
All that said, I really like the intuition of “not a scalar but a distribution-like object.” In my head, what's coming together is something like a trajectory-based stability profile built from a few real-valued measurable signals rather than a full-blown complex wavefunction. I've got another post in the works that goes into more detail and once that's formalized soon I'm certainly open to revisiting the modeling to see where these concepts intersect.
I'm not at all surprised by the assertion that humans share values with animals. When you consider that selective pressures act on all systems (which is to say that every living system has to engage with the core constraints of visibility, cost, memory, and strain), it's not much of a leap to conclude that there would be shared attractor basins where values converge over evolutionary timescales.
Accesible/capable AI is also why teachers are going to have to stop grading on "getting the right answer" and start incorporating more "show your reasoning" questions in exams without access to AI. Education will have to adapt to this new technology like it's had to adapt to all new technology.
To be honest, done correctly this may actually be a net positive if we stop optimizing learners only for correct answers and instead focus on the actual process of learning.
I could see a class where the students were encouraged to explore a topic with AI and have to submit their transcript as part of the assignment; their prompts could then be reviewed (along with the AI answers to verify that there weren't any mistakes that snuck in). Could give a lot of insight into the way a student approached the topic and show where their gaps are. Not saying this is the ultimate solution, but it does seem better than throwing up one's hands in resignation.