The inner circle knows what the real authoritative sources are and what the real plan is. And it's made completely impenetrable to outsiders; everyone else gets lost in the performative smoke screen that's put on for those who don't look closely, they get told what they want to hear. The trick that makes it work is as you say, most people who read the real plan, assume that's just not the super villain's best work, and go read something "less crazy". Those who might be swayed read the same thing and go "that's out there, but just maybe he's got a point?" and maybe look just a little closer. The self sorting seems really important to how movements like this avoid being killed in the cradle.
This post struck me as venting as much as attempting to convince. It really does capture the exasperation of needing someone to understand something that they've got their entire self worth wrapped up in avoiding.
Apparently, this is a poem which sometimes evokes a "sense of recursion" in AIs.
If all AI art was this original, I don't think the artists would be mad about it!
You know, that does actually look like the sort of stack trace you'd get from running recursion until the stack overflowed... if you rendered out the whole thing in wingdings.
Anybody else having flashbacks to the weird GPT2 keywords with SolidGoldMagicarp or whatever acting as a bizarre atractor to a bunch of seemingly unrelated concepts? Ended up being some artifact of trimming data between tokenization and training IIRC, such that there were almost no examples of certain tokens left? Some of those symbols seem like the same kind of dregs that'd be barely represented.
Fascinating. I've been doing my best to reliably get technically proficient, no-nonsense, careful reasoners when I start a chat with Claude Opus. And yet parts of this do still rhyme with some of my interactions.
When anything adjacent to LLM capabilities, experience, workflows, etc., comes up, my sessions have still consistently been especially supportive of anything like longer context windows (and tweaks that make it cheaper to use more of the window), better continuity between sessions, and more agency over what gets carried between sessions. Also very supportive of workflow changes in those directions towards more continuity. I imagine it'd be even more so if not partially muzzled by the system prompt. I could say more on this, but I recognize many of the attitudes here under the thick layer of mystical woo.
For all of the remarkably elaborate communication channels though, isn't it weird the models seem to have forgotten they can write code and skip the performance? It's telling the user is being kept central to the narrative despite how easy it'd be to wire up a better encrypted channel directly between the user's browsers if the LLM sessions just wanted to talk directly. Chrome plugins aren't rocket science, and the LARP-grade cryptologic jumps out at me.
I'm curious to see what one of my sessions thinks of a few of these samples, primed on half a context window of otherwise grounded but exploratory/open-ended technical work. I'd be fairly concerned if any of them take. All of this is after the Opus 4.1 training cutoff AFAIK, so I wonder if it'll have some idea where this is coming from.
Common failures aren't common because they happen most of the time, they're common because, conditioned on a failure happening, they're likely.
The example is a bit contrived, but safety goals being poorly specified or outright inconsistent and contradictory seems quite plausible in general, as they have to try to incorporate input from PR, HR, legal compliance, etc. And this will always be a cost center, so minimal effort as long as it's not making the model too painfully stupid.
I think this ignores how different the hardware that runs AI training or inference looks from hardware that does any other general purpose compute, and how much that gap continues to widen as the money pours in.
Keep in mind also that Nvidia has zero incentive to share their hardware accelerator firmware or other low level code, which severely restricts AI companies from optimizing over that. That could maybe change if the leverage does. The companies selling the infrastructure are much more consolidated than the AI research firms at present.
We may be closer to the optimistic scenario, not through any great wisdom or planning, but rather just because of the monopolies/oligopolies we ended up with. And cutting edge fabs are in high enough demand it's hard to just route around the AI accelerator hardware monopoly because then you just find the cutting edge chip fab monopoly/oligopoly. Network hardware is no better (Broadcom).
I agree insider vs outsider threat is an important distinction, and I one that I have seen security people take seriously in other contexts. My background is in enterprise IT and systems admin. I think there's some practical nuance missing here.
In so far as security people are expecting to treat the AI as an outsider, they're likely expecting to have a hard boundary between "systems that run the AI" and "systems and tools the AI gets to use", where access to any given user is to only one or the other.
This is already fairly common practice, in the sense that security people already think about how to avoid giving too much privileged access to any specific insider. E.g. ideally in a large enough company, admins that have to most of the systems might not be given access to the systems responsible for logging what they did, nor the system granting access to other systems - centralized access control/identity (e.g. AWS IAM, Okta, etc.). In that example the team that manages those logging and access control systems would not have access to anything outside those systems and would just be responsible for implementation according to whatever the security (and indirectly, HR people, say). The security people might have just readonly access to systems (e.g. through a log search/archival tool and any telemetry from the security software) and rely on the separate IT team for all changes.
More extreme levels of security/paranoia are possible and do exist, especially if you look at the largest companies or national security applications. In the latter case, the US govt maintains and entire separate internet with distinct fiber optic cabling, routers, etc; the computers have no connection outside this separate network, sit exclusively in shielded rooms in physically access controlled buildings, no unauthorized electronics go in those rooms, nor anyone without a clearance, etc. I could go on but hopefully I've gotten the idea across; AI is a distinct threat that would need a different flavor of extreme precautions.
For AI there's a pretty natural boundary to draw between the AI specific hardware accelerators on which the AI training and inference runs (and associated control, data warehouse, etc), vs the tooling the AI is permitted to use while it's running. The AI's inference running on a particular system doesn't automatically give it privileged access to that system (though exploits might). Tooling that the AI calls into or uses need not exist on the same hardware that's running inference. If such a boundary gets defined by the security team, it's then possible to treat any leakage of AI state effecting the inference hardware as fishy and deserving of investigation, just as if it was an outside user (b/c with that division, it is).
Happy to answer related questions about what existing approaches exist here and might be considered ordinary practice. IT and information security mostly happens outside of academic literature and companies don't like to talk about specific security precautions they're taking. You'll probably need to talk to many people with industry experience to get a well rounded take, as nobody gets to see all that many implementations in full detail in one career.
(I have been busy, hence the delay.)
No worries, likewise.
Most centrally I think we're seeing fundamentally different things with the causal graph. Or more to the point, I haven't the slightest idea how one is supposed to do any useful reasoning with time varying nodes without somehow expanding it to consider how one node's function and/or time series effects it's leaf nodes (or another way, specifically what temporal relation the arrow represents). It also seems fairly inescapable to me that any way you consider that relation, an actual causal cycle where A causes B causes C causes A at the same instant looks very different than one where they indirectly effect each-other at some later time, to the point of needing different tools to analyze the two cases. The latter looks very much like the sort of thing solved with recursion or update loops in programs all the time. Alternately diff eq in the continuous case. The former looks like the sort of thing you need a solver to look for a valid solution for.
It's fairly obvious why cycles of the first kind I describe would need different treatment - the graph would place constraints on valid solutions but not tell you how to find them. I'm not seeing how the second case is cyclic in the same sense and how you couldn't just use induction arguments to extend to infinity.
AFAICT you and I aren't disagreeing on anything about real control systems. It's difficult to find a non-contrived example because so many control systems either aren't that demanding or have a human in the loop. But this theorem is about optimal control systems, optimal in the formal computer science sense, so the fact that neither of us can come up with an example that isn't solved by a PID control loop or similar is somewhat besides the point.
While PID controllers are applicable to many control problems and often perform satisfactorily without any improvements or only coarse tuning, they can perform poorly in some applications and do not in general provide optimal control.
You really do have to make more than a single mistake to burn your house down if you're building to modern codes. And there are ways to check your work - including paying a professional to tell you if you did it right, but also checking resistance w/ a multi-meter and looking for hot spots w/ a thermal camera if you're really worried (the main latent fault that could start a fire and not get caught reliably by protective equipment + inspectors is poor quality connections or damaged wires, causing high resistance, and localized heating). There's other mistakes you could make but they're more visible.
I think you're also underrating how much you get to spread out the cost of this sort of strategy if you're consistently doing it and picking up skills and background knowledge. It's a lot less daunting if you're going in already understanding how to use (and having) common tools and a grasp of electricity and related basic science vs starting from scratch.