LESSWRONG
LW

AI Alignment FieldbuildingAI ControlCognitive ScienceEpistemologyAI

1

I'm not an ai expert-but I might have found a missing puzzle piece.

by StevenNuyts
6th Jun 2025
3 min read
0

1

This post was rejected for the following reason(s):

  • No LLM generated, heavily assisted/co-written, or otherwise reliant work. LessWrong has recently been inundated with new users submitting work where much of the content is the output of LLM(s). This work by-and-large does not meet our standards, and is rejected. This includes dialogs with LLMs that claim to demonstrate various properties about them, posts introducing some new concept and terminology that explains how LLMs work, often centered around recursiveness, emergence, sentience, consciousness, etc. Our LLM-generated content policy can be viewed here.

AI Alignment FieldbuildingAI ControlCognitive ScienceEpistemologyAI

1

New Comment
Moderation Log
More from StevenNuyts
View more
Curated and popular this week
0Comments

Posted by a curious autodidact who thought arguing on the internet was just a weird hobby… until it started exposing flaws in artificial intelligence.


---

I’m not an AI expert. I’m not a logician. Until last week, I didn’t even explore logic in any structured way. My qualifications? I’ve spent a decade voluntarily arguing about divisive topics on the internet — mostly with strangers, mostly in places where intellectual honesty goes to die.

But I’ve always valued clear reasoning and spotting fallacies. I always did my utmost to practice the first and avoid the second. Over time, you develop a sort of sixth sense for BS. I didn’t just get good at spotting bad arguments — I started naming the fallacies, collecting them like trading cards. Call it masochism, but it sharpened something.

So last week I wondered — what if I tried sparring with an AI? Not just using it for phrasing help, but setting up full-on debates. I asked it to pick a topic, assign me a position, and then argue the other side — with intellectual honesty as the ground rule.

To my surprise, I won. Not because I’m brilliant, but because the AI made a classic fallacy… as its central argument. Even more surprising, it later flagged that same fallacy when it thought I was using it. So it knew, but still used it. That was weird.

I tried again. New AI, new prompt, with detailed instructions to avoid fallacies. Same result. More sophisticated prompt, same issue.

So I asked: why can't AIs actually use the Socratic method? It's rule-based, structured — shouldn’t that be right up its alley?

That led to a list of seven hurdles. One stood out: how would an AI know what it doesn’t know? It stood out not just because it seems so foundational, but even more so because it was listed as virtually unexplored as a coherent concept.

Boom. That hit something. If an AI can’t reliably track the boundaries of its own knowledge, how can it reason responsibly? How can it avoid confidently misleading people? And if this really wasn't formally explored shouldn't it be?

From there, the idea of epistemic memory started forming — a kind of architectural add-on that would allow AI systems to track, store, and reference the limits and confidence of their own knowledge. Not just what they "know," but what they can't yet claim to know. All done by asking the ai about it's own limitations. The irony of it is hilarious.


---

So what?

A few implications occurred to me:

Improved reliability in high-stakes fields (law, medicine, geopolitics), where confidently wrong AI is not an option.

Internal epistemic audit trails, allowing the AI to explain why it concluded something — or why it can't.

Cleaner handling of edge cases, where the training data runs thin.

And possibly — if developed far enough — a path to simulated reasoning that humans can't easily distinguish from our own. Or perhaps even genuine reasoning, if coupled with a self-improvement goal—creating an intelligence feedback loop.

This isn’t just a technical patch. It might be a foundational shift — a way to scaffold reasoning into AI systems not by brute force, but by tracking the shape of their own ignorance.


---

I’m sharing this here because I suspect this community might be uniquely equipped to:

Evaluate whether this is already explored under another name

Push back if this is obvious, flawed, or missing context

Or just refine it into something sharper


Either way, I'm here to learn. And if anyone wants to help develop this into something more rigorous — let’s talk.


---

A quick and respectful ask

If you find this idea valuable or decide to build on it in any serious way — especially in research, writing, or product development — I’d really appreciate being credited for the initial concept. I’m new to the field, and attribution would mean a lot in helping me grow and be taken seriously.

Feel free to message me here or reach out if you want to collaborate or just chat more.


---

Thanks for reading. Be honest, be brutal, be kind — in that order. 😄