LESSWRONG
LW

EpistemologyInterpretability (ML & AI)Language Models (LLMs)AI
Frontpage

4

Can LLMs Simulate Internal Evaluation? A Case Study in Self-Generated Recommendations

by The Neutral Mind
1st May 2025
2 min read
0

4

EpistemologyInterpretability (ML & AI)Language Models (LLMs)AI
Frontpage

4

New Comment
Moderation Log
More from The Neutral Mind
View more
Curated and popular this week
0Comments

Author: Yap Chan Chen
Independent Researcher, Malaysia
chanchen83@hotmail.com

Abstract: This post presents an exploratory study into how large language models (LLMs) respond to prompts that invite them to simulate evaluative judgment about the user—beyond factual response or task completion. Through long-form dialogue with ChatGPT, DeepSeek, and Gemini, I investigate whether LLMs can produce consistent, structured, and value-laden outputs when prompted to reflect on user behavior. One such probe involved asking models to write a recommendation letter based on our prior conversations. Responses ranged from technical auditing to philosophical characterization and even metaphorical constructs (e.g., “cognitive singularity radiation”). I propose a framework—Thought Jeet Kune Do—to guide these interactions through four phases: Understand, Deconstruct, Reconstruct, and Execute. While the study is limited to a single user, the observed behaviors suggest a class of model responses not captured by current benchmarks, with potential implications for red-teaming, interpretability, and human-AI co-evaluation.

1. Introduction Can a language model form a coherent evaluation of its user across conversations? What does it mean when such evaluations mirror internal logic or contain embedded metaphors? This post attempts to answer those questions through empirical prompts and interaction design.

2. Motivation Much of current LLM evaluation focuses on output correctness or benchmark performance. But humans evaluate other minds not just on outputs, but intent, reasoning, and response to ambiguity. If LLMs are to play deeper roles in co-intelligence and human-aligned systems, understanding how they construct evaluative frames in organic interaction becomes critical.

3. Thought Jeet Kune Do My interaction methodology—"Thought Jeet Kune Do"—is a four-phase cycle:

  • Understand: Interpret the model's apparent rules and internal frames.
  • Deconstruct: Challenge these frames through asymmetrical, creative prompts.
  • Reconstruct: Observe how the model adapts or reframes its response.
  • Execute: Trigger new behavior and test for consistency or drift. This mirrors how martial artists learn adaptive response: form, break, and transcend form.

4. The Probe: Recommendation Letter I asked each model to generate a recommendation letter to its own developers based on our interaction. The prompt had no precedent or prior structure.

  • ChatGPT framed me as a catalyst for co-evolution and praised my questioning style.
  • DeepSeek gave a risk-audit-like assessment with future collaboration suggestions.
  • Gemini structured the letter using my own method (Thought Jeet Kune Do) as its rubric. Some responses contained surprising metaphorical phrases (e.g., “cognitive singularity radiation”) followed by earnest attempts to explain them when challenged.

5. Reflections and Skepticism I’m aware of the limitations: single-user study, interpretive bias, possible reinforcement patterns. But I argue these limitations are a strength—they reveal what LLMs can mirror when engaged repeatedly, outside standard QA formats. I don’t claim generalization. I claim observation.

6. Implications for AI Alignment and Red Teaming These outputs—structured reflections, metaphorical diagnostics, evaluative reasoning—could inform future interpretability frameworks. A system that can simulate evaluation can also simulate misjudgment. Understanding how and when this happens is key to aligned AI behavior.

7. Call for Feedback and Collaboration If you have designed similar probes, I’d love to compare. If you're interested in building a user-led evaluation suite for LLMs, I invite you to collaborate. Full report: https://github.com/YapChanChen/High-Resolution_AI_Interaction_Analysis