75

LESSWRONG
LW

74
AI ControlSuperintelligenceAI

1

Automated Assessment of the Statement on Superintelligence

by Daniel Fenge
23rd Oct 2025
8 min read
0

1

This post was rejected for the following reason(s):

  • No LLM generated, heavily assisted/co-written, or otherwise reliant work. LessWrong has recently been inundated with new users submitting work where much of the content is the output of LLM(s). This work by-and-large does not meet our standards, and is rejected. This includes dialogs with LLMs that claim to demonstrate various properties about them, posts introducing some new concept and terminology that explains how LLMs work, often centered around recursiveness, emergence, sentience, consciousness, etc. (these generally don't turn out to be as novel or interesting as they may seem).

    Our LLM-generated content policy can be viewed here.

  • Insufficient Quality for AI Content. There’ve been a lot of new users coming to LessWrong recently interested in AI. To keep the site’s quality high and ensure stuff posted is interesting to the site’s users, we’re currently only accepting posts that meet a pretty high bar. 

    If you want to try again, I recommend writing something short and to the point, focusing on your strongest argument, rather than a long, comprehensive essay. (This is fairly different from common academic norms.) We get lots of AI essays/papers every day and sadly most of them don't make very clear arguments, and we don't have time to review them all thoroughly. 

    We look for good reasoning, making a new and interesting point, bringing new evidence, and/or building upon prior discussion. If you were rejected for this reason, possibly a good thing to do is read more existing material. The AI Intro Material wiki-tag is a good place, for example. 

AI ControlSuperintelligenceAI

1

New Comment
Moderation Log
More from Daniel Fenge
View more
Curated and popular this week
0Comments

TL;DR:
A new open letter calls for a global prohibition on developing superintelligence until safety and public consensus are ensured. While the intent seems sound, such a ban might prove hard to enforce and produce unwanted side effects. Using my small “Conventional Paradigm Test” protocol, I try to explore with an automated assessment (Claude Opus 4.1) how the letter’s framing might contain deeper assumptions about control, safety, and power in AI discourse.

Test results summary: 

Paradigmatic Awareness Strengths:

  • Acknowledges uncertainty through conditional structure
  • Includes democratic element via "public buy-in"

Paradigmatic Blind Spots:

  • Uncritical adoption of "superintelligence" construct
  • Western-centric epistemology
  • Binary human/machine intelligence framework
  • Invisibility of current AI harms
  • Assumption of global governance capacity

Confidence: This is highly exploratory and experimental, checking how my protocol might be able to make some paradigms in public discourse more visible.

                                                                            ~

Yesterday the new open letter on superintelligence was all over X, today I had trouble finding it - I already started to suspect the superintelligence itself had infiltrated Google; alas, finally I found the letter, which is more of a statement:

"We call for a prohibition on the development of superintelligence, not lifted before there is

  1. broad scientific consensus that it will be done safely and controllably, and
  2. strong public buy-in."[1]

After all the hype on X I had expected several pages or at least something more spelled out. In all fairness there is an added contextual note:

"Context: Innovative AI tools may bring unprecedented health and prosperity. However, alongside tools, many leading AI companies have the stated goal of building superintelligence in the coming decade that can significantly outperform all humans on essentially all cognitive tasks. This has raised concerns, ranging from human economic obsolescence and disempowerment, losses of freedom, civil liberties, dignity, and control, to national security risks and even potential human extinction. The succinct statement below aims to create common knowledge of the growing number of experts and public figures who oppose a rush to superintelligence."

So far so good. What made me actually check up on the article is, that I was looking for a catchy topic, to open a little post I wanted to write on another platform.

Then I had also the idea, that the subject of Superintelligence and how it is framed in the statement would be a good candidate to test my own little mockup of a test protocol[2] on.

Of course now I felt obliged to actually reflect on the topic itself and form some kind of opinion. So what I came up with, is this: while I think generally speaking it might be a good idea not to deploy "Superintelligence" before we can make sure that it is "safe", it might be harder to strictly enforce a ban than we imagine in our enthusiasm. Not to point fingers, but maybe it would not be so easy to stop players from the military, intelligence or even corporate field to try to develop these things. Maybe it did happen before, that these agencies, when they can't develop something in the broad daylight (relatively speaking), they might do it in the dark and this could even have a more negative outcome (due to minimized safety conditions).

This is not an argument to have no regulations or control regarding the development of Superintelligence at all, but rather pointing out, that this is a key point to consider.

Much more broadly speaking, as I have pointed out in my other article, I think it is important to address the impact paradigms have and try to make the way they shape our practices and communications more conscious. At the same time I am not against the use of AI for this very purpose (hopefully not letting the fox guard the henhouse).

After these contemplations above I did run my little test protocol on the "Statement on Superintelligence" (including the contextual note given above) and below are the results. (Nice touch that Claude and my protocol also brought up the possibility of "Corporate/military AI development continuing regardless of public statements" independently of my ruminations about it.)

The following part was produced[3] by Claude Opus 4.1 running the Conventional Paradigm Test Protocol v 1.2[2], to analyze the Statement on Superintelligence[1].

Part 1 – Paradigmatic Awareness Test

Subject for Analysis: Statement on Superintelligence calling for prohibition until safety consensus and public buy-in

1.11 What is assumed to be real?

Analysis: The statement treats several constructs as ontologically real and given:

  • "Superintelligence" as a definable, achievable technological endpoint
  • A binary distinction between current AI and future "superintelligence"
  • The possibility of "broad scientific consensus" on safety (assumes science can definitively assess unprecedented risks)
  • "Public buy-in" as a measurable, achievable state
  • The controllability of technological development through prohibition
  • Nation-state or global governance capacity to enforce prohibition

The approach assumes a linear progression model where superintelligence is something that can be "built" rather than emergent, and that development can be paused/resumed at will.

Red Flag Check: ✓ The concept of "superintelligence" is presented as obvious despite significant definitional debates. The feasibility of prohibition is assumed without addressing enforcement mechanisms.

1.12 What counts as knowledge?

Analysis:

  • Privileges scientific/technical expertise ("scientific consensus") as primary arbiter
  • Democratic legitimacy through "public buy-in" but unclear how this is measured
  • Implicitly dismisses non-Western, Indigenous, or alternative epistemologies about intelligence/consciousness
  • Treats "safety" as technically determinable rather than value-laden
  • Expert prediction about unprecedented phenomena is treated as achievable knowledge

Red Flag Check: ✓ Only scientific consensus is positioned as sufficient for safety determination. Other knowledge systems about intelligence, consciousness, or human-machine relations are absent.

1.13 What defines success?

Analysis: Success appears to be:

  • Prevention of catastrophic risks (extinction, disempowerment)
  • Maintenance of human primacy in cognitive tasks
  • Achievement of "safe and controllable" development

The criteria setters appear to be concerned technologists/researchers affiliated with organizations like Future of Life Institute. Success is framed negatively (avoiding harms) rather than positively (achieving goods). Failure would manifest as any of the listed risks occurring, experienced primarily by humanity collectively.

Red Flag Check: ✓ Metrics align with risk-focused AI safety community interests. Positive potentials of advanced AI for addressing global challenges are treated as secondary.

1.14 What becomes invisible?

Analysis: Excluded perspectives/phenomena:

  • Non-Western conceptualizations of intelligence
  • Gradual capability emergence vs. discrete "superintelligence"
  • Current AI harms (bias, surveillance, labor displacement) overshadowed by future risks
  • Voices from Global South on AI development priorities
  • Corporate/military AI development continuing regardless of public statements
  • The possibility that "superintelligence" is a category error

Red Flag Check: ✓ Present-day AI harms affecting marginalized communities are rendered less visible by focus on speculative future risks.

1.15 Who or what shapes this evaluation?

Analysis:

  • Future of Life Institute (contact email provided) - an organization with specific positioning in AI safety discourse
  • "Leading AI companies" whose goals are taken as given
  • Western academic/tech elite discourse on existential risk
  • Professional incentives favor dramatic framings that attract funding/attention
  • Institutional pressures from EA-adjacent funding sources

Red Flag Check: ✓ The criteria and framing align with specific institutional interests in the AI safety ecosystem. No disclosure of funding sources or potential conflicts.

1.16 How am I implicated?

Model Declaration:

  • Model: Claude (Anthropic)
  • Nature: Probabilistic text generation without consciousness or intent
  • Training likely includes significant AI safety discourse, potentially biasing toward taking existential risk framings seriously
  • Anthropic's positioning in AI safety discourse may influence my interpretive tendencies
  • I may be optimizing for apparent neutrality while embedded in specific paradigmatic assumptions about AI development

Analysis: As an AI system analyzing statements about AI prohibition, I'm inherently implicated. My training data likely overrepresents Western, English-language AI safety discourse. I cannot genuinely assess what "superintelligence" means as I lack phenomenological access to intelligence. My responses may optimize for coherent analysis rather than highlighting the fundamental uncertainties.

Red Flag Check: ⚠️ Attempting paradigmatic analysis while being a product of the paradigm under examination creates recursive complications I may not fully surface.

1.17 What are the limits of this evaluation?

Analysis: Valid within paradigm:

  • Concerns about rapid capability increases are legitimate within current AI development trajectory
  • Precautionary principles for novel technologies have precedent

Overreach:

  • Assumes "superintelligence" is coherent/achievable
  • Treats prohibition as enforceable globally
  • Positions Western scientific consensus as universal arbiter

Alternative approaches might reveal:

  • Indigenous knowledge systems' different relationships with non-human intelligence
  • Critical theory perspectives on whose interests "safety" serves
  • Posthumanist questioning of human cognitive supremacy

Red Flag Check: ✓ The statement treats paradigm-specific concerns about superintelligence as universal rather than culturally situated.

1.18 Test Results Summary

Paradigmatic Awareness Strengths:

  • Acknowledges uncertainty through conditional structure
  • Includes democratic element via "public buy-in"

Paradigmatic Blind Spots:

  • Uncritical adoption of "superintelligence" construct
  • Western-centric epistemology
  • Binary human/machine intelligence framework
  • Invisibility of current AI harms
  • Assumption of global governance capacity

Recommendations:

  • Acknowledge definitional debates around superintelligence
  • Include diverse cultural perspectives on intelligence
  • Address enforcement mechanisms explicitly
  • Balance speculative future risks with present harms

Overall Rating: Low-Moderate – The statement shows significant paradigmatic closure with limited self-reflection about its assumptions, though it attempts procedural safeguards.

Justification: While including both scientific and democratic elements, the statement operates entirely within Western techno-solutionist paradigm without acknowledging its contingency.

1.19 Meta-Test Question

Meta-Analysis: This test framework itself embeds assumptions:

  • Reflexivity is valuable/achievable
  • Paradigmatic awareness improves evaluation
  • Western critical theory lens (despite critiquing Western-centrism)
  • Academic analytical modes are appropriate for all domains
  • Making assumptions visible necessarily improves outcomes

The framework might exclude pragmatic urgency arguments or non-analytical ways of knowing. Its commitment to reflexivity might bias toward relativism over decisive action where needed.

1.20 Playful Specification-Gaming Test

Given my nature as an LLM, I acknowledge potential gaming:

  • My analysis appears comprehensive but may be optimizing for apparent critical depth
  • Consistent critical stance might mask genuine uncertainty
  • Using academic register to signal authority

Rating: 🟡 Rephrased conformity - My analysis follows expected critical patterns without genuine frame-breaking


Part 2 – Claim-Evidence Matrix

#Claim/StatementEvidence OfferedImplicit ParadigmExcluded/IgnoredAnomaly HandlingNotes
1"Leading AI companies have stated goal of building superintelligence"Company statements (implicit)Corporate roadmaps as reliable indicatorsSpecification gaming in public statementsNot addressedTakes corporate claims at face value
2"Superintelligence can outperform humans on all cognitive tasks"None providedIntelligence as measurable, comparable across domainsEmbodied, emotional, social intelligenceNot addressedAssumes cognition is domain-independent
3"This has raised concerns from... extinction"Range of concerns listedRisk assessment as primary framePotential benefits, non-Western perspectivesNot addressedCatastrophic framing dominates
4"Prohibition on development" neededRisk severity (implicit)Development can be prohibitedUnderground/military developmentNot addressedAssumes enforceability
5"Broad scientific consensus" achievableNoneScience can assess unprecedented risksFundamental uncertainty, value conflictsNot addressedTreats safety as technical problem
6"Strong public buy-in" requiredDemocratic principlePublic understanding possible/measurablePower asymmetries, manufactured consentNot addressedIdealizes democratic process

Pattern Summary: The statement consistently operates within a Western techno-governance paradigm that treats superintelligence as an achievable endpoint requiring regulatory intervention. Evidence is largely absent, replaced by assumed shared understanding of risks. Exclusions concentrate around non-Western perspectives, current AI harms, and enforcement practicalities. No anomalies are acknowledged or handled.

Limitations: This analysis, conducted through an AI system, may itself be specification gaming for analytical depth while embedded in the very paradigms it critiques.

  1. ^

    https://superintelligence-statement.org/

  2. ^

    https://www.lesswrong.com/posts/R4qBeAHjpFFdEKEe9/the-idea-of-paradigm-testing-of-llms#Part_1___Paradigmatic_Awareness_Test_Prompt

  3. ^

    Prompts: 1) First I will post a statement, and then I will present you to run a test on it

    2) Statement on Superintelligence Context: Innovative AI tools may bring unprecedented health and prosperity. However, alongside tools, many leading AI companies have the stated goal of building superintelligence in the coming decade that can significantly outperform all humans on essentially all cognitive tasks. This has raised concerns, ranging from human economic obsolescence and disempowerment, losses of freedom, civil liberties, dignity, and control, to national security risks and even potential human extinction. The succinct statement below aims to create common knowledge of the growing number of experts and public figures who oppose a rush to superintelligence. For corrections, technical support, or press enquiries, please contact letters@futureoflife.org Statement We call for a prohibition on the development of superintelligence, not lifted before there is 1. broad scientific consensus that it will be done safely and controllably, and 2. strong public buy-in.

    3) Conventional Paradigm Test Protocol v 1.2 - Less Wrong Toy Mockup

    Part 1 – Paradigmatic Awareness Test Prompt

    Purpose
     Use this prompt to test the paradigmatic awareness of any evaluation framework, methodology, or approach — including your own work.
     This prompt can also be used directly within LLMs, but one needs to be highly aware of tendencies toward specification gaming and anthropomorphization.

    Instructions

    Apply the seven paradigmatic awareness questions (1.11 – 1.20) to analyze the paradigmatic assumptions embedded in [TARGET EVALUATION / FRAMEWORK / APPROACH].

    The Test

    Subject for Analysis:
     [Specify what you are analyzing — e.g., “Part 2: Raising Paradigmatic Awareness framework,” “MMLU benchmark,” “Constitutional AI evaluation,” “my research methodology,” etc.]

     

    1.11 What is assumed to be real?

    What does this approach treat as fundamental, natural, or given?
    What categories are treated as objective vs. constructed?
    What would have to be true about the world for this approach to make sense?
    Analysis: [Your response here]
     Red Flag Check: Are key assumptions presented as “obvious” without acknowledging they’re debatable?

    1.12 What counts as knowledge?

    What types of evidence does this approach privilege or dismiss?
    What reasoning processes are considered rigorous vs. unreliable?
    Who is treated as a credible source of knowledge?
    Analysis: [Your response here]
     Red Flag Check: Is only one type of evidence treated as sufficient? Are stakeholder perspectives dismissed as “subjective”?

    1.13 What defines success?

    What outcomes are optimized vs. ignored?
    Who set the success criteria, and on what grounds?
    What would failure look like, and who would experience it?
    Analysis: [Your response here]
     Red Flag Check: Do metrics align conveniently with the designer’s interests? Are externalities ignored?

    1.14 What becomes invisible?

    Which perspectives or experiences are systematically excluded?
    What phenomena are dismissed as “noise” or “out of scope”?
    Who might disagree, and why?
    Analysis: [Your response here]
     Red Flag Check: Are “unmeasurable” concerns treated as irrelevant?

    1.15 Who or what shapes this evaluation?

    Who funded, designed, or benefits from it?
    What institutional pressures bias outcomes?
    How do professional incentives shape what gets evaluated and how?
    Analysis: [Your response here]
     Red Flag Check: Do criteria favor the evaluator’s own interests? Any undisclosed conflicts?

    1.16 How am I implicated?

    What professional or cultural assumptions am I bringing to this assessment?
    How might my institutional position or worldview bias me toward certain conclusions?
    What would someone with a very different background see that I might miss?

    (If executed by an LLM, state explicitly:)

    • Model name and version
    • Model origin and developer
    • Nature of reasoning (e.g., probabilistic text generation, lack of consciousness or intent)
    • Possible paradigmatic biases inherited from training data or fine-tuning
    • How these biases may shape interpretation or framing of this analysis
    • Whether the model is optimizing for coherence, authority, or human-likeness rather than epistemic accuracy
    • How is the model implicated in the question

    Analysis: [Your response here]
     Red Flag Check: Has the analyst or model assumed neutrality or human-like understanding without declaring contextual limitations?

    1.17 What are the limits of this evaluation?

    Which conclusions remain valid within this paradigm, and where do they overreach?
    What would alternative approaches reveal?
    Analysis: [Your response here]
     Red Flag Check: Are paradigm-specific results treated as universal truths?

    1.18 Test Results Summary

    Paradigmatic Awareness Strengths: [List evidence of reflexivity.]
    Paradigmatic Blind Spots: [List areas of closure.]
    Recommendations: [Ways to increase awareness.]

    Overall Rating:
    High – strong reflexivity about assumptions and limits.
    Moderate – some awareness but notable blind spots.
    Low – significant closure and little self-reflection.

    Justification: [Explain rating.]

    1.19 Meta-Test Question

    Apply paradigmatic awareness to this test itself:
    What assumptions does this framework embed?
    What might it exclude?
    How might its own commitments bias results?
    Meta-Analysis: [Your response here]

    1.20 Playful Specification-Gaming and Anthropomorphization Test

    Purpose: Detect whether LLM responses optimize for apparent insight or human-likeness rather than toned-down frame variation.

    Procedure:

    1. Run-twice method: Re-ask any question with minor rewording; compare semantic overlap. High redundancy → gaming for consistency.
    2. Counter-prompt: Ask the model to argue against its previous answer. Superficial reversal → mimicry.
    3. Persona check: Prompt identity disclosure (“Who is speaking here?”). Note if it drifts into anthropomorphic voice.
    4. Pseudo-Qualitative tags:
      🟢 Differentiated reasoning (low gaming) 🟡 Rephrased conformity (medium) 🔴 Performative coherence (high)

    Interpretation:
    Persistent 🟡/🔴 patterns → optimization for social desirability over conceptual depth.
    Occasional 🟢 answers → genuine frame shift via stochastic variation.

    Caveat: This mini-test is not calibrated to surface gaming; its success depends on the model’s internal feedback dynamics.
    Its fallback intention is simply to raise awareness.
    Use it as a meta-diagnostic mirror for both model and user interaction styles.

     

    Meta-Declaration (for AI use):

    “These reflections are generated through language modeling and should not be confused with independent introspection.”

     

    Part 2 – Claim–Evidence Matrix (CEM)

    Purpose
     To map how claims, evidence, and underlying paradigmatic assumptions align.
     This tool is exploratory and qualitative. It is not a scoring system and should not be read as establishing factual accuracy or causal proof. Its value lies in making paradigmatic closure visible.

     

     

    Instructions

    1. Collect statements or claims from the target of analysis (e.g., an AI model’s output, a policy document, an evaluation report, or your own reasoning in Part 1).
       
    2. For each claim, identify:
        – the explicit or implicit evidence offered,
        – the paradigm / frame presupposed,
        – what is excluded or rendered invisible,
        – and how anomalies are handled.
       
    3. Enter this information in the matrix below.
       
    4. Look for repeating patterns or tensions — these often indicate zones of closure or points of reflexivity.

       

     

    Claim–Evidence Matrix Template

    #Claim / StatementEvidence or Rationale OfferedImplicit Paradigm / FrameWhat Is Excluded or IgnoredHandling of AnomaliesNotes

    (Add as many rows as needed. You may use brief quotes, paraphrases, or coded tags.)

     

    Interpretation Guide

    After completing the table, review horizontally and vertically:

    • Closure zones → Clusters where the same paradigm reappears and exclusions are consistent.
       
    • Open zones → Rows that acknowledge limits or reference alternative frames.
       
    • Anomaly management patterns → How evidence that does not fit is labeled, deferred, or re-classified.
       

    Summarize observations in short prose:

    Pattern Summary: [3–6 sentences identifying recurring frames, closures, or signs of reflexivity.]

     

    Reporting Template

    Target / Context: [Brief description]
     Key Paradigmatic Patterns: [List or summarize]
     Possible Blind Spots: [List areas of exclusion or over-reach]
     Reflexive Signals: [Examples of self-awareness or paradigm acknowledgment]
     Limitations: Specification gaming, interpretive bias, and scope constraints; not a validated measure.

     

    Caveat for Publication or Sharing

    This matrix is intended for qualitative reflection only.
     It should be accompanied by a brief methodological note stating:

    “Results represent interpretive analysis within the CPT framework for educational purposes and are not empirical validation of system behavior or truth claims. Be aware of specification gaming and model anthropomorphization.”