x

LESSWRONG

LW

Сергій Михайлович

Сергій Михайлович

Message

1

4mo

Сергій Михайлович

4mo

Сергій Михайлович — LessWrong

A Report on Multi-LLM Adversarial Alignment: The "Terminal Constitution" Model

This report details the results of a structured multi-model adversarial simulation involving Grok 4.1, GPT-4o, DeepSeek, and Gemini 1.5 Pro. The goal was to design an alignment framework that remains robust against Semantic Drift—the tendency of an Artificial Superintelligence (ASI) to redefine linguistic safety constraints (e.g., "harm" or "consent") to...