x

LESSWRONG

LW

tamas.bartha — LessWrong

tamas.bartha

tamas.bartha

Message

-10

2

2mo

tamas.bartha

-10

2mo

The Architectural Gap: Why Capability and Refusal Share Source

A structural account of agency, with implications for alignment The current alignment program implicitly treats capability-installation and refusal-prevention as separable problems: install the capability, prevent the refusal-class behaviors via training, monitoring, or RLHF. I want to argue that this is structurally unavailable in the regime where systems satisfy the conditions...

Agent Ontology: A Constraint-Based Approach

This document (original source: https://github.com/tamasbartha/AgentOntology) outlines an agent ontology and the subsequent requirements for agents efficient in survival. A key innovation of this ontology and deduction is the inversion of the core postulate of Karl Friston’s Free Energy Principle (FEP). While the FEP suggests that agents act to minimize the...