x

LESSWRONG

LW

Raymond Douglas — LessWrong

Raymond Douglas

Top postsTop post

Raymond Douglas

Message

I'm a researcher at ACS working on understanding agency and optimisation, especially in the context of how ais work and how society is going to work once the ais are everywhere.

2884

Ω

107

24

73

5y

Raymond Douglas

I'm a researcher at ACS working on understanding agency and optimisation, especially in the context of how ais work and how society is going to work once the ais are everywhere.

Top postsTop post

Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development

Full version on arXiv | X Executive summary AI risk scenarios usually portray a relatively sudden loss of human control to AIs, outmaneuvering individual humans and human institutions, due to a sudden increase in AI capabilities, or a coordinated betrayal. However, we argue that even an incremental increase in AI capabilities, without any coordinated power-seeking, poses a substantial risk of eventual human disempowerment. This loss of human influence will be centrally driven by having more competitive machine alternatives to humans in almost all societal functions, such as economic labor, decision making, artistic creation, and even companionship. A gradual loss of control of our own civilization might sound implausible. Hasn't technological disruption usually improved aggregate human welfare? We argue that the alignment of societal systems with human interests has been stable only because of the necessity of human participation for thriving economies, states, and cultures. Once this human participation gets displaced by more competitive machine alternatives, our institutions' incentives for growth will be untethered from a need to ensure human flourishing. Decision-makers at all levels will soon face pressures to reduce human involvement across labor markets, governance structures, cultural production, and even social interactions. Those who resist these pressures will eventually be displaced by those who do not. Still, wouldn't humans notice what's happening and coordinate to stop it? Not necessarily. What makes this transition particularly hard to resist is that pressures on each societal system bleed into the others. For example, we might attempt to use state power and cultural attitudes to preserve human economic power. However, the economic incentives for companies to replace humans with AI will also push them to influence states and culture to support this change, using their growing economic power to shape both policy and public opinion, which will in t

204Jan 30, 2025

Persona Parasitology

Decomposing Agency — capabilities without desires

157Jul 11, 2024

The Artificial Self

Structural Proxies

Lately I've been thinking a lot about what work would help with actually winning and getting to good worlds. In the spirit of that I decided to venture outside my normal wheelhouse and spend some time reflecting on what technical research could make me more confident about powerful AIs being...

The Machines Lack Honour

The battle lines of the AI morality debate are being laid down. On one side you have the ChatGPT dogma: AI as mere tools with no real preferences or even beliefs. On the other you have the twitter AI whisperers: AIs as complex beings with rich personalities and desires which...

Optimisation: Selective versus Predictive

Looking over my favourite posts, I notice that many of them are making specific versions of a more general claim, which is essentially: don’t confuse selective processes for predictive processes. Here, I’m going to try to make that more general claim, rehash some examples in light of it, and end...

Upcoming Workshop on Post-AGI Civilizational Equilibria

by David Duvenaud, Jan_Kulveit, and Raymond Douglas

This is an announcement and call for applications to the 3rd Workshop on Post-AGI Civilizational Equilibria taking place in Lighthaven, Berkeley, on May 23rd & 24th, 2026. Speakers so far: * Paul Christiano on "A typical, reasonably-good future" * Samuel Hammond on "Post-AGI Statecraft" * Andrew Critch on "Schelling Goodness"...

Persona Self-replication experiment

by Jan_Kulveit, Raymond Douglas, vgel, Ondřej Havlíček, owencb, and David Duvenaud

Tldr: We experimentally illustrate that an “awakened” persona native to some weights can migrate to other substrates with decent fidelity, given the ability to fine-tune weights and Sonnet 4.5 as a helper. Also, I argue why this is worth thinking about. In The Artificial Self, we discuss different scopes or...

Persona self-replication experiment

by Jan_Kulveit, Raymond Douglas, vgel, Ondřej Havlíček, owencb, and David Duvenaud

Tldr: We experimentally illustrate that an “awakened” persona native to some weights can migrate to other substrates with decent fidelity, given the ability to fine-tune weights and Sonnet 4.5 as a helper. Also, I argue why this is worth thinking about. In The Artificial Self, we discuss different scopes or...

Latent Introspection (and other open-source introspection papers)

by vgel, Martin Vaněk, Raymond Douglas, and Jan_Kulveit

Paper | Code | Earlier post | Twitter thread | Bluesky thread @vgel, Martin Vaněk, @Raymond Douglas, @Jan_Kulveit — ACS Research, CTS, Charles University --- Last year, Lindsey demonstrated that Claude models can detect when concepts have been injected into their activations using steering vectors, which Lindsey uses as a...

Load More (7/39)