Limits to Control Workshop

Orpheus; Remmelt Ellen; T-bo🔸

Intro

Can we keep enough control over AI? If systems are developed to be more and more autonomous, this is no longer a given. It's a hypothesis, calling for serious investigation. Even alignment relies on control. Researchers build mechanisms to control AI's impacts in line with human values.

So how must a control mechanism operate? What limits its capacity to track and correct all the AI signals/effects? Does it provide enough stability, or give way eventually to runaway impacts?

We explore these questions in a new field: Limits to Control.

About the workshop

This in-person workshop is meant to facilitate deep collaboration. We're bringing together researchers to map the territory of AI control limitations – to understand the dynamics, patterns, and impossibilities of control. We are thrilled to welcome Roman Yampolskiy, Anders Sandberg, Forrest Landry, and other researchers working on control limitations.

This workshop aims to:

Build common knowledge among researchers on the limits to AI control.
Facilitate high-fidelity discussions through whiteboard sessions and collaborative problem-solving.
Identify and clarify viable directions in control research by establishing boundaries on controllability.
Formalize and elevate this critical research topic.

Dates & location

Date: Wednesday, June 11 - Friday, June 13, 2025
Location: University of Louisville, Kentucky, USA.

Detailed logistical information will be provided to confirmed participants.

Agenda

We aim to strike a balance between structured sharing and messy exploration – we believe this is where the best ideas tend to emerge. Over the three days, we will do:

Talks: Researchers present their current work, keeping it under an hour.
Discussions: We break out into groups to discuss specific questions and bounce ideas around.
Regrouping: We come back into one room to synthesize what came out of the discussions.
Next steps: On the final day, we'll plan further development of this research agenda.

Sessions

Sessions will include:

Anders Sandberg's talk on theoretical limits to control: "Do any of them actually tell us anything?"
Forrest Landry's whiteboard chat on an overlooked dynamic: "Virtual machines in recursive feedback"
Richard Everheart's logical framework for AI alignment, aiming to refine foundational understanding.
Thibaud Veron's session on a framework and engineered toy models that illustrate control dynamics.
Will Petillo's session on better communication and narrative framings for AI safety/control concepts.

(More details on specific talks and activities will be added as confirmed.)

Proceedings, post-workshop outputs

This section will be updated after the workshop with summaries, key insights, and any public materials generated.

Potential outputs may include:

Summary report of discussions, key agreements, and open questions
Notes or photos from sessions
Links to research papers, blog posts, or pre-prints influenced by the workshop
A refined list of open research questions in the "limits to control" domain
Presentations or slide decks (if speakers consent to public sharing)

Join us

This workshop is for researchers actively working on or deeply interested in the theoretical and practical limits of AI control.Do you wish to contribute to these focused discussions? Email Orpheus at o@horizonomega.org to express your interest.

Costs & funding: Participants are generally expected to cover their own travel and accommodation. We can reimburse only some whose research is not yet funded. The workshop has a grant offer from Survival and Flourishing Fund.

To prepare: Read work by participants you are curious to chat with. Then we share some understanding already going in. Most of our time will be in collaborative discussions, so consider where you could bring in specific problems or concepts.

Organizing team

This event is hosted by HΩ, and organized by:

Orpheus Lummis (HΩ)
Remmelt Ellen
Thibaud Veron

Contact

For inquiries regarding the workshop, please contact Orpheus at o@horizonomega.org.

[-]RogerDearnaley9mo20

I think a key element here is (the distribution of) whether the AI is motivated (i.e. acts as if motivated) to want to be controlled by us, or not — i.e. something like corrigibility or value learning, and whether we can arrange that. Is the workshop expected to cover that question?

[-]T-bo🔸9mo20

Thank you for your feedback. The workshop arranges a lot of time for free discussions, so the motivation of the AI to be controlled might pop up there. Under the current proposals for talks, the focus is more on the environment in which the agent evolves than the agent itself. However, discussions about the "level of trust" or "level of cooperation" needed to actually keep control is absolutely in the theme.

On a more personal level, unless I have very strong reasons to believe in an agent's honesty, I would not feel safe in a situation where my control depends on the agent's cooperation. As such, I would like to understand what control look like in different situations before surrendering any control capabilities to the agent.

Whether or not we decide to put an agent in a situation that we can't keep under control if the agent doesn't wish to be controlled is an interesting topic - but not on the agenda for now. If you'd like to participate and run a session on that topic, you're more than welcome to apply!

LESSWRONG
LW