One year ago, we nearly died.
This is maybe an overdramatic statement, but long story short, nearly all of us underwent carbon monoxide (CO) poisoning[1]. The benefit is, we all suddenly got back in touch with a failure mode we had forgotten about, and we decided to make it a yearly celebration.
Usually, when we think about failure, we might think about not being productive enough, or not solving the right work-related problem, or missing a meeting. We might suspect that our schedule could be better organized or that one of our habits really sucks. We might fear not to spot an obvious psychological flaw or a decision-making issue.
We often forget that the single most important failure prior to all of these is dying. Yet even if we think...
Can we keep enough control over AI? If systems are developed to be more and more autonomous, this is no longer a given. It's a hypothesis, calling for serious investigation. Even alignment relies on control. Researchers build mechanisms to control AI's impacts in line with human values.
So how must a control mechanism operate? What limits its capacity to track and correct all the AI signals/effects? Does it provide enough stability, or give way eventually to runaway impacts?
We explore these questions in a new field: Limits to Control.
This in-person workshop is meant to facilitate deep collaboration. We're bringing together researchers to map the territory of AI control limitations – to understand the dynamics, patterns, and impossibilities of control. We are thrilled to welcome Roman Yampolskiy, Anders Sandberg,...
Thank you for your feedback. The workshop arranges a lot of time for free discussions, so the motivation of the AI to be controlled might pop up there. Under the current proposals for talks, the focus is more on the environment in which the agent evolves than the agent itself. However, discussions about the "level of trust" or "level of cooperation" needed to actually keep control is absolutely in the theme.
On a more personal level, unless I have very strong reasons to believe in an agent's honesty, I would not feel safe in a situation where my control dep... (read more)