Delegated agents in practice: How companies might end up selling AI services that act on behalf of consumers and coalitions, and what this implies for safety research

I'm brainstorming ways this post may be off the mark. Curious if you have any :)

You can personalise an AI service across some dimensions that won’t make it more resemble an agent acting on a person’s behalf (or won't meet all criteria of 'agentiness')
- not acting *over time* - more like a bespoke tool customised once to a customer’s preferred parameters, e.g. a website-builder like wix.com
- an AI service personalising content according to a user’s likes/reads/don't show clicks isn't agent-like
- efficient personalised services will be built on swappable modules and/or shared repositories of consumer preference components and contexts (meaning that the company never actually runs an independent instantiation of the service)
Personalisation of AI services will fall short of delegated agents except in a few niches because of lack of demand or supply
- a handful of the largest software corporations (FAAMG, etc.) have locked in customers into networks and routines but are held back from personalising customer experiences because they tend to rely on third-party revenue streams
- it's generally more profitable to specialise in and market a service that caters to either high-paying discerning customers, or a broad mass audience that's basically okay with anything you give them
- too hard to manage mass customisation or not cost-effective compared to other forms of business innovation
- humans are already well-adapted and trained for providing personalised services; AI can compete better in other areas
- humans already have very similar preferences within the space of theoretical possibilities – making catering to individual differences less fruitful than you'd intuitively think
- it’s easier to use AI to shape users to have more homogenous preferences than to cater to preference differences
- eliciting human preferences takes up too much of the user's attention and/or runs up against too many possible interpretations (based on assumptions of user's rationality and prior knowledge, as well as relevant contextual cues) to work
- you can make more commercial progress by designing and acclimatising users to a common interface that allows those users to meet their diverging preferences themselves (than to design AI interfaces that elicits the users' preferences and acts on their behalf)
- software engineers need a rare mix of thing- and person-oriented skills to develop delegated agents
- a series of bad publicity incidents impede further development (analogous to self-driving car crashes)
- data protection or anonymisation laws in Europe and beyond limit personalisation efforts (or further down the line, restrictions on autonomous algorithms do)
- doesn’t fit current zeitgeist somehow in high-income nations
Research directions aren't priorities
- Advances in preference learning will be used for other unhelpful stuff (just read Andrew Critch's post)
- Research on how much influence delegated agents might offer can, besides being really speculative, be misused or promote competitive dynamics
Context assumptions:
- Delegated agents will be developed first inside say military labs (or other organisational structures in other places) that involve meaningfully dissimilar interactions than at a Silicon Valley start-up.
- Initial contexts in which delegated agents are produced and used really don’t matter for how AI designs are deployed in later decades (something like, it’s overdetermined)
Conceptual confusion:
- Terms in this post are ambiguous or used to refer to different things (e.g. general AI 'tasks' vs. 'tasks' humans conceive and act on, 'service' infrastructure vs. online 'service' aimed at human users, 'virtual assistant' conventionally means a remote human assistant, 'model')
- An ‘AI agent’ is a vague, leaky concept that should be replaced with more exacting dimensions and mechanisms
- Carving out humans and algorithms into separate individuals with separate ‘preferences’ is a fundamentally impoverished notion. This post assumes that perspective and therefore fosters mistaken/unskillful reasoning.

A scenario

A ‘pure’ delegated agent may start out as a personal service hosted through an encrypted AWS account. Wealthy, tech-savvy early adopters pay a monthly fee to use it as an extension of themselves – to pre-process information and automate decisions on their behalf.

The start-up's founders recognise that their new tool is much more intimate and intrusive than good ol' GMail and Facebook (which show ads to anonymised user segments). To market it successfully, they invest in building trust with target users. They design the delegated agent to assuage their user's fears around data privacy and unfeeling autonomous algorithms, leave control firmly in the user's hands, explain its actions, and prevent outsiders from snooping or interfering in how it acts on the user’s behalf (or at least give consistent impressions thereof). This instils founder effects in terms of the company's core expected design and later directions of development.

Research directions that may be relevant to existential safety

Narrow value learning: Protocols for eliciting preferences that are user time/input-efficient, user-approved/friendly and context-sensitive (reducing elicitation fatigue, and ensuring that users know how to interact and don’t disengage). Models for building accurate (hierarchical?) and interpretable (semi-symbolic?) representations of the user’s preferences on the fly within the service’s defined radius of influence.

Defining delegation: How to define responsibility and derive enforceable norms in cases where a person and an agent acting on its behalf collaborate on exercising control and alternate in the taking of initiative?

Heterogeneity of influence: How much extra negotiation power or other forms of influence does paying extra for a more sophisticated and computationally powerful delegated agent with more access to information offer? Where does it make sense for groups to pool funds to pay for a delegated agent to represent shared interests? To what extent does being an early mover or adopter in this space increase later influence?

Governance and enforcement: How to coordinate the distribution of punishments and rewards to heterogeneous delegated agents (and to the users who choose which designs to buy so they have skin in the game) such that they steer away from actions that impose negative externalities (including hidden systemic risks) onto other, less-represented persons and towards cooperating on creating positive externalities?
See this technical paper if that question interests you.

Emergence of longer-term goals: Drexler argues for a scenario where services are developed that complete tasks within bounded times (including episodic RL).
Will a service designed to act on behalf of consumers or coalitions converge on a bounded planning horizon? Would the average planning horizon of a delegated agent be longer than that of ‘conventional’ CAIS? How would stuff like instrumental convergence and Goodharting look like in a messy system of users buying delegated agents that complete tasks across longer time horizons but flexibly elicit and update their model of the users’ preferences and enforcers’ policies?

[-]ozziegooen5y30

I find this interesting, thanks for working on it. I’ve been thinking about similar things for a while and have heard related discussions, but I’m happy to have more standardized terminology and the links to existing literature.

I am more interested in how this could be used improve our thinking abilities for broad range of valuable purposes, rather than on the implications specifically for them to be unsafe.

[-]Remmelt5y10

Sure! I'm curious to hear any purposes you thought of that delegated agents could assist with.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

7

Delegated agents in practice: How companies might end up selling AI services that act on behalf of consumers and coalitions, and what this implies for safety research

7

7

A scenario

Research directions that may be relevant to existential safety