Remmelt Ellen


Delegated agents in practice: How companies might end up selling AI services that act on behalf of consumers and coalitions, and what this implies for safety research

I'm brainstorming ways this post may be off the mark. Curious if you have any :)

  • You can personalise an AI service across some dimensions that won’t make it more resemble an agent acting on a person’s behalf (or won't meet all criteria of 'agentiness')
    • not acting *over time* - more like a bespoke tool customised once to a customer’s preferred parameters, e.g. a website-builder like
    • an AI service personalising content according to a user’s likes/reads/don't show clicks isn't agent-like
    • efficient personalised services will be built on swappable modules and/or shared repositories of consumer preference components and contexts (meaning that the company never actually runs an independent instantiation of the service)
  • Personalisation of AI services will fall short of delegated agents except in a few niches because of lack of demand or supply
    • a handful of the largest software corporations (FAAMG, etc.) have locked in customers into networks and routines but are held back from personalising customer experiences because they tend to rely on third-party revenue streams
    • it's generally more profitable to specialise in and market a service that caters to either high-paying discerning customers, or a broad mass audience that's basically okay with anything you give them
    • too hard to manage mass customisation or not cost-effective compared to other forms of business innovation
    • humans are already well-adapted and trained for providing personalised services; AI can compete better in other areas
    • humans already have very similar preferences within the space of theoretical possibilities – making catering to individual differences less fruitful than you'd intuitively think
    • it’s easier to use AI to shape users to have more homogenous preferences than to cater to preference differences
    • eliciting human preferences takes up too much of the user's attention and/or runs up against too many possible interpretations (based on assumptions of user's rationality and prior knowledge, as well as relevant contextual cues) to work
    • you can make more commercial progress by designing and acclimatising users to a common interface that allows those users to meet their diverging preferences themselves (than to design AI interfaces that elicits the users' preferences and acts on their behalf)
    • software engineers need a rare mix of thing- and person-oriented skills to develop delegated agents
    • a series of bad publicity incidents impede further development (analogous to self-driving car crashes)
    • data protection or anonymisation laws in Europe and beyond limit personalisation efforts (or further down the line, restrictions on autonomous algorithms do)
    • doesn’t fit current zeitgeist somehow in high-income nations
  • Research directions aren't priorities
    • Advances in preference learning will be used for other unhelpful stuff (just read Andrew Critch's post)
    • Research on how much influence delegated agents might offer can, besides being really speculative, be misused or promote competitive dynamics
  • Context assumptions:
    • Delegated agents will be developed first inside say military labs (or other organisational structures in other places) that involve meaningfully dissimilar interactions than at a Silicon Valley start-up.
    • Initial contexts in which delegated agents are produced and used really don’t matter for how AI designs are deployed in later decades (something like, it’s overdetermined)
  • Conceptual confusion:
    • Terms in this post are ambiguous or used to refer to different things (e.g. general AI 'tasks' vs. 'tasks' humans conceive and act on, 'service' infrastructure vs. online 'service' aimed at human users, 'virtual assistant' conventionally means a remote human assistant, 'model')
    • An ‘AI agent’ is a vague, leaky concept that should be replaced with more exacting dimensions and mechanisms
    • Carving out humans and algorithms into separate individuals with separate ‘preferences’ is a fundamentally impoverished notion. This post assumes that perspective and therefore fosters mistaken/unskillful reasoning.
The Values-to-Actions Decision Chain

Ah, I have the first diagram in your article as one of my desktop backgrounds. :-) It was a fascinating demonstration of how experiences can be built up into more complex frameworks (even though I feel I only half-understand it). It was one of several articles that inspired and moulded my thinking in this post.

I'd value having a half-an-hour Skype chat with you some time. If you're up for it, feel free to schedule one here.

The Values-to-Actions Decision Chain

So, I do find it fascinating to analyse how multi-layered networks of agents interact and how those interactions can be improved to better reach goals together. My impression is also that it’s hard to make progress in (otherwise several simple coordination problems would already have been solved) and I lack expertise in network science, complexity science, multi-agent systems or microeconomics. I haven’t set out a clear direction but I do find your idea of making this into a larger project inspiring.

I’ll probably work on gathering more emperical data over time to overhaul any conclusions I came to in this article and gain a more fine-grained understanding how people interact in the EA community. When I happen to make some creative connections between concepts again, I’ll start writing those up. :-)

I think I’ll also write a case study in the next months that examines one possible implication of this model (e.g. local group engagement) in a more detailed, balanced way (for the strategic implications I wrote about in this post, I leant towards being concise and activating people to think about them instead of examining a bunch of separate data sources dispassionately).

The Values-to-Actions Decision Chain

Thanks for mentioning this!

Let me think about your question for a while. Will come back on it later.

AI Safety Research Camp - Project Proposal

Thanks for mentioning it.

If later you happen to see a blind spot or a failure mode we should work on covering, we'd like to learn about it!

AI Safety Research Camp - Project Proposal

Do you mean for the Gran Canaria camp?

We're also working towards a camp 2.0 in late July in the UK. I assume that's during summer break for you.

"Taking AI Risk Seriously" (thoughts by Critch)

Great, let me throw together a reply to your questions in reverse order. I've had a long day and lack the energy to do the rigorous, concise write-up that I'd want to do. But please comment with specific questions/criticisms that I can look into later.

What is the thought process behind their approach?

RAISE (copy-paste from slightly-promotional-looking wiki):

AI safety is a small field. It has only about 50 researchers. The field is mostly talent-constrained. Given the dangers of an uncontrolled intelligence explosion, increasing the amount of AIS researchers is crucial for the long-term survival of humanity.

Within the LW community there are plenty of talented people that bear a sense of urgency about AI. They are willing to switch careers to doing research, but they are unable to get there. This is understandable: the path up to research-level understanding is lonely, arduous, long, and uncertain. It is like a pilgrimage. One has to study concepts from the papers in which they first appeared. This is not easy. Such papers are undistilled. Unless one is lucky, there is no one to provide guidance and answer questions. Then should one come out on top, there is no guarantee that the quality of their work will be sufficient for a paycheck or a useful contribution.

The field of AI safety is in an innovator phase. Innovators are highly risk-tolerant and have a large amount of agency, which allows them to survive an environment with little guidance or supporting infrastructure. Let community organisers not fall for the typical mind fallacy, expecting risk-averse people to move into AI safety all by themselves. Unless one is particularly risk-tolerant or has a perfect safety net, they will not be able to fully take the plunge. Plenty of measures can be made to make getting into AI safety more like an "It's a small world"-ride:

  • Let there be a tested path with signposts along the way to make progress clear and measurable.
  • Let there be social reinforcement so that we are not hindered but helped by our instinct for conformity.
  • Let there be high-quality explanations of the material to speed up and ease the learning process, so that it is cheap.

AI Safety Camp (copy-paste from our proposal, which will be posted on LW soon):

Aim: Efficiently launch aspiring AI safety and strategy researchers into concrete productivity by creating an ‘on-ramp’ for future researchers.


  1. Get people started on and immersed into concrete research work intended to lead to papers for publication.
  2. Address the bottleneck in AI safety/strategy of few experts being available to train or organize aspiring researchers by efficiently using expert time.
  3. Create a clear path from ‘interested/concerned’ to ‘active researcher’.
  4. Test a new method for bootstrapping talent-constrained research fields.

Method: Run an online research group culminating in a two week intensive in-person research camp.

(our plans is test our approach in Gran Canaria on 12 April, for which we're taking in applications right now, and based on our refinements, organise a July camp at the planned EA Hotel in the UK)

What material do these groups cover?

RAISE (from the top of my head)

The study group has finished writing video scripts on the first corrigibility unit for the online course. It has now split into two to work on the second unit:

  1. group A is learning about reinforcement learning using this book
  2. group B is writing video scripts on inverse reinforcement learning

Robert Miles is also starting to make the first video of the first corrigibility unit (we've allowed ourselves to get delayed too much in actually publishing and testing material IMO). Past videos we've experimented with include a lecture by Johannes Treutin from FRI and Rupert McCallum giving lectures on corrigibility.

AI Safety Camp (copy-paste from proposal)

Participants will work in groups on tightly-defined research projects on the following topics:

  • Agent foundations
  • Machine learning safety
  • Policy & strategy
  • Human values

Projects will be proposed by participants prior to the start of the program. Expert advisors from AI Safety/Strategy organisations will help refine them into proposals that are tractable, suitable for this research environment, and answer currently unsolved research questions. This allows for time-efficient use of advisors’ domain knowledge and research experience, and ensures that research is well-aligned with current priorities.

Participants will then split into groups to work on these research questions in online collaborative groups over a period of several months. This period will culminate in a two week in-person research camp aimed at turning this exploratory research into first drafts of publishable research papers. This will also allow for cross-disciplinary conversations and community building. Following the two week camp, advisors will give feedback on manuscripts, guiding first drafts towards completion and advising on next steps for researchers.

Who's running them and what's their background?

Our two core teams mostly consist of young European researchers/autodidacts who haven't published much on AI safety yet (which does risk us not knowing enough about the outcomes we're trying to design for others).

RAISE (from the top of my head):

Toon Alfrink (founder, coordinator): AI bachelor student, also organises LessWrong meetups in Amsterdam.

Robert Miles (video maker): Runs a relatively well-known YouTube channel advocating careully for AI safety.

Veerle de Goederen (oversees preqs study group): Finished a Biology bachelor (and has been our most reliable team member)

Johannes Heidecke (oversees the advanced study group): Master student, researching inverse reinforcement learning in Spain.

Remmelt Ellen (planning coordinator): see below.

AI Safety Camp (copy-paste from proposal)

Remmelt Ellen Remmelt is the Operations Manager of Effective Altruism Netherlands, where he coordinates national events, supports organisers of new meetups and takes care of mundane admin work. He also oversees planning for the team at RAISE, an online AI Safety course. He is a Bachelor intern at the Intelligent & Autonomous Systems research group. In his spare time, he’s exploring how to improve the interactions within multi-layered networks of agents to reach shared goals – especially approaches to collaboration within the EA community and the representation of persons and interest groups by negotiation agents in sub-exponential takeoff scenarios.

Tom McGrath Tom is a maths PhD student in the Systems and Signals group at Imperial College, where he works on statistical models of animal behaviour and physical models of inference. He will be interning at the Future of Humanity Institute from Jan 2018, working with Owain Evans. His previous organisational experience includes co-running Imperial’s Maths Helpdesk and running a postgraduate deep learning study group.

Linda Linsefors Linda has a PhD in theoretical physics, which she obtained at Université Grenoble Alpes for work on loop quantum gravity. Since then she has studied AI and AI Safety online for about a year. Linda is currently working at Integrated Science Lab in Umeå, Sweden, developing tools for analysing information flow in networks. She hopes to be able to work full time on AI Safety in the near future.

Nandi Schoots Nandi did a research master in pure mathematics and a minor in psychology at Leiden University. Her master was focused on algebraic geometry and her thesis was in category theory. Since graduating she has been steering her career in the direction of AI safety. She is currently employed as a data scientist in the Netherlands. In parallel to her work she is part of a study group on AI safety and involved with the reinforcement learning section of RAISE.

David Kristoffersson David has a background as R&D Project Manager at Ericsson where he led a project of 30 experienced software engineers developing many-core software development tools. He liaised with five internal stakeholder organisations, worked out strategy, made high-level technical decisions and coordinated a disparate set of subprojects spread over seven cities on two different continents. He has a further background as a Software Engineer and has a BS in Computer Engineering. In the past year, he has contracted for the Future of Humanity Institute, and has explored research projects in ML and AI strategy with FHI researchers.

Chris Pasek After graduating from mathematics and theoretical computer science, Chris ended up touring the world in search of meaning and self-improvement, and finally settled on working as a freelance researcher focused on AI alignment. Currently also running a rationalist shared housing project on the tropical island of Gran Canaria and continuing to look for ways to gradually self-modify in the direction of a superhuman FDT-consequentialist.

Mistake: I now realise that by not mentioning that I'm involved with both may resemble a conflict of interest – I had removed 'projects I'm involved with' from my earlier comment before posting it to keep it concise.

"Taking AI Risk Seriously" (thoughts by Critch)

If you're committed to studying AI safety but have little money, here are two projects you can join (do feel free to add other suggestions):

1) If you want to join a beginners or advanced study group on reinforcement learning, post here in the RAISE group.

2) If you want to write research in a group, apply for the AI Safety Camp in Gran Canaria on 12-22 April.