Riccardo Volpato

Research ideas to study humans with AI Safety in mind

Premise Recently I spent some time thinking about ways in which studying the human side of human-machine systems would be beneficial to build aligned AIs. I discussed these ideas informally and people seemed interested and wanted to know more. Thus, I decided to write a list of research directions for studying humans that could help solve the alignment problem. The list is non-exhaustive. Also, the intention behind it is not to argue that these research directions are more important than any other but rather to suggest directions to someone with a related background or personal fit in studying humans. There is also a lot of valuable work in AI Strategy that involves studying humans, which I am not familiar with. I wrote this list mostly with Technical AI Safety in mind. Human-AI Research Fields Before diving into my suggestions for studying humans with AI Safety in mind, I want to mention some less well-known research fields that study the interactions between human and AI systems in different ways, since I reference some of these below. Leaving aside the usual suspects of psychology, cognitive science and neuroscience, other interesting research areas I came across are Cybernetics A “transdisciplinary” approach defined by Norbert Wiener in 1948 as "the scientific study of control and communication in the animal and the machine". It is currently mostly used as a historical reference and a foundational reading. However, there is growing work in integrating cybernetics concepts in current research. Human-AI Interaction Human-Computer Interaction (HCI) is an established field dating back to the 70s. It “studies the design and use of computer technology, focused on the interfaces between people and computers”. Human-AI Interaction is a recently established sub-field of HCI concerned with studying specifically the interactions between humans and “AI-infused systems”. Computational Social Science “Using computers to model, simulate, and analyze social phenomena.

23Jul 3, 2020

Riccardo Volpato

Message

Visualizing neural network planning

TLDR We develop a technique to try and detect if a NN is doing planning internally. We apply the decoder to the intermediate representations of the network to see if it’s representing the states it’s planning through internally. We successfully reveal intermediate states in a simple Game of Life model,...

May 9, 20244

We can understand value only through endless self-reflection

Once I was talking with a friend about an upcoming career decision. I said I wanted to do something "valuable" and he said that he recognises value when he sees it. Understanding values has never been easy for me, so I was intrigued by his sentence. I started asking myself...

May 13, 20211

Is 'satificing' optimisation?

It seems to me that the behavioural science research around 'the trade-off between maximising and satisficing' in terms of well-being relates with human beings being 'optimisers' only sometimes. Also, it seems to suggests that there can be alternative and better approaches to optimisation? Is this true? Or is 'satisficing' just...

Aug 24, 20205

Research ideas to study humans with AI Safety in mind

Jul 3, 202023

What messy problems do you see Deep Reinforcement Learning applicable to?

Many consider Deep Reinforcement Learning (DeepRL) systems, such as AlhpaZero or MuZero, as the most-likely early version of AGI systems. However, a common critique to RL systems is that they are applicable only to well-defined problems, such as games. Hence, a natural question that follow from this is: what "messy"...

Apr 5, 20205

What is the relationship between Preference Learning and Value Learning?

It appears that in the last few years the AI Alignment community has dedicated great attention to the Value Learning Problem [1]. In particular, the work of Stuart Armstrong stands out to me. Concurrently, during the last decade, researcher such as Eyke Hüllermeier Johannes Fürnkranz produced a significant amount of...

Jan 13, 20205

LESSWRONG
LW

LESSWRONG
LW

Riccardo Volpato

Riccardo Volpato

Riccardo Volpato

Research ideas to study humans with AI Safety in mind

Is 'satificing' optimisation?

What is the relationship between Preference Learning and Value Learning?

What messy problems do you see Deep Reinforcement Learning applicable to?

Riccardo Volpato

Visualizing neural network planning

We can understand value only through endless self-reflection

Is 'satificing' optimisation?

Research ideas to study humans with AI Safety in mind

What messy problems do you see Deep Reinforcement Learning applicable to?

What is the relationship between Preference Learning and Value Learning?

Visualizing neural network planning

We can understand value only through endless self-reflection

Is 'satificing' optimisation?

Research ideas to study humans with AI Safety in mind

What messy problems do you see Deep Reinforcement Learning applicable to?

What is the relationship between Preference Learning and Value Learning?

Research ideas to study humans with AI Safety in mind

Is 'satificing' optimisation?

What is the relationship between Preference Learning and Value Learning?

What messy problems do you see Deep Reinforcement Learning applicable to?