Riccardo Volpato — LessWrong

Visualizing neural network planning

by Nevan Wichers, Victor Tao, fbarez, and Riccardo Volpato

TLDR We develop a technique to try and detect if a NN is doing planning internally. We apply the decoder to the intermediate representations of the network to see if it’s representing the states it’s planning through internally. We successfully reveal intermediate states in a simple Game of Life model,...

May 9, 20244

We can understand value only through endless self-reflection

Once I was talking with a friend about an upcoming career decision. I said I wanted to do something "valuable" and he said that he recognises value when he sees it. Understanding values has never been easy for me, so I was intrigued by his sentence. I started asking myself...

May 13, 20211

Is 'satificing' optimisation?

It seems to me that the behavioural science research around 'the trade-off between maximising and satisficing' in terms of well-being relates with human beings being 'optimisers' only sometimes. Also, it seems to suggests that there can be alternative and better approaches to optimisation? Is this true? Or is 'satisficing' just...

Aug 24, 20205

Research ideas to study humans with AI Safety in mind

Premise Recently I spent some time thinking about ways in which studying the human side of human-machine systems would be beneficial to build aligned AIs. I discussed these ideas informally and people seemed interested and wanted to know more. Thus, I decided to write a list of research directions for...

Jul 3, 202023

What messy problems do you see Deep Reinforcement Learning applicable to?

Many consider Deep Reinforcement Learning (DeepRL) systems, such as AlhpaZero or MuZero, as the most-likely early version of AGI systems. However, a common critique to RL systems is that they are applicable only to well-defined problems, such as games. Hence, a natural question that follow from this is: what "messy"...

Apr 5, 20205

What is the relationship between Preference Learning and Value Learning?

It appears that in the last few years the AI Alignment community has dedicated great attention to the Value Learning Problem [1]. In particular, the work of Stuart Armstrong stands out to me. Concurrently, during the last decade, researcher such as Eyke Hüllermeier Johannes Fürnkranz produced a significant amount of...

Jan 13, 20205