TLDR We develop a technique to try and detect if a NN is doing planning internally. We apply the decoder to the intermediate representations of the network to see if it’s representing the states it’s planning through internally. We successfully reveal intermediate states in a simple Game of Life model,...
Once I was talking with a friend about an upcoming career decision. I said I wanted to do something "valuable" and he said that he recognises value when he sees it. Understanding values has never been easy for me, so I was intrigued by his sentence. I started asking myself...
It seems to me that the behavioural science research around 'the trade-off between maximising and satisficing' in terms of well-being relates with human beings being 'optimisers' only sometimes. Also, it seems to suggests that there can be alternative and better approaches to optimisation? Is this true? Or is 'satisficing' just...
Premise Recently I spent some time thinking about ways in which studying the human side of human-machine systems would be beneficial to build aligned AIs. I discussed these ideas informally and people seemed interested and wanted to know more. Thus, I decided to write a list of research directions for...
Many consider Deep Reinforcement Learning (DeepRL) systems, such as AlhpaZero or MuZero, as the most-likely early version of AGI systems. However, a common critique to RL systems is that they are applicable only to well-defined problems, such as games. Hence, a natural question that follow from this is: what "messy"...
It appears that in the last few years the AI Alignment community has dedicated great attention to the Value Learning Problem [1]. In particular, the work of Stuart Armstrong stands out to me. Concurrently, during the last decade, researcher such as Eyke Hüllermeier Johannes Fürnkranz produced a significant amount of...