LESSWRONG
LW

Joseph Emerson — LessWrong

Small point on this reference:

"While some proponents of AIF believe that it is a more principled rival to Reinforcement Learning (RL), it has been shown that AIF is formally equivalent to the control-as-inference formulation of RL.^[8]"

I believe the paper cited here says that AIF is formally equivalent to control-as-inference only in its likelihood-AIF variant, i.e. when the value is moved into a biased likelihood and made equivalent to the control-as-inference optimality variable. The paper otherwise shows that AIF and control-as-inference are not identical, and that this arises from differences in how value is encoded in each. In AIF, value is encoded in the prior preferences of the agent over observations, whereas in control-as-inference, value has a separate representation from the veridical generative model.

The authors may have meant to explain that AIF in the specific case of the likelihood variant is formally equivalent to control-as-inference, in which case they should state that clearly.

Do Deep Neural Networks Have Brain-like Representations?: A Summary of Disagreements

Joseph Emerson

Acknowledgments: Thanks to SPAR summer 2024 and the Promisingness of Automating AI Alignment group, particularly the supervisor for this project, Bogdan-Ionut Cirstea, for his feedback.

This post began as a write up of my review of neuro-AI literature for the Promisingness of Automating AI Alignment. After returning to the document to make some updates, I decided it’s worth posting despite the absence of strong conclusions.

TL;DR

Do artificial neural networks form representations similar to those used by biological brains? If so, it could be important to consider implications for AI alignment, such as understanding how brain-AI alignment impacts the difficulty of ontology identification in AI systems. I looked at the empirical case for representation alignment... (read 7675 more words →)

Replying toNeurotechnology is Critical for AI Alignment

Joseph Emerson2y

Neurotechnology is Critical for AI Alignment

Hey Milan, I’m broadly sympathetic to the argument in Proposition 1 Reason 2 that if we want to understand if models do human-derived cognitive operation X, we need to define what X is, and the best validation of our definition will come from testing it in humans. But recently, I’ve been wrestling with whether we need to define the cognition that models are doing in the same terms that we define human cognition to get alignment of model behavior.

For instance you could take the definition of deception given in this paper: “the systematic inducement of false beliefs in the pursuit of some outcome other than the truth”. This definition intentionally avoids relying on... (read 413 more words →)

Discovering Latent Knowledge in the Human Brain: Part 1 – Clarifying the concepts of belief and knowledge

Joseph Emerson

Acknowledgements: Many thanks to Milan Cvitkovic and Collin Burns for their help in workshopping this project proposal. Many of the ideas presented here regarding the use of neuroscience and neurotechnology were originally proposed by Milan here. This post represents my current thinking on the topic and does not necessarily represent the views of the individuals I have just mentioned.

tl;dr: Methods for promoting honest AI hinge on our ability to identify beliefs/knowledge in models. These concepts are derived from human cognition. In order to use them in AI, we need to provide definitions at the appropriate level of resolution. In this post, I describe why I think it would be a useful starting point... (read 3375 more words →)