x

LESSWRONG

LW

Victor Tao — LessWrong

Victor Tao

Victor Tao

Message

2

7y

Victor Tao

2

7y

Visualizing neural network planning

by Nevan Wichers, Victor Tao, fbarez, and Riccardo Volpato

TLDR We develop a technique to try and detect if a NN is doing planning internally. We apply the decoder to the intermediate representations of the network to see if it’s representing the states it’s planning through internally. We successfully reveal intermediate states in a simple Game of Life model,...

May 9, 2024•4