Link: Let's Think Dot by Dot: Hidden Computation in Transformer Language Models by Jacob Pfau, William Merrill & Samuel R. Bowman

Chris_Leong

12 Link: Let's Think Dot by Dot: Hidden Computation in Transformer Language Models by Jacob Pfau, William Merrill & Samuel R. Bowman

by Chris_Leong

27th Apr 2024

1 min read

0

12

This is a linkpost for https://twitter.com/jacob_pfau/status/1783951795238441449

One consideration that is pretty important for AI safety is understanding the extent to which a model's outputs are aligned with its chain of thought.

This paper (Twitter thread linked) provides some relevant evidence. It demonstrates that it is possible for a model to achieve performance comparable to chain-of-thought with dots replacing the chain of thought under some circumstances. In particular, the model can't just be trained with sequences like "QUESTION..............ANSWER", but sequences need to also be mixed in where there is a "parallelisable" chain of thought. Here "parallelisable" means that different components of the chain of thought can be calculated in parallel rather than all separately.

In terms of how this is relevant to AI safety, this provides an empirical demonstration that a model is capable of very effectively engaging in background computation under certain circumstance. It shows that the model is much better at doing background parallelizable tasks than non-parallelizable tasks. In other words, the chain of thought is less binding than it might have been because the model is free to perform some of the computations necessary for future tokens in the background.

AI

Frontpage

12

New Comment

Moderation Log

LESSWRONG
LW

LESSWRONG
LW

12

Link: Let's Think Dot by Dot: Hidden Computation in Transformer Language Models by Jacob Pfau, William Merrill & Samuel R. Bowman

12

12

12