LESSWRONG
LW

860
Wikitags

Chain-of-Thought Alignment

Edited by Roger Dearnaley, RogerDearnaley, et al. last updated 1st Dec 2023

"Chain-of-thought" autonomous agentic wrappers such as AutoGPT around an LLM such as GPT-4, and similar Language Model Cognitive Architectures (LMCAs) (other commonly used terms are Language Model Autonomous Agents (LMAAs), or Scaffolded LLMs), are a recent candidate approach to building an AGI.

They create, edit, and maintain a natural language context by recursively feeding parts of this into the LLM along with suitable prompts for activities like subtask planning, self-criticism, and memory summarization, generating a textual stream-of-consciousness, memories etc. They thus combine LLM neural nets with natural language symbolic thinking more along the lines of GOFAI.

Recent open-source examples are quite simple and not particularly capable, but it seems rather plausible that they could progress rapidly. They could make interpretability much easier than pure neural net systems, since their 'chain-of-though'/'stream of consciousness' and 'memories' would be written in human natural language, so interpretable and editable by a monitoring human or LLM-based monitoring system (modulo concerns about opaque natural language or detecting possible hidden steganographic side-channels concealed in apparently-innocent natural language). This topic discusses the alignment problem for systems combining such agentic wrappers with LLMs, if they are in fact capable of approaching or reaching AGI.

See Also

  • LLM Powered Autonomous Agents (Lilian Weng, 2023)
Subscribe
Discussion
1
Subscribe
Discussion
1
Posts tagged Chain-of-Thought Alignment
88Capabilities and alignment of LLM cognitive architectures
Ω
Seth Herd
2y
Ω
18
269the case for CoT unfaithfulness is overstated
Ω
nostalgebraist
1y
Ω
43
136Externalized reasoning oversight: a research direction for language model alignment
Ω
tamera
3y
Ω
23
39Language Agents Reduce the Risk of Existential Catastrophe
Ω
cdkg, Simon Goldstein
2y
Ω
14
34Scaffolded LLMs: Less Obvious Concerns
Ω
Stephen Fowler
2y
Ω
15
14Alignment of AutoGPT agents
Ozyrus
2y
1
248On AutoGPT
Zvi
2y
47
130Training a Reward Hacker Despite Perfect Labels
Ω
ariana_azarbal, vgillioz, TurnTrout
1mo
Ω
45
91We should start looking for scheming "in the wild"
Ω
Marius Hobbhahn
6mo
Ω
4
76Unfaithful Reasoning Can Fool Chain-of-Thought Monitoring
Ω
Benjamin Arnav, Pablo Bernabeu Perez, Timothy Kostolansky, HanneWhitt, Nathan Helm-Burger, Mary Phuong
3mo
Ω
17
73LLM AGI will have memory, and memory changes alignment
Seth Herd
5mo
15
66AI CoT Reasoning Is Often Unfaithful
Zvi
5mo
4
66Shane Legg interview on alignment
Seth Herd
2y
20
63Steganography in Chain of Thought Reasoning
Ω
A Ray
3y
Ω
13
56Internal independent review for language model agent alignment
Ω
Seth Herd
2y
Ω
30
Load More (15/87)
Add Posts