LESSWRONG
LW

AI
Frontpage

4

[Linkpost] The Expressive Capacity of State Space Models: A Formal Language Perspective

by Bogdan Ionut Cirstea
28th May 2024
1 min read
3

4

This is a linkpost for https://arxiv.org/abs/2405.17394
AI
Frontpage

4

[Linkpost] The Expressive Capacity of State Space Models: A Formal Language Perspective
1Bogdan Ionut Cirstea
2Mateusz Bagiński
1Bogdan Ionut Cirstea
New Comment
3 comments, sorted by
top scoring
Click to highlight new comments since: Today at 7:15 AM
[-]Bogdan Ionut Cirstea1y10

Whoever (strong?) downvoted this, I'm genuinely curious about the reasoning behind. Especially since my previous linkpost also got downvoted down to negative numbers at some point, for reasons unknown to me.

Reply
[-]Mateusz Bagiński1y22

It wasn't me but it's probably about spreading AI capabilities-relevant knowledge.

Reply1
[-]Bogdan Ionut Cirstea1y10

Huh, that would seem a bit unnuanced to me, especially since I mentioned weak forward passes; though maybe I should add more context, e.g. this comment / the whole thread. 

Thanks, might try to edit with a bit more context!

Reply
Moderation Log
More from Bogdan Ionut Cirstea
View more
Curated and popular this week
3Comments

Paper authors: Yash Sarrof, Yana Veitsman, Michael Hahn.

Context: architectures with weak forward passes can be differentially transparent; see e.g. this comment / the whole thread and research agendas like externalized reasoning or the translucent thought hypothesis.

Summary thread: https://x.com/yashYRS/status/1795340993757352402. Summary of the summary thread: like transformers, which have weak forward passes, SSMs are also in the TC0 computational complexity class, 'but cover distinct fragments within it'. 'SSMs can track hierarchical structures with optimal memory [...] suggesting that SSMs, while being more parallellizable, maintain sufficient power to handle the hierarchical structure of language.'

Abstract:

Recently, recurrent models based on linear state space models (SSMs) have shown promising performance in language modeling (LM), competititve with transformers. However, there is little understanding of the in-principle abilities of such models, which could provide useful guidance to the search for better LM architectures. We present a comprehensive theoretical study of the capacity of such SSMs as it compares to that of transformers and traditional RNNs. We find that SSMs and transformers have overlapping but distinct strengths. In star-free state tracking, SSMs implement straightforward and exact solutions to problems that transformers struggle to represent exactly. They can also model bounded hierarchical structure with optimal memory even without simulating a stack. On the other hand, we identify a design choice in current SSMs that limits their expressive power. We discuss implications for SSM and LM research, and verify results empirically on a recent SSM, Mamba.