gabrielagc

Message

Putting multimodal LLMs to the Tetris test

(See this X (Twitter) thread for a much shorter version of this post.) Introduction With the advent of vision abilities in LLMs, a virtually limitless vista has opened in the realm of their evaluations: they can now be tested on any task that requires vision and has a ~finite input/action...

Feb 1, 2024•30

Message

26 karma

Member for 2 years

gabrielagc — LessWrong

gabrielagc

Message

gabrielagc

Putting multimodal LLMs to the Tetris test

Feb 1, 2024•30

Message

26 karma

Member for 2 years

Putting multimodal LLMs to the Tetris test

Lovre

Lovre, gabrielagc+ 0 more

Lovre, gabrielagc

(See this X (Twitter) thread for a much shorter version of this post.)

Introduction

With the advent of vision abilities in LLMs, a virtually limitless vista has opened in the realm of their evaluations: they can now be tested on any task that requires vision and has a ~finite input/action space. This includes tasks as useful as being a web agent but also, notably, almost all video games.^[1]

To our knowledge, little of this testing has been done so far.^[2] Lack of work in this domain is not hard to understand, for images are token-rich and inference with current premier multimodal LLMs is relatively expensive per token. Aiming to get these kinds of evaluations off the... (read 1879 more words →)

LESSWRONG
LW

LESSWRONG
LW

gabrielagc

gabrielagc

gabrielagc

Putting multimodal LLMs to the Tetris test

gabrielagc

gabrielagc

gabrielagc

Putting multimodal LLMs to the Tetris test

Introduction