LESSWRONG
LW

1988
JanDisselhoff
4010
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No posts to display.
No wikitag contributions to display.
Gemini Diffusion: watch this space
JanDisselhoff4mo50

I have seen one paper, arguing that diffusion is stronger. 

the most interesting result there: A small 6M(!) model is able to solve 9x9 sudokus with 100% accuracy. In my own experiments, using a LLama 3B model and a lot of finetuning and engineering, I got up to ~50% accuracy with autoregressive sampling.[1]

For sudoku this seems pretty obvious: In many cases there is a cell that is very easy to fill, so you can solve the problem piece by piece. Otoh, autoregressive models sometimes have to keep the full solution "in their head" before filling the first cell.

(As an aside, I tried testing some sudokus in Gemini Diffusion, which it couldn't solve)

So yeah, there are some tasks that should definitely be easier to solve for diffusion LLMs. Their scaling behaviour is not researched well yet, afaict.

  1. ^

    I don't fully trust the results of this paper, as it seems too strong and I have not yet been able to replicate this well. But the principle holds and the idea is good.

Reply