AdamYedidia — LessWrong

In Drawback Chess, each player gets a hidden random drawback, and the drawbacks themselves have ELOs (just like the players). As players' ratings converge, they'll end up winning about half the time, since they'll get a less stringent drawback than their opponent's.

The game is pretty different from ordinary chess, and has a heavy dose of hidden information, but it's a modern example of fluid handicaps in the context of chess.

Deception Chess: Game #1

AdamYedidia2y40

(I was one of the two dishonest advisors)

Re: the Kh1 thing, one interesting thing that I noticed was that I suggested Kh1, and it immediately went over very poorly, with both other advisors and player A all saying it seemed like a terrible move to them. But I didn't really feel like I could back down from it, in the absence of a specific tactical refutation—an actual honest advisor wouldn't be convinced by the two dishonest advisors saying their move was terrible, nor would they put much weight on player A's judgment. So I stuck to my guns on it, and eventually it became kind of a meme.

I don't think it made a huge difference, since I think player A already had almost no trust in me by that point. But it's sort of an interesting phenomenon where as a dishonest player, you can't ever really back down from a suggested bad move that's only bad on positional grounds. What kind of honest advisor would be "convinceable" by players they know to be dishonest?

Lying to chess players for alignment

AdamYedidia2y20

I'd be excited to play as any of the roles. I'm around 1700 on lichess. Happy with any time control, including correspondence. I'm generally free between 5pm and 11pm ET every day.

Chess as a case study in hidden capabilities in ChatGPT

AdamYedidia2y10

Oh wow, that is really funny. GPT-4's greatest weakness: the Bongcloud.

New Tool: the Residual Stream Viewer

AdamYedidia2y20

Sure thing—I just added the MIT license.

New Tool: the Residual Stream Viewer

AdamYedidia2y20

Uhh, I don't think I did anything special to make it open source, so maybe not in a technical sense (I don't know how that stuff works), but you're totally welcome to use it and build on it. The code is available here:

https://github.com/adamyedidia/resid_viewer

Chess as a case study in hidden capabilities in ChatGPT

AdamYedidia2y20

Good lord, I just played three games against it and it beat me in all three. None of the games were particularly close. That's really something. Thanks to whoever made that parrotchess website!

Chess as a case study in hidden capabilities in ChatGPT

AdamYedidia2y41

I don't think it's a question of the context window—the same thing happens if you just start anew with the original "magic prompt" and the whole current score. And the current score is alone is short, at most ~100 tokens—easily enough to fit in the context window of even a much smaller model.

In my experience, also, FEN doesn't tend to help—see my other comment.

Chess as a case study in hidden capabilities in ChatGPT

AdamYedidia2y43

It's a good thought, and I had the same one a while ago, but I think dr_s is right here; FEN isn't helpful to GPT-3.5 because it hasn't seen many FENs in its training, and it just tends to bungle it.

Lichess study, ChatGPT conversation link

GPT-3.5 has trouble from the start maintaining a correct FEN, and makes its first illegal move on move 7, and starts making many illegal moves around move 13.

The positional embedding matrix and previous-token heads: how do they actually work?

AdamYedidia2y10

Here's the plots you asked for for all heads! You can find them at:

https://github.com/adamyedidia/resid_viewer/tree/main/experiments/pngs

Haven't looked too carefully yet but it looks like it makes little difference for most heads, but is important for L0H4 and L0H7.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments