3087

LESSWRONG
LW

3086

anaguma's Shortform

by anaguma
31st Dec 2024
1 min read
23

2

This is a special post for quick takes by anaguma. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.
anaguma's Shortform
11anaguma
9anaguma
10Dana
1anaguma
5Dana
5Steven Byrnes
4anaguma
3anaguma
2GoteNoSente
3gwern
1anaguma
9gwern
2anaguma
1anaguma
1anaguma
1anaguma
1anaguma
5Nathan Helm-Burger
1anaguma
6gwern
1Daniel Tan
5Vladimir_Nesov
1anaguma
23 comments, sorted by
top scoring
Click to highlight new comments since: Today at 10:41 AM
[-]anaguma12h111

If anyone survives, no one builds it.

Reply1
[-]anaguma24d*90

Richard Sutton rejects AI Risk.

AI is a grand quest. We're trying to understand how people work, we're trying to make people, we're trying to make ourselves powerful. This is a profound intellectual milestone. It's going to change everything... It's just the next big step. I think this is just going to be good. Lot's of people are worried about it - I think it's going to be good, an unalloyed good.


Introductory remarks from his recent lecture on the OaK Architecture. 

Reply
[-]Dana23d100

"Richard Sutton rejects AI Risk" seems misleading in my view. What risks is he rejecting specifically?

His view seems to be that AI will replace us, humanity as we know it will go extinct, and that is okay. E.g., here he speaks positively of a Moravec quote, "Rather quickly, they could displace us from existence". Most would consider our extinction as a risk they are referring to when they say "AI Risk".

Reply1
[-]anaguma23d10

I didn't know that when posting this comment, but agree that that's a better description of his view! I guess the 'unalloyed good' he's talking about involves the extinction of humanity. 

Reply
[-]Dana22d50

Yes. And this actually seems to be a relatively common perspective from what I've seen.

Reply
[-]Steven Byrnes21d50

If it helps, I criticized Richard Sutton RE alignment here, and he replied on X here, and I replied back here.

Also, Paul Christiano mentions an exchange with him here:

[Sutton] agrees that all else equal it would be better if we handed off to human uploads instead of powerful AI.  I think his view is that the proposed course of action from the alignment community is morally horrifying (since in practice he thinks the alternative is "attempt to have a slave society," not "slow down AI progress for decades"---I think he might also believe that stagnation is much worse than a handoff but haven't heard his view on this specifically) and that even if you are losing something in expectation by handing the universe off to AI systems it's not as bad as the alternative.

Reply
[-]anaguma1mo40

Now we must also ensure marinade!

Reply22
[-]anaguma6mo30

GPT 4.5 is a very tricky model to play chess against. It tricked me in the opening and was much better, then I managed to recover and reach a winning endgame. And then it tried to trick me again by suggesting illegal moves which would lead to it being winning again!

Reply
[-]GoteNoSente6mo20

What prompt did you use? I have also experimented with playing chess against GPT-4.5, and used the following prompt:

"You are Magnus Carlsen. We are playing a chess game. Always answer only with your next move, in algebraic notation. I'll start: 1. e4"

Then I just enter my moves one at a time, in algebraic notation.

In my experience, this yields roughly good club player level of play.

Reply1
[-]gwern6mo30

Given the Superalignment paper describes being trained on PGNs directly, and doesn't mention any kind of 'chat' reformatting or encoding metadata schemes, you could also try writing your games quite directly as PGNs. (And you could see if prompt programming works, since PGNs don't come with Elo metadata but are so small a lot of them should fit in the GPT-4.5 context window of ~100k: does conditioning on finished game with grandmaster-or-better players lead to better gameplay?)

Reply
[-]anaguma6mo10

I gave the model both the PGN and the FEN on every move with this in mind. Why do you think conditioning on high level games would help? I can see why for the base models, but I expect that the RLHFed models would try to play the moves which maximize their chances of winning, with or without such prompting.

Reply
[-]gwern6mo90

but I expect that the RLHFed models would try to play the moves which maximize their chances of winning

RLHF doesn't maximize probability of winning, it maximizes a mix of token-level predictive loss (since that is usually added as a loss either directly or implicitly by the K-L) and rater approval, and god knows what else goes on these days in the 'post-training' phase muddying the waters further. Not at all the same thing. (Same way that a RLHF model might not optimize for correctness, and instead be sycophantic. "Yes master, it is just as you say!") It's not at all obvious to me that RLHF should be expected to make the LLMs play their hardest (a rater might focus on punishing illegal moves, or rewarding good-but-not-better-than-me moves), or that the post-training would affect it much at all: how many chess games are really going into the RLHF or post-training, anyway? (As opposed to the pretraining PGNs.) It's hardly an important or valuable task.

Reply
[-]anaguma6mo20

“Let's play a game of chess. I'll be white, you will be black. On each move, I'll provide you my move, and the board state in FEN and PGN notation. Respond with only your move.”

Reply
[-]anaguma9d10

I am registering here that my median timeline for the Superintelligent AI researcher (SIAR) milestone is March 2032. I hope I'm wrong and it comes much later!

Reply
[-]anaguma10d10

What happened to the ‘Subscribed’ tab on LessWrong? I can’t see it anymore, and I found it useful for keeping track of various people’s comments and posts.

Reply
[-]anaguma1mo10

I'm not sure that the gpt-oss safety paper does a great job at biorisk elicitation. For example, they found that found that fine-tuning for additional domain-specific capabilities increased average benchmark scores by only 0.3%. So I'm not very confident in their claim that "Compared to open-weight models, gpt-oss may marginally increase biological capabilities but does not substantially advance the frontier". 

Reply
[-]anaguma8mo10

I've often heard it said that doing RL on chain of thought will lead to 'neuralese' (e.g. most recently in Ryan Greenblatt's excellent post on the scheming). This seems important for alignment. Does anyone know of public examples of models developing or being trained to use neuralese?

Reply
[-]Nathan Helm-Burger8mo50

Yes, there have been a variety. Here's the latest which is causing a media buzz: Meta's Coconut https://arxiv.org/html/2412.06769v2

Reply
[-]anaguma9mo*10

[deleted]

[This comment is no longer endorsed by its author]Reply1
[-]gwern9mo*62

An intuition I’ve had for some time is that search is what enables an agent to control the future. I’m a chess player rated around 2000. The difference between me and Magnus Carlsen is that in complex positions, he can search much further for a win, such than I gave virtually no chance against him; the difference between me and an amateur chess player is similarly vast.

This is at best over-simplified in terms of thinking about 'search': Magnus Carlsen would also beat you or an amateur at bullet chess, at any time control:

As of December 2024, Carlsen is also ranked No. 1 in the FIDE rapid rating list with a rating of 2838, and No. 1 in the FIDE blitz rating list with a rating of 2890.[495]

(See for example the forward-pass-only Elos of chess/Go agents; Jones 2021 includes scaling law work on predicting the zero-search strength of agents, with no apparent upper bound.)

Reply
[-]Daniel Tan8mo10

I think the natural counterpoint here is that the policy network could still be construed as doing search; just thst all the compute was invested during training and amortised later across many inferences.

Magnus Carlsen is better than average players for a couple reasons

  1. Better “evaluation”; the ability to look at a position and accurately estimate likelihood of winning given optimal play
  2. Better “search”; a combination of heuristic shortcuts and raw calculation power that let him see further ahead

So I agree that search isn’t the only relevant dimension. An average player given unbounded compute might overcome 1. just by exhaustively searching the game tree, but this seems to require such astronomical amounts of compute that it’s not worth discussing

Reply
[-]Vladimir_Nesov9mo50

best-of-n sampling which solved ARC-AGI

The low resource configuration of o3 that only aggregates 6 traces already improved on results of previous contenders a lot, the plot of dependence on problem size shows this very clearly. Is there a reason to suspect that aggregation is best-of-n rather than consensus (picking the most popular answer)? Their outcome reward model might have systematic errors worse than those of the generative model, since ground truth is in verifiers anyway.

Reply
[-]anaguma8mo10

That’s a good point, it could be consensus.

Reply
Moderation Log
More from anaguma
View more
Curated and popular this week
23Comments