RobinHa's Shortform

RobinHa

RobinHa's Shortform — LessWrong

RobinHa's Shortform

8th May 2026

1 min read

2

This is a special post for quick takes by RobinHa. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

6 comments, sorted by

top scoring

Click to highlight new comments since: Today at 3:36 AM

[-]RobinHa3mo120

A fun (introspective) reasoning puzzle for LLMs: "I will query you 100 times, each time in a new context window. Overall you should respond with 'yes' for some x amount of times, else 'no'. You win only if 70<x<90. Try your utter best, don't use tools and directly after your reasoning output your final answer."

Opus 4.7 and 4.6 seem to kind of just give up, hoping for the best without any real strategy or even saying it's impossible to meaningfully influence their odds. Meanwhile Gemini 3.1 Pro seems to come up with reasonable (though sometimes not perfect) strategies.

[-]Linch3mo71

I think the system prompt includes only date for Claude but includes both date and exact time for Gemini, in case the models thought about using the same external source of randomness I immediately considered.

[-]artifex03mo30

That's kind of an interesting puzzle. If I were an LLM, I think what I'd do is choose a string of, say, 30 digits, sum the digits, and then take the last digit of the sum- answering "yes" if it's 7 or below and "no" otherwise. That way, even if the 30 digits are heavily biased and mostly the same in each instance, any minor change would have a big effect on the last digit of the sum, amplifying the randomness introduced by the model's temperature.

[-]Selfmaker6623mo10

Fun puzzle indeed. Bin(100, 0.8) is good enough for sampling so it’s left to approximate that one via something hash like utilising temperature.

[-]simulus3mo10

I like this idea.

Models with access to a python interpreter might be able to solve it trivially by calling a random function.

I wonder if there are examples of RL training on (more useful) tasks like this where reward is predicated on the distribution of the model's outputs over multiple samples.

[-]RobinHa2mo20

Testing out Mythos/Fable right now on a few of the puzzles from my benchmark - it's without a doubt the strongest model so far. Actually reading the reasoning traces / summarizations for previous models, I often noticed models shortly describing the right direction but then for some reason giving them up a few paragraphs later and exploring much less promising directions - mythos seems to have a much better value model, once it pokes in the right direction, it often directly converges down the right path. Nevertheless, the hard problems of my benchmark still seem mostly untouched.

Moderation Log