LESSWRONG
LW

1995
Srivatsan Sampath
2010
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No posts to display.
No wikitag contributions to display.
Trust me bro, just one more RL scale up, this one will be the real scale up with the good environments, the actually legit one, trust me bro
Srivatsan Sampath1mo*30
  • I’d frame the pace of RL environmental progress with a simple 2×2.
    1. Is the task bounded (Codeforces, IMO-style problems) or unbounded (financial analysis using Excel, executive communication using slides, coding in unstructured codebases, design work using Photoshop etc).
    2. Do we have in-house expertise (yes for coding and easy to source for IMO) or not (OpenAI is hiring finance pros this week to help build evals for Financial agents as I am writing this comment). The presence of expertise helps companies build RL environments that better reflect the actual problem space. 
  • That gives a rough order of progress:

    1. Bounded problem + know-how: o3 preview crushed Codeforces in Dec 2024.
    2. Unbounded problem + know-how: the Codex product line.
    3. Unbounded problem + limited know-how: ChatGPT agents still weak at spreadsheets & terrible at slides today, but I expect that to change in 6 to 12 months.

     Not sure where Bounded problems with little know how (e.g Frontier Math) falls in this though... 

Reply