x

LESSWRONG

LW

Srivatsan Sampath — LessWrong

Srivatsan Sampath

Srivatsan Sampath

Message

2

1

9mo

Srivatsan Sampath

2

9mo

Trust me bro, just one more RL scale up, this one will be the real scale up with the good environments, the actually legit one, trust me bro

Srivatsan Sampath9mo*30

I’d frame the pace of RL environmental progress with a simple 2×2.
1. Is the task bounded (Codeforces, IMO-style problems) or unbounded (financial analysis using Excel, executive communication using slides, coding in unstructured codebases, design work using Photoshop etc).
2. Do we have in-house expertise (yes for coding and easy to source for IMO) or not (OpenAI is hiring finance pros this week to help build evals for Financial agents as I am writing this comment). The presence of expertise helps companies build RL environments that better reflect the actual pro

... (read more)