x

LESSWRONG

LW

Anthony Perez-sanz — LessWrong

Anthony Perez-sanz

Anthony Perez-sanz

Message

-1

1

1y

Anthony Perez-sanz

-1

1y

Recent AI model progress feels mostly like bullshit

Anthony Perez-sanz1y00

I feel like your jumping to cheating way to quickly. I think everyone would agree that there is overfitting to benchmarks and to benchmark like questions. Also, this is a very hard problem. The average person doesn't have a shot at contributing to security research. Even the typical appsec engineer with years of experience would fail at the task of navigating a new codebase, creating a threat model and finding important security issues. This takes an expert in the field at least a few days of work. This is much longer than the time periods that AIs can be ... (read more)