Martin Vlach — LessWrong

LESSWRONG
LW

Are we on the same board with current evidence of DS_Math_v1 was 99% honest and immediately useful drop while v2 we are only around 90% sure on this?

Martin Vlach's Shortform

Martin Vlach2d10

Talking to chatbots with curiosity, ›What they do here?‹ is an ablation for evaluations/benchmarking.

The Memetics of AI Successionism

Martin Vlach6d10

Ooch, there are 5 sources of tension, you've named just the first one and I'd bet the some of the 5 covers more than a minority of our population.

Sample Interesting First

Martin Vlach10d10

Draft: A concise theory of agentic consciousness

Martin Vlach10d10

did you refer to
> dialing our sense of threat
or as a prominent emotion that does not fit the pattern described?

In the second case I might adjust with a bit more clarity, I did not perceive it as a "typical emotion".

Legible vs. Illegible AI Safety Problems

Martin Vlach18d10

https://philarchive.org/rec/KURTTA-2

Wow, that's comprehensive(≈long).

Anthropic & Dario’s dream

Martin Vlach18d10

It's simply not enough to develop AI gradually, perform evaluations and do interpretability work to build safe superintelligence.

but to develop AI gradually, perform evaluations and do interpretability to indicate whenever to stop developing( capabilities) seem sensibly safe.

Legible vs. Illegible AI Safety Problems

Martin Vlach18d10

Pretty brilliant and IMHO correct observations for counter-arguments, appreciated!

Why Future AIs will Require New Alignment Methods

Martin Vlach1mo20

Task duration for software engineering tasks that AIs can complete with 50% success rate (50% time horizon)

paragraph seems duplicated.

medical research doing so in concerning domains

"instead of" is missing..?

Martin Vlach's Shortform

Martin Vlach2mo10

My friends(M.K.,he's on Github) honorable aim to establish a term in the AI evals field: The cognitive asymetry, generating-verifying complexity gap for model-as-judge evals.

Various tasks that have a clear intelligence-to-solve vs. intelligence-to-verify-a-solution gap, ie. only X00-B LMs have a shot, but X-B model is strong on verifying are desired.
It fits nicely to the incremental iterative alighnment scaling playbook, I hope.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments