Tomáš Gavenčiak

The Human Substitution Test as a Sanity Check for AI Evaluations

by VojtaKovarik, Tomáš Gavenčiak, and Mateusz Bagiński

TL;DR: We suggest a sanity check for proposed evaluation or AI oversight schemes: Imagine the AI was replaced by a competent, strategic human — someone who knows they might get evaluated and has their own agenda. Would the evaluation still work? When we apply this mental move broadly, to all...

Jul 1033

Entanglement Between an AI and Its Environment

by VojtaKovarik, Tomáš Gavenčiak, and Mateusz Bagiński

TL;DR: * We introduce a concept we call "entanglement": roughly, the amount of information that the AI has about its environment. * We distinguish between "actual" entanglement between a specific instance of an AI and its environment and "minimum" entanglement corresponding to some task, without which solving the task is...

Jul 727

Deployment Awareness Matters More Than Evaluation Awareness

by VojtaKovarik, Tomáš Gavenčiak, and Mateusz Bagiński

TL;DR Evaluation awareness — an AI recognizing it's being evaluated — is a widely discussed concept in AI safety. But there is a closely related concept that we claim is more important: deployment awareness, the AI's ability to recognize when it is not being evaluated and when its actions matter....

Jun 2646

If This Were a Test, How Much Would It Cost?

by VojtaKovarik and Tomáš Gavenčiak

TL;DR A capable, strategic, misaligned AI doesn't need to figure out whether it's in a test or in real deployment. It just needs to ask: "If this were a test, how much would it have cost to create?" If the answer is "more than what the evaluator is able or...

Jun 1634

Apply now to Human-Aligned AI Summer School 2026

by Anna Gajdova, Tomáš Gavenčiak, VojtaKovarik, and Jan_Kulveit

TL;DR: Apply now for the 6th Human-Aligned AI Summer School happening in Prague, July 13–16. What The sixth Human-Aligned AI Summer School, a four-day intensive summer school focused on technical AI alignment research, will take place in Prague from 13th to 16th July. We will meet again for lectures, workshops,...

May 2017

Shallow review of technical AI safety, 2025

by technicalities, Tomáš Gavenčiak, Stephen McAleese, peligrietzer, Stag, jordinne, ozziegooen, Violet Hour, and lenz

WebsiteEditorialRepo Change in 18 latent capabilities between GPT-3 and o1, from Zhou et al (2025) This is the third annual review of what’s going on in technical AI safety. You could stop reading here and instead explore the data on the shallow review website. It’s shallow in the sense that...

Dec 17, 2025195

Sample Interesting First

When looking for new project that is both interesting and tractable, aim for interesting ideas first and only then filter them for tractability - not the other way around! Here interesting is relative to whatever ambitious goal or metric you have, and tractable can mean realistic, practical, affordable or something...

Oct 18, 20258

Tomáš Gavenčiak

Tomáš Gavenčiak

Shallow review of technical AI safety, 2025

InterLab – a toolkit for experiments with multi-agent interactions

Announcing Human-aligned AI Summer School

Deployment Awareness Matters More Than Evaluation Awareness

Tomáš Gavenčiak

Shallow review of technical AI safety, 2025

InterLab – a toolkit for experiments with multi-agent interactions

Announcing Human-aligned AI Summer School

Deployment Awareness Matters More Than Evaluation Awareness

The Human Substitution Test as a Sanity Check for AI Evaluations

Entanglement Between an AI and Its Environment

Deployment Awareness Matters More Than Evaluation Awareness

If This Were a Test, How Much Would It Cost?

Apply now to Human-Aligned AI Summer School 2026

Shallow review of technical AI safety, 2025

Sample Interesting First