VojtaKovarik

Apply now to Human-Aligned AI Summer School 2026

by Anna Gajdova, Tomáš Gavenčiak, VojtaKovarik, and Jan_Kulveit

TL;DR: Apply now for the 6th Human-Aligned AI Summer School happening in Prague, July 13–16. What The sixth Human-Aligned AI Summer School, a four-day intensive summer school focused on technical AI alignment research, will take place in Prague from 13th to 16th July. We will meet again for lectures, workshops,...

May 2013

Evaluation as a (Cooperation-Enabling?) Tool

Key points: 0. Advertisement: We have an IMO-nice position paper which argues that AI Testing Should Account for Sophisticated Strategic Behaviour, and that we should think about evaluation (also) through game-theoretic lens. (See this footnote for an example: [1].) Tell your friends! On framing evaluation as one of many tools:...

Dec 10, 202518

Irresponsible Companies Can Be Made of Responsible Employees

tl;dr: 1. In terms of financial interests of an AI company, bankruptcy and the world ending are both equally bad. If a company acted in line with its financial interests[1], it would happily accept significant extinction risk for increased revenue. 2. There are plausible mechanisms which would allow a company...

Oct 8, 202580

Apply now to Human-Aligned AI Summer School 2025

Join us at the fifth Human-Aligned AI Summer School in Prague from 22nd to 25th July 2025! Update: We have now confirmed our speaker list with excellent speakers -- see below! Update: We still have capacity for more excellent participants as of late June. Please help us spread the word...

Jun 6, 202528

When is "unfalsifiable implies false" incorrect?

I am looking for examples of theories that we now know to be correct, but that would have have been unfalsifiable in a slightly different context --- e.g., in the past, or in hypothetical scenarios. (Unsurprisingly, this is motivated by the unfalsifiability of some claims around AI X-risk. For more...

Jun 15, 20243

What is the purpose and application of AI Debate?

I think there is an important lack of clarity and shared understanding regarding how people intend to use AI-Safety-via-Debate-style approaches. So I think it would be helpful if there were some people --- who either (i) work on Debate or (ii) believe that Debate is promising --- who could give...

Apr 4, 202413

Extinction Risks from AI: Invisible to Science?

Abstract: In an effort to inform the discussion surrounding existential risks from AI, we formulate Extinction-level Goodhart’s Law as "Virtually any goal specification, pursued to the extreme, will result in the extinction[1] of humanity'', and we aim to understand which formal models are suitable for investigating this hypothesis. Note that...

Feb 21, 202424

VojtaKovarik

VojtaKovarik

Irresponsible Companies Can Be Made of Responsible Employees

AI Safety Debate and Its Applications

Recursive Middle Manager Hell: AI Edition

New paper: (When) is Truth-telling Favored in AI debate?

VojtaKovarik

Irresponsible Companies Can Be Made of Responsible Employees

AI Safety Debate and Its Applications

Recursive Middle Manager Hell: AI Edition

New paper: (When) is Truth-telling Favored in AI debate?

Apply now to Human-Aligned AI Summer School 2026

Evaluation as a (Cooperation-Enabling?) Tool

Irresponsible Companies Can Be Made of Responsible Employees

Apply now to Human-Aligned AI Summer School 2025

When is "unfalsifiable implies false" incorrect?

What is the purpose and application of AI Debate?

Extinction Risks from AI: Invisible to Science?