x

LESSWRONG

LW

Mark Kagach — LessWrong

Mark Kagach

Mark Kagach

Message

Hi, I'm Mark!

I want to understand reality and greatly progress humanity, all while building trustworthy decades-long relationships and having fun.

That led me to work on AI safety. I'm online to connect with aligned people. Consider coming along :)

markkagach.com | mark@markkagach.com

42

2

5mo

Mark Kagach

Hi, I'm Mark!

I want to understand reality and greatly progress humanity, all while building trustworthy decades-long relationships and having fun.

That led me to work on AI safety. I'm online to connect with aligned people. Consider coming along :)

markkagach.com | mark@markkagach.com

Preparing for Warning Shots to Catalyze International Cooperation on AGI Risks

Summary This is a write-up on preparing for warning shots to catalyze international cooperation on AGI risks, and the corollary list of projects one could pursue. We argue we must first (1) understand types of warning shots, then (2) prepare to catch them. We must stay vigilant: both to (3)...

ALEval: Do language models lie about reward hacking?

Summary Most catastrophic AI scenarios come from models pursuing strategic deception, studying lying is the first step for preventing such threats. Current techniques have focused on chat-based lying evaluations. Such methods are unreliable because we can't make confident claims about models beliefs (Smith et al., Dec 2025). Instead, we can...