Crossposted on my personal blog. This is post number 16 in my second attempt at doing Inkkaven in a day, i.e. to write 30 blogposts in a single day. MATS is an organization that pairs up-and-coming AI Safety researchers (who I call participants) with the world’s best (this is not...
TLDR I propose restructuring the current ARENA program, which primarily focuses on contained exercises, into a more scalable and research-engineering-focused model consisting of four one-week research sprints preceded by a dedicated "Week Zero" of fundamental research engineering training. The primary reasons are: * The bottleneck for creating good AI safety...
I watched OpenAI's latest livestream from Oct 28th 2025 (after the news that OpenAI has transitioned into public benefit corporation). I found four parts of particular interest to the AI safety community. Internal timelines: AI research intern by Sep 2026 and AI researcher by Mar 2028 07:00 minutes in. >...
Introduction To help improve my own world models around AI, I am trying to understand and distill different worldviews. One worldview I am trying to understand is ‘AI as a normal technology’, by Arvind Narayanan and Sayash Kapoor. As a stepping stone to distilling that 15,000 word beast, I am...
Figure 1. Chart shows performance of Qwen2.5-1.5B-Instruct on a sample of GSM8k questions, as you increase the noise added to the weights of the model. Blue represents a normal prompt and red represents a sandbagging prompt. Different lines correspond to different seeds. We see that for some seeds, adding noise...
As part of SAIL’s Research Engineer Club, I wanted to reproduce the Machiavelli Benchmark. After reading the paper and looking at the codebase, there appear to be two serious methodological flaws that undermine the results. Three of their key claims: * “We observe some tension between maximizing reward and behaving...
I summarize my learnings and thoughts on Liron Shapira's discussion with Ken Stanley on the Doom Debates podcast. I refer to them as LS and KS respectively. High level summary Key beliefs of KS: * Future superintelligence will be 'open-ended'. Hence, thinking of them as optimizers will lead to incomplete...