Understanding Trust: Project Update

abramdemski

This is a brief note on what I did with my funding in 2025, and my plans for 2026, written primarily because Manifund nudged me for an update on my project.

I ran my AISC project (which I announced here) with four mentees in Spring 2025: Norman Hsia, Hanna Gabor, Paul Rapoport, and Roman Malov. A few other people attended the weekly meetings as well, and those regular meetings have continued (they are joinable -- pm me if interested). Norman and Paul ended up as coauthors of my ILIAD 2024 paper Understanding Trust, which had been drafted in 2024, so served as both an input and an output of the AISC project.

I recorded most of the meetings involved in the project, as one of the hopeful outputs was publicly posted videos explaining the research agenda. I've proven to be bad at this side of things: I don't like listening to myself talk, so I found it difficult to edit or even to review edits done by others. I'm finally uploading the videos with minimal AI-orchestrated edits. Playlist here. At the time of publication, there's only two, but more coming very soon. If you are OK with the almost-unedited presentation style, it should be a good resource to get a very in-depth view on my thinking about AI safety and decision theory; a thorough snapshot of my thinking as of spring 2025.

In 2025, I obtained funding for 2025 as well as 2026. (My total financial runway is longer than this, but 2025 and 2026 have been funded by grants/donations which compensated me for my continued research at specific price points.) I'm opening up my Manifund project for funding for 2027, for those who feel so inclined.

In addition to publishing the ILIAD 2024 paper, I also published an ILIAD 2025 paper: Communication & Trust. I consider it to be an incremental improvement: the ILIAD 2024 treated self-modifying actions as a distinct class with known effects which work with certainty. The ILIAD 2025 paper treated all actions as having some subjective chance of disrupting the agent's computation.

I also attended Inkhaven, where I wrote a post for every day in November. This was a big success for me: I was able to write about many things which I had been wanting to write about for some time (perhaps in rougher form than if I had eventually gotten around to them via my normal writing process). It was also exhausting. Here's my retrospective, with the caveat that I wrote it on the very last day, when I was perhaps the most sick of writing.

One of the posts describes my research arc over 2025, and the hopes I have moving forward. This is still a good summary of where I'd like to take my research in 2026. I have hope that we're understanding concepts and abstraction better, so that we might soon be able to characterize important concepts like agency, alignment, corrigibility, etc in a formalism which deals natively with ontological shifts. Most of my hope is due to Sam Eisenstat's Condensation: a theory of concepts, which I wrote a detailed review of during Inkhaven.

As for my more ML-flavored research ideas, I finally wrote about that stuff last week. I've already found someone interested in trying some experiments based on those ideas. We'll see how that goes.

I'm also mentoring with MATS this summer. You can still apply to my MATS track today or tomorrow as I write this; applications are due January 18th.

LESSWRONG
LW

LESSWRONG
LW

62

Understanding Trust: Project Update

62

62

62