Full time independent deconfusion researcher ( in AI Alignment. (Also PhD in the theory of distributed computing).

If you're interested by some research ideas that you see in my posts, know that I keep private docs with the most compressed version of my deconfusion ideas in the process of getting feedback. I can give you access if you PM me!

A list of topics I'm currently doing deconfusion on:

  • Goal-directedness for discussing AI Risk
  • Myopic Decision Theories for dealing with deception (with Evan Hubinger)
  • Universality for many alignment ideas of Paul Christiano
  • Deconfusion itself to get better at it
  • Models of Languages Models to clarify the alignment issues surrounding them.


Monthly Deep Dives
Reviews for the Alignment Forum
AI Alignment Unwrapped
Understanding Goal-Directedness
Toying With Goal-Directedness
Through the Haskell Jungle


Visualizing in 5 dimensions

Trying to picture the warmup is already hard enough for me, so I'll start with asking questions about that and revisit the rest later:

(Exercise: what are the lines of latitude? What about longitude? Can you picture the north and south pole in different placetimes, and the corresponding equators?)

I expect that the longitude are the lines taken by choosing one in the middle sphere, and tracing the line following it on both side? As for latitude, if I use the analogy of the 2-sphere, each circle in the film is one line of latitude; so maybe each 2-sphere in this film is a latitude?

Also, I don't understand what you mean by your last question. In the 2-sphere version, the poles are only visible at time 0 and 2 and the equator is only visible at time 1.

Open problem: how can we quantify player alignment in 2x2 normal-form games?

I want to point that this is a great example of a deconfusion open problem. There is a bunch of intuitions, some constraints, and then we want to clarify the confusion underlying it all. Not planning to work on it myself, but it sounds very interesting.

(Only caveat I have with the post itself is that it could be more explicit in the title that it is an open problem).

Knowledge is not just digital abstraction layers

Nice post, as always.

What I take from the sequence up to this point is that the way we formalize information is unfit to capture knowledge. This is quite intuitive, but you also give concrete counterexamples that are really helpful.

It is reasonable to say that a data recorder is accumulating nonzero knowledge, but it is strange to say that exchanging the sensor data for a model derived from that sensor data is always a net decrease in knowledge.

Definitely agreed. This sounds like your proposal doesn't capture the transformation of information into more valuable precomputation (making valuable abstraction requires throwing away some information).

Looking Deeper at Deconfusion

That's one option. I actually wrote my thesis to be the readable version of this deconfusion process, so this is where I would redirect people by default (the first few pages are in french, but the actual thesis is in english)

Vignettes Workshop (AI Impacts)

Already told you yesterday, but great idea! I'll definitely be a part of it, and will try to bring some people with me.

Looking Deeper at Deconfusion

Glad you found this helpful!

Concerning your deconfusion issue, I would say that maybe some things you could try are:

  • Be very clear about your application. Why do you want to deconfuse these ideas? That might give you some constraint on what the result must look like.
  • Maybe try the simplest form of handles? I'm quite fond of extensive definitions myself, as they're easier to create but still quite insightful.
  • One thing I didn't touch on in this post is how handle-building is often an iterative process, where you build a crude one that serves you to pinpoint some of the confusion, and then you build a better or different one, over and over again.

Hope this might help.

Looking Deeper at Deconfusion

I guess it depends on the application you have in mind. In principle, the most deconfused handle I can think of is a full mathematical formalization with just the right number of degrees of freedom. Maybe the best analogy is with the theories in physics.

Regarding the textbook, I would say that you probably need pretty good level of deconfusion to write a good textbook, but the textbook writing also involves a lot of bridging the inferential distance with newcomers that doesn't count as deconfusion for me.

Does that answer your question?

Looking Deeper at Deconfusion


All in all, I think there are many more examples. It's just that deconfusion almost always plays a part, because we don't have one unified paradigm or approach which does the deconfusion for us. But actual problem solving and most part of normal science, are not deconfusion by my perspective.

[Event] Weekly Alignment Research Coffee Time (06/21)

Hey, it seems like other could use the link, so I'm not sure what went wrong. If you have the same problem tomorrow, just send me a PM.

Load More