I want to map out all the open problems in Alignment; understand the key topics, how they relate and how the parts form a greater whole. I believe a small group of people who can notice when they're confused, can make substantial conceptual progress on many of the open problems. Besides, it sounds like a lot of fun.

To do this, I will have a 1-hr open meeting every week working on this. If you're reading this, you are free to join.

But wait, what would this look like?

Once a week, time TBD, we would call and go through a few questions, 5-minute timer style. Then discuss. For example:

Week 1:

1. Map out all of alignment from a first principles perspective. [Your solution should feel like it reasonably solves alignment. It can contain many "black boxes" that do the hard work, so spend extra time on opening up the black boxes and exploring them.] (30 minutes, then discuss)

2. Design images that explain the same information from 1 above. [You can use whatever tool you want, I will use draw.io] (10 minutes, then discuss/ post images in Discord)

Week 2: Vote on a specific topic to zoom in on and explore from first principles

Week 3: Literature Review of topic from Week 2

Week 4 on: TBD

[Specific questions and phrasings will change weekly based on feedback]

What if I won't be helpful?

Discussion will be on a volunteer basis. I expect there will even be too much talking (or not enough time to talk). You will not be expected or called out to tell everyone your answer. You can even write up your thoughts in the Discord throughout the week if that's how you'd like to contribute.

One hour/week won't amount to much

As mentioned, there is a Discord server attached. There will always be open problems to explore and work on the week between meetings. If you want to join, you will NOT be expected to work on anything in-between meetings, but I will (I only recommend you do that if you find the topic fun or interesting).

I would love to have a community of people focused on a specific topic, discussing and giving feedback. To me, a Discord server is useful for creative work since I have a lower standard for anything I write (as opposed to writing this post).

What if you waste time on dead ends?

I will post weekly on what we figured out in the last meeting, as well as post the questions for the next week. I am counting on the LW community to provide feedback, to hopefully point out faulty assumptions and dead-end research directions.


If you're interested, pm me here on LW or email me at logansmith5(at)gmail. You will get a WhenToMeet to pick a time and a Discord server link. We will start the week of Aug 9th-15th, depending on the decided time.


New Comment
7 comments, sorted by Click to highlight new comments since: Today at 9:55 AM

Excellent initiative. I'm interested and will PM you. I am working on several similar projects, one of them being described in this post.

This seems great. Would be ok to join one or two times to see if your group and method, and what ever topic you decide to zoom into, is a good fit for me?

I've started to collect various AI Safety initiative s here, so that we are aware of each other and hopefully can support each other. Let me know if you want to be listed there too.

Also people who are interested in joining elriggs group, might also be interested in the AI Safety discussion days, men and JJ are organising. Same topic, different format.

FLI have done a map of all AI Safety research (or all that they could find at the time). Would this be a useful recourse for you? I'm not linking it directly, becasue you might want to think for yourself first,before becoming to biased by others ideas. But it seems that at least it would be a useful tool at the literature review stage.

Thanks for reaching out. I've sent you the links in a DM.

I would like to be listed in the list of various AI Safety initiatives.

I'm looking forward to this month's AI Safety discussion day (I saw yours and Vanessa's post about it in Diffractor's Discord).

I'll start reading other's maps of Alignment in a couple days, so I would appreciate the link from FLI; thank you. Gyrodiot's post has several links related to "mapping AI", including one from FLI (Benefits and Risks of AI), but it seems like a different link than the one you meant.

The FLI map probably refers to The Landscape of AI Safety and Beneficence Research, also in my list but credited to its main author, Richard Mallah.

Interesting, but there are multiple schemes worth thinking about that I have a hard time fitting into the same top-down design. There are disjunctions like "either the AI should take good actions or it should find arguments that cause humans to take good actions" that prompt different ideas on different sides of the disjunction. On the other hand, maybe you plan to manage and catalogue all the disjunctions rather than splitting different branches into isolated discussions? Potentially doable, but also potentially leads to pandemonium.

It's not clear in the OP, but I'm planning on a depth-first search as opposed to breadth. Week 2-XX will focus on a singular topic (like turntrout's impact measures or johnswentworth's abstractions).

I am looking forward to disjunctive maps though!

New to LessWrong?