Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Ought has written a detailed update and analysis of recent experiments on factored cognition. These are experiments with human participants and don’t involve any machine learning. The goal is to learn about the viability of IDA, Debate, and related approaches to AI alignment. For background, here are some prior LW posts on Ought: Ought: Why it Matters and How to Help, Factored Cognition presentation.

Here is the opening of the research update:

Evaluating Arguments One Step at a Time
We’re studying factored cognition: under what conditions can a group of people accomplish complex cognitive tasks if each person only has minimal context?
In a recent experiment, we focused on dividing up the task of evaluating arguments. We created short, structured arguments for claims about movie reviews. We then tried to distinguish valid from invalid arguments by showing each participant only one step of the argument, not the review or the other steps.
In this experiment, we found that:
1. Factored evaluation of arguments can distinguish some valid from invalid arguments by identifying implausible steps in arguments for false claims.
2. However, experiment participants disagreed a lot about whether steps were valid or invalid. This method is therefore brittle in its current form, even for arguments which only have 1–5 steps.
3. More diverse argument and evidence types (besides direct quotes from the text), larger trees, and different participant guidelines should improve results.
In this technical progress update, we describe these findings in depth.

The rest of the post is here.

New Comment

New to LessWrong?