LESSWRONG
LW

AI
Frontpage

14

Improving Mathematical Reasoning with-Process Supervision

by p.b.
31st May 2023
1 min read
3

14

This is a linkpost for https://openai.com/research/improving-mathematical-reasoning-with-process-supervision
AI
Frontpage

14

Improving Mathematical Reasoning with-Process Supervision
5harfe
4Vladimir_Nesov
2Charlie Steiner
New Comment
3 comments, sorted by
top scoring
Click to highlight new comments since: Today at 12:28 AM
[-]harfe2y51

From just reading your excerpt (and not the whole paper), it is hard to determine how much alignment washing is going on here.

  • what is aligned chain-of-thought? What would unaligned chain-of-thought look like?
  • what exactly means alignment in the context of solving math problems?

But maybe these worries can be answered from reading the full paper...

Reply
[-]Vladimir_Nesov2y43

Our results below show that process supervision in fact incurs a negative alignment tax

Some compelling arguments are given that alignment tax would be negative when this method is used to improve safety. The actual experimental results are about improving/eliciting capabilities and don't explore application of the method for safety, except by drawing an analogy.

Reply
[-]Charlie Steiner2y2-2

I am shocked that higher quality training data based on more effortful human feedback produced a better result.

Reply
Moderation Log
Curated and popular this week
3Comments

Alignment impact

Process supervision has several alignment advantages over outcome supervision. It directly rewards the model for following an aligned chain-of-thought, since each step in the process receives precise supervision. Process supervision is also more likely to produce interpretable reasoning, since it encourages the model to follow a human-approved process. In contrast, outcome supervision may reward an unaligned process, and it is generally harder to scrutinize.

In some cases, safer methods for AI systems can lead to reduced performance3, a cost which is known as an alignment tax. In general, any alignment tax may hinder the adoption of alignment methods, due to pressure to deploy the most capable model. Our results below show that process supervision in fact incurs a negative alignment tax, at least in the math domain. This could increase the adoption of process supervision, which we believe would have positive alignment side-effects.

 

Paper here.

Related Prediction Market.