DarioAmodei — LessWrong

Learning from Human Preferences - from OpenAI (including Christiano, Amodei & Legg)

Re: title, probably worth pointing out that DeepMind was also involved in this paper.

Learning from Human Preferences - from OpenAI (including Christiano, Amodei & Legg)

I have a hunch that semi-neat approaches to AI may come back as a layer on top of neural nets -- consider the work on using neural net heuristics to decide the next step in theorem-proving (https://arxiv.org/abs/1606.04442). In such a system, the decision process is opaque, but the result is fully verifiable, at least in the world of math (in a powerful system the theorems may be being proved for ultimate use in some fuzzy interface with reality). The extent to which future systems might look like this, or what that means for safety, isn't very clear yet (at least not to me), but it's another paradigm to consider.

Learning from Human Preferences - from OpenAI (including Christiano, Amodei & Legg)

DarioAmodei8y30

To clarify and elaborate a bit on Paul’s point, our explicit methodology was to take a typical reinforcement learning system, with standard architecture and hyperparameter choices, and add in feedback mostly without changing the hyperparameters/architecture. There were a couple exceptions — an agent that’s learning a reward function needs more incentive to explore than an agent with a fixed reward function, so we had to increase the exploration bonus, and also there are a few parameters specific to the reward predictor itself that we had to choose. However, we did our best to show the consequences of changing some of those parameters (that’s the ablation analysis section).

To put it another way, our method was to take the existing black magic and show that we could build in something that does what we want (in this admittedly very limited case) without much further black magic or additional complication. As a general matter I do think it is desirable (including for safety reasons) to simplify the design of systems, but as Paul says, it’s not necessarily essential. In my view one promising route for simplification is turning fixed hyperparameters into adaptive ones that are responsive to data — consider the optimization method Adam or batch normalization.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments