Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.
This is a linkpost for https://arxiv.org/abs/2301.00723

A preprint is published by Devdhar Patel, Joshua Russell, Francesca Walsh, Tauhidur Rahman, Terrance Sejnowski, and Hava Siegelmann in December 2022.

Abstract:

We present temporally layered architecture (TLA), a biologically inspired system for temporally adaptive distributed control. TLA layers a fast and a slow controller together to achieve temporal abstraction that allows each layer to focus on a different time-scale. Our design is biologically inspired and draws on the architecture of the human brain which executes actions at different timescales depending on the environment's demands. Such distributed control design is widespread across biological systems because it increases survivability and accuracy in certain and uncertain environments. We demonstrate that TLA can provide many advantages over existing approaches, including persistent exploration, adaptive control, explainable temporal behavior, compute efficiency and distributed control. We present two different algorithms for training TLA: (a) Closed-loop control, where the fast controller is trained over a pre-trained slow controller, allowing better exploration for the fast controller and closed-loop control where the fast controller decides whether to "act-or-not" at each timestep; and (b) Partially open loop control, where the slow controller is trained over a pre-trained fast controller, allowing for open loop-control where the slow controller picks a temporally extended action or defers the next n-actions to the fast controller. We evaluated our method on a suite of continuous control tasks and demonstrate the advantages of TLA over several strong baselines.

Conclusion:

In this work, we presented Temporally Layered Architecture (TLA), a framework for distributed, adaptive response time in reinforcement learning. The framework allows the RL agent to achieve smooth control in a real-time setting using a slow controller while a fast controller monitors and intervenes as required. Additionally, we demonstrated an alternative setting where the slow controller can gate the fast controller, activating it only when required for efficient control. We demonstrate faster convergence and more action repetition in the closed-loop approach and fewer decision and faster convergence in the partially-open loop approach. Additionally, we demonstrate in a real time setting, where processing and actuation delays are taken into account, and show that our approach outperforms the current approaches in the delayed setting while picking fewer actions. Our work demonstrates that a temporally adaptive approach has similar benefits for AI as has been demonstrated in biology and is an important direction for future research in artificially intelligent control.

New Comment
4 comments, sorted by Click to highlight new comments since: Today at 5:24 AM

Old is new again. Figure 1(a) is cascade control (the higher level operates by means of the lower) and 1(b) is the subsumption architecture (the higher level operates instead of the lower).

I don't know why they call 1(b) "open loop". The loop looks closed to me.

I wonder if the authors have considered using arbitrarily many layers of control.

Did anything about this paper stand out to you? It doesn't strike me as anything revolutionary on its own. Interesting component, perhaps. Does it change your expectations about what safety approaches work? Is it mainly capabilities news?

It certainly is an interesting component of a research tree that will be key to making anything seriously scale, though.

No, just a piece of the puzzle of a more salient understanding of AI self-control that I want to outline, which should integrate ML, cognitive science, theory of consciousness, control theory/resilience theory, and dynamical systems theory/stability theory.

Only this sort of understanding could make the discussion of oracle AI vs. agent AI agendas really substantiated, IMO.