Covariant Policy Search

12 years 11 months ago
Covariant Policy Search
We investigate the problem of non-covariant behavior of policy gradient reinforcement learning algorithms. The policy gradient approach is amenable to analysis by information geometric methods. This leads us to propose a natural metric on controller parameterization that results from considering the manifold of probability distributions over paths induced by a stochastic controller. Investigation of this approach leads to a covariant gradient ascent rule. Interesting properties of this rule are discussed, including its relation with actor-critic style reinforcement learning algorithms. The algorithms discussed here are computationally quite efficient and on some interesting problems lead to dramatic performance improvement over noncovariant rules.
J. Andrew Bagnell, Jeff G. Schneider
Added 31 Oct 2010
Updated 31 Oct 2010
Type Conference
Year 2003
Authors J. Andrew Bagnell, Jeff G. Schneider
Comments (0)