Online Reinforcement Learning Control of an Electromagnetic Manipulator

More Info
expand_more

Abstract

Machine Learning Control is a control paradigm that applies Artificial Intelligence methods to control problems. Within this domain, the field of Reinforcement Learning (RL) is particularly promising, since it provides a framework in which a control policy does not have to be programmed explicitly, but can be learned by an intelligent controller directly from real-world data, allowing to control systems that are either arduous or even impossible to model analytically. However, in spite of such considerable potential, the RL paradigm poses a number of challenges that effectively hinder its applications in the real-world and in industry. It is therefore critical that research in this field is advanced until RL-based controllers can be practically demonstrated to be real-world feasible and reliable. This thesis report presents the attempts made at applying control strategies based on Reinforcement Learning to solve a precise positioning task with a physical experimental setup. The setup at hand is a magnetic manipulator (magman) characterized by a high degree of nonlinearity. The controller uses the spatially continuous magnetic field generated by four actuators to displace a steel ball, constrained to move in one dimension, towards a reference position. Two different implementations of the Q-learning algorithm (Sutton, Barto, et al., 1998) were deployed. In spite of the good results obtained in a simplified simulated environment, both implementations failed on the experimental setup. The negative outcome of these experiments is mainly due to the fact that, since the task at hand is an accurate positioning task, the reward obtained by the learner while interacting with the environment is too sparse for it to be able to learn a stabilizing control policy. Other factors have presumably contributed to the controllers’ failure, such as the circumstance that the agent does not have access to the full system state information and a sub-optimal tuning of the algorithms’ hyper-parameters. Besides model-free RL, the Value Iteration model-based method was successfully applied both in simulations and with the experimental setup. The present findings suggest that, in order to solve the magman task with model-free RL, more sophisticated algorithms need to be deployed, such as for example an agent that can naturally deal with continuous state and action spaces, as the DDPG algorithm (Lillicrap et al., 2015), with exploration carried out in the parameter-space rather than in the control action space (Plappert et al., 2017), in addition to a more optimal exploitation of the information extracted from the environment, for example using Hindsight Experience Replay (Andrychowicz et al., 2017).