Machine learning can be effectively applied in control loops to robustly make optimal control decisions. There is increasing interest in using spiking neural networks (SNNs) as the apparatus for machine learning in control engineering, because SNNs can potentially offer high ener
...
Machine learning can be effectively applied in control loops to robustly make optimal control decisions. There is increasing interest in using spiking neural networks (SNNs) as the apparatus for machine learning in control engineering, because SNNs can potentially offer high energy efficiency and new SNN-enabling neuromorphic hardwares are being rapidly developed. A defining character of control problems is that environmental reactions and delayed rewards must be considered. While reinforcement learning (RL) provides the fundamental mechanisms to address such problems, realizing these mechanisms in SNN learning has been underexplored. Previously, schemes of spike timing dependent plasticity (STDP) learning modulated by factors of temporal difference (TD-STDP) or reward (R-STDP) have been proposed for RL with SNN. Here we designed and implemented an SNN controller to explore and compare these two schemes by considering Cart-Pole balancing as a representative example. While the TD-based learning rules are very general ones, the resulted model exhibits rather slow convergence, producing noisy and imperfect results even after prolonged training. We show that by integrating the understanding of the dynamics of the environment into the reward function of R-STDP, a robust SNN-based controller can be learnt much more efficiently than by TD-STDP. The work of this master thesis project has also been published as a paper in Electronics, Vol. 12.