Neural networks (NNs) have, in recent years, become a major part of modern pattern recognition, and both theoretical and applied research evolve at an astounding pace. NNs are usually trained via gradient descent (GD), but research has shown that GD is not always capable of train
...
Neural networks (NNs) have, in recent years, become a major part of modern pattern recognition, and both theoretical and applied research evolve at an astounding pace. NNs are usually trained via gradient descent (GD), but research has shown that GD is not always capable of training very small networks. As a result, networks trained via GD are often significantly larger than necessary, demanding more computing power and energy to evaluate, and thereby hampering adoption on lower- power devices. This thesis investigates whether evolutionary algorithms (EAs) can successfully train NNs and if these networks can be smaller than those required by GD. Four algorithms, namely the GD-based Adam, Adam with cold restarts, and the EAs GOMEA and BIPOP-CMA-ES, were used to train various configurations of multilayer perceptrons (MLPs) with one hidden layer for the Exclusive-OR (XOR) problem. Their relative performance was gauged by comparing the rates at which each algorithm attained an acceptable loss for a given number of hidden nodes. The main findings are that EAs could find smaller ReLU-activated networks than GD could, while, oppositely, the GD could generally find smaller networks for sigmoid- activated networks. However, GOMEA in particular was able to successfully train XOR networks using the highly discrete Heaviside activation function, whereas GD could not due gradient erasure. Problem-specific knowledge in the form of a soft symmetry breaking constraint was found to be effective to increase success rates in a limited number of cases. These findings indicate that using EAs is a viable strategy for training NNs, yielding smaller, and therefore more efficient networks than those trained via GD, provided that the network topology is suitable for the chosen EA. This opens up future research into, including but not limited to, the many ways EAs can be scaled up to training larger NNs, hybridization with GD, and niche network topologies.