Do More Elaborate Search Strategies Lead to Better Neural Architecture Search Performance?

More Info
expand_more

Abstract

Computer vision tasks, like supervised image classification, are effectively tackled by convolutional neural networks, provided that the architecture, which defines the structure of the network, is set correctly. Neural Architecture Search (NAS) is a relatively young and increasingly popular field that is concerned with automatically optimizing the architecture of neural networks. Previously known work shows that even though a recent trend has been to develop increasingly complex search strategies for NAS, several search strategies do not significantly outperform simple approaches like randomly sampling from the search space on single-objective NAS tasks. Additionally, proper ablation studies are often missing. Therefore, it is currently uncertain at best which mechanisms are key for an algorithm to have to achieve excellent NAS performance. In the first part of this thesis, Local Search (LS) and a differently biased form of random search, are proposed for multi-objective (MO) NAS. The multi-objective version of NAS is studied less and understanding the trade-off between between multiple objectives for architectures is arguably more interesting. We find that very simple algorithms can achieve search performance close to that of state-of-the-art evolutionary algorithms (EAs), while outperforming plain random search. Additionally, we find that the quality of the set of architectures found by LS is similar to the those found by the EAs, if compared with respect to test accuracy. Nevertheless, from the compared search strategies the Multi-Objective Gene-pool Optimal Mixing Evolutionary Algorithm (MO-GOMEA), a state-of-the-art model-based EA, achieves the best performance. In the second part of this thesis, it is explored which mechanisms are essential for MO-GOMEA to achieve an excellent search performance for NAS spaces. We find that the automatic population-sizing scheme of MO-GOMEA offers a welcome anytime-performance, but objective space clustering has only a small beneficial impact. The number of clusters can be set arbitrarily. Special (extreme) clusters that optimize for one objective only can be enabled to the practitioner’s preference, resulting in different search behaviors. The improvement in performance gained by automatically detecting and exploiting dependencies within architectures is limited: this model-based aspect of MO-GOMEA seems only helpful for finding highly accurate networks.