Efficient Neural Architecture Search for Language Modeling

More Info
expand_more

Abstract

Neural networks have achieved great success in many difficult learning tasks like image classification, speech recognition and natural language processing. However, neural architectures are hard to design, which requires lots of knowledge and time of human experts. Therefore, there has been a growing interest in automating the process of designing neural architectures. Though these searched architectures have achieved competitive performance on various tasks, the efficiency of NAS still needs to be improved. Moreover, current neural architecture search approach disregards the dependency between a node and its predecessors and successors.
This thesis builds upon BayesNAS which employs the classic Bayesian learning method to search for CNN architectures, and extends it to the problem of neural architecture search for recurrent architectures. Hierarchical sparse priors are used to model the architecture parameters to alleviate the dependency issue. Since the update of posterior variance is based on Laplace approximation, an efficient method to compute the Hessian of recurrent layer is proposed. We can find candidated architectures after training the over-parameterized network for only one epoch. Our experiments on Penn Treebank and WikiText-2 show that competitive architectures can be found in 0.3 GPU days using a single GPU for language modeling task. We find that our algorithm is more efficient than state-of-the-art.

Files

Thesis_Final.pdf
(pdf | 1.11 Mb)
Unknown license