Solving machine learning with machine learning

Exploiting Very Large-Scale Neighbourhood Search for synthesizing machine learning pipelines

More Info
expand_more

Abstract

This paper presents a comparative study of multiple algorithms that can be used to automatically search for high-performing pipelines on machine learning problems. These algorithms, namely Very Large-Scale Neighbourhood search (VLSN), Breadth-first search, Metropolis-Hastings, Monte-Carlo tree search (MCTS), enumerative A* search, and Genetic Programming, are evaluated on three datasets. The performance of VLSN is consistently acceptable for this task, but the best performance is given by MCTS. Interestingly the results show that limiting the solution space to pipelines containing only a classifier operator does not significantly decrease performance. Three possible explanations for this are that the datasets used are too simple, the use of default hyperparameters makes preprocessing and feature selection operators useless, or the evaluation of pipelines on a limited training set makes the search procedure ineffective. This research contributes to the field of AutoML by shedding light on algorithm performance and providing insights for future improvements.