Solving machine learning with machine learning

Sonneveld, A.C.

Solving machine learning with machine learning

Exploiting Very Large-Scale Neighbourhood Search for synthesizing machine learning pipelines

Bachelor thesis (2023)

Authors

A.C. Sonneveld Electrical Engineering, Mathematics and Computer Science

Contributors

S. Dumančić Algorithmics - (mentor)

T.R. Hinnerichs Algorithmics - (mentor)

David Tax Pattern Recognition and Bioinformatics - (graduation committee member)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

Program Synthesis AutoML Algorithm Selection

To reference this document use:

http://resolver.tudelft.nl/uuid:45c4af9f-ded9-4e33-9c0b-26d7e247d46c

More Info

expand_more

Published Date

29-06-2023

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

This paper presents a comparative study of multiple algorithms that can be used to automatically search for high-performing pipelines on machine learning problems. These algorithms, namely Very Large-Scale Neighbourhood search (VLSN), Breadth-first search, Metropolis-Hastings, Monte-Carlo tree search (MCTS), enumerative A* search, and Genetic Programming, are evaluated on three datasets. The performance of VLSN is consistently acceptable for this task, but the best performance is given by MCTS. Interestingly the results show that limiting the solution space to pipelines containing only a classifier operator does not significantly decrease performance. Three possible explanations for this are that the datasets used are too simple, the use of default hyperparameters makes preprocessing and feature selection operators useless, or the evaluation of pipelines on a limited training set makes the search procedure ineffective. This research contributes to the field of AutoML by shedding light on algorithm performance and providing insights for future improvements.

Files

CSE3000_Final_Paper_1_.pdf

(pdf | 0.287 Mb)

Unknown license