Print Email Facebook Twitter Minimize experimentation overhead through dataset selection, ensemble feature attention, and feature selection with reduced subset sizes Title Minimize experimentation overhead through dataset selection, ensemble feature attention, and feature selection with reduced subset sizes Author Anton, Mihai (TU Delft Electrical Engineering, Mathematics and Computer Science) Contributor Cruz, Luis (mentor) Shome, A. (mentor) van Deursen, A. (graduation committee) van Gemert, J.C. (graduation committee) Cohen-Addad, Vincent (mentor) Jerome, Sammy (mentor) Degree granting institution Delft University of Technology Programme Computer Science Date 2024-04-24 Abstract In large-scale ML, data size becomes a critical variable, especially in the context of large companies, where models already exist and are hard to change and fine-tune. Time to market and model quality are essential metrics, thus looking for ways to select, prune and augment the input data while treating the model as a black box can speed up the process from raw data to productionized model.Datasets can have thousands of features and many redundant/duplicate samples, for various business logic reasons. In some particular ML flows, it might be that only a subset of them provide most of the input to the final accuracy. Also, looking into ways to provide insights on what data points are the most meaningful can help engineers collect more relevant samples, or focus their attention on specific parts of the data distribution. Subject datamachine learningfeature selectionmlopsoptimization To reference this document use: http://resolver.tudelft.nl/uuid:3463cc02-99c4-4628-9a10-98cc7a40cfcc Part of collection Student theses Document type master thesis Rights © 2024 Mihai Anton Files PDF MihaiAnton_5350123_MScThesis.pdf 7.5 MB Close viewer /islandora/object/uuid:3463cc02-99c4-4628-9a10-98cc7a40cfcc/datastream/OBJ/view