Filtering Knowledge: A Comparative Analysis of Information-Theoretical-Based Feature Selection Methods

Vasilev, K.V.

Filtering Knowledge: A Comparative Analysis of Information-Theoretical-Based Feature Selection Methods

Bachelor thesis (2023)

Authors

K.V. Vasilev Electrical Engineering, Mathematics and Computer Science

Contributors

Asterios Katsifodimos Web Information Systems (mentor)

A. Ionescu Web Information Systems (mentor)

Elvin Isufi (graduation committee member)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

Data Augmentation Information Theory Feature Selection Comparative analysis

To reference this document use:

http://resolver.tudelft.nl/uuid:fbcf96d8-3685-4838-85e6-ee6887c25e15

More Info

expand_more

Published Date

28-06-2023

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

The data used in machine learning algorithms strongly influences the algorithms' capabilities. Feature selection techniques can choose a set of columns that meet a certain learning goal. There is a wide variety of feature selection methods, however, the ones we cover in this comparative analysis are part of the information-theoretical-based family. We evaluate MIFS, MRMR, CIFE, and JMI using the machine learning algorithms Logistic Regression, XGBoost, and Support Vector Machines.
Multiple datasets with a variety of feature types are used during evaluation. We find that MIFS and MRMR are 2-4 times faster than CIFE and JMI. MRMR and JMI choose columns that lead to significantly higher accuracy and lower root mean squared error earlier. The results we present here can help data scientists pick the right feature selection method depending on the datasets used.

Files

Kiril_vasilev_filtering_knowle... (pdf)

(pdf | 1.02 Mb)

Unknown license