Automatic feature augmentation ranking: XGBoost

More Info
expand_more

Abstract

Automatic machine learning is a subfield of machine learning that automates the common procedures faced in predictive tasks. The problem of one such procedure is automatic data augmentation, where one desires to enrich the existing data to increase model performance. In relational data repositories, the data is stored in normal form. This causes problems, since joining all tables and subsequently performing feature selection is highly inefficient. This paper provides AFAR, an approach to efficiently and effectively perform automated feature augmentation by ranking candidate joins in a data repository. Additionally, an experimental evaluation that validates the approach’s capabilities, is presented.