Locally Explainable Isolation Forest with Mixed-Attribute Data and Ternary Isolation Trees

Huistra, M.E.

Locally Explainable Isolation Forest with Mixed-Attribute Data and Ternary Isolation Trees

Combatting Money Laundering with Anomaly Detection

Master thesis (2021)

Authors

M.E. Huistra Electrical Engineering, Mathematics and Computer Science

Contributors

C.W. Oosterlee Numerical Analysis - (mentor)

Nestor Parolya Statistics - (graduation committee member)

N.V. Budko Numerical Analysis - (graduation committee member)

Evert Haasdijk Deloitte (graduation committee member)

L.A. Souto Arias Numerical Analysis - (graduation committee member)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

Anomaly Detection Anti-Money Laundering Isolation Forest Mixed-Attribute Data Ternary Isolation Trees

To reference this document use:

http://resolver.tudelft.nl/uuid:903d52ed-66cc-4e88-8860-a08e30750b5b

More Info

expand_more

Published Date

01-10-2021

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

In the fight against money laundering, demand for data-driven Anti-Money Laundering (AML) solutions is growing. Particularly anomaly detection algorithms have proven effective in the detection of suspicious customer behaviour, as well as observing patterns otherwise hidden in customer transaction data. In this thesis, the Isolation Forest anomaly detection algorithm is studied in combination with the model-specific local explanation method, Multiple Indicator Local Depth-based Isolation Forest Feature Importance (MI-Local-DIFFI). To expand Isolation Forest to mixed-attribute data sets, the incorporation of nominal features is explored in more detail. This analysis resulted in the introduction of Isolation Forest with Categorical Sampling (iForestCS ), a methodology that directly incorporates nominal attributes into an isolation tree without the need of encoding it onto a numerical scale. This method is tested against different encoding strategies and Isolation Forest Conditional Anomaly Detection (iForestCAD) using different synthetic data sets. The method shows improved performance to the utilization of encoding strategies for different parameters of the underlying synthetic data. Furthermore, this thesis explores the potential of ternary Isolation Forest, in which the branching strategy of an isolation tree is expanded to produce three child nodes. It is demonstrated using synthetic data, that particularly the performance of MI-Local-DIFFI reduces when applied to a ternary Isolation Forest. Finally, the research considers a practical use-case. Using customer transaction data from Triodos Bank, the locally explainable Isolation Forest is applied to mixed-attribute customer transaction data. This has provided useful insight and resulted in the detection of suspicious customer behaviour and the introduction of new rules into business practices. Although the most interesting customer behaviour did not directly emanate from the nominal attributes, the method of incorporating nominal features resulted in differences when considering the anomalies with the highest anomaly scores.

Files

Thesis_Final_Mark_Huistra_4362... (pdf)

(pdf | 3.37 Mb)

Unknown license