Laundromats: More Than Just Missing Socks

Improving the estimation of the False Negative Rate of money laundering detection at Dutch banks

More Info
expand_more

Abstract

The Dutch banking sector is mandated to identify and report transactions that may signify money laundering (ML) activities. Banks have been reliant on rule-based transaction monitoring (TM) systems that flag transactions exceeding predefined thresholds. While such systems are instrumental in filtering potential ML transactions, the inherently small prevalence rate of ML occurring in the vast majority of financial transactions causes these systems to produce only a limited number of flagged transactions. Furthermore, as flagged transactions are only those surpassing certain rule thresholds, the alerts are biased toward presumed risk distributions. Consequently, this causes the performance regarding transactions that go unreviewed but should have been flagged, so-called false negatives, to remain unknown. This lack of understanding is a critical gap in the efficacy of current anti-money laundering (AML) controls and motivates the need for better insights into this False Negative Rate (FNR). Addressing this critical need for enhanced discernment of FNR, this study aims to improve this knowledge gap by answering the following research question: ‘To what extent could a supervised machine learning classifier, when trained on historical alerts, assist Dutch banks in estimating the False Negative Rate of rule-based transaction monitoring systems concerning unreviewed transactions?’. To achieve this goal, this study adopts a mixed-methods research design by combining a literature review with seven interviews with domain specialists to acquire insights into transactional ML typologies and the underlying indicators and thresholds employed in existing rule-based TM systems. The study further extended to the development of eight different types of supervised machine learning classifiers, applied both using their default settings as well as with balancing measures in place when possible. These classifiers were trained on two synthetically generated datasets of both 180 million transactions, one with a high and one with a low ML prevalence rate, indicating the relative frequency of ML transactions. These mirrored real transactional patterns in order to evaluate the feasibility of estimating the FNR pertaining to unreviewed transactions utilizing historical data on flagged transactions. In addition to establishing the effect of both different ML prevalence rates on performance, we also explored whether combining data from multiple financial institutions into one shared information perspective could be of additional value. The findings indicate that the classifiers struggle to accurately predict the FNR, especially in scenarios of low ML prevalence and without combining information from multiple institutions. There is a significant discrepancy between the actual (0.729 to 0.988) and predicted FNRs (0.156 to 0.649), even in higher ML prevalence settings. The low performance, as evidenced by poor Area Under Precision-Recall Curve (AUPRC) and Matthews Correlation Coefficient (MCC) scores, highlights the challenges in using machine learning for ML transaction detection in Dutch banking, calling for further research and development of more advanced detection models Despite the restrained success in accurately estimating FNR through supervised machine learning classifiers, the insights derived from this research are of considerable value. They prompt a critical examination of the current TM systems and suggest a pivot toward more sophisticated machine learning techniques for FNR estimation...

Files