MalPaCA: Malware behaviour analysis using unsupervised machine learning

Comparative analysis of various clustering algorithms on determining the best performance in terms of network behaviour discovery

More Info
expand_more

Abstract

MalPaCA makes use of unsupervised machine learning to provide malware capability assessment by clustering the temporal behaviour of malware network packet traces. A comparative analysis was performed on various clustering algorithms to determine the best clustering algorithm in terms of network behaviour discovery. The clustering algorithms included in the analysis were HDBSCAN, OPTICS, Agglomerative Hierarchical Clustering and K-medoids. Metrics that capture cluster separation, cohesion, purity and completeness were used to determine the performance of the clustering algorithms. Agglomerative Hierarchical Clustering had the lowest total error of 0.950 in the comparative analysis compared to the baseline HDBScan with an error of 1.381.