MalPaCA: Malware behaviour analysis using unsupervised machine learning

Comparative analysis of various clustering algorithms on determining the best performance in terms of network behaviour discovery

Bachelor thesis (2021)

Authors

H.J. de Heer Electrical Engineering, Mathematics and Computer Science

Contributors

A. Nadeem Cyber Security - (mentor)

S.E. Verwer Cyber Security - (graduation committee member)

M.A. Migut Computer Science & Engineering-Teaching Team - (coach)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

Clustering Malpaca Comparative analysis HDBScan

To reference this document use:

http://resolver.tudelft.nl/uuid:254db628-839c-4f99-b9be-91469453076e

More Info

expand_more

Published Date

01-07-2021

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

MalPaCA makes use of unsupervised machine learning to provide malware capability assessment by clustering the temporal behaviour of malware network packet traces. A comparative analysis was performed on various clustering algorithms to determine the best clustering algorithm in terms of network behaviour discovery. The clustering algorithms included in the analysis were HDBSCAN, OPTICS, Agglomerative Hierarchical Clustering and K-medoids. Metrics that capture cluster separation, cohesion, purity and completeness were used to determine the performance of the clustering algorithms. Agglomerative Hierarchical Clustering had the lowest total error of 0.950 in the comparative analysis compared to the baseline HDBScan with an error of 1.381.

Files

CSE3000_Research_project_Resea... (pdf)

(pdf | 1.42 Mb)