Attribution of the malware to the developers writing the malware is an important factor in cybercrime investigative work. Clustering together not only malware of the same family, but also inter-family malware relations together provides more information about the authors and aid
...
Attribution of the malware to the developers writing the malware is an important factor in cybercrime investigative work. Clustering together not only malware of the same family, but also inter-family malware relations together provides more information about the authors and aid further malware analysis work. In this report, previous work which concluded attribution on compiled binaries can be done by a programmer their style is questioned. Given insight on this matter, this report explores new clustering techniques for both static and dynamically derived features from malware binaries. Both methods are complementary as they provide very different types of data. In the static analysis, the data for the similarity comparison is derived from disassembled binaries, while in dynamic analysis the choice was made to record system calls executed by the malware during ecution.
We use a finer granularity than when comparing the data of the complete binaries with each other, such that instead of differences, fine similarities among malware families can be found. Evaluation of clusters is a difficult subject, because of its unsupervised nature and data quality related causes. However, upon manual inspection of the generated clusters, the newly developed clustering methods confirm previously discovered similarities but also find new connections among malware families.