Accelerating Cluster Assignment for SeqClu

Bachelor thesis (2021)

Authors

R.K.N. Al-Obaidi Electrical Engineering, Mathematics and Computer Science

Contributors

A. Nadeem Cyber Security - (mentor)

S.E. Verwer Cyber Security - (mentor)

M.A. Migut Computer Science & Engineering-Teaching Team - (graduation committee member)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

To reference this document use:

http://resolver.tudelft.nl/uuid:7ead7330-fe8a-4b94-a6cf-8f752083ecba

More Info

expand_more

Published Date

01-07-2021

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

Clustering is a group of (unsupervised) machine learning algorithms used to categorize data into clusters. The most popular clustering algorithm is k-means clustering. K-means clustering clusters the data into k clusters where a cluster is represented by the mean of the data points called a centroid. Instead of using the mean as a centroid, a data point (medoid) can be used instead. This algorithm is called k-medoids algorithm. Both the algorithms work in an offline setting where all the data is known beforehand and usually use Euclidean distance to calculate the distance between any two points. SeqClu is another clustering algorithm for sequential data that works in an online setting and uses Dynamic Time Warping as its distance measure. It is based on k-medoids where it uses p sequences called prototypes, to represent a cluster. It assigns an incoming sequence to the cluster that has the lowest average distance between its prototypes and the incoming sequence. The issue in this approach is that many Dynamic Time Warping distance calculations need to be made which affects the clustering speed and using the average distance affects the clustering accuracy due to outliers being assigned as prototypes. This paper proposes an alternative algorithm with three variants for the cluster assignment process. This algorithm iterates through the prototypes in search for the closest prototype while excluding clusters that are deemed too far. It assigns the incoming sequence to the cluster of the closest prototype that it has found. Experiments on the UJI Pen Characters and UCR Synthetic Control datasets show an improvement in clustering speed and accuracy.

Files

Accelerating_Cluster_Assignmen... (pdf)

(pdf | 1.03 Mb)