Accelerating Cluster Assignment for SeqClu

More Info
expand_more

Abstract

Clustering is a group of (unsupervised) machine learning algorithms used to categorize data into clusters. The most popular clustering algorithm is k-means clustering. K-means clustering clusters the data into k clusters where a cluster is represented by the mean of the data points called a centroid. Instead of using the mean as a centroid, a data point (medoid) can be used instead. This algorithm is called k-medoids algorithm. Both the algorithms work in an offline setting where all the data is known beforehand and usually use Euclidean distance to calculate the distance between any two points. SeqClu is another clustering algorithm for sequential data that works in an online setting and uses Dynamic Time Warping as its distance measure. It is based on k-medoids where it uses p sequences called prototypes, to represent a cluster. It assigns an incoming sequence to the cluster that has the lowest average distance between its prototypes and the incoming sequence. The issue in this approach is that many Dynamic Time Warping distance calculations need to be made which affects the clustering speed and using the average distance affects the clustering accuracy due to outliers being assigned as prototypes. This paper proposes an alternative algorithm with three variants for the cluster assignment process. This algorithm iterates through the prototypes in search for the closest prototype while excluding clusters that are deemed too far. It assigns the incoming sequence to the cluster of the closest prototype that it has found. Experiments on the UJI Pen Characters and UCR Synthetic Control datasets show an improvement in clustering speed and accuracy.