Cluster analysis in high dimensional data is a difficult but desirable task. Many existing methods fail to cluster high dimensional data due to what is known as the curse of dimensionality. Therefore, sophisticated clustering methods are in wide development. Along these lines, sp
...
Cluster analysis in high dimensional data is a difficult but desirable task. Many existing methods fail to cluster high dimensional data due to what is known as the curse of dimensionality. Therefore, sophisticated clustering methods are in wide development. Along these lines, spectral modularity maximization emerged from the theory of random matrices and graph modularity. The method is based on a filtering of the spectral decomposition of similarity matrices. Despite the recent success of this method, we uncover a fundamental challenge of spectral modularity: the spectral modularity breaks down as the number of groups in a data set grows. To mitigate this challenge, we propose two solutions: one solution based on a regularization and one solution based on a normalization. We perform a thorough empirical analysis of the clustering performance of the solutions and find that, not only do our methods resolve the breakdown of spectral modularity, but they also outperform existing clustering methods in
a variety of settings.