Memory usage analysis of binary clustering algorithm

What is the gain in peak memory usage of the binary clustering algorithm compared to current state-of-the-art clustering methods?

More Info
expand_more

Abstract

The rapid increase in the size of single-cell RNAseq datasets presents significant performance challenges when conducting evaluations and extracting information. We research an alternative input data format that utilizes binarization. Our main focus is an analysis of peak memory usage. An in-depth exploration of the solution’s design and implementation is provided, specifically emphasizing the strategies used to minimize memory usage. We analyzed and validated memory usage patterns and asymptotes using memory profiling tools. However, our findings suggest that gains in reducing memory usage on big modern datasets are attributable only to binarized data format rather than workflow interaction with the new format, which we found to be independent of the input format.

Files