Similarity metrics for binary cell clustering

How close can we get to state-of-the-art ?

Bachelor thesis (2023)

Authors

B.P. Golik Electrical Engineering, Mathematics and Computer Science

Contributors

M.J.T. Reinders Pattern Recognition and Bioinformatics - (mentor)

G.A. Bouland Pattern Recognition and Bioinformatics - (mentor)

B.H.M. Gerritsen Computer Science & Engineering-Teaching Team - (graduation committee member)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

To reference this document use:

http://resolver.tudelft.nl/uuid:744f8be7-37bc-4dd4-a66e-3003f72429ad

More Info

expand_more

Published Date

28-06-2023

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

Analysing single-cell RNA sequencing data is becoming an increasingly tedious task as the size of data sets grows. As a proposed solution, recent discoveries suggest that these data sets can be binarized without losing much information. This in turn should allow for memory and time efficient methods of storage and computation. Numerous analyses techniques require cell clustering as a preliminary procedure, which suggests the need to evaluate binary representation performance under that context. In this work we present a comparison between binary clustering results and the state-of-the-art, with a focus on similarity metric choice and the impact on intermediate steps of the procedure (i.e. similarity matrices and kNN graphs). The method was evaluated on single-cell transcriptomic data sets, utilizing a combination of R and C++ as an evaluation framework. Through these means we found that some of the similarity metrics operating on continuous input can possibly be reproduced with similarity metrics operating on binary input.

Files

CSE3000_Final_Paper.pdf

(pdf | 2.15 Mb)