S. Makrodimitris | TU Delft Repository

An in-depth comparison of linear and non-linear joint embedding methods for bulk and single-cell multi-omics

Review (2024) - Stavros Makrodimitris (author) , I.B. Pronk (author) , T. Abdelaal (author) , Marcel JT Reinders (author)

Multi-omic analyses are necessary to understand the complex biological processes taking place at the tissue and cell level, but also to make reliable predictions about, for example, disease outcome. Several linear methods exist that create a joint embedding using paired informati ...

Benchmarking variational AutoEncoders on cancer transcriptomics data

Journal article (2023) - Mostafa Eltager (author) , Tamim Abdelaal (author) , Mohammed Charrout (author) , A. Mahfouz (author) , Marcel .J.T. Reinders (author) , Stavros Makrodimitris (author)

Deep generative models, such as variational autoencoders (VAE), have gained increasing attention in computational biology due to their ability to capture complex data manifolds which subsequently can be used to achieve better performance in downstream tasks, such as cancer type p ...

Deep generative models, such as variational autoencoders (VAE), have gained increasing attention in computational biology due to their ability to capture complex data manifolds which subsequently can be used to achieve better performance in downstream tasks, such as cancer type prediction or subtyping of cancer. However, these models are difficult to train due to the large number of hyperparameters that need to be tuned. To get a better understanding of the importance of the different hyperparameters, we examined six different VAE models when trained on TCGA transcriptomics data and evaluated on the downstream tasks of cluster agreement with cancer subtypes and survival analysis. We studied the effect of the latent space dimensionality, learning rate, optimizer, initialization and activation function on the quality of subsequent downstream tasks on the TCGA samples. We found β-TCVAE and DIP-VAE to have a good performance, on average, despite being more sensitive to hyperparameters selection. Based on these experiments, we derived recommendations for selecting the different hyperparameters settings. To ensure generalization, we tested all hyperparameter configurations on the GTEx dataset. We found a significant correlation (ρ = 0.7) between the hyperparameter effects on clustering performance in the TCGA and GTEx datasets. This highlights the robustness and generalizability of our recommendations. In addition, we examined whether the learned latent spaces capture biologically relevant information. Hereto, we measured the correlation and mutual information of the different representations with various data characteristics such as gender, age, days to metastasis, immune infiltration, and mutation signatures. We found that for all models the latent factors, in general, do not uniquely correlate with one of the data characteristics nor capture separable information in the latent factors even for models specifically designed for disentanglement.@en

Machine learning-based somatic variant calling in cell-free DNA of metastatic breast cancer patients using large NGS panels

Journal article (2023) - Elisabeth M. Jongbloed (author) , Maurice P.H.M. Jansen (author) , Vanja de Weerd (author) , Jean A. Helmijr (author) , Corine M. Beaufort (author) , Marcel .J.T. Reinders (author) , Ronald van Marion (author) , Wilfred F.J. van IJcken (author) , Stavros Makrodimitris (author)

Next generation sequencing of cell-free DNA (cfDNA) is a promising method for treatment monitoring and therapy selection in metastatic breast cancer (MBC). However, distinguishing tumor-specific variants from sequencing artefacts and germline variation with low false discovery ra ...

Cell type deconvolution of methylated cell-free DNA at the resolution of individual reads

Journal article (2023) - Pia Keukeleire (author) , Stavros Makrodimitris (author) , Marcel JT Reinders (author)

Cell-free DNA (cfDNA) are DNA fragments originating from dying cells that are detectable in bodily fluids, such as the plasma. Accelerated cell death, for example caused by disease, induces an elevated concentration of cfDNA. As a result, determining the cell type origins of cfDN ...

What does that gene do?

Gene function prediction by machine learning with applications to plants

Doctoral thesis (2021) - S. Makrodimitris (author)

Billions of people world-wide rely on plant-based food for their daily energy intake. As global warming and the spread of diseases (such as the banana Panama disease) is substantially hindering the cultivation of plants, the need to develop temperature- and/or disease-resistant v ...

The Power of Universal Contextualized Protein Embeddings in Cross-species Protein Function Prediction

Journal article (2021) - Irene van den Bent (author) , Stavros Makrodimitris (author) , MJT Reinders (author)

Computationally annotating proteins with a molecular function is a difficult problem that is made even harder due to the limited amount of available labeled protein training data. Unsupervised protein embeddings partly circumvent this limitation by learning a universal protein re ...

Is Wikipedia succeeding in reducing gender bias? Assessing changes in gender bias in Wikipedia using word embeddings

Conference paper (2020) - Katja Geertruida Schmahl (author) , Tom Julian Viering (author) , Stavros Makrodimitris (author) , Arman Naseri Naseri (author) , David M.J. Tax (author) , M. Loog (author)

Large text corpora used for creating word embeddings (vectors which represent word meanings) often contain stereotypical gender biases. As a result, such unwanted biases will typically also be present in word embeddings derived from such corpora and downstream applications in the ...

Unsupervised protein embeddings outperform hand-crafted sequence and structure features at predicting molecular function

Journal article (2020) - A.O. Villegas Morcillo (author) , Stavros Makrodimitris (author) , R.C.H.J. van Ham (author) , A.M. Gomez (author) , Victoria Sanchez (author) , M. J.T. Reinders (author)

Motivation: Protein function prediction is a difficult bioinformatics problem. Many recent methods use deep neural networks to learn complex sequence representations and predict function from these. Deep supervised models require a lot of labeled training data which are not avail ...

Automatic gene function prediction in the 2020’s

Review (2020) - Stavros Makrodimitris (author) , Roeland C.H.J. Van Ham (author) , Marcel .J.T. Reinders (author)

The current rate at which new DNA and protein sequences are being generated is too fast to experimentally discover the functions of those sequences, emphasizing the need for accurate Automatic Function Prediction (AFP) methods. AFP has been an active and growing research field fo ...

A thorough analysis of the contribution of experimental, derived and sequence-based predicted protein-protein interactions for functional annotation of proteins

Journal article (2020) - Stavros Makrodimitris (author) , Marcel .J.T. Reinders (author) , Roeland C.H.J. Van Ham (author)

Physical interaction between two proteins is strong evidence that the proteins are involved in the same biological process, making Protein-Protein Interaction (PPI) networks a valuable data resource for predicting the cellular functions of proteins. However, PPI networks are larg ...

Metric learning on expression data for gene function prediction

Journal article (2020) - Stavros Makrodimitris (author) , Marcel .J.T. Reinders (author) , Roeland C.H.J. Van Ham (author)

Motivation: Co-expression of two genes across different conditions is indicative of their involvement in the same biological process. However, when using RNA-Seq datasets with many experimental conditions from diverse sources, only a subset of the experimental conditions is expec ...

Dynamic clonal hematopoiesis and functional T-cell immunity in a supercentenarian

Journal article (2020) - Erik Ben van den Akker (author) , S. Makrodimitris (author) , More Authors..., M. Hulsman (author) , Martijn H. Brugman (author) , Tatjana Nikolic (author) , Ted Bradley (author) , Quinten Waisfisz (author) , Frank Baas (author) , Marcel J. T. Reinders (author) , H. Holstege (author)

Improving protein function prediction using protein sequence and GO-term similarities

Journal article (2019) - Stavros Makrodimitris (author) , Roeland Van Ham (author) , M.J.T. Reinders (author)

Motivation: Most automatic functional annotation methods assign Gene Ontology (GO) terms to proteins based on annotations of highly similar proteins. We advocate that proteins that are less similar are still informative. Also, despite their simplicity and structure, GO terms seem ...