Generative Adversarial Nets for generating synthetic Imaging Mass Spectrometry data
More Info
expand_more
Abstract
This report investigates the use of Generative Adversarial Nets (GANs) specifically for over-
sampling Imaging Mass Spectrometry spectra. IMS is a technique used to measure the spatial
distribution of molecules, which is valuable in fields like oncology and biomarker discovery.
GANs, on the other hand, are a class of machine learning frameworks where two neural net-
works, the generator, and the discriminator, are trained simultaneously through adversarial
processes. The generator creates synthetic data, while the discriminator tries to distinguish
between real and synthetic data.
GANs-based oversampling aims to increase classifier performance by adding data to classes
that are underrepresented in the original data. Synthetic oversampling is especially relevant
in IMS data as the measuring technique is destructive, making acquiring more real samples
impossible. GANs have been shown to outperform other oversampling techniques such as
SMOTE on various datasets. Applying GANs directly to the dataset proved unsuccessful in
this oversampling task.
Different possible causes of the limited performance of the GANs are studied leading to
improved experiment results using spectra reduced in dimension and the Wasserstein GANs
with gradient penalty. Even though with these changes to the experiment the GANs appear
to generate more realistic data, using this data for oversampling does not increase overall
classifier performance. Rather, it steers the classifier to overfitting towards the minority
classes.
This report demonstrates that applying the designed GANs for oversampling minority classes
on this dataset does increase classifier performance. However, it is shown that GANs can be
trained on IMS data and that GANs might be of use for applications with IMS data besides
oversampling.