Synthetic data generation
Generative modelling of medical data for anonymisation purposes
Synthetic data generation with GANs
The usage of healthcare data in the development of artificial intelligence (AI) models is associated with issues around personal integrity and regulations. Patient data can usually not be freely shared and thus, the utility of it in creating AI solutions is limited.
What and why?
In this project, the aim is to explore generative modelling techniques (GANs) for generating synthetic data and inspect the impact synthetic data has on modelling performance. Additionally, comparisons of performance between machine learning models developed from real and synthetic data will be performed as well as assessing and comparing data leakage.
- test GANs to generate artificial data (images and text),
- use synthetic data (conditional and unconditional GANs) for balancing classes and examine biases,
- use augmentation for balancing classes,
- test different ratios real/fake (using provided models with help of master students),
- explain classification results using XAI methods,
- examine controllability in Latent Space (master students),
- combine text and image data in multimodal classification task.
Technologies used: Python, Pytorch
Methods used: Deep Neural Networks, Skin Diseases Detection and Recognition, Explainable artificial intelligence, Multimodal learning
- Modified version of the StyleGAN2-ADA for skin lesions generation
- Experiments using different XAI methods and ISIC2020 dataset
- Tabular data generation
- Mutlimodality for skin lesions classification
- Artificial Intelligence In Healthcare: Is synthetic data the future for improving medical diagnosis? | by Sylwia Majchrowska and Sandra Carrasco | Towards Data Science
- Artificial Intelligence in Healthcare Part II | by Sandra Carrasco and Sylwia Majchrowska | MLearning.ai
- On the evaluation of Generative Adversarial Networks | by Sandra Carrasco and Sylwia Majchrowska | Towards Data Science