Synthetic data generation

Generative modelling of medical data for anonymisation purposes

Last updated on Sep 24, 2022

Photo from Photo from SIIM-ISIC Melanoma Classification Kaggle Challenge

Synthetic data generation with GANs

The usage of healthcare data in the development of artificial intelligence (AI) models is associated with issues around personal integrity and regulations. Patient data can usually not be freely shared and thus, the utility of it in creating AI solutions is limited.

What and why?

In this project, the aim is to explore generative modelling techniques (GANs) for generating synthetic data and inspect the impact synthetic data has on modelling performance. Additionally, comparisons of performance between machine learning models developed from real and synthetic data will be performed as well as assessing and comparing data leakage.

Contribiution

Main tasks:

test GANs to generate artificial data (images and text),
use synthetic data (conditional and unconditional GANs) for balancing classes and examine biases,
use augmentation for balancing classes,
test different ratios real/fake (using provided models with help of master students),
explain classification results using XAI methods,
examine controllability in Latent Space (master students),
combine text and image data in multimodal classification task.

Technologies used: Python, Pytorch

Methods used: Deep Neural Networks, Skin Diseases Detection and Recognition, Explainable artificial intelligence, Multimodal learning

Deliverables

Github code:

Medium posts:

Presentations/workshops:

Inauguration of AICC at SUH 29th November 2021: EYE FOR AI
Workshops for Paris University AI4Healthcare 10th February 2022: ai4healthcare workshops
Women in Data Science Ljubljana 12th March 2022: WiDS2022 Ljubliana workshops
Event for Chalmers students at AI Sweden 21th April 2022: Edge Lab - SUH

cv dl classification gans hlc