Data-centric remedies for challenges in computer vision applications : insights from active learning, deep generative models, and explainable AI
Kaplan, Sinan (2024-05-31)
Väitöskirja
Kaplan, Sinan
31.05.2024
Lappeenranta-Lahti University of Technology LUT
Acta Universitatis Lappeenrantaensis
School of Engineering Science
School of Engineering Science, Laskennallinen tekniikka
Kaikki oikeudet pidätetään.
Julkaisun pysyvä osoite on
https://urn.fi/URN:ISBN:978-952-412-075-3
https://urn.fi/URN:ISBN:978-952-412-075-3
Kuvaus
ei tietoa saavutettavuudesta
Tiivistelmä
Machine learning in the form of deep neural networks is widely applied in various contexts including computer vision. The robustness of these models relies heavily on the quality of data used for training them. Consequently, challenges with data quantity, quality, diversity, representativeness and transparency become critical in training large-scale deep learning models. This thesis focuses on data-centric techniques to solve specific challenges in selected computer vision tasks.
A case study approach is employed to address three challenges: (1) data collection and sampling, (2) class imbalance in medical image analysis, and (3) interpretability/transparency in data-intensive applications. In the first case, active learning is used to efficiently curate high-quality datasets for human pose estimation to improving model performance with minimal data. Second, to make the training data more diverse and tackle the class imbalance issue in retinal image analysis, deep generative models including generative adversarial networks and variational autoencoders (VAE) are studied to generate synthetic retinal images. Third, an explainable artificial intelligence (XAI) method is utilized to enable an online platform for examining the characteristics of data and a model trained for recognizing eye diseases from optical coherence tomography images.
The results obtained from the cases demonstrate that active learning reduces labelling costs and maintains model performance of a human pose estimation model. In addition, deep generative models, particularly conditional VAE, show promise in generating diverse retinal images to mitigate the class imbalance issue. Lastly, the developed user-centric and interactive platform incorporating XAI provides a starting point for promoting interpretability and transparency in the application development for artificial intelligence.
A case study approach is employed to address three challenges: (1) data collection and sampling, (2) class imbalance in medical image analysis, and (3) interpretability/transparency in data-intensive applications. In the first case, active learning is used to efficiently curate high-quality datasets for human pose estimation to improving model performance with minimal data. Second, to make the training data more diverse and tackle the class imbalance issue in retinal image analysis, deep generative models including generative adversarial networks and variational autoencoders (VAE) are studied to generate synthetic retinal images. Third, an explainable artificial intelligence (XAI) method is utilized to enable an online platform for examining the characteristics of data and a model trained for recognizing eye diseases from optical coherence tomography images.
The results obtained from the cases demonstrate that active learning reduces labelling costs and maintains model performance of a human pose estimation model. In addition, deep generative models, particularly conditional VAE, show promise in generating diverse retinal images to mitigate the class imbalance issue. Lastly, the developed user-centric and interactive platform incorporating XAI provides a starting point for promoting interpretability and transparency in the application development for artificial intelligence.
Kokoelmat
- Väitöskirjat [1210]
