Abstract

The advancement of machine learning is heavily dependent on the quality of data used for training models. This thesis explores the enhancement of data quality and augmentation techniques using Generative Adversarial Networks (GANs), pecifically focusing on the Pix2Pix architecture. The research addresses the critical challenges of improving image quality, optimizing hyperparameters, and detecting fake images generated by GANs, aiming to enhance the robustness and eliability of machine learning models. This work presents a collection of metrics for measuring data quality for both structured and unstructured data. Furthermore, it defines data quality for machine learning, divided into three quality levels. Advanced GAN-based augmentation techniques, demonstrating how Pix2Pix can generate realistic and varied synthetic samples to address the issue of underrepresented data in training sets are discussed. To further enhance the quality ofPix2Pix’s output the thesis presents a grid-based search strategy for efficient hyperparameter tuning of Pix2Pix, incorporating early stopping criteria to save time while maintaining high image quality. In addition, results are presented to validate the use of neural networks to predict the performance impact of different hyperparameter configurations, reducing the need for extensive manual tuning and streamlining the optimization process. Enhanced quality requires a suitable and dependable method to detect synthetic samples. Another important point of research presented in this work is the detection of synthetic images by using image histograms and pre-trained deep learning models. The researchpresents a reliable method for distinguishing between real and synthetic images, which is crucial for maintaining data integrity in various applications. The practical applications of the presented results span multiple domains, including health-care, manufacturing, and GAN generated face detection against manipulation, where high-quality data is essential. The findings offer valuable insights and methodologies for improving data quality and model performance, contributing to the advancement of artificial intelligence and its real-world applications.

Keywords

Computer Vision, Generative Adversarial Networks, Data Quality, Machine Learning

Document Type

Thesis

Publication Date

2025

Embargo Period

2025-04-22

Creative Commons License

Creative Commons Attribution-NonCommercial 4.0 International License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License

Share

COinS