A traffic sign classification model based on vision transformer with convolution and GAN-based data
Liang, Zhikuo (2025)
Kandidaatintyö
Liang, Zhikuo
2025
School of Engineering Science, Tietotekniikka
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi-fe2025052253054
https://urn.fi/URN:NBN:fi-fe2025052253054
Tiivistelmä
Traffic sign recognition is fundamental to advanced driver assistance systems and autonomous driving, yet achieving robustness in diverse conditions like fog remains challenging. While convolutional neural networks dominate, Vision Transformers offer alternatives but can struggle with local features and limited data. This thesis investigates ViT-based models for traffic sign classification, enhanced via hybrid architectures and data augmentation. This study adapts ViT-CoMer variants: a baseline and a proposed lightweight hybrid integrating convolutional layers. To address adverse weather data scarcity, Foggy-CycleGAN generates synthetic foggy images based on the German Traffic Sign Recognition Benchmark to serve as augmentation. Using MMDetection, this study compares these ViT variants (trained on original and GAN-augmented GTSRB) against CNN and standard ViT baselines. Results show ViT models, especially with GAN augmentation, achieve competitive performance. The augmented hybrid ViTBaseline_2 excels particularly with large objects (AP 0.881, APl 0.993). The augmented baseline ViTBaseline also improves significantly (peak AP 0.887). Findings highlight the potential of combining ViTs, convolutional elements, and GAN augmentation for robust traffic sign classification, especially under adverse conditions.
