Toward event-based object detection for autonomous vehicles
Han, Zongshuo (2026)
Kandidaatintyö
Han, Zongshuo
2026
School of Engineering Science, Tietotekniikka
Kaikki oikeudet pidätetään.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi-fe2026051847973
https://urn.fi/URN:NBN:fi-fe2026051847973
Tiivistelmä
Asynchronous time driven sensors offer significant advantages in latency and dynamic range for autonomous driving; however, their sparsity poses a significant challenge to traditional object detection frameworks. This study systematically evaluates the performance of the YOLO (You Only Look Once) architecture, comparing YOLOv3 and the modern YOLOv8 on the PROPHESEE Gen1 event dataset. To address the inherent class imbalance and data sparsity issues of neuromorphic data streams, we implement and analyze dataset optimization methods and two temporal encoding schemes—Surface of Active Events (SAE) and Temporal Binary Representation (TBR). Experimental results demonstrate that YOLOv8 achieves a significant generational leap in detection capabilities. At a high temporal resolution of 2.5 milliseconds, the lightweight YOLOv8n outperforms the full YOLOv3 model across all metrics, achieving an mAP50 of 0.535 while maintaining a real time throughput of 284.7 FPS.
This study also balanced temporal resolution with feature density; by increasing the accumulation window to 30 milliseconds improved the pedestrian mAP50 by 24.3%, effectively alleviating the sparsity bottleneck. A key finding is that the encoding strategy affects the results: TBR excels at capturing structured vehicle motion with high precision, while SAE is better at capturing pedestrians, significantly improving recall. Finally, a real time inference system was developed, capable of processing raw .dat event files on a GPU with a stable latency of approximately 2.0 milliseconds. This demonstrates the feasibility of deploying the optimized YOLOv8 architecture to high speed, safety critical vision applications.
This study also balanced temporal resolution with feature density; by increasing the accumulation window to 30 milliseconds improved the pedestrian mAP50 by 24.3%, effectively alleviating the sparsity bottleneck. A key finding is that the encoding strategy affects the results: TBR excels at capturing structured vehicle motion with high precision, while SAE is better at capturing pedestrians, significantly improving recall. Finally, a real time inference system was developed, capable of processing raw .dat event files on a GPU with a stable latency of approximately 2.0 milliseconds. This demonstrates the feasibility of deploying the optimized YOLOv8 architecture to high speed, safety critical vision applications.
