- AutorIn
- Rajasekar Sankar Technische Universität Dresden
- Titel
- Transformer enhanced affordance learning for autonomous driving
- Zitierfähige Url:
- https://nbn-resolving.org/urn:nbn:de:bsz:14-qucosa2-938738
- Erstveröffentlichung
- 2024
- Datum der Einreichung
- 17.09.2024
- Datum der Verteidigung
- 30.10.2024
- Abstract (EN)
- Most existing autonomous driving perception approaches rely on the Direct perception method with camera sensors, yet they often overlook the valuable 3D spatial data provided by other sensors, such as LiDAR. This Master thesis investigates enhancing affordance learning through a multimodal fusion transformer, aiming to refine AV perception and scene interpretation by effectively integrating multi-sensor data. Our approach introduces a two-stage network architecture: the first stage employs a backbone to fuse sensor data and to extract features, while the second stage employs a Taskblock MLP network to predict both classification affordances (junction, red light, pedestrian, and vehicle hazards) and regression affordances (relative angle, lateral distance, and target vehicle distance). We utilized the TransFuser backbone, based on Imitation Learning, to integrate image and LiDAR BEV data using a self-attention mechanism and to extract the feature map. Our results are compared against image-only architectures like Latent TransFuser and other sensor fusion backbones. Integration with the OmniOpt 2 tool, developed by ScaDS.AI, facilitates hyperparameter optimization, enhancing the model performance. We assessed our model's effectiveness using the CARLA Town02 and as well as the real-world KITTI-360 datasets, demonstrating significant improvements in affordance prediction accuracy and reliability. This advancement underscores the potential of combining LiDAR and image data via transformer-based fusion to create safer and more efficient autonomous driving systems.
- Freie Schlagwörter (EN)
- Autonomous Driving, Affordance Learning, Sensor fusion, Imitation learning, Transformer, Taskblock MLP
- Klassifikation (DDC)
- 380
- Klassifikation (RVK)
- ZO 4660
- GutachterIn
- Prof. Dr. Ostap Okhrin
- Prof. Dr. Georg Hirte
- BetreuerIn Hochschule / Universität
- M.Sc. Dianzhao Li
- Den akademischen Grad verleihende / prüfende Institution
- Technische Universität Dresden, Dresden
- Sonstige beteiligte Institution
- Technische Universität Dresden, Dresden
- Version / Begutachtungsstatus
- publizierte Version / Verlagsversion
- URN Qucosa
- urn:nbn:de:bsz:14-qucosa2-938738
- Veröffentlichungsdatum Qucosa
- 30.10.2024
- Dokumenttyp
- Masterarbeit / Staatsexamensarbeit
- Sprache des Dokumentes
- Englisch
- Lizenz / Rechtehinweis
CC BY 4.0
- Inhaltsverzeichnis
List of Figures ix List of Tables xi Abbreviations xiii 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Autonmous Driving: Overview . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 From highly automated to autonomous . . . . . . . . . . . . . . 1 1.1.2 Autonomy levels . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.3 Perception systems . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Three Paradigms for autonomous driving . . . . . . . . . . . . . . . . . 4 1.3 Sensor Fusion: Global context capture . . . . . . . . . . . . . . . . . . . 5 1.4 Research Questions and Methods . . . . . . . . . . . . . . . . . . . . . . 5 1.4.1 Research Questions (RQ) . . . . . . . . . . . . . . . . . . . . . . 5 1.4.2 Research Methods (RM) . . . . . . . . . . . . . . . . . . . . . . . 6 1.5 Structure of the work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2 Research Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1 Affordance Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Multi-Modal Autonomous Driving . . . . . . . . . . . . . . . . . . . . . 9 2.3 Sensor Fusion Methods for Object Detection and Motion Forecasting . . 10 2.4 Attention for Autonomous Driving . . . . . . . . . . . . . . . . . . . . . 11 3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.1.1 Problem setting A . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.1.2 Problem setting B . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2 Input and Output parametrization . . . . . . . . . . . . . . . . . . . . . 15 3.2.1 Input Representation . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.2.2 Output Representation . . . . . . . . . . . . . . . . . . . . . . . . 18 3.3 Definition of affordances . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.4 Proposed Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.5 Detailed overview of the Proposed Architecture . . . . . . . . . . . . . . 20 3.5.1 Stage1: TransFuser Backbone - Multimodal fusion transformer . 21 3.5.2 Fused Feature extraction . . . . . . . . . . . . . . . . . . . . . . 23 3.5.3 Annotations extraction . . . . . . . . . . . . . . . . . . . . . . . 24 3.5.4 Stage2: Task-Block MLP Network architecture . . . . . . . . . . 29 3.6 Loss Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.6.1 Stage1: Loss Function . . . . . . . . . . . . . . . . . . . . . . . . 30 3.6.2 Stage2: Loss Function . . . . . . . . . . . . . . . . . . . . . . . . 31 3.6.3 Total Loss Function . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.7 Other Backbone Architectures . . . . . . . . . . . . . . . . . . . . . . . . 32 3.7.1 Latent TransFuser . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.7.2 Geometric Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.7.3 Late Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.8 Hyperparameter Optimization: OmniOpt 2 . . . . . . . . . . . . . . . . 34 4 Training and Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.1 Dataset definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.1.1 Types of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.1.2 Overview of Dataset Distribution . . . . . . . . . . . . . . . . . . 36 4.2 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.3 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.3.1 Stage 1: Backbone architecture training . . . . . . . . . . . . . . 38 4.3.2 Stage 2: TaskBlock MLP training . . . . . . . . . . . . . . . . . 39 4.3.3 Traning Parameter Study . . . . . . . . . . . . . . . . . . . . . . 41 4.4 Loss curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.4.1 Stage 1 Loss curve . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.4.2 Stage 2 Loss curve . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.5 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.5.1 Preparation of a optimization project . . . . . . . . . . . . . . . 43 5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.1 Quantitative Insights into Regression-Based Affordance Predictions . . . 45 5.1.1 Comparative Analysis of Error Metrics against each Backbone . 45 5.1.2 Graphical Analysis of error metrics performance for Transfuser . 47 5.2 Quantitative Insights into Classification-Based Affordance Predictions . 48 5.2.1 Comparative Analysis of Classification Performance Metrics against each Backbone . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.2.2 Graphical Analysis of classification performance for TransFuser . 50 5.3 OmniOpt2 Hyper-optimization results . . . . . . . . . . . . . . . . . . . 52 5.4 Affordance Prediction Dashboard . . . . . . . . . . . . . . . . . . . . . . 53 6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 6.1 Evaluation with CARLA Test dataset . . . . . . . . . . . . . . . . . . . 55 6.1.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 6.2 Evaluation with real world: The KITTI Dataset . . . . . . . . . . . . . 56 6.2.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 6.2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 A Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 A.1 Latent Transfuser with MLP . . . . . . . . . . . . . . . . . . . . . . . . 61 A.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 A.2.1 Comparative Analysis of Error Metrics in Latent Transfuser with Transformer and MLP . . . . . . . . . . . . . . . . . . . . . . . . 61 A.2.2 Comparative Analysis of Classification Performance Metrics in Latent Transfuser with Transformer and MLP . . . . . . . . . . 62