- Authors
- Martin Waltz
- title
- Essays on deep reinforcement learning with applications to path planning and control of autonomous vehicles
- Please use the following URL when quoting:
- https://nbn-resolving.org/urn:nbn:de:bsz:14-qucosa2-967505
- publication_date
- 2025
- Date of submission
- 17.09.2024
- Date of defense
- 08.04.2025
- Abstract (EN)
- This thesis contributes to the field of deep reinforcement learning (DRL), providing both methodological advancements and practical applications to planning and control tasks of autonomous vehicles. DRL, which merges the approximation power of neural networks with the reinforcement learning (RL) paradigm, has shown exceptional promise in solving complex sequential decision-making tasks. The first paper of this thesis addresses the inherent action-value estimation bias in the famous $Q$-Learning algorithm by introducing new estimators for the maximum expected value of a set of random variables. These estimators are theoretically analyzed and used to derive novel RL and DRL algorithms with improved bias control. The second paper presents a spatial-temporal recurrent neural network architecture for a DRL agent to perform path planning and control of autonomous vessels on the ocean. The agent considers maritime traffic rules, is robust to partial observability, and performs favourably compared to conventional baselines. The third paper adapts the new network architecture to inland waterways, proposing a holistic architecture for autonomous vessels with two distinct DRL agents for the path planning and following tasks. The agents account for environmental forces, operate effectively in high traffic densities, and respect the geometric constraints of inland waterways. The fourth paper tackles dynamic obstacle avoidance, a critical component of local path planning, for autonomous vehicles operating in non-lane-based traffic environments. A two-step approach is proposed: First, a supervised learning module predicts the trajectories of nearby vehicles, which serves as a basis for computing enhanced collision risk metrics. Second, a DRL agent is trained with an observation space enriched by these risk metrics. This approach significantly reduces collision rates compared to a baseline agent configuration. The fifth paper applies DRL to self-organize autonomous aircraft in a terminal arrival problem in the context of urban air mobility. Leveraging the proposed spatial-temporal recurrent network, the aircraft operate efficiently and safely throughout various operations. Additionally, we validate the self-organized DRL approach through real-world drone demonstrations.
- Keywords (EN)
- deep reinforcement learning, path planning, control, autonomous vehicles
- Classification (DDC)
- 380
- Classification (RVK)
- ZO 4660
- WC 7722
- Examiner
- Prof. Dr. Ostap Okhrin
- Prof. Dr. Carlo D'Eramo
- Awarding institution
- Technische Universität Dresden, Dresden
- version
- publizierte Version / Verlagsversion
- URN Qucosa
- urn:nbn:de:bsz:14-qucosa2-967505
- Qucosa date of publication
- 25.04.2025
- Document type
- doctoral_thesis
- Document language
- English
- licence
CC BY 4.0