Robust Wind-Resistant Hovering Control of Quadrotor UAVs Using Deep Reinforcement Learning

Jun Xue, Ziniu Liu, Guanjun Liu, Ziyuan Zhou, Kaiwen Zhang, Ying Tang, Jiacun Wang

Research output: Contribution to journalArticlepeer-review

8 Scopus citations

Abstract

Unmanned Aerial Vehicles (UAVs) have extensive applications such as logistics transportation and aerial photography. However, UAVs are sensitive to winds. Traditional control methods, such as proportional- integral-derivative controllers, generally fail to work well when the strength and direction of winds are changing frequently. In this work deep reinforcement learning algorithms are combined with a domain randomization method to learn robust wind-resistant hovering policies. A novel reward function is designed to guide learning. This reward function uses a constant reward to maintain a continuous flight of a UAV as well as a weight of the horizontal distance error to ensure the stability of the UAV at altitude. A five-dimensional representation of actions instead of the traditional four dimensions is designed to strengthen the coordination of wings of a UAV. We theoretically explain the rationality of our reward function based on the theories of Q-learning and reward shaping. Experiments in the simulation and real-world application both illustrate the effectiveness of our method. To the best of our knowledge, it is the first paper to use reinforcement learning and domain randomization to explore the problem of robust wind-resistant hovering control of quadrotor UAVs, providing a new way for the study of wind-resistant hovering and flying of UAVs.

Original languageEnglish (US)
Pages (from-to)1-10
Number of pages10
JournalIEEE Transactions on Intelligent Vehicles
DOIs
StateAccepted/In press - 2023

All Science Journal Classification (ASJC) codes

  • Automotive Engineering
  • Control and Optimization
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Robust Wind-Resistant Hovering Control of Quadrotor UAVs Using Deep Reinforcement Learning'. Together they form a unique fingerprint.

Cite this