TY - JOUR
T1 - Robust Wind-Resistant Hovering Control of Quadrotor UAVs Using Deep Reinforcement Learning
AU - Xue, Jun
AU - Liu, Ziniu
AU - Liu, Guanjun
AU - Zhou, Ziyuan
AU - Zhang, Kaiwen
AU - Tang, Ying
AU - Wang, Jiacun
N1 - Publisher Copyright:
IEEE
PY - 2023
Y1 - 2023
N2 - Unmanned Aerial Vehicles (UAVs) have extensive applications such as logistics transportation and aerial photography. However, UAVs are sensitive to winds. Traditional control methods, such as proportional- integral-derivative controllers, generally fail to work well when the strength and direction of winds are changing frequently. In this work deep reinforcement learning algorithms are combined with a domain randomization method to learn robust wind-resistant hovering policies. A novel reward function is designed to guide learning. This reward function uses a constant reward to maintain a continuous flight of a UAV as well as a weight of the horizontal distance error to ensure the stability of the UAV at altitude. A five-dimensional representation of actions instead of the traditional four dimensions is designed to strengthen the coordination of wings of a UAV. We theoretically explain the rationality of our reward function based on the theories of Q-learning and reward shaping. Experiments in the simulation and real-world application both illustrate the effectiveness of our method. To the best of our knowledge, it is the first paper to use reinforcement learning and domain randomization to explore the problem of robust wind-resistant hovering control of quadrotor UAVs, providing a new way for the study of wind-resistant hovering and flying of UAVs.
AB - Unmanned Aerial Vehicles (UAVs) have extensive applications such as logistics transportation and aerial photography. However, UAVs are sensitive to winds. Traditional control methods, such as proportional- integral-derivative controllers, generally fail to work well when the strength and direction of winds are changing frequently. In this work deep reinforcement learning algorithms are combined with a domain randomization method to learn robust wind-resistant hovering policies. A novel reward function is designed to guide learning. This reward function uses a constant reward to maintain a continuous flight of a UAV as well as a weight of the horizontal distance error to ensure the stability of the UAV at altitude. A five-dimensional representation of actions instead of the traditional four dimensions is designed to strengthen the coordination of wings of a UAV. We theoretically explain the rationality of our reward function based on the theories of Q-learning and reward shaping. Experiments in the simulation and real-world application both illustrate the effectiveness of our method. To the best of our knowledge, it is the first paper to use reinforcement learning and domain randomization to explore the problem of robust wind-resistant hovering control of quadrotor UAVs, providing a new way for the study of wind-resistant hovering and flying of UAVs.
UR - http://www.scopus.com/inward/record.url?scp=85174856965&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85174856965&partnerID=8YFLogxK
U2 - 10.1109/TIV.2023.3324687
DO - 10.1109/TIV.2023.3324687
M3 - Article
AN - SCOPUS:85174856965
SN - 2379-8858
SP - 1
EP - 10
JO - IEEE Transactions on Intelligent Vehicles
JF - IEEE Transactions on Intelligent Vehicles
ER -