TY - GEN
T1 - Fused-Layer-based DNN Model Parallelism and Partial Computation Offloading
AU - Li, Mingze
AU - Wang, Ning
AU - Zhou, Huan
AU - Duan, Yubin
AU - Wu, Jie
N1 - Funding Information:
VI. ACKNOWLEDGMENTS This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grants NO. 62172255 and NO. 61872221. The corresponding author is H. Zhou.
Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - With the development of Internet of Things (IoT) and the advance of deep learning, there is an urgent need to enable deep learning inference on IoT devices. To address the computation limitation of IoT devices in processing complex Deep Neural Networks (DNNs), partial computation offloading is developed to dynamically adjust computation offloading assignment strategy in different channel conditions for better performance. In this paper, we take advantage of intrinsic DNN computation characteristics, and propose a novel Fused-Layer-based (FL-based) DNN model parallelism method to accelerate inference. The key idea is that a DNN layer can be converted to several smaller layers to increase partial computation offloading flexibility, and thus further create better computation offloading solution. However, there is a trade-off between parallelism computation offloading flexibility and model parallelism overhead. Then, we discuss the optimal DNN model parallelism and the corresponding scheduling and offloading strategies in partial computation offloading. In particular, we present a Minimizing Waiting (MW) method, which explores both the FL strategy, the path scheduling strategy, and the path offloading strategy to reduce time complexity. Finally, we validate the effectiveness of the proposed method in commonly used DNNs. The results show that the proposed method can reduce the DNN inference time by an average of 18.39 times compared with No FL (NFL) algorithm, and is very close to the optimal solution Brute Force (BF) with greatly reduced time complexity.
AB - With the development of Internet of Things (IoT) and the advance of deep learning, there is an urgent need to enable deep learning inference on IoT devices. To address the computation limitation of IoT devices in processing complex Deep Neural Networks (DNNs), partial computation offloading is developed to dynamically adjust computation offloading assignment strategy in different channel conditions for better performance. In this paper, we take advantage of intrinsic DNN computation characteristics, and propose a novel Fused-Layer-based (FL-based) DNN model parallelism method to accelerate inference. The key idea is that a DNN layer can be converted to several smaller layers to increase partial computation offloading flexibility, and thus further create better computation offloading solution. However, there is a trade-off between parallelism computation offloading flexibility and model parallelism overhead. Then, we discuss the optimal DNN model parallelism and the corresponding scheduling and offloading strategies in partial computation offloading. In particular, we present a Minimizing Waiting (MW) method, which explores both the FL strategy, the path scheduling strategy, and the path offloading strategy to reduce time complexity. Finally, we validate the effectiveness of the proposed method in commonly used DNNs. The results show that the proposed method can reduce the DNN inference time by an average of 18.39 times compared with No FL (NFL) algorithm, and is very close to the optimal solution Brute Force (BF) with greatly reduced time complexity.
UR - http://www.scopus.com/inward/record.url?scp=85146936617&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85146936617&partnerID=8YFLogxK
U2 - 10.1109/GLOBECOM48099.2022.10000779
DO - 10.1109/GLOBECOM48099.2022.10000779
M3 - Conference contribution
AN - SCOPUS:85146936617
T3 - 2022 IEEE Global Communications Conference, GLOBECOM 2022 - Proceedings
SP - 5195
EP - 5200
BT - 2022 IEEE Global Communications Conference, GLOBECOM 2022 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2022 IEEE Global Communications Conference, GLOBECOM 2022
Y2 - 4 December 2022 through 8 December 2022
ER -