TY - GEN
T1 - Accelerate Cooperative Deep Inference via Layer-wise Processing Schedule Optimization
AU - Wang, Ning
AU - Duan, Yubin
AU - Wu, Jie
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/7
Y1 - 2021/7
N2 - Computation offloading is proposed to solve one obstacle of enabling high-accurate and real-time deep inference in resource-constrained Internet of Things (IoT) devices. Cooperative deep inference is proposed recently to further trade-off the introduced communication latency in computation offloading, which partitions a Deep Neural Network (DNN) model into two parts and utilizes the IoT end device and the server to process the DNN model cooperatively. We observe one important but ignored fact in all previous works: DNN computation and communication processing cbe conducted simultaneously in cooperative deep inference. As a result, the DNN layer-wise processing schedule has an impact on inference latency and it is non-trivial to find the optimal schedule in State-Of-The-Art (SOTA) DNNs with Directed Acyclic Graph (DAG) computational architectures. The contributions of this paper are as follows. (1) The proposed Deep Inference Optimization with Layer-wise Schedule, Deep-Inference-L, is a unique pipeline-based DAG schedule problem, which turns out to be NP-hard. (2) We categorize SOTA DNNs into three different categories and discuss the corresponding optimal processing schedule in special cases and efficient heuristic schedules in the general case. (3) The proposed solutions are extensively tested via a proof-of-concept prototype. (4) Results indicate that our algorithms can achieve an 8x speedup compared with local inference in the best case.
AB - Computation offloading is proposed to solve one obstacle of enabling high-accurate and real-time deep inference in resource-constrained Internet of Things (IoT) devices. Cooperative deep inference is proposed recently to further trade-off the introduced communication latency in computation offloading, which partitions a Deep Neural Network (DNN) model into two parts and utilizes the IoT end device and the server to process the DNN model cooperatively. We observe one important but ignored fact in all previous works: DNN computation and communication processing cbe conducted simultaneously in cooperative deep inference. As a result, the DNN layer-wise processing schedule has an impact on inference latency and it is non-trivial to find the optimal schedule in State-Of-The-Art (SOTA) DNNs with Directed Acyclic Graph (DAG) computational architectures. The contributions of this paper are as follows. (1) The proposed Deep Inference Optimization with Layer-wise Schedule, Deep-Inference-L, is a unique pipeline-based DAG schedule problem, which turns out to be NP-hard. (2) We categorize SOTA DNNs into three different categories and discuss the corresponding optimal processing schedule in special cases and efficient heuristic schedules in the general case. (3) The proposed solutions are extensively tested via a proof-of-concept prototype. (4) Results indicate that our algorithms can achieve an 8x speedup compared with local inference in the best case.
UR - http://www.scopus.com/inward/record.url?scp=85114965456&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85114965456&partnerID=8YFLogxK
U2 - 10.1109/ICCCN52240.2021.9522274
DO - 10.1109/ICCCN52240.2021.9522274
M3 - Conference contribution
AN - SCOPUS:85114965456
T3 - Proceedings - International Conference on Computer Communications and Networks, ICCCN
BT - 30th International Conference on Computer Communications and Networks, ICCCN 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 30th International Conference on Computer Communications and Networks, ICCCN 2021
Y2 - 19 July 2021 through 22 July 2021
ER -