Fused-Layer-based DNN Model Parallelism and Partial Computation Offloading

Mingze Li, Ning Wang, Huan Zhou, Yubin Duan, Jie Wu

Research output: Contribution to journalConference articlepeer-review

Abstract

With the development of Internet of Things (IoT) and the advance of deep learning, there is an urgent need to enable deep learning inference on IoT devices. To address the computation limitation of IoT devices in processing complex Deep Neural Networks (DNNs), partial computation offloading is developed to dynamically adjust computation offloading assignment strategy in different channel conditions for better performance. In this paper, we take advantage of intrinsic DNN computation characteristics, and propose a novel Fused-Layer-based (FL-based) DNN model parallelism method to accelerate inference. The key idea is that a DNN layer can be converted to several smaller layers to increase partial computation offloading flexibility, and thus further create better computation offloading solution. However, there is a trade-off between parallelism computation offloading flexibility and model parallelism overhead. Then, we discuss the optimal DNN model parallelism and the corresponding scheduling and offloading strategies in partial computation offloading. In particular, we present a Minimizing Waiting (MW) method, which explores both the FL strategy, the path scheduling strategy, and the path offloading strategy to reduce time complexity. Finally, we validate the effectiveness of the proposed method in commonly used DNNs. The results show that the proposed method can reduce the DNN inference time by an average of 18.39 times compared with No FL (NFL) algorithm, and is very close to the optimal solution Brute Force (BF) with greatly reduced time complexity.

Original languageEnglish (US)
Pages (from-to)5195-5200
Number of pages6
JournalProceedings - IEEE Global Communications Conference, GLOBECOM
DOIs
StatePublished - 2022
Externally publishedYes
Event2022 IEEE Global Communications Conference, GLOBECOM 2022 - Virtual, Online, Brazil
Duration: Dec 4 2022Dec 8 2022

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Computer Networks and Communications
  • Hardware and Architecture
  • Signal Processing

Fingerprint

Dive into the research topics of 'Fused-Layer-based DNN Model Parallelism and Partial Computation Offloading'. Together they form a unique fingerprint.

Cite this