Fused-Layer-based DNN Model Parallelism and Partial Computation Offloading

Mingze Li, Ning Wang, Huan Zhou, Yubin Duan, Jie Wu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

With the development of Internet of Things (IoT) and the advance of deep learning, there is an urgent need to enable deep learning inference on IoT devices. To address the computation limitation of IoT devices in processing complex Deep Neural Networks (DNNs), partial computation offloading is developed to dynamically adjust computation offloading assignment strategy in different channel conditions for better performance. In this paper, we take advantage of intrinsic DNN computation characteristics, and propose a novel Fused-Layer-based (FL-based) DNN model parallelism method to accelerate inference. The key idea is that a DNN layer can be converted to several smaller layers to increase partial computation offloading flexibility, and thus further create better computation offloading solution. However, there is a trade-off between parallelism computation offloading flexibility and model parallelism overhead. Then, we discuss the optimal DNN model parallelism and the corresponding scheduling and offloading strategies in partial computation offloading. In particular, we present a Minimizing Waiting (MW) method, which explores both the FL strategy, the path scheduling strategy, and the path offloading strategy to reduce time complexity. Finally, we validate the effectiveness of the proposed method in commonly used DNNs. The results show that the proposed method can reduce the DNN inference time by an average of 18.39 times compared with No FL (NFL) algorithm, and is very close to the optimal solution Brute Force (BF) with greatly reduced time complexity.

Original languageEnglish (US)
Title of host publication2022 IEEE Global Communications Conference, GLOBECOM 2022 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages5195-5200
Number of pages6
ISBN (Electronic)9781665435406
DOIs
StatePublished - 2022
Event2022 IEEE Global Communications Conference, GLOBECOM 2022 - Virtual, Online, Brazil
Duration: Dec 4 2022Dec 8 2022

Publication series

Name2022 IEEE Global Communications Conference, GLOBECOM 2022 - Proceedings

Conference

Conference2022 IEEE Global Communications Conference, GLOBECOM 2022
Country/TerritoryBrazil
CityVirtual, Online
Period12/4/2212/8/22

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Computer Networks and Communications
  • Hardware and Architecture
  • Signal Processing
  • Renewable Energy, Sustainability and the Environment
  • Safety, Risk, Reliability and Quality

Fingerprint

Dive into the research topics of 'Fused-Layer-based DNN Model Parallelism and Partial Computation Offloading'. Together they form a unique fingerprint.

Cite this