Reducing makespans of DAG scheduling through interleaving overlapping resource utilization

Yubin Duan, Ning Wang, W. Jie

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

As data center clusters need to process quintillion bytes of data per day, it becomes a critical problem that efficiently scheduling jobs to improve resource utilization. However, the data analysis job usually contains multiple stages with dependent relationships, which brings challenges for scheduling. Those stages are modeled as Directed Acyclic Graphs (DAGs) and the general DAG scheduling problem is NP-hard. In this paper, we notice that in some parallel computing frameworks such as Spark, the execution of each stage could be divided into multiple phases that use different resources. We observe that interleaving different resources in a pipelined manner could improve resource utilization. Based on this observation, we propose to minimize the job makespan by exploiting resource pipeline. We first theoretically analyze the scheduling for perfectly parallel stages. In this case, our scheduling problem is equivalent to a DAG shop problem which is NP-hard. A contention-free scheduler is proposed and its approximation properties are analyzed. Stages of real-world jobs are usually not perfectly parallel. For general jobs, a reinforcement learning (RL) based scheduler is proposed to adaptively adjust the resource contention. We evaluate our contention-free and RL-based schedulers on a Spark cluster deployed on the Amazon EC2. Experiments on real-world and synthetic datasets show our RL-based scheduler can improve the CPU and network utilization by 33.0% and 29.7%, respectively.

Original languageEnglish (US)
Title of host publicationProceedings - 2020 IEEE 17th International Conference on Mobile Ad Hoc and Smart Systems, MASS 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages392-400
Number of pages9
ISBN (Electronic)9781728198668
DOIs
StatePublished - Dec 2020
Event17th IEEE International Conference on Mobile Ad Hoc and Smart Systems, MASS 2020 - Virtual, Delhi, India
Duration: Dec 10 2020Dec 13 2020

Publication series

NameProceedings - 2020 IEEE 17th International Conference on Mobile Ad Hoc and Smart Systems, MASS 2020

Conference

Conference17th IEEE International Conference on Mobile Ad Hoc and Smart Systems, MASS 2020
Country/TerritoryIndia
CityVirtual, Delhi
Period12/10/2012/13/20

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Computer Networks and Communications
  • Computer Science Applications
  • Hardware and Architecture
  • Information Systems and Management
  • Safety, Risk, Reliability and Quality

Fingerprint

Dive into the research topics of 'Reducing makespans of DAG scheduling through interleaving overlapping resource utilization'. Together they form a unique fingerprint.

Cite this