G-thinker: A distributed framework for mining subgraphs in a big graph

Da Yan, Guimu Guo, Md Mashiur Rahman Chowdhury, M. Tamer Ozsu, Wei Shinn Ku, John C.S. Lui

Research output: Chapter in Book/Report/Conference proceedingConference contribution

19 Scopus citations

Abstract

Mining from a big graph those subgraphs that satisfy certain conditions is useful in many applications such as community detection and subgraph matching. These problems have a high time complexity, but existing systems to scale them are all IO-bound in execution. We propose the first truly CPU-bound distributed framework called G-thinker that adopts a user-friendly subgraph-centric vertex-pulling API for writing distributed subgraph mining algorithms. To utilize all CPU cores of a cluster, G-thinker features (1) a highly-concurrent vertex cache for parallel task access and (2) a lightweight task scheduling approach that ensures high task throughput. These designs well overlap communication with computation to minimize the CPU idle time. Extensive experiments demonstrate that G-thinker achieves orders of magnitude speedup compared even with the fastest existing subgraph-centric system, and it scales well to much larger and denser real network data. G-thinker is open-sourced at http://bit.ly/gthinker with detailed documentation.

Original languageEnglish (US)
Title of host publicationProceedings - 2020 IEEE 36th International Conference on Data Engineering, ICDE 2020
PublisherIEEE Computer Society
Pages1369-1380
Number of pages12
ISBN (Electronic)9781728129037
DOIs
StatePublished - Apr 2020
Externally publishedYes
Event36th IEEE International Conference on Data Engineering, ICDE 2020 - Dallas, United States
Duration: Apr 20 2020Apr 24 2020

Publication series

NameProceedings - International Conference on Data Engineering
Volume2020-April
ISSN (Print)1084-4627

Conference

Conference36th IEEE International Conference on Data Engineering, ICDE 2020
Country/TerritoryUnited States
CityDallas
Period4/20/204/24/20

All Science Journal Classification (ASJC) codes

  • Software
  • Signal Processing
  • Information Systems

Fingerprint

Dive into the research topics of 'G-thinker: A distributed framework for mining subgraphs in a big graph'. Together they form a unique fingerprint.

Cite this