PrefixFPM: A parallel framework for general-purpose frequent pattern mining

Da Yan, Wenwen Qu, Guimu Guo, Xiaoling Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

21 Scopus citations

Abstract

Frequent pattern mining (FPM) has been a focused theme in data mining research for decades, but there lacks a general programming framework that can be easily customized to mine different kinds of frequent patterns, and existing solutions to FPM over big transaction databases are IO-bound rendering CPU cores underutilized even though FPM is NP-hard.This paper presents, PrefixFPM, a general-purpose framework for FPM that is able to fully utilize the CPU cores in a multicore machine. PrefixFPM follows the idea of prefix projection to partition the workloads of PFM into independent tasks by divide and conquer. PrefixFPM exposes a unified programming interface to users who can customize it to mine their desired patterns, and the parallel execution engine is transparent to end-users and can be reused for mining all kinds of patterns. We have adapted the state-of-the-art serial algorithms for mining frequent patterns including subsequences, subtrees, and subgraphs on top of PrefixFPM, and extensive experiments demonstrate an excellent speedup ratio of PrefixFPM with the number of cores.A demo is available at https://youtu.be/PfioC0GDpsw; the code is available at https://github.com/yanlab19870714/PrefixFPM.

Original languageEnglish (US)
Title of host publicationProceedings - 2020 IEEE 36th International Conference on Data Engineering, ICDE 2020
PublisherIEEE Computer Society
Pages1938-1941
Number of pages4
ISBN (Electronic)9781728129037
DOIs
StatePublished - Apr 2020
Externally publishedYes
Event36th IEEE International Conference on Data Engineering, ICDE 2020 - Dallas, United States
Duration: Apr 20 2020Apr 24 2020

Publication series

NameProceedings - International Conference on Data Engineering
Volume2020-April
ISSN (Print)1084-4627

Conference

Conference36th IEEE International Conference on Data Engineering, ICDE 2020
Country/TerritoryUnited States
CityDallas
Period4/20/204/24/20

All Science Journal Classification (ASJC) codes

  • Software
  • Signal Processing
  • Information Systems

Fingerprint

Dive into the research topics of 'PrefixFPM: A parallel framework for general-purpose frequent pattern mining'. Together they form a unique fingerprint.

Cite this