PrefixFPM: a parallel framework for general-purpose mining of frequent and closed patterns

Da Yan, Wenwen Qu, Guimu Guo, Xiaoling Wang, Yang Zhou

Research output: Contribution to journalArticlepeer-review

13 Scopus citations

Abstract

A frequent pattern is a substructure that appears in a database with frequency (aka. support) no less than a user-specified threshold, while a closed pattern is one that has no super-pattern that has the same support. Here, a substructure can refer to different structural forms, such as itemsets, subsequences, subtrees, and subgraphs, and mining such substructures is important in many real applications such as product recommendation and feature extraction. Currently, there lacks a general programming framework that can be easily customized to mine different types of patterns, and existing parallel and distributed solutions are IO-bound rendering CPU cores underutilized. Since mining frequent and/or closed patterns are NP-hard, it is important to fully utilize the available CPU cores. This paper presents such a general-purpose framework called PrefixFPM. The framework is based on the idea of prefix projection which allows a divide-and-conquer mining paradigm. PrefixFPM exposes a unified programming interface to users who can readily customize it to mine their desired patterns. We have adapted the state-of-the-art serial algorithms for mining patterns including subsequences, subtrees, and subgraphs on top of PrefixFPM, and extensive experiments demonstrate an excellent speedup ratio of PrefixFPM with the number of CPU cores.

Original languageEnglish (US)
Pages (from-to)253-286
Number of pages34
JournalVLDB Journal
Volume31
Issue number2
DOIs
StatePublished - Mar 2022
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'PrefixFPM: a parallel framework for general-purpose mining of frequent and closed patterns'. Together they form a unique fingerprint.

Cite this