TY - JOUR
T1 - PrefixFPM
T2 - a parallel framework for general-purpose mining of frequent and closed patterns
AU - Yan, Da
AU - Qu, Wenwen
AU - Guo, Guimu
AU - Wang, Xiaoling
AU - Zhou, Yang
N1 - Funding Information:
Da Yan and Guimu Guo were supported by NSF OAC-1755464 and NSF DGE-1723250. Guimu Guo acknowledges financial support from the Alabama Graduate Research Scholars Program (GRSP) funded through the Alabama Commission for Higher Education and administered by the Alabama EPSCoR. Wenwen Qu and Xiaoling Wang were supported by NSFC grants (No. 61972155), the Science and Technology Commission of Shanghai Municipality (20DZ1100300), and the Open Project Fund from Shenzhen Institute of Artificial Intelligence and Robotics for Society.
Publisher Copyright:
© 2021, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.
PY - 2022/3
Y1 - 2022/3
N2 - A frequent pattern is a substructure that appears in a database with frequency (aka. support) no less than a user-specified threshold, while a closed pattern is one that has no super-pattern that has the same support. Here, a substructure can refer to different structural forms, such as itemsets, subsequences, subtrees, and subgraphs, and mining such substructures is important in many real applications such as product recommendation and feature extraction. Currently, there lacks a general programming framework that can be easily customized to mine different types of patterns, and existing parallel and distributed solutions are IO-bound rendering CPU cores underutilized. Since mining frequent and/or closed patterns are NP-hard, it is important to fully utilize the available CPU cores. This paper presents such a general-purpose framework called PrefixFPM. The framework is based on the idea of prefix projection which allows a divide-and-conquer mining paradigm. PrefixFPM exposes a unified programming interface to users who can readily customize it to mine their desired patterns. We have adapted the state-of-the-art serial algorithms for mining patterns including subsequences, subtrees, and subgraphs on top of PrefixFPM, and extensive experiments demonstrate an excellent speedup ratio of PrefixFPM with the number of CPU cores.
AB - A frequent pattern is a substructure that appears in a database with frequency (aka. support) no less than a user-specified threshold, while a closed pattern is one that has no super-pattern that has the same support. Here, a substructure can refer to different structural forms, such as itemsets, subsequences, subtrees, and subgraphs, and mining such substructures is important in many real applications such as product recommendation and feature extraction. Currently, there lacks a general programming framework that can be easily customized to mine different types of patterns, and existing parallel and distributed solutions are IO-bound rendering CPU cores underutilized. Since mining frequent and/or closed patterns are NP-hard, it is important to fully utilize the available CPU cores. This paper presents such a general-purpose framework called PrefixFPM. The framework is based on the idea of prefix projection which allows a divide-and-conquer mining paradigm. PrefixFPM exposes a unified programming interface to users who can readily customize it to mine their desired patterns. We have adapted the state-of-the-art serial algorithms for mining patterns including subsequences, subtrees, and subgraphs on top of PrefixFPM, and extensive experiments demonstrate an excellent speedup ratio of PrefixFPM with the number of CPU cores.
UR - http://www.scopus.com/inward/record.url?scp=85112055252&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85112055252&partnerID=8YFLogxK
U2 - 10.1007/s00778-021-00687-0
DO - 10.1007/s00778-021-00687-0
M3 - Article
AN - SCOPUS:85112055252
SN - 1066-8888
VL - 31
SP - 253
EP - 286
JO - VLDB Journal
JF - VLDB Journal
IS - 2
ER -