TY - GEN
T1 - PrefixFPM
T2 - 36th IEEE International Conference on Data Engineering, ICDE 2020
AU - Yan, Da
AU - Qu, Wenwen
AU - Guo, Guimu
AU - Wang, Xiaoling
N1 - Funding Information:
We tested those 3 programs (parallel versions of PrefixSpan, gSpan and Sleuth) on large real datasets and found that in all experiments, an ideal speedup is obtained for at least 16 cores, and performance continues to improve till all 32 cores in our machine are used. As an illustration, Figure 7 shows the scalability results of gSpan’s parallel implementation with PrefixSpan on the NCI [2] molecular structure dataset with 265,242subgraphswherewesetτsup=60,000. Acknowledgments: The research of Da Yan and Guimu Guo is supported by NSF OAC 1755464 and NSF DGE 1723250. The research of Wenwen Qu and Xiaoling Wang is supported by NSFC grant (No. 61532021) and Zhejiang Lab (No. 2019KB0AB04).
Publisher Copyright:
© 2020 IEEE.
PY - 2020/4
Y1 - 2020/4
N2 - Frequent pattern mining (FPM) has been a focused theme in data mining research for decades, but there lacks a general programming framework that can be easily customized to mine different kinds of frequent patterns, and existing solutions to FPM over big transaction databases are IO-bound rendering CPU cores underutilized even though FPM is NP-hard.This paper presents, PrefixFPM, a general-purpose framework for FPM that is able to fully utilize the CPU cores in a multicore machine. PrefixFPM follows the idea of prefix projection to partition the workloads of PFM into independent tasks by divide and conquer. PrefixFPM exposes a unified programming interface to users who can customize it to mine their desired patterns, and the parallel execution engine is transparent to end-users and can be reused for mining all kinds of patterns. We have adapted the state-of-the-art serial algorithms for mining frequent patterns including subsequences, subtrees, and subgraphs on top of PrefixFPM, and extensive experiments demonstrate an excellent speedup ratio of PrefixFPM with the number of cores.A demo is available at https://youtu.be/PfioC0GDpsw; the code is available at https://github.com/yanlab19870714/PrefixFPM.
AB - Frequent pattern mining (FPM) has been a focused theme in data mining research for decades, but there lacks a general programming framework that can be easily customized to mine different kinds of frequent patterns, and existing solutions to FPM over big transaction databases are IO-bound rendering CPU cores underutilized even though FPM is NP-hard.This paper presents, PrefixFPM, a general-purpose framework for FPM that is able to fully utilize the CPU cores in a multicore machine. PrefixFPM follows the idea of prefix projection to partition the workloads of PFM into independent tasks by divide and conquer. PrefixFPM exposes a unified programming interface to users who can customize it to mine their desired patterns, and the parallel execution engine is transparent to end-users and can be reused for mining all kinds of patterns. We have adapted the state-of-the-art serial algorithms for mining frequent patterns including subsequences, subtrees, and subgraphs on top of PrefixFPM, and extensive experiments demonstrate an excellent speedup ratio of PrefixFPM with the number of cores.A demo is available at https://youtu.be/PfioC0GDpsw; the code is available at https://github.com/yanlab19870714/PrefixFPM.
UR - http://www.scopus.com/inward/record.url?scp=85085860957&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85085860957&partnerID=8YFLogxK
U2 - 10.1109/ICDE48307.2020.00208
DO - 10.1109/ICDE48307.2020.00208
M3 - Conference contribution
AN - SCOPUS:85085860957
T3 - Proceedings - International Conference on Data Engineering
SP - 1938
EP - 1941
BT - Proceedings - 2020 IEEE 36th International Conference on Data Engineering, ICDE 2020
PB - IEEE Computer Society
Y2 - 20 April 2020 through 24 April 2020
ER -