TY - GEN
T1 - PrefixFPM
T2 - 36th IEEE International Conference on Data Engineering, ICDE 2020
AU - Yan, Da
AU - Qu, Wenwen
AU - Guo, Guimu
AU - Wang, Xiaoling
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/4
Y1 - 2020/4
N2 - Frequent pattern mining (FPM) has been a focused theme in data mining research for decades, but there lacks a general programming framework that can be easily customized to mine different kinds of frequent patterns, and existing solutions to FPM over big transaction databases are IO-bound rendering CPU cores underutilized even though FPM is NP-hard.This paper presents, PrefixFPM, a general-purpose framework for FPM that is able to fully utilize the CPU cores in a multicore machine. PrefixFPM follows the idea of prefix projection to partition the workloads of PFM into independent tasks by divide and conquer. PrefixFPM exposes a unified programming interface to users who can customize it to mine their desired patterns, and the parallel execution engine is transparent to end-users and can be reused for mining all kinds of patterns. We have adapted the state-of-the-art serial algorithms for mining frequent patterns including subsequences, subtrees, and subgraphs on top of PrefixFPM, and extensive experiments demonstrate an excellent speedup ratio of PrefixFPM with the number of cores.A demo is available at https://youtu.be/PfioC0GDpsw; the code is available at https://github.com/yanlab19870714/PrefixFPM.
AB - Frequent pattern mining (FPM) has been a focused theme in data mining research for decades, but there lacks a general programming framework that can be easily customized to mine different kinds of frequent patterns, and existing solutions to FPM over big transaction databases are IO-bound rendering CPU cores underutilized even though FPM is NP-hard.This paper presents, PrefixFPM, a general-purpose framework for FPM that is able to fully utilize the CPU cores in a multicore machine. PrefixFPM follows the idea of prefix projection to partition the workloads of PFM into independent tasks by divide and conquer. PrefixFPM exposes a unified programming interface to users who can customize it to mine their desired patterns, and the parallel execution engine is transparent to end-users and can be reused for mining all kinds of patterns. We have adapted the state-of-the-art serial algorithms for mining frequent patterns including subsequences, subtrees, and subgraphs on top of PrefixFPM, and extensive experiments demonstrate an excellent speedup ratio of PrefixFPM with the number of cores.A demo is available at https://youtu.be/PfioC0GDpsw; the code is available at https://github.com/yanlab19870714/PrefixFPM.
UR - http://www.scopus.com/inward/record.url?scp=85085860957&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85085860957&partnerID=8YFLogxK
U2 - 10.1109/ICDE48307.2020.00208
DO - 10.1109/ICDE48307.2020.00208
M3 - Conference contribution
AN - SCOPUS:85085860957
T3 - Proceedings - International Conference on Data Engineering
SP - 1938
EP - 1941
BT - Proceedings - 2020 IEEE 36th International Conference on Data Engineering, ICDE 2020
PB - IEEE Computer Society
Y2 - 20 April 2020 through 24 April 2020
ER -