TY - GEN
T1 - Parallel mining of frequent subtree patterns
AU - Qu, Wenwen
AU - Yan, Da
AU - Guo, Guimu
AU - Wang, Xiaoling
AU - Zou, Lei
AU - Zhou, Yang
N1 - Publisher Copyright:
© 2020, Springer Nature Switzerland AG.
PY - 2020
Y1 - 2020
N2 - Mining frequent subtree patterns in a tree database (or, forest) is useful in domains such as bioinformatics and mining semi-structured data. We consider the problem of mining embedded subtrees in a database of rooted, labeled, and ordered trees. We compare two existing serial mining algorithms, PrefixTreeSpan and TreeMiner, and adapt them for parallel execution using PrefixFPM, our general-purpose framework for frequent pattern mining that is designed to effectively utilize the CPU cores in a multicore machine. Our experiments show that TreeMiner is faster than its successor PrefixTreeSpan when a limited number of CPU cores are used, as the total mining workloads is smaller; however, PrefixTreeSpan has a much higher speedup ratio and can beat TreeMiner when given enough CPU cores.
AB - Mining frequent subtree patterns in a tree database (or, forest) is useful in domains such as bioinformatics and mining semi-structured data. We consider the problem of mining embedded subtrees in a database of rooted, labeled, and ordered trees. We compare two existing serial mining algorithms, PrefixTreeSpan and TreeMiner, and adapt them for parallel execution using PrefixFPM, our general-purpose framework for frequent pattern mining that is designed to effectively utilize the CPU cores in a multicore machine. Our experiments show that TreeMiner is faster than its successor PrefixTreeSpan when a limited number of CPU cores are used, as the total mining workloads is smaller; however, PrefixTreeSpan has a much higher speedup ratio and can beat TreeMiner when given enough CPU cores.
UR - http://www.scopus.com/inward/record.url?scp=85097244848&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85097244848&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-61133-0_2
DO - 10.1007/978-3-030-61133-0_2
M3 - Conference contribution
AN - SCOPUS:85097244848
SN - 9783030611323
T3 - Communications in Computer and Information Science
SP - 18
EP - 32
BT - Software Foundations for Data Interoperability and Large Scale Graph Data Analytics - 4th International Workshop, SFDI 2020, and 2nd International Workshop, LSGDA 2020, held in Conjunction with VLDB 2020, Proceedings
A2 - Qin, Lu
A2 - Zhang, Wenjie
A2 - Zhang, Ying
A2 - Peng, You
A2 - Kato, Hiroyuki
A2 - Wang, Wei
A2 - Xiao, Chuan
PB - Springer Science and Business Media Deutschland GmbH
T2 - 4th International Workshop on Software Foundations for Data Interoperability, SFDI 2020 and 2nd International Workshop on Large Scale Graph Data Analytics, LSGDA 2020, held in Conjunction with VLDB 2020
Y2 - 4 September 2020 through 4 September 2020
ER -