TY - GEN
T1 - Semi-Supervised and Incremental Sequence Analysis for Taxonomic Classification
AU - Fasino, Adriana
AU - Ozdogan, Emrecan
AU - Sokhansanj, Bahrad A.
AU - Rosen, Gail
AU - Polikar, Robi
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Metagenomic analysis is vital in determining what organisms are present in a microbial sample and why they are present. In this study, we explore the utility of MMseqs2, a bioinformatics pipeline, for taxonomic classification in metagenomics, focusing on 16S rRNA gene sequences. We evaluate the algorithm's performance in full dataset as well as batch-by-batch incremental processing, and more importantly, we add the capability of semi-supervised classification to this otherwise clustering only algorithm. Incremental updating is important because it allows seamless integration and processing of new data, whereas semi-supervised classification allows taxonomic identification of previously unknown organisms. We also evaluate the different clustering modes offered by MMseqs2, and compare MMseqs2 to our previously developed semi-supervised incremental algorithm SSI-VSEARCH. We show that MMseqs2's built-in clusterupdate function works well, and our semi-supervised classification capability adds new functionality to this bioinformatics processing pipeline.
AB - Metagenomic analysis is vital in determining what organisms are present in a microbial sample and why they are present. In this study, we explore the utility of MMseqs2, a bioinformatics pipeline, for taxonomic classification in metagenomics, focusing on 16S rRNA gene sequences. We evaluate the algorithm's performance in full dataset as well as batch-by-batch incremental processing, and more importantly, we add the capability of semi-supervised classification to this otherwise clustering only algorithm. Incremental updating is important because it allows seamless integration and processing of new data, whereas semi-supervised classification allows taxonomic identification of previously unknown organisms. We also evaluate the different clustering modes offered by MMseqs2, and compare MMseqs2 to our previously developed semi-supervised incremental algorithm SSI-VSEARCH. We show that MMseqs2's built-in clusterupdate function works well, and our semi-supervised classification capability adds new functionality to this bioinformatics processing pipeline.
UR - http://www.scopus.com/inward/record.url?scp=85182947610&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85182947610&partnerID=8YFLogxK
U2 - 10.1109/SSCI52147.2023.10371886
DO - 10.1109/SSCI52147.2023.10371886
M3 - Conference contribution
AN - SCOPUS:85182947610
T3 - 2023 IEEE Symposium Series on Computational Intelligence, SSCI 2023
SP - 1132
EP - 1138
BT - 2023 IEEE Symposium Series on Computational Intelligence, SSCI 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2023 IEEE Symposium Series on Computational Intelligence, SSCI 2023
Y2 - 5 December 2023 through 8 December 2023
ER -