Semi-Supervised and Incremental Sequence Analysis for Taxonomic Classification

Adriana Fasino, Emrecan Ozdogan, Bahrad A. Sokhansanj, Gail Rosen, Robi Polikar

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Metagenomic analysis is vital in determining what organisms are present in a microbial sample and why they are present. In this study, we explore the utility of MMseqs2, a bioinformatics pipeline, for taxonomic classification in metagenomics, focusing on 16S rRNA gene sequences. We evaluate the algorithm's performance in full dataset as well as batch-by-batch incremental processing, and more importantly, we add the capability of semi-supervised classification to this otherwise clustering only algorithm. Incremental updating is important because it allows seamless integration and processing of new data, whereas semi-supervised classification allows taxonomic identification of previously unknown organisms. We also evaluate the different clustering modes offered by MMseqs2, and compare MMseqs2 to our previously developed semi-supervised incremental algorithm SSI-VSEARCH. We show that MMseqs2's built-in clusterupdate function works well, and our semi-supervised classification capability adds new functionality to this bioinformatics processing pipeline.

Original languageEnglish (US)
Title of host publication2023 IEEE Symposium Series on Computational Intelligence, SSCI 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1132-1138
Number of pages7
ISBN (Electronic)9781665430654
DOIs
StatePublished - 2023
Externally publishedYes
Event2023 IEEE Symposium Series on Computational Intelligence, SSCI 2023 - Mexico City, Mexico
Duration: Dec 5 2023Dec 8 2023

Publication series

Name2023 IEEE Symposium Series on Computational Intelligence, SSCI 2023

Conference

Conference2023 IEEE Symposium Series on Computational Intelligence, SSCI 2023
Country/TerritoryMexico
CityMexico City
Period12/5/2312/8/23

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Computer Science Applications
  • Human-Computer Interaction
  • Decision Sciences (miscellaneous)
  • Safety, Risk, Reliability and Quality

Fingerprint

Dive into the research topics of 'Semi-Supervised and Incremental Sequence Analysis for Taxonomic Classification'. Together they form a unique fingerprint.

Cite this