The Naïve Bayes classifier++ for metagenomic taxonomic classification - Query evaluation

Haozhe Duan, Gavin Hearne, Robi Polikar, Gail L. Rosen

    Research output: Contribution to journalArticlepeer-review

    Abstract

    Motivation: This study examines the query performance of the NBC++ (Incremental Naive Bayes Classifier) program for variations in canonicality, k-mer size, databases, and input sample data size. We demonstrate that both NBC++ and Kraken2 are influenced by database depth, with macro measures improving as depth increases. However, fully capturing the diversity of life, especially viruses, remains a challenge. Results: NBC++ can competitively profile the superkingdom content of metagenomic samples using a small training database. NBC++ spends less time training and can use a fraction of the memory than Kraken2 but at the cost of long querying time. Major NBC++ enhancements include accommodating canonical k-mer storage (leading to significant storage savings) and adaptable and optimized memory allocation that accelerates query analysis and enables the software to be run on nearly any system. Additionally, the output now includes log-likelihood values for each training genome, providing users with valuable confidence information.

    Original languageEnglish (US)
    Article numberbtae743
    JournalBioinformatics
    Volume41
    Issue number1
    DOIs
    StatePublished - Jan 1 2025

    All Science Journal Classification (ASJC) codes

    • Statistics and Probability
    • Biochemistry
    • Molecular Biology
    • Computer Science Applications
    • Computational Theory and Mathematics
    • Computational Mathematics

    Fingerprint

    Dive into the research topics of 'The Naïve Bayes classifier++ for metagenomic taxonomic classification - Query evaluation'. Together they form a unique fingerprint.

    Cite this