Abstract
Motivation: This study examines the query performance of the NBC++ (Incremental Naive Bayes Classifier) program for variations in canonicality, k-mer size, databases, and input sample data size. We demonstrate that both NBC++ and Kraken2 are influenced by database depth, with macro measures improving as depth increases. However, fully capturing the diversity of life, especially viruses, remains a challenge. Results: NBC++ can competitively profile the superkingdom content of metagenomic samples using a small training database. NBC++ spends less time training and can use a fraction of the memory than Kraken2 but at the cost of long querying time. Major NBC++ enhancements include accommodating canonical k-mer storage (leading to significant storage savings) and adaptable and optimized memory allocation that accelerates query analysis and enables the software to be run on nearly any system. Additionally, the output now includes log-likelihood values for each training genome, providing users with valuable confidence information.
Original language | English (US) |
---|---|
Article number | btae743 |
Journal | Bioinformatics |
Volume | 41 |
Issue number | 1 |
DOIs | |
State | Published - Jan 1 2025 |
All Science Journal Classification (ASJC) codes
- Statistics and Probability
- Biochemistry
- Molecular Biology
- Computer Science Applications
- Computational Theory and Mathematics
- Computational Mathematics