Neural network-based taxonomic clustering for metagenomics

Steven D. Essinger, Robi Polikar, Gail L. Rosen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

Metagenomic studies inherently involve sampling genetic information from an environment potentially containing thousands of distinctly different microbial organisms. This genetic information is sequenced producing many short fragments (<500 base pair (bp)); each is tentatively a small representative of the DNA coding structure. Any of the fragments may belong to any of the organisms in the sample, but the relationship is unknown a priori. Furthermore, most of these organisms have not been identified and correspondingly are not represented in any of the publicly available search databases. Our goal is to be able to predict the taxonomic classification of an organism based on the fragments obtained from an environmental sample that may include many (some previously unidentified) organisms. To elucidate the diversity and composition of the sample, we first use a supervised naïve Bayes classifier to score the fragments of known genomes, followed by an unsupervised clustering to group fragments from similar organisms together. We are then free to analyze each cluster separately. This is challenging since we are not interested in similar sequences, but sequences that come from similar genomes, which are known to vary widely intra-genomically. Our dataset comprises of an extremely challenging scenario involving clustering fragments at the phyla level, where none of the phyla have been previously seen or identified. We present two variations of our proposed approach, one based on ART and K-means. We show that ART can cluster 500bp fragments from 17 novel phyla at an overall isolation/grouping that is 10% better than K-means and nearly 7 times over chance.

Original languageEnglish (US)
Title of host publication2010 IEEE World Congress on Computational Intelligence, WCCI 2010 - 2010 International Joint Conference on Neural Networks, IJCNN 2010
DOIs
StatePublished - 2010
Event2010 6th IEEE World Congress on Computational Intelligence, WCCI 2010 - 2010 International Joint Conference on Neural Networks, IJCNN 2010 - Barcelona, Spain
Duration: Jul 18 2010Jul 23 2010

Publication series

NameProceedings of the International Joint Conference on Neural Networks

Other

Other2010 6th IEEE World Congress on Computational Intelligence, WCCI 2010 - 2010 International Joint Conference on Neural Networks, IJCNN 2010
CountrySpain
CityBarcelona
Period7/18/107/23/10

All Science Journal Classification (ASJC) codes

  • Software
  • Artificial Intelligence

Fingerprint Dive into the research topics of 'Neural network-based taxonomic clustering for metagenomics'. Together they form a unique fingerprint.

Cite this