Clustering gene expression data using probabilistic non-negative matrix factorization

Belhassen Bayar, Nidhal Bouaynaya, Roman Shterenberg

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Non-negative matrix factorization (NMF) has proven to be a useful decomposition for multivariate data. Specifically, NMF appears to have advantages over other clustering methods, such as hierarchical clustering, for identification of distinct molecular patterns in gene expression profiles. The NMF algorithm, however, is deterministic. In particular, it does not take into account the noisy nature of the measured genomic signals. In this paper, we extend the NMF algorithm to the probabilistic case, where the data is viewed as a stochastic process. We show that the probabilistic NMF can be viewed as a weighted regularized matrix factorization problem, and derive the corresponding update rules. Our simulation results show that the probabilistic non-negative matrix factorization (PNMF) algorithm is more accurate and more robust than its deterministic homologue in clustering cancer subtypes in a leukemia microarray dataset.

    Original languageEnglish (US)
    Title of host publicationProceedings 2011 IEEE International Workshop on Genomic Signal Processing and Statistics, GENSIPS'11
    PublisherIEEE Computer Society
    Pages143-146
    Number of pages4
    ISBN (Print)9781467304900
    DOIs
    StatePublished - 2011
    Event2011 IEEE International Workshop on Genomic Signal Processing and Statistics, GENSIPS'11 - San Antonio, TX, United States
    Duration: Dec 4 2011Dec 6 2011

    Publication series

    NameProceedings - IEEE International Workshop on Genomic Signal Processing and Statistics
    ISSN (Print)2150-3001
    ISSN (Electronic)2150-301X

    Other

    Other2011 IEEE International Workshop on Genomic Signal Processing and Statistics, GENSIPS'11
    CountryUnited States
    CitySan Antonio, TX
    Period12/4/1112/6/11

    All Science Journal Classification (ASJC) codes

    • Biochemistry, Genetics and Molecular Biology (miscellaneous)
    • Computational Theory and Mathematics
    • Signal Processing
    • Biomedical Engineering

    Fingerprint Dive into the research topics of 'Clustering gene expression data using probabilistic non-negative matrix factorization'. Together they form a unique fingerprint.

    Cite this