Non-negative matrix factorization (NMF) has proven to be a useful decomposition technique for multivariate data, where the non-negativity constraint is necessary to have a meaningful physical interpretation. NMF reduces the dimensionality of non-negative data by decomposing it into two smaller non-negative factors with physical interpretation for class discovery. The NMF algorithm, however, assumes a deterministic framework. In particular, the effect of the data noise on the stability of the factorization and the convergence of the algorithm are unknown. Collected data, on the other hand, is stochastic in nature due to measurement noise and sometimes inherent variability in the physical process. This paper presents new theoretical and applied developments to the problem of non-negative matrix factorization (NMF). First, we generalize the deterministic NMF algorithm to include a general class of update rules that converges towards an optimal non-negative factorization. Second, we extend the NMF framework to the probabilistic case (PNMF). We show that the Maximum a posteriori (MAP) estimate of the non-negative factors is the solution to a weighted regularized non-negative matrix factorization problem. We subsequently derive update rules that converge towards an optimal solution. Third, we apply the PNMF to cluster and classify DNA microarrays data. The proposed PNMF is shown to outperform the deterministic NMF and the sparse NMF algorithms in clustering stability and classification accuracy.
|Journal of Bioinformatics and Computational Biology
|Published - Feb 2014
All Science Journal Classification (ASJC) codes
- Molecular Biology
- Computer Science Applications