Objective: The study of pathogenic mechanism at the genetic level by imaging genetics methods enables to effectively reveal the association of histopathology and genetics. However, there is a lack of effective and accurate tools to establish association models from macroscopic to microscopic. Methods: The multi-constrained joint non-negative matrix factorization (MCJNMF) was developed for simultaneous integration of genomic data and image data to identify common modules related to disease. Two types of data matrices were projected onto a common feature space, in which heterogeneous variables with large coefficients in the same projected direction form a common module. Meanwhile, the correlation between original data features was integrated by using regularization constraints to improve the biological relevance. Sparsity constraints and orthogonal constraints were performed on decomposition factors to minimize the redundancy between different bases and to reduce algorithm complexity. Results: This algorithm was successfully performed on the module identification of lung metastasis in soft tissue sarcomas (STSs) by integrating FDG-PET image and DNA methylation data features. Multilevel analysis on the top extracted modules revealed that these modules were closely related to the lung metastasis. Particularly, several genes with diagnostic potential for lung metastasis can be discovered from high score modules. Conclusion: This method not only can be applied for the accurate identification of patterns related to pathogenic mechanism of diseases, but also has a significant implication for discovering protein biomarkers. Significance: This method provides avenues for further studies of identifying complex association patterns of diseases according to different types of biological data.
All Science Journal Classification (ASJC) codes
- Biomedical Engineering