Integrative analysis of gene expression and copy number alterations using canonical correlation analysis
Summary, in English
Results: Using the correlation-maximizing methods, regularized dual CCA and PCA+CCA, we show that without pre-selection of known disease-relevant genes, and without using information about clinical class membership, an exploratory analysis singles out two patient groups, corresponding to well-known leukemia subtypes. Furthermore, the variables showing the highest relevance to the extracted features agree with previous biological knowledge concerning copy number alterations and gene expression changes in these subtypes. Finally, the correlation-maximizing methods are shown to yield results which are more biologically interpretable than those resulting from a covariance-maximizing method, and provide different insight compared to when each variable set is studied separately using PCA.
Conclusions: We conclude that regularized dual CCA as well as PCA+CCA are useful methods for exploratory analysis of paired genetic data sets, and can be efficiently implemented also when the number of variables is very large.
- Mathematics (Faculty of Engineering)
- Division of Clinical Genetics
- BioCARE: Biomarkers in Cancer Medicine improving Health Care, Education and Innovation
BioMed Central (BMC)
- Bioinformatics and Systems Biology
- ISSN: 1471-2105