COM.on C.A.6:e7/39-44   Online published on Jan.17, 2012.
Application of Partial Least Squares in high-dimensional genomic data analysis

Panpan Wang

MOE Laboratory of Contemporary Anthropology, School of Life Science, Fudan University, Shanghai 200433, China

ABSTRACT: Partial Least Squares (PLS) is a statistical regression technology which could perform well on the analysis of high-dimensional genomic data, such as the microarray data, SNP data from GWAS, and proteomic data. In this article, we review the challenges that are faced by the classical linear regression, and lead to the advantages of PLS. PLS can not only solve the problem of co-linearity through dimension reduction but also the problem of regression singularity in the condition of small sample size and high dimensional predictive variables. We also provide some modified algorithms of PLS incorporate with the application in the real biological data analysis. For example, sparse partial least squares can simultaneously realize dimension reduction and variable selection, and the combination of PLS with cluster analysis or general linear regression can deal with diverse problems of data analysis.

Key words: partial least squares, high-dimensional genomic data, dimension reduction, variable selection

Recieved:  Dec.7, 2011   Accepted: Dec.14, 2011  Corresponding:

《现代人类学通讯》第六卷e7篇 第39-44页  2012年1月17日网上发行




复旦大学生命科学学院现代人类学教育部重点实验室 上海 200433




收稿日期:2011年12月7日  修回日期:2011年12月14日 联系人:王盼盼

