COM.on C.A.6:e7/39-44
Online published on
Jan.17, 2012.Panpan Wang
MOE Laboratory of Contemporary Anthropology, School of Life Science, Fudan University, Shanghai 200433, China
Key words: partial least squares, high-dimensional genomic data, dimension reduction, variable selection
Recieved: Dec.7, 2011 Accepted: Dec.14, 2011 Corresponding: catherine64278@163.com
《现代人类学通讯》第六卷e7篇 第39-44页 2012年1月17日网上发行
专题综述
偏最小二乘在高维基因组数据分析中的应用
王盼盼
复旦大学生命科学学院现代人类学教育部重点实验室 上海 200433
摘要:偏最小二乘是一个非常高效的统计回归技术,它能很好的应用于高维的基因组数据的分析中,如基因表达的芯片数据,全基因组关联分析的SNP数据,甚至蛋白质组数据等。在本文中,我们将从最初的线性回归讲起,引出偏最小二乘回归在高维数据分析中的优势。它不仅能通过降维解决预测变量的共线性问题,也能解决样本数目偏少的回归奇异性问题。并结合偏最小二乘在实际生物数据中的应用,给出修正的算法。如稀疏的偏最小二乘方法能在降维的同时实现变量选择,偏最小二乘与聚类分析或广义线性回归结合能更多的应用于各种不同的数据分析问题。
关键词:偏最小二乘;高维基因组数据;降维;变量选择
收稿日期:2011年12月7日
修回日期:2011年12月14日
联系人:王盼盼
catherine64278@163.com
全文链接 Full text:
[PDF]
参考文献 References
1.Maitra S,Yan J (2008) Principle Component
Analysis and Partial Least Squares:Two
Dimension Reduction Techniques for
Regression. Casualty Actuarial
Society:80-90.
2.Boulesteix AL,Strimmer K (2005) Partial
Least Squares: A Versatile Tool for the
Analysis of High-Dimensional Genomic Data.
Seminar for Applied Stochastics.
3.Martens H (2001) Reliable and relevant
modelling of real world data: a personal
account of the development of PLS
Regression. Chemometr Intell Lab
58(2):85-95.
4.Wold S (2001) Personal memories of the
early PLS development. Chemometr Intell Lab
58(2):83-84.
5.Garthwaite PH (1994) An Interpretation of
Partial Least-Squares. J Am Stat Assoc
89(425):122-127.
6.Martens H,Naes T (1989) Multivariate
Calibration. New York: Wiley.
7. Boulesteix AL, Strimmer K (2005)
Predicting transcription factor activities
from combined analysis of microarray and
ChIP data: a partial least squares approach.
Theor Biol Med Model 2:23.
8.Datta S, Pihur V,Datta S (2008)
Reconstruction of genetic association
networks from microarray data: a partial
least squares approach. Bioinformatics
24:561-568.
9.Brown PO, Ross DT, Scherf U, Eisen MB,
Perou CM, Rees C, Spellman P, Iyer V,
Jeffrey SS, Van de Rijn M, Waltham M,
Pergamenschikov A, Lee JCE, Lashkari D,
Shalon D, Myers TG, Weinstein JN, Botstein D
(2000) Systematic variation in gene
expression patterns in human cancer cell
lines. Nat Genet 24:227-235.
10.Boulesteix A-L (2004) PLS Dimension
Reduction for Classification with Microarray
Data. Statistical Applications in Genetics
and Molecular Biology 3(1):A33.
11.Musumarra G, Barresi V, Condorelli
DF,Scire S (2003) A bioinformatic approach
to the identification of candidate genes for
the development of new cancer diagnostics.
Biol Chem 384:321-327.
12.Nguyen DV,Rocke DM (2002) Partial least
squares proportional hazard regression for
application to DNA microarray survival data.
Bioinformatics 18:1625-1632.
13.Keles S,Chun H (2010) Sparse partial
least squares regression for simultaneous
dimension reduction and variable selection.
J Roy Stat Soc B 72:3-25.
14.Keles S,Chun H (2009) Expression
Quantitative Trait Loci Mapping With
Multivariate Sparse Partial Least Squares
Regression. Genetics 182:79-90.
15.Chun HH, Ballard DH, Cho J, Zhao HY
(2011) Identification of Association Between
Disease and Multiple Markers Via Sparse
Partial Least-Squares Regression. Genet
Epidemiol 35:479-486.