首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
MOTIVATION: One important aspect of data-mining of microarray data is to discover the molecular variation among cancers. In microarray studies, the number n of samples is relatively small compared to the number p of genes per sample (usually in thousands). It is known that standard statistical methods in classification are efficient (i.e. in the present case, yield successful classifiers) particularly when n is (far) larger than p. This naturally calls for the use of a dimension reduction procedure together with the classification one. RESULTS: In this paper, the question of classification in such a high-dimensional setting is addressed. We view the classification problem as a regression one with few observations and many predictor variables. We propose a new method combining partial least squares (PLS) and Ridge penalized logistic regression. We review the existing methods based on PLS and/or penalized likelihood techniques, outline their interest in some cases and theoretically explain their sometimes poor behavior. Our procedure is compared with these other classifiers. The predictive performance of the resulting classification rule is illustrated on three data sets: Leukemia, Colon and Prostate.  相似文献   

2.
MOTIVATION: One important application of gene expression microarray data is classification of samples into categories, such as the type of tumor. The use of microarrays allows simultaneous monitoring of thousands of genes expressions per sample. This ability to measure gene expression en masse has resulted in data with the number of variables p(genes) far exceeding the number of samples N. Standard statistical methodologies in classification and prediction do not work well or even at all when N < p. Modification of existing statistical methodologies or development of new methodologies is needed for the analysis of microarray data. RESULTS: We propose a novel analysis procedure for classifying (predicting) human tumor samples based on microarray gene expressions. This procedure involves dimension reduction using Partial Least Squares (PLS) and classification using Logistic Discrimination (LD) and Quadratic Discriminant Analysis (QDA). We compare PLS to the well known dimension reduction method of Principal Components Analysis (PCA). Under many circumstances PLS proves superior; we illustrate a condition when PCA particularly fails to predict well relative to PLS. The proposed methods were applied to five different microarray data sets involving various human tumor samples: (1) normal versus ovarian tumor; (2) Acute Myeloid Leukemia (AML) versus Acute Lymphoblastic Leukemia (ALL); (3) Diffuse Large B-cell Lymphoma (DLBCLL) versus B-cell Chronic Lymphocytic Leukemia (BCLL); (4) normal versus colon tumor; and (5) Non-Small-Cell-Lung-Carcinoma (NSCLC) versus renal samples. Stability of classification results and methods were further assessed by re-randomization studies.  相似文献   

3.
Bjørnstad A  Westad F  Martens H 《Hereditas》2004,141(2):149-165
The utility of a relatively new multivariate method, bi-linear modelling by cross-validated partial least squares regression (PLSR), was investigated in the analysis of QTL. The distinguishing feature of PLSR is to reveal reliable covariance structures in data of different types with regard to the same set objects. Two matrices X (here: genetic markers) and Y (here: phenotypes) are interactively decomposed into latent variables (PLS components, or PCs) in a way which facilitates statistically reliable and graphically interpretable model building. Natural collinearities between input variables are utilized actively to stabilise the modelling, instead of being treated as a statistical problem. The importance of cross-validation/jack-knifing as an intuitively appealing way to avoid overfitting, is emphasized. Two datasets from chromosomal mapping studies of different complexity were chosen for illustration (QTL for tomato yield and for oat heading date). Results from PLSR analysis were compared to published results and to results using the package PLABQTL in these data sets. In all cases PLSR gave at least similar explained validation variances as the reported studies. An attractive feature is that PLSR allows the analysis of several traits/replicates in one analysis, and the direct visual identification of individuals with desirable marker genotypes. It is suggested that PLSR may be useful in structural and functional genomics and in marker assisted selection, particularly in cases with limited number of objects.  相似文献   

4.
以采取植被恢复措施的陕西省吴起县为研究区,实地采集24个土壤剖面不同层次的黄绵土土样100个,在进行土壤样本全氮(TN)和碱解氮(AHN)含量及实验室反射光谱数据测量和分析的基础上,用相关分析(CA)结合偏最小二乘回归(PLS)方法建立黄绵土土壤TN和AHN含量的校正模型,并用独立样本对校正模型进行验证.结果表明: 利用6种光谱变换方式建立的校正模型中,微分光谱建立的校正模型是预测研究区土壤TN含量的最佳模型,校正和验证R2分别为0.929和0.935,均方根误差(RMSE)分别为0.045和0.047 g·kg-1,相对预测偏差(RPD)为3.12;而归一化变换建立的校正模型是预测土壤AHN含量的最佳模型,校正和验证R2分别为0.873和0.773,RMSE分别为9.946和16.204 mg·kg-1,RPD为1.538.所建立的全氮预测模型可以对0~40 cm土层的TN进行有效预测,而碱解氮的预测模型对同一深度只能进行粗略预测.本研究为采取植被恢复措施的退化生态系统区黄绵土土壤全氮的快速预测提供了一种较好的方法,但是对于碱解氮的准确、快速预测,需要进一步研究.  相似文献   

5.
Internal forces in the human body can be estimated from measured movements and external forces using inverse dynamic analysis. Here we present a general method of analysis which makes optimal use of all available data, and allows the use of inverse dynamic analysis in cases where external force data is incomplete. The method was evaluated for the analysis of running on a partially instrumented treadmill. It was found that results correlate well with those of a conventional analysis where all external forces are known.  相似文献   

6.

Background

The objective of the present study was to test the ability of the partial least squares regression technique to impute genotypes from low density single nucleotide polymorphisms (SNP) panels i.e. 3K or 7K to a high density panel with 50K SNP. No pedigree information was used.

Methods

Data consisted of 2093 Holstein, 749 Brown Swiss and 479 Simmental bulls genotyped with the Illumina 50K Beadchip. First, a single-breed approach was applied by using only data from Holstein animals. Then, to enlarge the training population, data from the three breeds were combined and a multi-breed analysis was performed. Accuracies of genotypes imputed using the partial least squares regression method were compared with those obtained by using the Beagle software. The impact of genotype imputation on breeding value prediction was evaluated for milk yield, fat content and protein content.

Results

In the single-breed approach, the accuracy of imputation using partial least squares regression was around 90 and 94% for the 3K and 7K platforms, respectively; corresponding accuracies obtained with Beagle were around 85% and 90%. Moreover, computing time required by the partial least squares regression method was on average around 10 times lower than computing time required by Beagle. Using the partial least squares regression method in the multi-breed resulted in lower imputation accuracies than using single-breed data. The impact of the SNP-genotype imputation on the accuracy of direct genomic breeding values was small. The correlation between estimates of genetic merit obtained by using imputed versus actual genotypes was around 0.96 for the 7K chip.

Conclusions

Results of the present work suggested that the partial least squares regression imputation method could be useful to impute SNP genotypes when pedigree information is not available.  相似文献   

7.
A computer program is described for the rapid calculation of least squares solutions for data fitted to different functions normally used in reassociation and hybridization kinetic measurements. The equations for the fraction not reacted as a function of Cot follow: First order, exp(-kCot); second order, (1+kCot)-1; variable order, (1+kCot)-n; approximate fraction of DNA sequence remaining single stranded, (1+kCot)-.44; and a function describing the pairing of tracer when the rate constant for the tracer (k) is distinct from the driver rate constant (kd): (formula: see text). Several components may be used for most of these functional forms. The standard deviations of the individual parameters at the solutions are calculated.  相似文献   

8.
There is an increasing need to link the large amount of genotypic data, gathered using microarrays for example, with various phenotypic data from patients. The classification problem in which gene expression data serve as predictors and a class label phenotype as the binary outcome variable has been examined extensively, but there has been less emphasis in dealing with other types of phenotypic data. In particular, patient survival times with censoring are often not used directly as a response variable due to the complications that arise from censoring. We show that the issues involving censored data can be circumvented by reformulating the problem as a standard Poisson regression problem. The procedure for solving the transformed problem is a combination of two approaches: partial least squares, a regression technique that is especially effective when there is severe collinearity due to a large number of predictors, and generalized linear regression, which extends standard linear regression to deal with various types of response variables. The linear combinations of the original variables identified by the method are highly correlated with the patient survival times and at the same time account for the variability in the covariates. The algorithm is fast, as it does not involve any matrix decompositions in the iterations. We apply our method to data sets from lung carcinoma and diffuse large B-cell lymphoma studies to verify its effectiveness.  相似文献   

9.
In the linear model with right-censored responses and many potential explanatory variables, regression parameter estimates may be unstable or, when the covariates outnumber the uncensored observations, not estimable. We propose an iterative algorithm for partial least squares, based on the Buckley-James estimating equation, to estimate the covariate effect and predict the response for a future subject with a given set of covariates. We use a leave-two-out cross-validation method for empirically selecting the number of components in the partial least-squares fit that approximately minimizes the error in estimating the covariate effect of a future observation. Simulation studies compare the methods discussed here with other dimension reduction techniques. Data from the AIDS Clinical Trials Group protocol 333 are used to motivate the methodology.  相似文献   

10.
Testing for serial correlation in least squares regression. II   总被引:4,自引:0,他引:4  
DURBIN J  WATSON GS 《Biometrika》1951,38(1-2):159-178
  相似文献   

11.
Testing for serial correlation in least squares regression. I   总被引:3,自引:0,他引:3  
DURBIN J  WATSON GS 《Biometrika》1950,37(3-4):409-428
  相似文献   

12.
Testing for serial correlation in least squares regression.III   总被引:4,自引:0,他引:4  
DURBIN  J.; WATSON  G. S. 《Biometrika》1971,58(1):1-19
  相似文献   

13.
14.
15.
A microcomputer program using an electronic spreadsheet program was developed to calculate pharmacokinetic values of thiacetarsamide sodium, a drug used to kill the adult heartworm (Dirofilaria immitis) parasite of the dog. A least squares, semilogarithmic regression analysis was done on data using a two compartment intravenous model. The data entered includes the time after injection and the drug concentration at that time. A graph of the points can be viewed or plotted, with the time on the x-axis, and the natural log of the drug concentration on the y-axis.The program was developed on a Televideo (512 kbytes) microcomputers7. The spreadsheet used was Lotus 12371  相似文献   

16.
采用偏最小二乘回归方法估测森林郁闭度   总被引:6,自引:0,他引:6  
以遥感数据与森林资源一类清查数据为基础,探讨了用Bootstrap方法筛选最优郁闭度估测变量,用偏最小二乘回归方法建立模型估测森林郁闭度的可行性.结果表明:无论是用所有变量构造的模型还是用所选最优变量构造的模型,郁闭度估测的相对偏差在5%左右.筛选出的最优变量与其他地区的研究结论差异很大,说明除了筛选方法,地带性植被和地形地貌的不同也会造成估测郁闭度最优变量的差异.  相似文献   

17.
Partial least squares discriminant analysis (PLS-DA) is a partial least squares regression of a set Y of binary variables describing the categories of a categorical variable on a set X of predictor variables. It is a compromise between the usual discriminant analysis and a discriminant analysis on the significant principal components of the predictor variables. This technique is specially suited to deal with a much larger number of predictors than observations and with multicollineality, two of the main problems encountered when analysing microarray expression data. We explore the performance of PLS-DA with published data from breast cancer (Perou et al. 2000). Several such analyses were carried out: (1) before vs after chemotherapy treatment, (2) estrogen receptor positive vs negative tumours, and (3) tumour classification. We found that the performance of PLS-DA was extremely satisfactory in all cases and that the discriminant cDNA clones often had a sound biological interpretation. We conclude that PLS-DA is a powerful yet simple tool for analysing microarray data.  相似文献   

18.
MOTIVATION: Gene association/interaction networks provide vast amounts of information about essential processes inside the cell. A complete picture of gene-gene associations/interactions would open new horizons for biologists, ranging from pure appreciation to successful manipulation of biological pathways for therapeutic purposes. Therefore, identification of important biological complexes whose members (genes and their products proteins) interact with each other is of prime importance. Numerous experimental methods exist but, for the most part, they are costly and labor intensive. Computational techniques, such as the one proposed in this work, provide a quick 'budget' solution that can be used as a screening tool before more expensive techniques are attempted. Here, we introduce a novel computational method based on the partial least squares (PLS) regression technique for reconstruction of genetic networks from microarray data. RESULTS: The proposed PLS method is shown to be an effective screening procedure for the detection of gene-gene interactions from microarray data. Both simulated and real microarray experiments show that the PLS-based approach is superior to its competitors both in terms of performance and applicability. AVAILABILITY: R code is available from the supplementary web-site whose URL is given below.  相似文献   

19.
20.
On the regression analysis of multivariate failure time data   总被引:19,自引:0,他引:19  
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号