首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 765 毫秒
1.
When the explanatory variables of a linear model are split into two groups, two notions of collinearity are defined: a collinearity between the variables of each group, of which the mean is called residual collinearity, and a collinearity between the two groups called explained collinearity. Canonical correlation analysis provides information about the collinearity: large canonical correlation coefficients correspond to some small eigenvalues and eigenvectors of the correlation matrix and characterise the explained collinearity. Other small eigenvalues of this matrix correspond to the residual collinearity. A selection of predictors can be performed from the canonical correlation variables, according to their partial correlation coefficient with the explained variable. In the proposed application, the results obtained by the selection of canonical variables are better than those given by classical regression and by principal component regression.  相似文献   

2.
Many studies have indicated relationships between individual species, but none have related combinations of overstory variables to understory herbaceous vegetation in a Ponderosa pine/Gambel oak ecosystem. Our objective was to determine not only the general relationships between the two sets of variables, but also identify the hyghest contributing variables. We used canonical correlation analysis to relate overstory variables (canopy cover, basal cover and density) to herbaceous vegetation cover variables. Canopy, basal, and ground cover were measured by the line intercept method using a 12.2 m tape as a sample unit. Tree density was measured by the Point-Center-Quarter method. The analysis was made with selected overstory variables and 5 understory herbaceous cover variables. This analysis revealed a significant canonical correlation between the two canonical variables (r=0.69). The analysis showed that among herbaceous cover variables, Oregon grape, Kentucky bluegrass, sedge, and foxtail barley; and among overstory variables, the density and the basal cover of Ponderosa pine indicated the highest positive contribution to the correlation of the two linear combinations while the density and canopy of Gambel oak negatively affected the canonical correlation.  相似文献   

3.
In this paper we develop an efficient optimization algorithm for solving canonical correlation analysis (CCA) with complex structured-sparsity-inducing penalties, including overlapping-group-lasso penalty and network-based fusion penalty. We apply the proposed algorithm to an important genome-wide association study problem, eQTL mapping. We show that, with the efficient optimization algorithm, one can easily incorporate rich structural information among genes into the sparse CCA framework, which improves the interpretability of the results obtained. Our optimization algorithm is based on a general excessive gap optimization framework and can scale up to millions of variables. We demonstrate the effectiveness of our algorithm on both simulated and real eQTL datasets.  相似文献   

4.
This paper presents a novel method to explore the intrinsic morphological correlation between the bones of a shoulder joint (humerus and scapula). To model this correlation, canonical correlation analysis (CCA) is used. We also propose a technique to predict a three-dimensional (3D) bone shape from its adjoining segment at a joint based on partial least squares regression (PLS). The high dimensional 3D surface information of a bone is represented by a few variables using principal component analysis, which also captures the pattern of variability of the shapes in our datasets. Our results show that the humerus set and scapula set have highly linear morphological relationship and that the correlation information can be used as a classifier. In this study, primate shoulder bone datasets were categorised into two clusters: great apes (including humans) and monkeys. A leave one out experiment was performed to test the robustness of this prediction method. The prediction behaviour using this method shows statistically significantly better results than using the mean shape from the training set.  相似文献   

5.
采用协惯量分析(PCA-CA COIA)和典范对应分析(CCA)两种排序方法, 对北京小龙门林场的黄檗 (Phellodendron amurense)群落进行了分析, 并用Spearman秩相关系数检验了对应排序轴的相关性。两种排序方法得出的结果基本一致, 两者的第一排序轴都反映了海拔高度和坡向对群落分布的影响, 而各自第二、第三排序轴所代表的环境意义有所差异, 并出现了交叉, 但是两者的前3个排序轴均反映了海拔、坡位、土壤厚度和凋落物层厚度的变化趋势, 说明在环境因子个数较少或共线性效应不明显的情况下, 协惯量分析也能达到CCA的分析效果, 并且在排序轴特征值解释量上高于典范对应分析。  相似文献   

6.
In this study we set out to investigate the possibility of linking phenological phases throughout the vegetation cycle, as a local-scale biological phenomenon, directly with large-scale atmospheric variables via two different empirical downscaling techniques. In recent years a number of methods have been developed to transfer atmospheric information at coarse General Circulation Model's grid resolutions to local scales and individual points. Here multiple linear regression (MLR) and canonical correlation analysis (CCA) have been selected as downscaling methods. Different validation experiments (e.g. temporal cross-validation, split-sample tests) are used to test the performance of both approaches and compare them for time series of 17 phenological phases and air temperatures from Central Europe as microscale variables. A number of atmospheric variables over the North Atlantic and Europe are utilized as macroscale predictors. The period considered is 1951–1998. Temporal cross-validation reveals that the CCA model generally performs better than MLR, which explains 20%–50% of the phenological variances, whereas the CCA model shows a range from 40% to over 60% throughout most of the vegetation cycle. To show the validity of employing phenological observations for downscaling purposes both methods (MLR and CCA) are also applied to gridded local air temperature time series over Central Europe. In this case there is no obvious superiority of the CCA model over the MLR model. Both models show explained variances from 40% to over 70% in the temporal cross-validation experiment. The results of this study indicate that time series of phenological occurrence dates are very compatible with the needs of empirical downscaling originally developed of local-scale atmospheric variables.  相似文献   

7.
Biomarker discovery aims to find small subsets of relevant variables in ‘omics data that correlate with the clinical syndromes of interest. Despite the fact that clinical phenotypes are usually characterized by a complex set of clinical parameters, current computational approaches assume univariate targets, e.g. diagnostic classes, against which associations are sought for. We propose an approach based on asymmetrical sparse canonical correlation analysis (SCCA) that finds multivariate correlations between the ‘omics measurements and the complex clinical phenotypes. We correlated plasma proteomics data to multivariate overlapping complex clinical phenotypes from tuberculosis and malaria datasets. We discovered relevant ‘omic biomarkers that have a high correlation to profiles of clinical measurements and are remarkably sparse, containing 1.5–3% of all ‘omic variables. We show that using clinical view projections we obtain remarkable improvements in diagnostic class prediction, up to 11% in tuberculosis and up to 5% in malaria. Our approach finds proteomic-biomarkers that correlate with complex combinations of clinical-biomarkers. Using the clinical-biomarkers improves the accuracy of diagnostic class prediction while not requiring the measurement plasma proteomic profiles of each subject. Our approach makes it feasible to use omics'' data to build accurate diagnostic algorithms that can be deployed to community health centres lacking the expensive ‘omics measurement capabilities.  相似文献   

8.
We address the identification of optimal biomarkers for the rapid diagnosis of neonatal sepsis. We employ both canonical correlation analysis (CCA) and sparse support vector machine (SSVM) classifiers to select the best subset of biomarkers from a large hematological data set collected from infants with suspected sepsis from Yale-New Haven Hospital''s Neonatal Intensive Care Unit (NICU). CCA is used to select sets of biomarkers of increasing size that are most highly correlated with infection. The effectiveness of these biomarkers is then validated by constructing a sparse support vector machine diagnostic classifier. We find that the following set of five biomarkers capture the essential diagnostic information (in order of importance): Bands, Platelets, neutrophil CD64, White Blood Cells, and Segs. Further, the diagnostic performance of the optimal set of biomarkers is significantly higher than that of isolated individual biomarkers. These results suggest an enhanced sepsis scoring system for neonatal sepsis that includes these five biomarkers. We demonstrate the robustness of our analysis by comparing CCA with the Forward Selection method and SSVM with LASSO Logistic Regression.  相似文献   

9.
This analysis was conducted to explore the association between 5 birth size measurements (weight, length and head, chest and mid-upper arm [MUAC] circumferences) as dependent variables and 10 maternal factors as independent variables using canonical correlation analysis (CCA). CCA considers simultaneously sets of dependent and independent variables and, thus, generates a substantially reduced type 1 error. Data were from women delivering a singleton live birth (n = 14506) while participating in a double-masked, cluster-randomized, placebo-controlled maternal vitamin A or β-carotene supplementation trial in rural Bangladesh. The first canonical correlation was 0.42 (P<0.001), demonstrating a moderate positive correlation mainly between the 5 birth size measurements and 5 maternal factors (preterm delivery, early pregnancy MUAC, infant sex, age and parity). A significant interaction between infant sex and preterm delivery on birth size was also revealed from the score plot. Thirteen percent of birth size variability was explained by the composite score of the maternal factors (Redundancy, RY/X = 0.131). Given an ability to accommodate numerous relationships and reduce complexities of multiple comparisons, CCA identified the 5 maternal variables able to predict birth size in this rural Bangladesh setting. CCA may offer an efficient, practical and inclusive approach to assessing the association between two sets of variables, addressing the innate complexity of interactions.  相似文献   

10.
To understand the role of human microbiota in health and disease, we need to study effects of environmental and other epidemiological variables on the composition of microbial communities. The composition of a microbial community may depend on multiple factors simultaneously. Therefore we need multivariate methods for detecting, analyzing and visualizing the interactions between environmental variables and microbial communities. We provide two different approaches for multivariate analysis of these complex combined datasets: (i) We select variables that correlate with overall microbiota composition and microbiota members that correlate with the metadata using canonical correlation analysis, determine independency of the observed correlations in a multivariate regression analysis, and visualize the effect size and direction of the observed correlations using heatmaps; (ii) We select variables and microbiota members using univariate or bivariate regression analysis, followed by multivariate regression analysis, and visualize the effect size and direction of the observed correlations using heatmaps. We illustrate the results of both approaches using a dataset containing respiratory microbiota composition and accompanying metadata. The two different approaches provide slightly different results; with approach (i) using canonical correlation analysis to select determinants and microbiota members detecting fewer and stronger correlations only and approach (ii) using univariate or bivariate analyses to select determinants and microbiota members detecting a similar but broader pattern of correlations. The proposed approaches both detect and visualize independent correlations between multiple environmental variables and members of the microbial community. Depending on the size of the datasets and the hypothesis tested one can select the method of preference.  相似文献   

11.
Posture segmentation plays an essential role in human motion analysis. The state-of-the-art method extracts sufficiently high-dimensional features from 3D depth images for each 3D point and learns an efficient body part classifier. However, high-dimensional features are memory-consuming and difficult to handle on large-scale training dataset. In this paper, we propose an efficient two-stage dimension reduction scheme, termed biview learning, to encode two independent views which are depth-difference features (DDF) and relative position features (RPF). Biview learning explores the complementary property of DDF and RPF, and uses two stages to learn a compact yet comprehensive low-dimensional feature space for posture segmentation. In the first stage, discriminative locality alignment (DLA) is applied to the high-dimensional DDF to learn a discriminative low-dimensional representation. In the second stage, canonical correlation analysis (CCA) is used to explore the complementary property of RPF and the dimensionality reduced DDF. Finally, we train a support vector machine (SVM) over the output of CCA. We carefully validate the effectiveness of DLA and CCA utilized in the two-stage scheme on our 3D human points cloud dataset. Experimental results show that the proposed biview learning scheme significantly outperforms the state-of-the-art method for human posture segmentation.  相似文献   

12.
13.
This laboratory study of a variably mineralized and hydrothermally altered granite outcrop investigated the influences of rock-surface chemistry and heavy metal content on resident bacterial populations. Results indicated that elevated heavy metal concentrations had a profound impact on bacterial community structure, with strong relationships found between certain ribotypes and particular chemical/heavy metal elements. Automated ribosomal intergenic sequence analysis (ARISA) was used to assess the nature and extent of bacterial diversity, and this was combined with chemical analysis and multivariate statistics to identify the main geochemical factors influencing bacterial community structure. A randomization test revealed significant changes in bacterial structure between samples, while canonical correspondence analysis (CCA) related each individual ARISA profile to linear combinations of the chemical variables (mineralogy, major element and heavy metal concentrations) revealing the geochemical factors that correlated with changes in the ARISA data. anova was performed to further explore interactions between individual ribotypes and chemical/heavy metal composition, and revealed that a high proportion of ribotypes correlated significantly with heavy metals.  相似文献   

14.
The genome-wide association study (GWAS) approach has discovered hundreds of genetic variants associated with diseases and quantitative traits. However, despite clinical overlap and statistical correlation between many phenotypes, GWAS are generally performed one-phenotype-at-a-time. Here we compare the performance of modelling multiple phenotypes jointly with that of the standard univariate approach. We introduce a new method and software, MultiPhen, that models multiple phenotypes simultaneously in a fast and interpretable way. By performing ordinal regression, MultiPhen tests the linear combination of phenotypes most associated with the genotypes at each SNP, and thus potentially captures effects hidden to single phenotype GWAS. We demonstrate via simulation that this approach provides a dramatic increase in power in many scenarios. There is a boost in power for variants that affect multiple phenotypes and for those that affect only one phenotype. While other multivariate methods have similar power gains, we describe several benefits of MultiPhen over these. In particular, we demonstrate that other multivariate methods that assume the genotypes are normally distributed, such as canonical correlation analysis (CCA) and MANOVA, can have highly inflated type-1 error rates when testing case-control or non-normal continuous phenotypes, while MultiPhen produces no such inflation. To test the performance of MultiPhen on real data we applied it to lipid traits in the Northern Finland Birth Cohort 1966 (NFBC1966). In these data MultiPhen discovers 21% more independent SNPs with known associations than the standard univariate GWAS approach, while applying MultiPhen in addition to the standard approach provides 37% increased discovery. The most associated linear combinations of the lipids estimated by MultiPhen at the leading SNPs accurately reflect the Friedewald Formula, suggesting that MultiPhen could be used to refine the definition of existing phenotypes or uncover novel heritable phenotypes.  相似文献   

15.
节肢动物群落稳定性分析灰典型相关模型及其应用   总被引:2,自引:0,他引:2  
陈超英 《生态学报》2007,27(8):3370-3378
稳定性是生态系统最重要的特征之一。根据累加生成可增强两单调增序列线性相关性原理,应用典型相关分析方法,以比值mn/mp(mp为害虫个体数,mn为天敌个体数)为测度群落稳定性的指数,构建群落稳定性分析灰典型相关模型。具体做法是:(1)将各种群的数量序列按害虫各次总数量的递增顺序进行重排,用各种群数量序列的极差去除该序列,将其无量纲化,然后对各序列进行累加生成;(2)以害虫各种群为一组变量,天敌各种群为另一组变量,应用典型相关分析的方法,求出各对典型变量。对达到线性拟合要求的典型变量对,以害虫为自变量,天敌为因变量建立回归方程,并对这些方程中的变量进行累减还原;(3)对这些方程进行线性组合,合并成一个方程,组合系数是使这些害虫典型变量的线性组合与害虫总数量序列的线性相关值达到最大;(4)引进转换系数的概念,建立天敌、害虫总量相互转换模型,该模型称之为灰典型相关模型,通过模型可分析各种群在群落稳定性中的作用。将模型应用于福州金山茶园节肢动物群落的稳定性分析,所得结果与实际基本相符,表明建立的灰典型相关模型是可行的。  相似文献   

16.
Aquatic Oligochaetes in ditches   总被引:4,自引:4,他引:0  
  相似文献   

17.
Errors‐in‐variables models in high‐dimensional settings pose two challenges in application. First, the number of observed covariates is larger than the sample size, while only a small number of covariates are true predictors under an assumption of model sparsity. Second, the presence of measurement error can result in severely biased parameter estimates, and also affects the ability of penalized methods such as the lasso to recover the true sparsity pattern. A new estimation procedure called SIMulation‐SELection‐EXtrapolation (SIMSELEX) is proposed. This procedure makes double use of lasso methodology. First, the lasso is used to estimate sparse solutions in the simulation step, after which a group lasso is implemented to do variable selection. The SIMSELEX estimator is shown to perform well in variable selection, and has significantly lower estimation error than naive estimators that ignore measurement error. SIMSELEX can be applied in a variety of errors‐in‐variables settings, including linear models, generalized linear models, and Cox survival models. It is furthermore shown in the Supporting Information how SIMSELEX can be applied to spline‐based regression models. A simulation study is conducted to compare the SIMSELEX estimators to existing methods in the linear and logistic model settings, and to evaluate performance compared to naive methods in the Cox and spline models. Finally, the method is used to analyze a microarray dataset that contains gene expression measurements of favorable histology Wilms tumors.  相似文献   

18.
The Mantel test is widely used to test the linear or monotonic independence of the elements in two distance matrices. It is one of the few appropriate tests when the hypothesis under study can only be formulated in terms of distances; this is often the case with genetic data. In particular, the Mantel test has been widely used to test for spatial relationship between genetic data and spatial layout of the sampling locations. We describe the domain of application of the Mantel test and derived forms. Formula development demonstrates that the sum-of-squares (SS) partitioned in Mantel tests and regression on distance matrices differs from the SS partitioned in linear correlation, regression and canonical analysis. Numerical simulations show that in tests of significance of the relationship between simple variables and multivariate data tables, the power of linear correlation, regression and canonical analysis is far greater than that of the Mantel test and derived forms, meaning that the former methods are much more likely than the latter to detect a relationship when one is present in the data. Examples of difference in power are given for the detection of spatial gradients. Furthermore, the Mantel test does not correctly estimate the proportion of the original data variation explained by spatial structures. The Mantel test should not be used as a general method for the investigation of linear relationships or spatial structures in univariate or multivariate data. Its use should be restricted to tests of hypotheses that can only be formulated in terms of distances.  相似文献   

19.
多变量空间相关分析多基于时间序列数据,对数据时长与统计要求严格,空间非平稳性特征分析可以利用单期数据分析多变量之间的相关性。通过空间变系数回归模型分析了2006年和2011年的新疆伊犁地区降水量和温度对植被覆盖度指数影响的空间变化特征,利用局部线性地理加权回归(GWR)方法估计得到了回归系数曲面,揭示出变量间相互影响的空间异质性,同时利用线性回归最小二乘估计进行了对比。结果表明:(1)空间变系数回归模型可以用于变量间的空间相关分析;(2)局部线性GWR估计方法明显优于线性回归最小二乘估计;(3)拟合结果表明,伊犁地区降水量和温度对植被覆盖指数的影响具有显著的空间非平稳性特征;(4)模型估计误差是降水、气温之外的地形、地貌及人类活动等多种因素造成的,需进一步研究。方法可为具有空间非平稳性特征变量间空间相关性分析以及植被覆盖指数的空间模拟分布提供思路和方法。  相似文献   

20.
The main objective of this study is to find associations between site characteristics (topographic, and soil physical and chemical properties) and soybean [Glycine max (L.) Merr.] plant performance (e.g. yield, canopy development) occurring at a field scale. The study took place in an Illinois production field in the 2000 and 2001 seasons. These associations were studied with canonical correlation analysis (CCA) followed by a spatial analysis of the resulting canonical variables with semivariography. The CCA discovered several significant associations between site characteristics. The first pair of canonical variables had a correlation coefficient of 0.76. The site characteristics most consistently correlated with the first pair of canonical variables were organic matter (OM) (r = 0.64 and 0.51 for the 2000 and 2001 seasons, respectively), pH (r = 0.39 and 0.51 for the 2000 and 2001 seasons, respectively), and deep electrical conductivity (ECD) (r = 0.53 and 0.49 for the 2000 and 2001 seasons, respectively). Site variables soil phosphorous (P) and soil potassium (K) were inconsistently correlated with the site characteristics canonical variable. These results indicate that site variables related to soil water retention are more consistently associated with soybean performance than site variables related to soil fertility. The plant performance characteristic most correlated with the soybean performance canonical variable were NDVIN (r = 0.76 and 0.72 for the 2000 and 2001 seasons, respectively), SPAD (r = 0.70 and 0.47 for the 2000 and 2001 seasons, respectively), and yield (r= 0.44 and 0.58 for the 2000 and 2001 seasons, respectively). The variables NDVIN, yield, ECD are obtained with sensors and thus they can be easily used at a production field scale. The common spatial structures in pairs of the canonical variables confirm the relationship between site properties and soybean performance, proving their potential in the demarcation of uniform areas within production fields. This approach can be used to explore soil plant relationships in other field studies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号