首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Abstract. Numerous ecological studies use Principal Components Analysis (PCA) for exploratory analysis and data reduction. Determination of the number of components to retain is the most crucial problem confronting the researcher when using PCA. An incorrect choice may lead to the underextraction of components, but commonly results in overextraction. Of several methods proposed to determine the significance of principal components, Parallel Analysis (PA) has proven consistently accurate in determining the threshold for significant components, variable loadings, and analytical statistics when decomposing a correlation matrix. In this procedure, eigenvalues from a data set prior to rotation are compared with those from a matrix of random values of the same dimensionality (p variables and n samples). PCA eigenvalues from the data greater than PA eigenvalues from the corresponding random data can be retained. All components with eigenvalues below this threshold value should be considered spurious. We illustrate Parallel Analysis on an environmental data set. We reviewed all articles utilizing PCA or Factor Analysis (FA) from 1987 to 1993 from Ecology, Ecological Monographs, Journal of Vegetation Science and Journal of Ecology. Analyses were first separated into those PCA which decomposed a correlation matrix and those PCA which decomposed a covariance matrix. Parallel Analysis (PA) was applied for each PCA/FA found in the literature. Of 39 analy ses (in 22 articles), 29 (74.4 %) considered no threshold rule, presumably retaining interpretable components. According to the PA results, 26 (66.7 %) overextracted components. This overextraction may have resulted in potentially misleading interpretation of spurious components. It is suggested that the routine use of PA in multivariate ordination will increase confidence in the results and reduce the subjective interpretation of supposedly objective methods.  相似文献   

2.
Accounting for population genetic substructure is important in reducing type 1 errors in genetic studies of complex disease. As efforts to understand complex genetic disease are expanded to different continental populations the understanding of genetic substructure within these continents will be useful in design and execution of association tests. In this study, population differentiation (Fst) and Principal Components Analyses (PCA) are examined using >200 K genotypes from multiple populations of East Asian ancestry. The population groups included those from the Human Genome Diversity Panel [Cambodian, Yi, Daur, Mongolian, Lahu, Dai, Hezhen, Miaozu, Naxi, Oroqen, She, Tu, Tujia, Naxi, Xibo, and Yakut], HapMap [ Han Chinese (CHB) and Japanese (JPT)], and East Asian or East Asian American subjects of Vietnamese, Korean, Filipino and Chinese ancestry. Paired Fst (Wei and Cockerham) showed close relationships between CHB and several large East Asian population groups (CHB/Korean, 0.0019; CHB/JPT, 00651; CHB/Vietnamese, 0.0065) with larger separation with Filipino (CHB/Filipino, 0.014). Low levels of differentiation were also observed between Dai and Vietnamese (0.0045) and between Vietnamese and Cambodian (0.0062). Similarly, small Fst''s were observed among different presumed Han Chinese populations originating in different regions of mainland of China and Taiwan (Fst''s <0.0025 with CHB). For PCA, the first two PC''s showed a pattern of relationships that closely followed the geographic distribution of the different East Asian populations. PCA showed substructure both between different East Asian groups and within the Han Chinese population. These studies have also identified a subset of East Asian substructure ancestry informative markers (EASTASAIMS) that may be useful for future complex genetic disease association studies in reducing type 1 errors and in identifying homogeneous groups that may increase the power of such studies.  相似文献   

3.
The work reported in this paper examines the use of principal component analysis (PCA), a technique of multivariate statistics to facilitate the extraction of meaningful diagnostic information from a data set of chromatographic traces. Two data sets mimicking archived production records were analysed using PCA. In the first a full-factorial experimental design approach was used to generate the data. In the second, the chromatograms were generated by adjusting just one of the process variables at a time. Data base mining was achieved through the generation of both gross and disjoint principal component (PC) models. PCA provided easily interpretable 2-dimensional diagnostic plots revealing clusters of chromatograms obtained under similar operating conditions. PCA methods can be used to detect and diagnose changes in process conditions, however results show that a PCA model may require recalibration if an equipment change is made. We conclude that PCA methods may be useful for the diagnosis of subtle deviations from process specification not readily distinguishable to the operator.  相似文献   

4.

Background

The dairy cattle breeding industry is a highly globalized business, which needs internationally comparable and reliable breeding values of sires. The international Bull Evaluation Service, Interbull, was established in 1983 to respond to this need. Currently, Interbull performs multiple-trait across country evaluations (MACE) for several traits and breeds in dairy cattle and provides international breeding values to its member countries. Estimating parameters for MACE is challenging since the structure of datasets and conventional use of multiple-trait models easily result in over-parameterized genetic covariance matrices. The number of parameters to be estimated can be reduced by taking into account only the leading principal components of the traits considered. For MACE, this is readily implemented in a random regression model.

Methods

This article compares two principal component approaches to estimate variance components for MACE using real datasets. The methods tested were a REML approach that directly estimates the genetic principal components (direct PC) and the so-called bottom-up REML approach (bottom-up PC), in which traits are sequentially added to the analysis and the statistically significant genetic principal components are retained. Furthermore, this article evaluates the utility of the bottom-up PC approach to determine the appropriate rank of the (co)variance matrix.

Results

Our study demonstrates the usefulness of both approaches and shows that they can be applied to large multi-country models considering all concerned countries simultaneously. These strategies can thus replace the current practice of estimating the covariance components required through a series of analyses involving selected subsets of traits. Our results support the importance of using the appropriate rank in the genetic (co)variance matrix. Using too low a rank resulted in biased parameter estimates, whereas too high a rank did not result in bias, but increased standard errors of the estimates and notably the computing time.

Conclusions

In terms of estimation''s accuracy, both principal component approaches performed equally well and permitted the use of more parsimonious models through random regression MACE. The advantage of the bottom-up PC approach is that it does not need any previous knowledge on the rank. However, with a predetermined rank, the direct PC approach needs less computing time than the bottom-up PC.  相似文献   

5.
PurposeTo demonstrate unique information potential of a powerful multivariate data processing method, principal component analysis (PCA), in detecting complex interrelationships between diverse patient, disease and treatment variables and in prognostication of therapy's outcome and response of patients after mastectomy.Patients and MethodsOne hundred-forty-two patients with breast cancer were retrospectively evaluated. The patients were selected from a group of 201 patients who had been treated and observed in the same oncology ward. The selection was based on availability of complete set of information describing each patient. The set consisted of 60 specific data. A matrix of 142 × 60 data points was subjected to PCA using a professional, statistical software (commercially available) and a personal computer.ResultsTwo principal components, PC1 and PC2, were extracted. They accounted for 26% of total data variance. Projections of 60 variables and 142 patients were made on a plane determined by PC1 and PC2. A clear clustering of the variables and of the patients was observed. It was discussed in terms of similarity (dissimilarity) of the variables and the patients, respectively. A strikingly clear separation was demonstrated to exist between the group of patients living over 7 years after mastectomy and the group of deceased patients.ConclusionPCA offers a new promising alternative of statistical analysis of multivariable data on cancer patients. Using the PCA, potentially useful information on both the factors affecting treatment outcome and general prognosis, may be extracted from large data sets.  相似文献   

6.
Endothelial protein C receptor (EPCR) is a CD1‐like transmembrane glycoprotein with important regulatory roles in protein C (PC) pathway, enhancing PC's anticoagulant, anti‐inflammatory, and antiapoptotic activities. Similarly to homologous CD1d, EPCR binds a phospholipid [phosphatidylethanolamine (PTY)] in a groove corresponding to the antigen‐presenting site, although it is not clear if lipid exchange can occur in EPCR as in CD1d. The presence of PTY seems essential for PC γ‐carboxyglutamic acid (Gla) domain binding. However, the lipid‐free form of the EPCR has not been characterized. We have investigated the structural role of PTY on EPCR, by multiple molecular dynamics (MD) simulations of ligand bound and unbound forms of the protein. Structural changes, subsequent to ligand removal, led to identification of two stable and folded ligand‐free conformations. Compared with the bound form, unbound structures showed a narrowing of the A′ pocket and a high flexibility of the helices around it, in agreement with CD1d simulation. Thus, a lipid exchange with a mechanism similar to CD1d is proposed. In addition, unbound conformations presented a reduced interaction surface for Gla domain, confirming the role of PTY in establishing the proper EPCR conformation for the interaction with its partner protein. Single MD simulations were also obtained for 29 mutant models with predicted structural stability and impaired binding ability. Ligand affinity calculations, based on linear interaction energy method, showed that substitution‐induced conformational changes affecting helices around the A′ pocket were associated to a reduced binding affinity. Mutants responsible for this effect may represent useful reagents for experimental tests. Proteins 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

7.
Hiriote S  Chinchilli VM 《Biometrics》2011,67(3):1007-1016
Summary In many clinical studies, Lin's concordance correlation coefficient (CCC) is a common tool to assess the agreement of a continuous response measured by two raters or methods. However, the need for measures of agreement may arise for more complex situations, such as when the responses are measured on more than one occasion by each rater or method. In this work, we propose a new CCC in the presence of repeated measurements, called the matrix‐based concordance correlation coefficient (MCCC) based on a matrix norm that possesses the properties needed to characterize the level of agreement between two p× 1 vectors of random variables. It can be shown that the MCCC reduces to Lin's CCC when p= 1. For inference, we propose an estimator for the MCCC based on U‐statistics. Furthermore, we derive the asymptotic distribution of the estimator of the MCCC, which is proven to be normal. The simulation studies confirm that overall in terms of accuracy, precision, and coverage probability, the estimator of the MCCC works very well in general cases especially when n is greater than 40. Finally, we use real data from an Asthma Clinical Research Network (ACRN) study and the Penn State Young Women's Health Study for demonstration.  相似文献   

8.
Arild O. Gautestad 《Oikos》2013,122(4):612-620
How to differentiate between scale‐free space use like Lévy walk and a two‐level scale‐specific process like composite random walk (mixture of intra‐ and inter‐patch habitat movement) is surrounded by controversy. Composite random walk may under some parameter conditions appear Lévy walk‐like from the perspective of the path’s distribution of step lengths due to superabundance of very long steps relative to the expectation from a classic (single‐level) random walk. However, a more explicit focus on the qualitative differences between studying movement at a high resolution mechanistic (behavioral) level and the more coarse‐grained statistical mechanical level may contribute to resolving both this and other issues related to scaling complexity. Specifically, a re‐sampling of a composite random walk at larger time lags than the micro‐level unit time step for the simulation makes a Lévy‐look‐alike step length distribution re‐shaping towards a Brownian motion‐like pattern. Conversely, a true Levy walk maintains its scaling characteristics upon re‐sampling. This result illustrates how a confusing pattern at the mechanistic level may be resolved by changing observational scale from the micro level to the coarser statistical mechanical meso‐ or macro‐scale. The instability of the composite random walk pattern under rescaling is a consequence of influence of the central limit theorem. I propose that a coarse‐graining test – studying simulated animal paths at a coarsened temporal scale by re‐sampling a series – should be routinely performed prior to comparing theoretical results with those patterns generated from GPS data describing animal movement paths. Fixes from terrestrial mammals are often collected at hourly intervals or larger, and such a priori coarse‐grained series may thus comply better with the statistical mechanical meso‐ or macro‐level of analysis than the behavioral mechanics observed at finer resolutions typically in the range of seconds and minutes. If fixes of real animals are collected at this high frequency, coarse graining both the simulated and real series is advised in order to bring the analysis into a temporal scale domain where analytical methods from statistical mechanics can be applied.  相似文献   

9.
Given a set of alternative models for a specific protein sequence, the model quality assessment (MQA) problem asks for an assignment of scores to each model in the set. A good MQA program assigns these scores such that they correlate well with real quality of the models, ideally scoring best that model which is closest to the true structure. In this article, we present a new approach for addressing the MQA problem. It is based on distance constraints extracted from alignments to templates of known structure, and is implemented in the Undertaker program for protein structure prediction. One novel feature is that we extract noncontact constraints as well as contact constraints. We describe how the distance constraint extraction is done and we show how they can be used to address the MQA problem. We have compared our method on CASP7 targets and the results show that our method is at least comparable with the best MQA methods that were assessed at CASP7. We also propose a new evaluation measure, Kendall's τ, that is more interpretable than conventional measures used for evaluating MQA methods (Pearson's r and Spearman's ρ). We show clear examples where Kendall's τ agrees much more with our intuition of a correct MQA, and we therefore propose that Kendall's τ be used for future CASP MQA assessments. Proteins 2009. © 2008 Wiley‐Liss, Inc.  相似文献   

10.
Knowledge about the diversity, locomotor adaptations, and evolution of the marsupial forelimb is limited, resulting in an underrepresentation of marsupials in comparative anatomical literature on mammalian forelimb anatomy. This study investigated hand proportions in the diverse marsupial order Diprotodontia. Fifty-two measurements of 95 specimens representing 47 species, as well as 6 non-diprotodontian specimens, were explored using principal components analysis (PCA). Bootstrapping was used to assess the reliability of the loadings. Phylogenetically independent contrasts and phylogenetic ANOVA were used to test for correlation with size and functional adaptation of forelimbs for locomotor habit, scored as arboreal vs. terrestrial. Analysis of first principal component (PC1) scores revealed significant differences between arboreal and terrestrial species, and was related to relative slenderness of their phalangeal elements. Both locomotor groups displayed allometry along PC1 scores, but with different intercepts such that PC1 discriminated between the two locomotor habits almost completely. PC2 separated some higher-level clades and burrowing species. Analysis of locomotor predictors commonly applied by palaeontologists indicates that ratios between proximal and intermediate phalanges were unsuitable as predictors of arboreality/terrestriality, but the phalangeal index was more effective. From PCA results, a phalangeal slenderness ratio was developed which proved to be a useful discriminator, suggesting that a single unallocated phalanx can be used for an impression of locomotor mode in fossils. Most Diprotodontia are laterally paraxonic or ectaxonic, with the exception of digging species whose hands are medially paraxonic. Our results complement those of studies on placental mammals, suggesting that the demands of arboreality, terrestriality, or frequent digging on intrinsic hand proportions are met with similar anatomical adaptations in marsupials.  相似文献   

11.
Principal components analysis (PCA) has not been very much in vogue within the field of movement coordination even though it is useful to reduce data dimensionality and to reveal underlying data structures. Traditionally, studies of coordination between two joints have predominantly made use of relative phase analyses. This has resulted in the identification of principal constraints that govern the Central Nervous System’s organization and the control of coordination patterns. However, relative phase analyses on pairwise joints have some drawbacks because they are not optimal for revealing convergent patterns among multijoint coordination modes and for unraveling generic control strategies.In this paper, we present a method to analyze multijoint coordination based on the properties of PC, more specifically the eigenvalues and eigenvectors of the covariance matrix.The comparison between relative phase analysis and PCA shows that both provide similar and consistent results, underscoring the latter technique’s sensitivity to the study of coordination performance. In addition, it provides a method for automatic pattern detection as well as an index of performance for each joint within the context of the global coordination pattern.Finally, the merit of the PCA technique within the context of central pattern generators (CPG) will be discussed.  相似文献   

12.
Standard optimization algorithms for maximizing likelihood may not be applicable to the estimation of those flexible multivariable models that are nonlinear in their parameters. For applications where the model's structure permits separating estimation of mutually exclusive subsets of parameters into distinct steps, we propose the alternating conditional estimation (ACE) algorithm. We validate the algorithm, in simulations, for estimation of two flexible extensions of Cox's proportional hazards model where the standard maximum partial likelihood estimation does not apply, with simultaneous modeling of (1) nonlinear and time‐dependent effects of continuous covariates on the hazard, and (2) nonlinear interaction and main effects of the same variable. We also apply the algorithm in real‐life analyses to estimate nonlinear and time‐dependent effects of prognostic factors for mortality in colon cancer. Analyses of both simulated and real‐life data illustrate good statistical properties of the ACE algorithm and its ability to yield new potentially useful insights about the data structure.  相似文献   

13.
典范相关分析是一种检验两组变量间最大相关的多元统计技术。本文运用此技术结合Pearson's相关系数、PCA分析,对植物群落中植物重要值与土壤组分的相关研究表明:典范相关分析能极好地定量解释生态学中两组变量的相关,并能指示出多个因子的复合作用。同时强调,由于典范相关分析技术对原始数据的线性要求,从而有必要对数据进行标准化和预先的PCA分析。  相似文献   

14.
Summary In 2002, Ker–Chau Li introduced the liquid association measure to characterize three‐way interactions between genes, and developed a computationally efficient estimator that can be used to screen gene expression microarray data for such interactions. That study, and others published since then, have established the biological validity of the method, and clearly demonstrated it to be a useful tool for the analysis of genomic data sets. To build on this work, we have sought a parametric family of multivariate distributions with the flexibility to model the full range of trivariate dependencies encompassed by liquid association. Such a model could situate liquid association within a formal inferential theory. In this article, we describe such a family of distributions, a trivariate, conditional normal model having Gaussian univariate marginal distributions, and in fact including the trivariate Gaussian family as a special case. Perhaps the most interesting feature of the distribution is that the parameterization naturally parses the three‐way dependence structure into a number of distinct, interpretable components. One of these components is very closely aligned to liquid association, and is developed as a measure we call modified liquid association. We develop two methods for estimating this quantity, and propose statistical tests for the existence of this type of dependence. We evaluate these inferential methods in a set of simulations and illustrate their use in the analysis of publicly available experimental data.  相似文献   

15.
Principal components analysis, PCA, is a statistical method commonly used in population genetics to identify structure in the distribution of genetic variation across geographical location and ethnic background. However, while the method is often used to inform about historical demographic processes, little is known about the relationship between fundamental demographic parameters and the projection of samples onto the primary axes. Here I show that for SNP data the projection of samples onto the principal components can be obtained directly from considering the average coalescent times between pairs of haploid genomes. The result provides a framework for interpreting PCA projections in terms of underlying processes, including migration, geographical isolation, and admixture. I also demonstrate a link between PCA and Wright''s fst and show that SNP ascertainment has a largely simple and predictable effect on the projection of samples. Using examples from human genetics, I discuss the application of these results to empirical data and the implications for inference.  相似文献   

16.
17.
MOTIVATION: ANOVA is a technique, which is frequently used in the analysis of microarray data, e.g. to assess the significance of treatment effects, and to select interesting genes based on P-values. However, it does not give information about what exactly is causing the effect. Our purpose is to improve the interpretation of the results from ANOVA on large microarray datasets, by applying PCA on the individual variance components. Interaction effects can be visualized by biplots, showing genes and variables in one plot, providing insight in the effect of e.g. treatment or time on gene expression. Because ANOVA has removed uninteresting sources of variance, the results are much more interpretable than without ANOVA. Moreover, the combination of ANOVA and PCA provides a simple way to select genes, based on the interactions of interest. RESULTS: It is shown that the components from an ANOVA model can be summarized and visualized with PCA, which improves the interpretability of the models. The method is applied to a real time-course gene expression dataset of mesenchymal stem cells. The dataset was designed to investigate the effect of different treatments on osteogenesis. The biplots generated with the algorithm give specific information about the effects of specific treatments on genes over time. These results are in agreement with the literature. The biological validation with GO annotation from the genes present in the selections shows that biologically relevant groups of genes are selected. AVAILABILITY: R code with the implementation of the method for this dataset is available from http://www.cac.science.ru.nl under the heading "Software".  相似文献   

18.
Recently, although advances were made on modeling multivariate count data, existing models really has several limitations: (i) The multivariate Poisson log‐normal model (Aitchison and Ho, 1989) cannot be used to fit multivariate count data with excess zero‐vectors; (ii) The multivariate zero‐inflated Poisson (ZIP) distribution (Li et al., 1999) cannot be used to model zero‐truncated/deflated count data and it is difficult to apply to high‐dimensional cases; (iii) The Type I multivariate zero‐adjusted Poisson (ZAP) distribution (Tian et al., 2017) could only model multivariate count data with a special correlation structure for random components that are all positive or negative. In this paper, we first introduce a new multivariate ZAP distribution, based on a multivariate Poisson distribution, which allows the correlations between components with a more flexible dependency structure, that is some of the correlation coefficients could be positive while others could be negative. We then develop its important distributional properties, and provide efficient statistical inference methods for multivariate ZAP model with or without covariates. Two real data examples in biomedicine are used to illustrate the proposed methods.  相似文献   

19.
20.
Principal components analysis and trend surface analysis have been applied to a transition mire with the aim to characterize the vegetation pattern and reveal the major trends of variation. The first three PCA axes were ecologically interpretable, viz. the 1 st and 2nd as a complex soil moisture gradient and the 3rd axis as a gradient in the amount of peat in the soil. The ecological interpretability of the 1st axis of PCA after VARIMAX rotation, is unclear because some outlier samples caused a reorientation of the axis. TSA appeared to be useful for the clarification of joint patterns of species groups, which were major contributors to ordination axes in terms of component loadings. The smooth effect of TSA was briefly discussed in connection with the influence of extremes upon the outcoming trend structure. The use of four-variable TSA including a time series is emphasized for the study of spatial-temporal relations and ecological succession.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号