首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Wood and McCarthy (1984) found a ‘misallocation of variance’ when applying PCA, including Varimax rotation, to simulated data. Here it is demonstrated that this effect can be produced by Varimax rotation, without PCA as an intervening step. PCA does not distort or lose information when extracting components, since it is shown that the prototypes may be perfectly reconstructed from the unrotated solution. However, it is stressed that infinitely many sets of prototypes may render the same final solution, a fact which cannot be overcome by any method. The role of the rotation step within this framework is discussed.  相似文献   

2.
Berner D 《Oecologia》2011,166(4):961-971
Morphological traits typically scale with the overall body size of an organism. A meaningful comparison of trait values among individuals or populations that differ in size therefore requires size correction. A frequently applied size correction method involves subjecting the set of n morphological traits of interest to (common) principal component analysis [(C)PCA], and treating the first principal component [(C)PC1] as a latent size variable. The remaining variation (PC2–PCn) is considered size-independent and interpreted biologically. I here analyze simulated data and natural datasets to demonstrate that this (C)PCA-based size correction generates systematic statistical artifacts. Artifacts arise even when all traits are tightly correlated with overall size, and they are particularly strong when the magnitude of variance is heterogeneous among the traits, and when the traits under study are few. (C)PCA-based approaches are therefore inappropriate for size correction and should be abandoned in favor of methods using univariate general linear models with an adequate independent body size metric as covariate. As I demonstrate, (C)PC1 extracted from a subset of traits, not themselves subjected to size correction, can provide such a size metric.  相似文献   

3.
Two regression analyses were used to study how the absence/presence of fungi in south Swedish beech forest is related to topsoil and litter chemistry. Since many soil variables are correlated, each species was related to models of (1) the rotated principal components of the soil properties (as suggested by a previous study) and (2) the underlying primary variables. The study indicated that the two analyses are complementary and provide a mean for further interpretation of the results, since they consider different aspects of sporophore occurrence in relation to soil properties. One of the conclusions is that various litter variables, partly related to the mull/mor gradient, are of greater importance than indicated in a previous study.Abbreviations PCA principal component analysis - PC principal component  相似文献   

4.
PurposeTo demonstrate unique information potential of a powerful multivariate data processing method, principal component analysis (PCA), in detecting complex interrelationships between diverse patient, disease and treatment variables and in prognostication of therapy's outcome and response of patients after mastectomy.Patients and MethodsOne hundred-forty-two patients with breast cancer were retrospectively evaluated. The patients were selected from a group of 201 patients who had been treated and observed in the same oncology ward. The selection was based on availability of complete set of information describing each patient. The set consisted of 60 specific data. A matrix of 142 × 60 data points was subjected to PCA using a professional, statistical software (commercially available) and a personal computer.ResultsTwo principal components, PC1 and PC2, were extracted. They accounted for 26% of total data variance. Projections of 60 variables and 142 patients were made on a plane determined by PC1 and PC2. A clear clustering of the variables and of the patients was observed. It was discussed in terms of similarity (dissimilarity) of the variables and the patients, respectively. A strikingly clear separation was demonstrated to exist between the group of patients living over 7 years after mastectomy and the group of deceased patients.ConclusionPCA offers a new promising alternative of statistical analysis of multivariable data on cancer patients. Using the PCA, potentially useful information on both the factors affecting treatment outcome and general prognosis, may be extracted from large data sets.  相似文献   

5.
Principal component analysis (PCA) is a dimensionality reduction and data analysis tool commonly used in many areas. The main idea of PCA is to represent high-dimensional data with a few representative components that capture most of the variance present in the data. However, there is an obvious disadvantage of traditional PCA when it is applied to analyze data where interpretability is important. In applications, where the features have some physical meanings, we lose the ability to interpret the principal components extracted by conventional PCA because each principal component is a linear combination of all the original features. For this reason, sparse PCA has been proposed to improve the interpretability of traditional PCA by introducing sparsity to the loading vectors of principal components. The sparse PCA can be formulated as an ? 1 regularized optimization problem, which can be solved by proximal gradient methods. However, these methods do not scale well because computation of the exact gradient is generally required at each iteration. Stochastic gradient framework addresses this challenge by computing an expected gradient at each iteration. Nevertheless, stochastic approaches typically have low convergence rates due to the high variance. In this paper, we propose a convex sparse principal component analysis (Cvx-SPCA), which leverages a proximal variance reduced stochastic scheme to achieve a geometric convergence rate. We further show that the convergence analysis can be significantly simplified by using a weak condition which allows a broader class of objectives to be applied. The efficiency and effectiveness of the proposed method are demonstrated on a large-scale electronic medical record cohort.  相似文献   

6.
Principal component analysis (PCA) was used to analyse the behaviour of a chromatographic separation as its scale increased. Three 4.6 mm diameter columns identical in every respect except for column length (25, 15 and 5 cm), were used to generate the data from a test system based on the reversed-phase HPLC separation of crude erythromycin on a polystyrene matrix (PLRP 1000) having a particle diameter of 8 mu;m and a pore diameter of 100 nm. The species were separated with an isocratic solvent composed of 45/55 acetonitrile/water at about pH 7. An experimental design technique was used to investigate the effects of four process variables (load volume, load concentration, temperature and pH of buffer) on the chromatogram shapes. Following appropriate pre-processing of the chromatographic data, subsets of critical chromatograms were selected which sufficiently characterised the entire data set. From this subset, the corresponding runs were performed on the different sized columns and principal component models were generated for each. At 5 and 15 cm a single principal component was sufficient to characterise all the variance in the chromatograms which the range of process variables introduced, but at 25 cm two principal components were required, particularly to characterise the chromatograms with small loads. Excellent correlations were observed between the first principal components at the three scales. The possibility of predicting the separations on the 25 cm column from an analysis of the separations observed at 5 cm was investigated. The study revealed that good predictions could be made at high loads (>92%) , but the model was not effective at low loads because of the need to incorporate a second principal component which was not defined by the range of variables applied to the 5 cm column.  相似文献   

7.
The effect of interobserver error on a principal components analysis of a small sample of human crania is examined. A comparison of individual specimen scores for components is made to find rotated principal components which identify interobserver error. The individual variables which load highly on such components are then tested for interobserver error univariately. Multivariate components which must identify interobserver error contain no high loadings for variables which demonstrate interobserver error in the univariate case. Principal component analysis, in defining new component variables, extracts such error in an easily identified way which makes comparison of samples measured by more than one anthropometrist more reliable.  相似文献   

8.
青海省海北州典型高寒草甸土壤质量评价   总被引:4,自引:0,他引:4  
以青海省海北州的典型高寒草甸(金露梅灌丛草甸、矮嵩草草甸、高山嵩草草甸)为研究对象,以7种土壤微生物活性指标和10种土壤理化性质指标组成的土壤生物肥力性质为评价指标,对不同利用方式下草甸的土壤质量进行主成分分析(PCA).结果表明:高寒草甸土壤综合质量可用3个主成分(PC)来表征,其中PC1上有13个指标的载荷较高,PC2上有3个指标的载荷较高, PC3上只有全磷的载荷较高.结合Norm值的方法,筛选出微生物生物量碳、脲酶、碱性磷酸酶、蛋白酶、有机碳、全氮、有效氮、有效磷、有效钾、容重和阳离子交换量(CEC)等11项指标,建立了海北高寒草甸土壤综合质量评价的最小数据集(MDS).通过主成分和对应的权重系数分析, 对3种草甸的土壤综合质量进行排序,0~10 cm层为矮嵩草草甸>金露梅灌丛>高山嵩草草甸;10~20 cm层为金露梅灌丛>高山嵩草草甸>矮嵩草草甸.  相似文献   

9.
The aim of this paper is (1) to find and discuss the best multivariate statistical method in exploring the soil productivity function in an East-Hungarian region; (2) to evaluate and interpret the edaphic indicators and Hungarian soil quality index (HSQI); and (3) to identify the main determinant factors and indicators in this region. Soil pH, carbonate content, soluble and exchangeable Na+, clay, humus, available phosphorus and potassium content were analyzed. Topographical position and HSQI were evaluated as well. Yield data (maize, winter wheat, sunflower) of 10 years were standardized using calculated relative yield of each crop. Having simple indicators, stepwise linear regressions for mean relative yield were inadequate for choice uncorrelated indicators which have significant influence on yields. The variables were analyzed using principal component analysis (PCA) with Varimax rotation. According to the eigenvalues greater than 1, the PCA yielded three principal components (PCs) explaining a total of 89.471% of the variance for the entire data set. These factors could be well interpreted as derived complex indicators. Having the three PCs, a stepwise linear regression process (PCR) was conducted with dependent variables mean relative yield. The explained variance for mean relative yield was as high as adjusted R2 = 0.771 (p < 0.001). The three PC factors together explained the mean relative yield better than the simple indicators and the HSQI. So, the variables can effectively explain the yield and the variability together with other variables as linear combinations. Consequently, PCR is a successful method to reveal the site specific relationship between soil properties and yields and to revision the HSQI at local level.  相似文献   

10.
An improved method for deconvoluting complex spectral maps from bidimensional fluorescence monitoring is presented, relying on a combination of principal component analysis (PCA) and feedforward artificial neural networks (ANN). With the aim of reducing ANN complexity, spectral maps are first subjected to PCA, and the scores of the retained principal components are subsequently used as ANN input vector. The method is presented using the case study of an extractive membrane biofilm reactor, where fluorescence maps of a membrane-attached biofilm were analysed, which were collected under different reactor operating conditions. During ANN training, the spectral information is associated with process performance indicators. Originally, 231 excitation/emission pairs per fluorescence map were used as ANN input vector. Using PCA, each fluorescence map could be represented by a maximum of six principal components, thereby catching 99.5% of its variance. As a result, the dimension of the ANN input vector and hence the complexity of the artificial neural network was significantly reduced, and ANN training speed was increased. Correlations between principal components and ANN predicted process performance parameters were good with correlation coefficients in the order of 0.7 or higher.  相似文献   

11.
Changes in metabolites in fermented soymilk prepared with selected Bifidobacterium and Streptococci strains were analyzed using a 1H-NMR-based metabolomic technique. Principal components analysis (PCA) allowed the clear separation of 50% methanol extracts from fermented soymilk with different fermentation times by combining principal components PC1 and PC3, which accounted for 55.1% of the total variance. Loading plot analysis was performed to select major compounds contributing to the separation, and the relative levels of selected metabolites were determined. In addition, the free-radical scavenging activities of each sample were investigated, and the underlying mechanisms were elucidated by determining the total phenolics and total flavonoids contents of each sample. The present study suggests the usefulness of combining 1H-NMR with PCA in discriminating fermented soymilk samples with different fermentation times, and elucidates of the factors affecting free-radical scavenging activities of fermented soymilk.  相似文献   

12.
A database of nonredundant structures of EF-hand domains--i.e., pairs of helix-loop-helix motifs--has been assembled, and the six angles among the four helices re-determined. A principal component analysis of these angles allows us to use two such components (PC1 and PC2) to describe the system retaining 80% of the total variance. A PC2 against PC1 plot representation allows us to represent in a compact way the full range of structural diversity of EF-hand domains, their grouping into protein families, and the variation for each family upon calcium and peptide binding.  相似文献   

13.
H Gao  T Zhang  Y Wu  Y Wu  L Jiang  J Zhan  J Li  R Yang 《Heredity》2014,113(6):526-532
Given the drawbacks of implementing multivariate analysis for mapping multiple traits in genome-wide association study (GWAS), principal component analysis (PCA) has been widely used to generate independent ‘super traits'' from the original multivariate phenotypic traits for the univariate analysis. However, parameter estimates in this framework may not be the same as those from the joint analysis of all traits, leading to spurious linkage results. In this paper, we propose to perform the PCA for residual covariance matrix instead of the phenotypical covariance matrix, based on which multiple traits are transformed to a group of pseudo principal components. The PCA for residual covariance matrix allows analyzing each pseudo principal component separately. In addition, all parameter estimates are equivalent to those obtained from the joint multivariate analysis under a linear transformation. However, a fast least absolute shrinkage and selection operator (LASSO) for estimating the sparse oversaturated genetic model greatly reduces the computational costs of this procedure. Extensive simulations show statistical and computational efficiencies of the proposed method. We illustrate this method in a GWAS for 20 slaughtering traits and meat quality traits in beef cattle.  相似文献   

14.

Background

The dairy cattle breeding industry is a highly globalized business, which needs internationally comparable and reliable breeding values of sires. The international Bull Evaluation Service, Interbull, was established in 1983 to respond to this need. Currently, Interbull performs multiple-trait across country evaluations (MACE) for several traits and breeds in dairy cattle and provides international breeding values to its member countries. Estimating parameters for MACE is challenging since the structure of datasets and conventional use of multiple-trait models easily result in over-parameterized genetic covariance matrices. The number of parameters to be estimated can be reduced by taking into account only the leading principal components of the traits considered. For MACE, this is readily implemented in a random regression model.

Methods

This article compares two principal component approaches to estimate variance components for MACE using real datasets. The methods tested were a REML approach that directly estimates the genetic principal components (direct PC) and the so-called bottom-up REML approach (bottom-up PC), in which traits are sequentially added to the analysis and the statistically significant genetic principal components are retained. Furthermore, this article evaluates the utility of the bottom-up PC approach to determine the appropriate rank of the (co)variance matrix.

Results

Our study demonstrates the usefulness of both approaches and shows that they can be applied to large multi-country models considering all concerned countries simultaneously. These strategies can thus replace the current practice of estimating the covariance components required through a series of analyses involving selected subsets of traits. Our results support the importance of using the appropriate rank in the genetic (co)variance matrix. Using too low a rank resulted in biased parameter estimates, whereas too high a rank did not result in bias, but increased standard errors of the estimates and notably the computing time.

Conclusions

In terms of estimation''s accuracy, both principal component approaches performed equally well and permitted the use of more parsimonious models through random regression MACE. The advantage of the bottom-up PC approach is that it does not need any previous knowledge on the rank. However, with a predetermined rank, the direct PC approach needs less computing time than the bottom-up PC.  相似文献   

15.
We examined fluctuating asymmetry (FA) and body condition (BC) as two measures of environmental stress using museum specimens of Lophuromys aquilus, a rodent species complex wide-spread across the African Albertine Rift. We related FA and BC to a spatially-derived index of anthropogenic impact using a principal components analysis (PCA). We found no relationship between the four PCA scores and mean FA or BC, but did find that FA variance was higher in areas with lower anthropogenic impact. There was also a negative, albeit non-significant, trend for PC3, suggesting that populations with higher than average BC were in areas with higher anthropogenic impact. Overall, our case study does not support FA and BC as effective predictors of environmental stress with low to moderate habitat disturbance. In fact, L. aquilus, as a habitat generalist, may be positively affected by some aspects of anthropogenic change. Studies relating environmental stress to anthropogenic impact should examine sites with a wide range of habitat qualities and human impact and utilize multiple measures of environmental stress to characterize the health of one or more populations.  相似文献   

16.
As a systematic and holistic study of metabolites in plants, animals, and human beings, metabolomics has advanced considerably in recent years, due largely to the rapid development of analytical technology and the application of multivariate data analysis methods. Exploratory data analysis, which has played a crucial role in this advance, aims to examine the natural data structure to reveal important information. Principal components analysis (PCA) is probably the most widely used technique for exploratory data analysis, but projection pursuit (PP) is another important method that often outperforms PCA because it is based on distributional rather than variance optimization. Recent algorithmic improvements have made the implementation of PP easier, but, when the sample size is small compared to the number of variables, it is found that PP (with kurtosis as a projection index) fails to gives meaningful information. Mathematically, this involves the ill-posed inverse problem that also occurs for many other multivariate data analysis methods that result in overfitting. In this work, a regularized projection pursuit (RPP) method is proposed to solve this problem and iterative optimization algorithms are developed for both step-wise univariate and multivariate PP. The utility of the algorithms is established using simulated data, which also demonstrates the use of ridge trace plots for the optimization of the ridge parameter. Three experimental data sets in the public domain are also analyzed, including a study on soy bean disease (47 samples × 35 variables), NMR spectral data for glomerulonephritis patients (50 × 200) and metabolomics data from a bovine diet study (39 × 47). In all cases, RPP showed superior class separation compared to PCA or ordinary PP.  相似文献   

17.
The method of principal component analysis (PCA) was applied to the absorption-wavelength-time surfaces generated by rapid scanning stopped-flow spectrophotometry (RSSFS). The method was used to resolve the absorption surfaces generated during the reduction of cytochrome c oxidase by 5,10-dihydro-5-methyl phenazine (MPH) into the individual spectral shapes and time courses of the component chromophores. Two forms of resting cytochrome oxidase were used in these analyses: one that has its maximum absorption in the Soret region at 418 nm (418-nm species) and the other has its absorption maximum at 424 nm (424-nm species). A weighting scheme suitable for RSSFS data was developed. The optical absorption spectra obtained by W.H. Vanneste (1966, Biochemistry, 5:838-848) for the oxidase components were found to fit adequately as components of the experimental surfaces. Among these spectra were the oxidized forms of cytochromes a and a3 in the wavelength region 330-520 nm for the 418-nm species. Vanneste's spectral shape for the oxidized cytochrome a3 did not fit as a component in the spectrum of the 424-nm species. After accounting for the spectral shape of all components present, PCA provided a straightforward method for determining the separate time courses of each chromophore. We have found for both forms used that cytochrome a is reduced by MPH in the initial stages of the reaction, while cytochrome a3 is reduced in subsequent, slow phases. An important aspect of PCA is that it provided confirmation of the spectra of the various oxidase components without requiring the use of inhibitors or the use of simplifying mechanistic assumptions. The resolution of time profiles of strongly overlapping chromophores is also demonstrated.  相似文献   

18.
Event-related brain potentials (ERPs) were recorded from 74 subjects (45 men) between 18 and 82 years of age in a simple visual detection task. On each trial the subject reported the location of a triangular flash of light presented briefly 20° laterally to the left or right visual field or to both fields simultaneously. ERPs to targets exhibited a similar morphology including P1, N1, P2, N2, and P3 components across all age groups. The principal effects of advancing age were (1) a marked reduction in amplitude of the posterior P1 component (75–150 latency) together with an amplitude increase of an anterior positivity at the same latency; (2) an increase in amplitude of the P3 component that was most prominent over frontal scalp areas; and (3) a linear increase in P3 peak latency. These results extend the findings of age-related changes in P3 peak latency and distribution to a non-oddball task in the visual modality and raise the possibility that short-latency ERPs may index changes in visual attention in the elderly.  相似文献   

19.
The work reported in this paper examines the use of principal component analysis (PCA), a technique of multivariate statistics to facilitate the extraction of meaningful diagnostic information from a data set of chromatographic traces. Two data sets mimicking archived production records were analysed using PCA. In the first a full-factorial experimental design approach was used to generate the data. In the second, the chromatograms were generated by adjusting just one of the process variables at a time. Data base mining was achieved through the generation of both gross and disjoint principal component (PC) models. PCA provided easily interpretable 2-dimensional diagnostic plots revealing clusters of chromatograms obtained under similar operating conditions. PCA methods can be used to detect and diagnose changes in process conditions, however results show that a PCA model may require recalibration if an equipment change is made. We conclude that PCA methods may be useful for the diagnosis of subtle deviations from process specification not readily distinguishable to the operator.  相似文献   

20.
JX Liu  Y Xu  CH Zheng  Y Wang  JY Yang 《PloS one》2012,7(7):e38873
Conventional gene selection methods based on principal component analysis (PCA) use only the first principal component (PC) of PCA or sparse PCA to select characteristic genes. These methods indeed assume that the first PC plays a dominant role in gene selection. However, in a number of cases this assumption is not satisfied, so the conventional PCA-based methods usually provide poor selection results. In order to improve the performance of the PCA-based gene selection method, we put forward the gene selection method via weighting PCs by singular values (WPCS). Because different PCs have different importance, the singular values are exploited as the weights to represent the influence on gene selection of different PCs. The ROC curves and AUC statistics on artificial data show that our method outperforms the state-of-the-art methods. Moreover, experimental results on real gene expression data sets show that our method can extract more characteristic genes in response to abiotic stresses than conventional gene selection methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号