首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Summary Principal components analysis is well suited for many data analysis problems in ecology, particularly for data reduction and hypothesis generation; but the structure of PCA is poorly suited for indirect gradient analysis. Whatever the intended application of PCA, the user must exercise special care in selecting data transformations to prevent the analysis from being overwhelmed by the purely numerical effects in the variance structure of the data.I would like to thank R. H. Whittaker, H. G. Gauch, R. E. Moeller, and S. R. Searle for their guidance and assistance.  相似文献   

2.
Influence in principal components analysis   总被引:4,自引:0,他引:4  
CRITCHLEY  FRANK 《Biometrika》1985,72(3):627-636
  相似文献   

3.
Local influence in principal components analysis   总被引:5,自引:0,他引:5  
SHI  LEI 《Biometrika》1997,84(1):175-186
  相似文献   

4.
5.
It has long been recognized that tooth crown diameters in hominoids are all positively intercorrelated one with another. This study reports on sex-specific correlation matrices derived from 2,650 individuals from the Solomon Islands, Melanesia. Mesiodistal and buccolingual diameters of all permanent teeth from one side are used, excluding third molars. Analysis discloses significant sex dimorphism in the strengths of the intercorrelations, with females being better integrated. Principal components analysis (PCA) provides an objective means of data reduction (shown here to be preferable to simple size summation methods) and decorrelation of the resulting linear combinations. Four components are extracted (with results being virtually identical in the two sexes) and arguments are put forth that varimax rotation to "a simpler solution" may be counterproductive. Before rotation, the four components are 1) overall size, 2) buccolingual widths contrasted with mesiodistal lengths, 3) anterior (I,C) contrasted with posterior (P,M) teeth, and 4) premolars contrasted with molars. Most of the explained (shared) variance (63%) extracted by PCA is in overall size of the dentition. There is a strong urge to view the results of these principal components analyses as reflective of biologically and genetically meaningful entities.  相似文献   

6.
Landgrebe J  Wurst W  Welzl G 《Genome biology》2002,3(4):research0019.1-research001911

Background  

In microarray data analysis, the comparison of gene-expression profiles with respect to different conditions and the selection of biologically interesting genes are crucial tasks. Multivariate statistical methods have been applied to analyze these large datasets. Less work has been published concerning the assessment of the reliability of gene-selection procedures. Here we describe a method to assess reliability in multivariate microarray data analysis using permutation-validated principal components analysis (PCA). The approach is designed for microarray data with a group structure.  相似文献   

7.

Background

Airway inflammation in COPD can be measured using biomarkers such as induced sputum and FeNO. This study set out to explore the heterogeneity of COPD using biomarkers of airway and systemic inflammation and pulmonary function by principal components analysis (PCA).

Subjects and Methods

In 127 COPD patients (mean FEV1 61%), pulmonary function, FeNO, plasma CRP and TNF-α, sputum differential cell counts and sputum IL8 (pg/ml) were measured. Principal components analysis as well as multivariate analysis was performed.

Results

PCA identified four main components (% variance): (1) sputum neutrophil cell count and supernatant IL8 and plasma TNF-α (20.2%), (2) Sputum eosinophils % and FeNO (18.2%), (3) Bronchodilator reversibility, FEV1 and IC (15.1%) and (4) CRP (11.4%). These results were confirmed by linear regression multivariate analyses which showed strong associations between the variables within components 1 and 2.

Conclusion

COPD is a multi dimensional disease. Unrelated components of disease were identified, including neutrophilic airway inflammation which was associated with systemic inflammation, and sputum eosinophils which were related to increased FeNO. We confirm dissociation between airway inflammation and lung function in this cohort of patients.  相似文献   

8.
Principal component analysis (PCA) of published DNA-relatedness data showed the usefulness of this method in displaying relationships among closely related bacteria. Very similar ordinations were obtained when relative binding ratios (RBR) at 60°C or 75°C or ΔT m values were used to form the data matrix. A curvilinear relationship and a (quasi) linear relationship were found, respectively, between 75°C and 60°C RBR and ΔT m and 60°C RBR. These statistical relationships explain the similarity of PCA results using either measurement (60°C RBR, 75°C RBR, or ΔT m). Use of PCA is suggested to delineate groups within a complex set of DNA-relatedness data. The level of ΔT m within groups and between groups should help decide whether these groups are genospecies.  相似文献   

9.
10.
J Ma  CI Amos 《PloS one》2012,7(7):e40224
Despite the significant advances made over the last few years in mapping inversions with the advent of paired-end sequencing approaches, our understanding of the prevalence and spectrum of inversions in the human genome has lagged behind other types of structural variants, mainly due to the lack of a cost-efficient method applicable to large-scale samples. We propose a novel method based on principal components analysis (PCA) to characterize inversion polymorphisms using high-density SNP genotype data. Our method applies to non-recurrent inversions for which recombination between the inverted and non-inverted segments in inversion heterozygotes is suppressed due to the loss of unbalanced gametes. Inside such an inversion region, an effect similar to population substructure is thus created: two distinct "populations" of inversion homozygotes of different orientations and their 1:1 admixture, namely the inversion heterozygotes. This kind of substructure can be readily detected by performing PCA locally in the inversion regions. Using simulations, we demonstrated that the proposed method can be used to detect and genotype inversion polymorphisms using unphased genotype data. We applied our method to the phase III HapMap data and inferred the inversion genotypes of known inversion polymorphisms at 8p23.1 and 17q21.31. These inversion genotypes were validated by comparing with literature results and by checking Mendelian consistency using the family data whenever available. Based on the PCA-approach, we also performed a preliminary genome-wide scan for inversions using the HapMap data, which resulted in 2040 candidate inversions, 169 of which overlapped with previously reported inversions. Our method can be readily applied to the abundant SNP data, and is expected to play an important role in developing human genome maps of inversions and exploring associations between inversions and susceptibility of diseases.  相似文献   

11.
Identification of population structure is a common goal for a variety of applications, including conservation, wildlife management, and medical genetics. The outcome of these analyses can have far reaching implications; therefore, it is important to ensure an understanding of the strengths and weaknesses of the methodologies used. Increasing in popularity, the discriminant analysis of principal components (DAPC) method incorporates combinations of genetic variables (alleles) into a model that differentiates individuals into genetic clusters. However, users may not have a full understanding of how to best parameterize the model. In this issue of Thia (Molecular Ecology Resources, 2022) looks under the hood of the DAPC. Using simulated data, he demonstrates the importance of careful parameter selection in developing a DAPC model, what the implications are for over-fitting the model, and finally, how best to evaluate the results of DAPC models. This work highlights the issues that can arise when over-parameterizing the DAPC model when gene flow is high among clusters and provides important guidelines to ensure researchers are making conclusions that are biologically relevant.  相似文献   

12.
Many recent approaches to decoding neural spike trains depend critically on the assumption that for low-pass filtered spike trains, the temporal structure is optimally represented by a small number of linear projections onto the data. We therefore tested this assumption of linearity by comparing a linear factor analysis technique (principal components analysis) with a nonlinear neural network based method. It is first shown that the nonlinear technique can reliably identify a neuronally plausible nonlinearity in synthetic spike trains. However, when applied to the outputs from primary visual cortical neurons, this method shows no evidence for significant temporal nonlinearities. The implications of this are discussed. Received: 29 November 1996 / Accepted in revised form: 1 July 1997  相似文献   

13.
14.
15.
The immense volume and rapid growth of human genomic data, especially single nucleotide polymorphisms (SNPs), present special challenges for both biomedical researchers and automatic algorithms. One such challenge is to select an optimal subset of SNPs, commonly referred as "haplotype tagging SNPs" (htSNPs), to capture most of the haplotype diversity of each haplotype block or gene-specific region. This information-reduction process facilitates cost-effective genotyping and, subsequently, genotype-phenotype association studies. It also has implications for assessing the risk of identifying research subjects on the basis of SNP information deposited in public domain databases. We have investigated methods for selecting htSNPs by use of principal components analysis (PCA). These methods first identify eigenSNPs and then map them to actual SNPs. We evaluated two mapping strategies, greedy discard and varimax rotation, by assessing the ability of the selected htSNPs to reconstruct genotypes of non-htSNPs. We also compared these methods with two other htSNP finders, one of which is PCA based. We applied these methods to three experimental data sets and found that the PCA-based methods tend to select the smallest set of htSNPs to achieve a 90% reconstruction precision.  相似文献   

16.
Various indicators rooted in the concepts of information and entropy have been proposed to be used for ecological network analysis. They are theoretically well grounded and widely used in the literature, but have always been difficult to interpret due to an apparent lack of strict relations with node and link weight. We generated several sets of 10,000 networks in order to explore such relations and work towards a sounder interpretation. The indices we explored are based on network composition (i.e., type and importance of network compartments), or network flows (i.e., type and importance of flows among compartments), including Structural Information (SI), Total System Throughput (TST), Average Mutual Information (AMI), Flow Diversity (H), and Ascendency (ASC). A correlation analysis revealed a lack of strict relationships among the responses of the investigated indicators within the simulated space of variability of the networks. However, fairly coherent patterns of response were revealed when networks were sorted by following a “bottom-up” criterion, i.e. by increasing the dominance of the large-sized top predator in the network. This ranking is reminiscent of ecosystem succession, along which the prominence of higher trophic level organisms progressively increases. In particular, the results show that a simple increase in organisms having large size and low consumption rates is potentially able to simultaneously lead to an increase of different types of information (as SI, H and AMI), thus also emphasizing the importance of bionomic traits related to body size in affecting information-related properties in a trophically connected community. The observed trends suffer from a certain dispersion of data, which was diminished by imposing specific and ecologically meaningful constraints, such as mass balancing and restriction to certain range of the ratio A/C, an index related to the viability of ecological networks. These results suggest that the identification of a set of effective constraints may help to identify improved conditions for applicability of the investigated flow-based indicators, and also provide indication on how to normalise them with respect to meaningful network properties or reference states. Thus, in order to increase confidence in the derived network metrics describing a particular ecosystem state, and thus increase their applicability, it is advisable to construct replicate networks by taking the variability of input data into account, and by applying uncertainty and sensitivity analyses.  相似文献   

17.
Do KA  Kirk K 《Biometrics》1999,55(1):174-181
Principal component analysis enhanced by the use of smoothing is used in conjunction with discriminant analysis techniques to devise a statistical classification method for the analysis of event-related potential data. A training set of premedication potentials collected from adolescents with attention-deficit hyperactive disorder (ADHD) is used to predict whether adolescents from an independent subject group will respond to long-term medication. Comparison of outcome prediction rates demonstrates that this method, which uses information from the whole ERP curve, is superior to the classification technique currently used by clinicians, which is based on a single ERP curve feature. The need to administer an initial dose of medication to classify patients is also eliminated.  相似文献   

18.
Non-centred Principal Components Analysis (NPCA) ordinates sites and species simultaneously, and can be solved either by direct iteration or by eigenvector calculation. The weight of sites and species in the analysis is proportional to their overall abundance. Because of this, the method is not susceptible to distortion by rare species, as is the case with Reciprocal Averaging (RA). Detrending techniques can also be applied to this method to eliminate arch effects.When NPCA was tried with field data, it produced ordination axes that were significantly associated to independently measured environmental variables. In contrast, RA failed to produce axes related to environmental factors, even after the main rare species had been eliminated from the analysis.Abbreviations NPCA Non-centred Principal Components Analysis - RA Reciprocal Averaging  相似文献   

19.
The recent release of the Bovine HapMap dataset represents the most detailed survey of bovine genetic diversity to date, providing an important resource for the design and development of livestock production. We studied this dataset, comprising more than 30,000 Single Nucleotide Polymorphisms (SNPs) for 19 breeds (13 taurine, three zebu, and three hybrid breeds), seeking to identify small panels of genetic markers that can be used to trace the breed of unknown cattle samples. Taking advantage of the power of Principal Components Analysis and algorithms that we have recently described for the selection of Ancestry Informative Markers from genomewide datasets, we present a decision-tree which can be used to accurately infer the origin of individual cattle. In doing so, we present a thorough examination of population genetic structure in modern bovine breeds. Performing extensive cross-validation experiments, we demonstrate that 250-500 carefully selected SNPs suffice in order to achieve close to 100% prediction accuracy of individual ancestry, when this particular set of 19 breeds is considered. Our methods, coupled with the dense genotypic data that is becoming increasingly available, have the potential to become a valuable tool and have considerable impact in worldwide livestock production. They can be used to inform the design of studies of the genetic basis of economically important traits in cattle, as well as breeding programs and efforts to conserve biodiversity. Furthermore, the SNPs that we have identified can provide a reliable solution for the traceability of breed-specific branded products.  相似文献   

20.
基于主成分分析的土壤肥力综合指数评价   总被引:38,自引:0,他引:38  
以杨凌为研究区域,选择田块尺度为评价单元,采集27个田块土壤,利用模糊数学中综合指数评价模型进行土壤肥力质量评价。经取样数合理性统计分析,27个样点估计研究区域土壤肥力评价能够满足90%置信水平条件下15%和20%的相对误差取样数的精度需求。运用主成分分析,通过计算变量的Norm值筛选速效钾、碳酸钙、全磷、全氮、土壤有机质、CEC、速效磷、全钾进入土壤肥力评价的最小数据库集(MDS),利用非线性隶属度函数实现评价指标量纲归一化且评价单项肥力指标,利用雷达图直观地反映单因素肥力指标在土壤中的状态以及土壤肥力的整体状况,结果表明,全磷和碳酸钙是影响研究区土壤肥力的限制性因子。研究区域在田块评价尺度上的土壤肥力综合指数范围为0.7~0.8。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号