首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Shriner D 《Heredity》2011,107(5):413-420
Principal components analysis of genetic data is used to avoid inflation in type I error rates in association testing due to population stratification by covariate adjustment using the top eigenvectors and to estimate cluster or group membership independent of self-reported or ethnic identities. Eigendecomposition transforms correlated variables into an equal number of uncorrelated variables. Numerous stopping rules have been developed to identify which principal components should be retained. Recent developments in random matrix theory have led to a formal hypothesis test of the top eigenvalue, providing another way to achieve dimension reduction. In this study, I compare Velicer's minimum average partial test to a test on the basis of Tracy-Widom distribution as implemented in EIGENSOFT, the most widely used implementation of principal components analysis in genome-wide association analysis. By computer simulation of vicariance on the basis of coalescent theory, EIGENSOFT systematically overestimates the number of significant principal components. Furthermore, this overestimation is larger for samples of admixed individuals than for samples of unadmixed individuals. Overestimating the number of significant principal components can potentially lead to a loss of power in association testing by adjusting for unnecessary covariates and may lead to incorrect inferences about group differentiation. Velicer's minimum average partial test is shown to have both smaller bias and smaller variance, often with a mean squared error of 0, in estimating the number of principal components to retain. Velicer's minimum average partial test is implemented in R code and is suitable for genome-wide genotype data with or without population labels.  相似文献   

2.
Simulated event-related potential (ERP) components were used to investigate the ability of principal component analysis (PCA), Varimax rotation and univariate analysis of variance (ANOVA) to reconstruct component wave shapes to allocate variance correctly across components, and to identify the correct locus of simulated experimental treatment.The simulated ERPs consisted of 800 randomly weighted combinations of three 64-point components, corresponding to a 2 × 2 × 10 repeated-measures design with 20 subjects. Covariance PCAs, Varimax rotations and univariate ANOVAs were performed on each of 400 such stimulations, 100 with no effect of any experimental treatment and 100 each with main effects on each of the 3 components. Eight hundred additional simulations were performed to investigate the effects of systematic variations in the size of the experimental treatments and the number of subjects per experiment.The wave shapes of the simulated components were reconstructed reasonably well, although not completely, by the rotated principal component (PC) loadings. However, comparison of rotated PC scores with the random weights used to generate the simulated ERPs indicated that PCA incorrectly allocated variance across overlapping components, p producing dramatic increases in type I error (the largest in excess of 80%) for ANOVAs on one another. Although these results should not on one another. Although these results should not be overgeneralized, they clearly demonstrate that the PCA-Varimax-ANOVA strategy can incorrectly distribute variance across components, resulting in serious misinterpretation of treatment effects. Additional simulation studies are needed to determine the generality of the variance misallocation problem; pending the outcome of such studies, results obtained with the PCA-Varimax-ANOVA strategy should be interpreted cautiously.  相似文献   

3.
We examined acoustic individuality in wild agile gibbon Hylobates agilis agilis and determined the acoustic variables that contribute to individual discrimination using multivariate analyses. We recorded 125 female-specific songs (great calls) from six groups in west Sumatra and measured 58 acoustic variables for each great call. We performed principal component analysis to summarize the 58 variables into six acoustic principal components (PCs). Generally, each PC corresponded to a part of the great call. Significant individual differences were found across six individual gibbons in each of the six PCs. Moreover, strong acoustic individuality was found in the introductory and climax parts of the great call. In contrast, the terminal part contributed little to individual identification. Discriminant analysis showed that these PCs contributed to individual discrimination with high repeatability. Although we cannot conclude that agile gibbon use these acoustic components for individual discrimination, they are potential candidates for individual recognition.  相似文献   

4.
Principal component analysis (PCA) is a one-group method. Its purpose is to transform correlated variables into uncorrelated ones and to find linear combinations accounting for a relatively large amount of the total variability, thus reducing the number of original variables to a few components only.
In the simultaneous analysis of different groups, similarities between the principal component structures can often be modelled by the methods of common principal components (CPCs) or partial CPCs. These methods assume that either all components or only some of them are common to all groups, the discrepancies being due mainly to sampling error.
Previous authors have dealt with the k-group situation either by pooling the data of all groups, or by pooling the within-group variance-covariance matrices before performing a PCA. The latter technique is known as multiple group principal component analysis or MGPCA (Thorpe, 1983a). We argue that CPC- or partial CPC-analysis is often more appropriate than these previous methods.
A morphometrical example using males and females of Microtus californicus and M. ochrogaster is presented, comparing PCA, CPC and partial CPC analyses. It is shown that the new methods yield estimated components having smaller standard errors than when groupwise analyses are performed. Formulas are given for estimating standard errors of the eigenvalues and eigenvectors, as well as for computing the likelihood ratio statistic used to test the appropriateness of the CPC- or partial CPC-model.  相似文献   

5.
Principal component analysis is a powerful tool in biomechanics for reducing complex multivariate datasets to a subset of important parameters. However, interpreting the biomechanical meaning of these parameters can be a subjective process. Biomechanical interpretations that are based on visual inspection of extreme 5th and 95th percentile waveforms may be confounded when extreme waveforms express more than one biomechanical feature. This study compares interpretation of principal components using representative extremes with a recently developed method, called single component reconstruction, which provides an uncontaminated visualization of each individual biomechanical feature. Example datasets from knee joint moments, lateral gastrocnemius EMG, and lumbar spine kinematics are used to demonstrate that the representative extremes method and single component reconstruction can yield equivalent interpretations of principal components. However, single component reconstruction interpretation cannot be contaminated by other components, which may enhance the use and understanding of principal component analysis within the biomechanics community.  相似文献   

6.
A comparison is made between a 200-ps molecular dynamics simulation in vacuum and a normal mode analysis on the protein bovine pancreatic trypsin inhibitor (BPTI) in order to elucidate the dual aspects of harmonicity and anharmonicity in the dynamics of proteins. The molecular dynamics trajectory is analyzed using principal component analysis, an effective harmonic analysis suited for comparison with the results from the normal mode analysis. The results suggest that the first principal component shows qualitatively different behavior from higher principal components and is associated with apparent barrier crossing events on an anharmonic conformational energy surface. The higher principal components appear to have probability distributions that are well approximated by Gaussians, indicating harmonicity. Eliminating the contribution from the first principal component reveals a great deal of correspondence between the 2 methods. This correspondence, however, involves a factor of 2, as the variances of the distribution of the higher principal components are, on average, roughly twice those found from the normal mode analysis. A model is proposed to reconcile these results with those from previous analyses.  相似文献   

7.
Availability of complete genome sequences allows in-depth comparison of single-residue and oligopeptide compositions of the corresponding proteomes. We have used principal component analysis (PCA) to study the landscape of compositional motifs across more than 70 genera from all three superkingdoms. Unexpectedly, the first two principal components clearly differentiate archaea, eubacteria, and eukaryota from each other. In particular, we contrast compositional patterns typical of the three superkingdoms and characterize differences between species and phyla, as well as among patterns shared by all compositional proteomic signatures. These species-specific patterns may even extend to subsets of the entire proteome, such as proteins pertaining to individual yeast chromosomes. We identify factors that affect compositional signatures, such as living habitat, and detect strong eukaryotic preference for homopeptides and palindromic tripeptides. We further detect oligopeptides that are either universally over- or underabundant across the whole proteomic landscape, as well as oligopeptides whose over- or underabundance is phylum- or species-specific. Finally, we report that species composition signatures preserve evolutionary memory, providing a new method to compare phylogenetic relationships among species that avoids problems of sequence alignment and ortholog detection.  相似文献   

8.
In this study, we examine acoustic individuality in male duet songs of wild, non-habituated Bornean southern gibbons (Hylobates albibarbis) and identify contributing acoustic variables. We recorded 174 male duet songs from nine groups in a rainforest in Central Kalimantan, Indonesia. Each male portion of the duet was analysed for 14 acoustic variables at three levels of variation, including six note-specific variables (start frequency, end frequency, minimum frequency, maximum frequency, average frequency and duration), four phrase-specific variables (minimum frequency, maximum frequency, duration and number of syllables) and four song-specific variables (minimum frequency, maximum frequency, duration and number of syllables). Principal component analysis was performed to summarise each of these sets of variables into a total of six principal components (PCs). Strong acoustic individuality was found in all PCs and at all three levels: note, phrase and song (all p < 0.001). Furthermore, a particularly high magnitude of individuality was found in PC 1 of the song-specific analysis, defined by the acoustic variables of duration and number of syllables. Due to the high levels of individuality, we suggest that these acoustic variables may be used by Bornean southern gibbons for individual discrimination. As well as furthering our biological understanding of male gibbon song with regards to individuality and associated conspecific recognition, these findings also have the potential to help improve population survey methods, such as the acoustic sampling method using listening points, by offering a more accurate method of individual recognition.  相似文献   

9.
Ott J  Rabinowitz D 《Human heredity》1999,49(2):106-111
For many traits, genetically relevant disease definition is unclear. For this reason, researchers applying linkage analysis often obtain information on a variety of items. With a large number of items, however, the test statistic from a multivariate analysis may require a prohibitively expensive correction for the multiple comparisons. The researcher is faced, therefore, with the issue of choosing which variables or combinations of variables to use in the linkage analysis. One approach to combining items is to first subject the data to a principal components analysis, and then perform the linkage analysis of the first few principal components. However, principal-components analyses do not take family structure into account. Here, an approach is developed in which family structure is taken into account when combining the data. The essence of the approach is to define principal components of heritability as the scores with maximum heritability in the data set, subject to being uncorrelated with each other. The principal components of heritability may be calculated as the solutions to a generalized eigensystem problem. Four simulation experiments are used to compare the power of linkage analyses based on the principal components of heritability and the usual principal components. The first of the experiments corresponds to the null hypothesis of no linkage. The second corresponds to a setting where the two kinds of principal components coincide. The third corresponds to a setting in which they are quite different and where the first of the usual principal components is not expected to have any power beyond the type I error rate. The fourth set of experiments corresponds to a setting where the usual principal components and the principal components of heritability differ, but where the first of the usual principal components is not without power. The results of the simulation experiments indicate that the principal components of heritability can be substantially different from the standard principal components and that when they are different, substantial gains in power can result by using the principal components of heritability in place of the standard principal components in linkage analyses.  相似文献   

10.
A quickly growing number of characteristics reflecting various aspects of gene function and evolution can be either measured experimentally or computed from DNA and protein sequences. The study of pairwise correlations between such quantitative genomic variables as well as collective analysis of their interrelations by multidimensional methods have delivered crucial insights into the processes of molecular evolution. Here, we present a principal component analysis (PCA) of 16 genomic variables from Saccharomyces cerevisiae, the largest data set analyzed so far. Because many missing values and potential outliers hinder the direct calculation of principal components, we introduce the application of Bayesian PCA. We confirm some of the previously established correlations, such as evolutionary rate versus protein expression, and reveal new correlations such as those between translational efficiency, phosphorylation density, and protein age. Although the first principal component primarily contrasts genomic change and protein expression, the second component separates variables related to gene existence and expressed protein functions. Enrichment analysis on genes affecting variable correlations unveils classes of influential genes. For example, although ribosomal and nuclear transport genes make important contributions to the correlation between protein isoelectric point and molecular weight, protein synthesis and amino acid metabolism genes help cause the lack of significant correlation between propensity for gene loss and protein age. We present the novel Quagmire database (Quantitative Genomics Resource) which allows exploring relationships between more genomic variables in three model organisms-Escherichia coli, S. cerevisiae, and Homo sapiens (http://webclu.bio.wzw.tum.de:18080/quagmire).  相似文献   

11.
本文以生活在不同地区的9组人群的成年男性头骨(668例)为主要研究对象,通过对其14项测量性状的聚类分析和主成分分析,探讨多变量统计分析方法在人类学研究中的价值。结果显示:欧氏距离系数可以初步判断各组人群的相互关系及差异;根据聚类分析树枝图推出的人群间的相互关系受作者主观意识的影响,可信的结论应建立在多种聚类方法产生的结果一致的基础上;主成分分析的结果与选取的变量有一定关系,选取不同的变量组,其结果会受到影响。同聚类分析方法相比,主成分分析方法相对较好地反映了人群间的相互关系。本文研究结果提示,应慎重对待多变量统计方法得出的人群间相互关系的结论。  相似文献   

12.
We propose a new method for selection of the most informative variables from the set of variables which can be measured directly. The information is measured by metrics similar to those used in experimental design theory, such as determinant of the dispersion matrix of prediction or various functions of its eigenvalues. The basic model admits both population variability and observational errors, which allows us to introduce algorithms based on ideas of optimal experimental design. Moreover, we can take into account cost of measuring various variables which makes the approach more practical. It is shown that the selection of optimal subsets of variables is invariant to scale transformations unlike other methods of dimension reduction, such as principal components analysis or methods based on direct selection of variables, for instance principal variables and battery reduction. The performance of different approaches is compared using the clinical data.  相似文献   

13.
Principal components for allometric analysis   总被引:1,自引:0,他引:1  
Logarithmic bivariate regression slopes and logarithmic principal component coefficient ratios are two methods for estimating allometry coefficients corresponding to a in the classic power formula Y = BXa. Both techniques depend on high correlation between variables. Interpretation is logically limited to the variables included in analysis. Principal components analysis depends also on relatively uniform intercorrelations; given this, it serves satisfactorily as a method for summarizing many bivariate combinations. Unmodified major principal component coefficients cannot represent scaling to body weight; rather, they represent scaling to a composite size vector which usually is highly correlated with body size or weight but has an unspecified allometry. Thus, the concepts of proportionality and of isometry must be kept distinct.  相似文献   

14.
Cell viability assays are important tools in oncological research and clinical practice to assess the tumor cell sensitivity of individual patients. The purpose of this study was to demonstrate the comparability of 3 widely used assays (MTT, ATP, calcein assays) by principal component analysis. The study included 4 different cytostatics (cisplatin, docetaxel, doxorubicin, vinblastine) and 3 different human cancer cell lines (MCF-7, A2780, doxorubicin resistant A2780adr). Ninety-three percent of the total variance of all variables included in the principal component analysis (resulting from 3 cell lines and 3 assays) could be explained by 1 principal component. Factor loadings were > 0.937 except for the variable MTT-A2780adr, which was 0.872. These results indicate the similarity of the 3 assays. A 2nd principal component analysis included literature data and showed accordance of data from this study and the literature. The MTT assay was further improved as a high-throughput screening-capable assay. The ATP assay is able to detect effects of cytostatics already after 1 h incubation. The determination of resistance factors allowed to differentiate cytostatics into P-gp or non-P-gp substrates. In conclusion, this study provides improved microplate reader-based cell viability assays and sets a statistically solid basis for a future comparison of data obtained in different laboratories by any of the 3 assays.  相似文献   

15.
A set of cranial characters was examined in the fruit bats Rousettus egyptiacus and Eidolon helvum to compare trends and relative importance of major components of bilateral morphometric variation, and their relationship with character size. Using two‐way, sides‐by‐individuals ANOVA , four components of variation were estimated for each bilateral variable: individual variation (I), directional asymmetry (DA), non‐directional asymmetry (NDA) and measurement error (E). Both species exhibit similar major trends of variation in asymmetry across characters, as shown by principal component analysis, using variance components as variables. Degree of interspecific congruence among characters was confirmed by a two‐way ANOVA with species and variance components as fixed factors. Congruence of asymmetry patterns between species suggests that the concept of population asymmetry parameter (PAP) could be extended to higher hierarchies. PAPs above the species level may result from common mechanisms or similar developmental constraints acting on species’ buffering capacities and morphological integration processes.  相似文献   

16.
A method was developed to evaluate the cumulative effect of wetland mosaics in the landscape on stream water quality and quantity in the nine-county region surrounding Minneapolis—St. Paul, Minnesota. A Geographic Information System (GIS) was used to record and measure 33 watershed variables derived from historical aerial photos. These watershed variables were then reduced to eight principal components which explained 86% of the variance. Relationships between stream water quality variables and the three wetland-related principal components were explored through stepwise multiple regression analysis. The proximity of wetlands to the sampling station was related to principal component two, which was associated with decreased annual concentrations of inorganic suspended solids, fecal coliform, nitrates, specific conductivity, flow-weighted NH4 flow-weighted total P, and a decreased proportion of phosphorus in dissolved form(p < 0.05). Wetland extent was related to decreased specific conductivity, chloride, and lead concentrations. The wetland-related principal components were also associated with the seasonal export of organic matter, organic nitrogen, and orthophosphate. Relationships between water quality and wetlands components were different for time-weighted averages as compared to flow-weighted averages. This suggests that wetlands were more effective in removing suspended solids, total phosphorus, and ammonia during high flow periods but were more effective in removing nitrates during low flow periods.  相似文献   

17.
18.
The microstructure of a DNA helix is characterized by several base pair and base step parameters such as twist, rise, roll, propeller twist, etc., in addition to conformational parameters such as the backbone and the glycosidic torsion angles. Among these only a few, which are independent of all others and of each other, may be used to precisely characterize the helix. The problem however is to identify these independent parameters. We have used principal component analysis to identify a relatively small set of independent parameters, with which to characterize each DNA helix. We show that these principal components clearly discriminate between A and B DNA helical types. The calculations further suggest that the microstructure of a DNA helix is better characterized using dinucleotides.  相似文献   

19.
Principal component similarity (PCS) analysis was used to evaluate judge performance from a wine competition. Data were analyzed for five international judges and seven wine makers, for 42 white, 30 red and 25 specialty wines, using a 20-point quality scoring system. Principal similarity plots were used to group judges according to judging 'style' and to identify outliers, for each wine category. Judge groupings were consistent when three different references were used; however, the most interpretable PCS plot was obtained when the overall mean-judge-score was used as the reference. Results from PCS were compared to principal component analysis (PCA). PCS analysis allowed the information from all significant principal components to be graphically represented in two dimensions and was more successful in classifying judges than plots based on the first three principal components. The technique of PCS is an important complement to existing methodologies, and can provide wine competition coordinators with an objective technique for judge evaluation and selection.  相似文献   

20.
The recognition of individuals is a basic cognitive ability of social animals. A prerequisite for individual recognition is distinct characteristics that can be used to distinguish between other conspecific individuals. Studies of birds have shown that visual information, such as colour patterning, is used in individual recognition. However, in the case of monochromatic birds, colour patterning cannot be used to identify individuals. Therefore, we expected that the configuration of facial features, such as the shape of the bills or eyes, may have enough individuality to permit individual recognition in such species. In this study, we aimed to clarify visible individual differences in the facial configuration of large-billed crows (Corvus macrorhynchos). Specifically, we analysed the profile pictures of 16 crows. We measured 26 variables in 20 pictures of each bird and then performed principal component analysis and discriminant function analysis. The results showed that the configuration of the facial profiles was individually distinct, but re-classification by discriminant functions implied that it did not clearly differ between sexes. These results suggest that crows may be able to recognise individuals on the basis of the individuality of facial configuration, even in the absence of any conspicuous colour patterning.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号