首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 171 毫秒
1.
A standard multivariate principal components (PCs) method was utilized to identify clusters of variables that may be controlled by a common gene or genes (pleiotropy). Heritability estimates were obtained and linkage analyses performed on six individual traits (total cholesterol (Chol), high and low density lipoproteins, triglycerides (TG), body mass index (BMI), and systolic blood pressure (SBP)) and on each PC to compare our ability to identify major gene effects. Using the simulated data from Genetic Analysis Workshop 13 (Cohort 1 and 2 data for year 11), the quantitative traits were first adjusted for age, sex, and smoking (cigarettes per day). Adjusted variables were standardized and PCs calculated followed by orthogonal transformation (varimax rotation). Rotated PCs were then subjected to heritability and quantitative multipoint linkage analysis. The first three PCs explained 73% of the total phenotypic variance. Heritability estimates were above 0.60 for all three PCs. We performed linkage analyses on the PCs as well as the individual traits. The majority of pleiotropic and trait-specific genes were not identified. Standard PCs analysis methods did not facilitate the identification of pleiotropic genes affecting the six traits examined in the simulated data set. In addition, genes contributing 20% of the variance in traits with over 0.60 heritability estimates could not be identified in this simulated data set using traditional quantitative trait linkage analyses. Lack of identification of pleiotropic and trait-specific genes in some cases may reflect their low contribution to the traits/PCs examined or more importantly, characteristics of the sample group analyzed, and not simply a failure of the PC approach itself.  相似文献   

2.
Influence in principal components analysis   总被引:4,自引:0,他引:4  
CRITCHLEY  FRANK 《Biometrika》1985,72(3):627-636
  相似文献   

3.
Local influence in principal components analysis   总被引:5,自引:0,他引:5  
SHI  LEI 《Biometrika》1997,84(1):175-186
  相似文献   

4.
5.
J Ma  CI Amos 《PloS one》2012,7(7):e40224
Despite the significant advances made over the last few years in mapping inversions with the advent of paired-end sequencing approaches, our understanding of the prevalence and spectrum of inversions in the human genome has lagged behind other types of structural variants, mainly due to the lack of a cost-efficient method applicable to large-scale samples. We propose a novel method based on principal components analysis (PCA) to characterize inversion polymorphisms using high-density SNP genotype data. Our method applies to non-recurrent inversions for which recombination between the inverted and non-inverted segments in inversion heterozygotes is suppressed due to the loss of unbalanced gametes. Inside such an inversion region, an effect similar to population substructure is thus created: two distinct "populations" of inversion homozygotes of different orientations and their 1:1 admixture, namely the inversion heterozygotes. This kind of substructure can be readily detected by performing PCA locally in the inversion regions. Using simulations, we demonstrated that the proposed method can be used to detect and genotype inversion polymorphisms using unphased genotype data. We applied our method to the phase III HapMap data and inferred the inversion genotypes of known inversion polymorphisms at 8p23.1 and 17q21.31. These inversion genotypes were validated by comparing with literature results and by checking Mendelian consistency using the family data whenever available. Based on the PCA-approach, we also performed a preliminary genome-wide scan for inversions using the HapMap data, which resulted in 2040 candidate inversions, 169 of which overlapped with previously reported inversions. Our method can be readily applied to the abundant SNP data, and is expected to play an important role in developing human genome maps of inversions and exploring associations between inversions and susceptibility of diseases.  相似文献   

6.
It has long been recognized that tooth crown diameters in hominoids are all positively intercorrelated one with another. This study reports on sex-specific correlation matrices derived from 2,650 individuals from the Solomon Islands, Melanesia. Mesiodistal and buccolingual diameters of all permanent teeth from one side are used, excluding third molars. Analysis discloses significant sex dimorphism in the strengths of the intercorrelations, with females being better integrated. Principal components analysis (PCA) provides an objective means of data reduction (shown here to be preferable to simple size summation methods) and decorrelation of the resulting linear combinations. Four components are extracted (with results being virtually identical in the two sexes) and arguments are put forth that varimax rotation to "a simpler solution" may be counterproductive. Before rotation, the four components are 1) overall size, 2) buccolingual widths contrasted with mesiodistal lengths, 3) anterior (I,C) contrasted with posterior (P,M) teeth, and 4) premolars contrasted with molars. Most of the explained (shared) variance (63%) extracted by PCA is in overall size of the dentition. There is a strong urge to view the results of these principal components analyses as reflective of biologically and genetically meaningful entities.  相似文献   

7.
Landgrebe J  Wurst W  Welzl G 《Genome biology》2002,3(4):research0019.1-research001911

Background  

In microarray data analysis, the comparison of gene-expression profiles with respect to different conditions and the selection of biologically interesting genes are crucial tasks. Multivariate statistical methods have been applied to analyze these large datasets. Less work has been published concerning the assessment of the reliability of gene-selection procedures. Here we describe a method to assess reliability in multivariate microarray data analysis using permutation-validated principal components analysis (PCA). The approach is designed for microarray data with a group structure.  相似文献   

8.
Principal components analysis revealed two main groups among 163 Vibrio anguillarum cultures from diseased fish from Norwegian waters. Nearly all isolates from farmed salmonids fell in group I (arabinose positive) but those from wild fish, particularly saithe Gadus virens , more commonly appeared in group II (arabinose negative).  相似文献   

9.
Many recent approaches to decoding neural spike trains depend critically on the assumption that for low-pass filtered spike trains, the temporal structure is optimally represented by a small number of linear projections onto the data. We therefore tested this assumption of linearity by comparing a linear factor analysis technique (principal components analysis) with a nonlinear neural network based method. It is first shown that the nonlinear technique can reliably identify a neuronally plausible nonlinearity in synthetic spike trains. However, when applied to the outputs from primary visual cortical neurons, this method shows no evidence for significant temporal nonlinearities. The implications of this are discussed. Received: 29 November 1996 / Accepted in revised form: 1 July 1997  相似文献   

10.
Gervini  Daniel 《Biometrika》2008,95(3):587-600
We present robust estimators for the mean and the principalcomponents of a stochastic process in . Robustness and asymptotic properties of theestimators are studied theoretically, by simulation and by example.It is shown that the proposed estimators are generally morerobust to outliers than the commonly used sample mean and principalcomponents, although their properties depend on the spacingsof the eigenvalues of the covariance function.  相似文献   

11.
We propose a modelling framework to study the relationship betweentwo paired longitudinally observed variables. The data for eachvariable are viewed as smooth curves measured at discrete time-pointsplus random errors. While the curves for each variable are summarizedusing a few important principal components, the associationof the two longitudinal variables is modelled through the associationof the principal component scores. We use penalized splinesto model the mean curves and the principal component curves,and cast the proposed model into a mixed-effects model frameworkfor model fitting, prediction and inference. The proposed methodcan be applied in the difficult case in which the measurementtimes are irregular and sparse and may differ widely acrossindividuals. Use of functional principal components enhancesmodel interpretation and improves statistical and numericalstability of the parameter estimates.  相似文献   

12.
The purpose of many microarray studies is to find the association between gene expression and sample characteristics such as treatment type or sample phenotype. There has been a surge of efforts developing different methods for delineating the association. Aside from the high dimensionality of microarray data, one well recognized challenge is the fact that genes could be complicatedly inter-related, thus making many statistical methods inappropriate to use directly on the expression data. Multivariate methods such as principal component analysis (PCA) and clustering are often used as a part of the effort to capture the gene correlation, and the derived components or clusters are used to describe the association between gene expression and sample phenotype. We propose a method for patient population dichotomization using maximally selected test statistics in combination with the PCA method, which shows favorable results. The proposed method is compared with a currently well-recognized method.  相似文献   

13.
Dou Y  Mi H  Zhao L  Ren Y  Ren Y 《Analytical biochemistry》2006,351(2):174-180
A method for simultaneous, nondestructive analysis of aminopyrine and phenacetin in compound aminopyrine phenacetin tablets with different concentrations has been developed by principal component artificial neural networks (PC-ANNs) on near-infrared (NIR) spectroscopy. In PC-ANN models, the spectral data were initially analyzed by principal component analysis. Then the scores of the principal components were chosen as input nodes for the input layer instead of the spectral data. The artificial neural network models using the spectral data as input nodes were also established and compared with the PC-ANN models. Four different preprocessing methods (first-derivative, second-derivative, standard normal variate (SNV), and multiplicative scatter correction) were applied to three sets of NIR spectra of compound aminopyrine phenacetin tablets. The PC-ANNs approach with SNV preprocessing spectra was found to provide the best results. The degree of approximation was performed as the selective criterion of the optimum network parameters.  相似文献   

14.
Summary Principal components analysis is well suited for many data analysis problems in ecology, particularly for data reduction and hypothesis generation; but the structure of PCA is poorly suited for indirect gradient analysis. Whatever the intended application of PCA, the user must exercise special care in selecting data transformations to prevent the analysis from being overwhelmed by the purely numerical effects in the variance structure of the data.I would like to thank R. H. Whittaker, H. G. Gauch, R. E. Moeller, and S. R. Searle for their guidance and assistance.  相似文献   

15.
16.
The immense volume and rapid growth of human genomic data, especially single nucleotide polymorphisms (SNPs), present special challenges for both biomedical researchers and automatic algorithms. One such challenge is to select an optimal subset of SNPs, commonly referred as "haplotype tagging SNPs" (htSNPs), to capture most of the haplotype diversity of each haplotype block or gene-specific region. This information-reduction process facilitates cost-effective genotyping and, subsequently, genotype-phenotype association studies. It also has implications for assessing the risk of identifying research subjects on the basis of SNP information deposited in public domain databases. We have investigated methods for selecting htSNPs by use of principal components analysis (PCA). These methods first identify eigenSNPs and then map them to actual SNPs. We evaluated two mapping strategies, greedy discard and varimax rotation, by assessing the ability of the selected htSNPs to reconstruct genotypes of non-htSNPs. We also compared these methods with two other htSNP finders, one of which is PCA based. We applied these methods to three experimental data sets and found that the PCA-based methods tend to select the smallest set of htSNPs to achieve a 90% reconstruction precision.  相似文献   

17.
The recent release of the Bovine HapMap dataset represents the most detailed survey of bovine genetic diversity to date, providing an important resource for the design and development of livestock production. We studied this dataset, comprising more than 30,000 Single Nucleotide Polymorphisms (SNPs) for 19 breeds (13 taurine, three zebu, and three hybrid breeds), seeking to identify small panels of genetic markers that can be used to trace the breed of unknown cattle samples. Taking advantage of the power of Principal Components Analysis and algorithms that we have recently described for the selection of Ancestry Informative Markers from genomewide datasets, we present a decision-tree which can be used to accurately infer the origin of individual cattle. In doing so, we present a thorough examination of population genetic structure in modern bovine breeds. Performing extensive cross-validation experiments, we demonstrate that 250-500 carefully selected SNPs suffice in order to achieve close to 100% prediction accuracy of individual ancestry, when this particular set of 19 breeds is considered. Our methods, coupled with the dense genotypic data that is becoming increasingly available, have the potential to become a valuable tool and have considerable impact in worldwide livestock production. They can be used to inform the design of studies of the genetic basis of economically important traits in cattle, as well as breeding programs and efforts to conserve biodiversity. Furthermore, the SNPs that we have identified can provide a reliable solution for the traceability of breed-specific branded products.  相似文献   

18.
Non-centred Principal Components Analysis (NPCA) ordinates sites and species simultaneously, and can be solved either by direct iteration or by eigenvector calculation. The weight of sites and species in the analysis is proportional to their overall abundance. Because of this, the method is not susceptible to distortion by rare species, as is the case with Reciprocal Averaging (RA). Detrending techniques can also be applied to this method to eliminate arch effects.When NPCA was tried with field data, it produced ordination axes that were significantly associated to independently measured environmental variables. In contrast, RA failed to produce axes related to environmental factors, even after the main rare species had been eliminated from the analysis.Abbreviations NPCA Non-centred Principal Components Analysis - RA Reciprocal Averaging  相似文献   

19.
Guanine-rich DNA repeat sequences located at the terminal ends of chromosomal DNA can fold in a sequence-dependent manner into G-quadruplex structures, notably the terminal 150-200 nucleotides at the 3′ end, which occur as a single-stranded DNA overhang. The crystal structures of quadruplexes with two and four human telomeric repeats show an all-parallel-stranded topology that is readily capable of forming extended stacks of such quadruplex structures, with external TTA loops positioned to potentially interact with other macromolecules. This study reports on possible arrangements for these quadruplex dimers and tetramers, which can be formed from 8 or 16 telomeric DNA repeats, and on a methodology for modeling their interactions with small molecules. A series of computational methods including molecular dynamics, free energy calculations, and principal components analysis have been used to characterize the properties of these higher-order G-quadruplex dimers and tetramers with parallel-stranded topology. The results confirm the stability of the central G-tetrads, the individual quadruplexes, and the resulting multimers. Principal components analysis has been carried out to highlight the dominant motions in these G-quadruplex dimer and multimer structures. The TTA loop is the most flexible part of the model and the overall multimer quadruplex becoming more stable with the addition of further G-tetrads. The addition of a ligand to the model confirms the hypothesis that flat planar chromophores stabilize G-quadruplex structures by making them less flexible.  相似文献   

20.
基于主成分分析的土壤肥力综合指数评价   总被引:38,自引:0,他引:38  
以杨凌为研究区域,选择田块尺度为评价单元,采集27个田块土壤,利用模糊数学中综合指数评价模型进行土壤肥力质量评价。经取样数合理性统计分析,27个样点估计研究区域土壤肥力评价能够满足90%置信水平条件下15%和20%的相对误差取样数的精度需求。运用主成分分析,通过计算变量的Norm值筛选速效钾、碳酸钙、全磷、全氮、土壤有机质、CEC、速效磷、全钾进入土壤肥力评价的最小数据库集(MDS),利用非线性隶属度函数实现评价指标量纲归一化且评价单项肥力指标,利用雷达图直观地反映单因素肥力指标在土壤中的状态以及土壤肥力的整体状况,结果表明,全磷和碳酸钙是影响研究区土壤肥力的限制性因子。研究区域在田块评价尺度上的土壤肥力综合指数范围为0.7~0.8。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号