首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 843 毫秒
1.
Comparative methods analyses have usually assumed that the species phenotypes are the true means for those species. In most analyses, the actual values used are means of samples of modest size. The covariances of contrasts then involve both the covariance of evolutionary changes and a fraction of the within-species phenotypic covariance, the fraction depending on the sample size for that species. Ives et al. have shown how to analyze data in this case when the within-species phenotypic covariances are known. The present model allows them to be unknown and to be estimated from the data. A multivariate normal statistical model is used for multiple characters in samples of finite size from species related by a known phylogeny, under the usual Brownian motion model of change and with equal within-species phenotypic covariances. Contrasts in each character can be obtained both between individuals within a species and between species. Each contrast can be taken for all of the characters. These sets of contrasts, each the same contrast taken for different characters, are independent. The within-set covariances are unequal and depend on the unknown true covariance matrices. An expectation-maximization algorithm is derived for making a reduced maximum likelihood estimate of the covariances of evolutionary change and the within-species phenotypic covariances. It is available in the Contrast program of the PHYLIP package. Computer simulations show that the covariances are biased when the finiteness of sample size is not taken into account and that using the present model corrects the bias. Sampling variation reduces the power of inference of covariation in evolution of different characters. An extension of this method to incorporate estimates of additive genetic covariances from a simple genetic experiment is also discussed.  相似文献   

2.
Mathew T  Nordström K 《Biometrics》1999,55(4):1221-1223
When data come from several independent studies for the purpose of estimating treatment control differences, meta-analysis can be carried out either on the best linear unbiased estimators computed from each study or on the pooled individual patient data modelled as a two-way model without interaction, where the two factors represent the different studies and the different treatments. Assuming that observations within and between studies are independent having a common variance, Olkin and Sampson (1998) have obtained the surprising result that the two meta-analytic procedures are equivalent, i.e., they both produce the same estimator. In this article, the same equivalence is established for the two-way fixed-effects model without interaction with the only assumption that the observations across studies be independent. A consequence of the equivalence result is that, regardless of the covariance structure, it is possible to get an explicit representation for the best linear unbiased estimator of any vector of treatment contrasts in a two-way fixed-effects model without interaction as long as the studies are independent. Another interesting consequence is that, for the purpose of best linear unbiased estimation, an unbalanced two-way fixed-effects model without interaction can be treated as several independent unbalanced one-way models, regardless of the covariance structure, when the studies are independent.  相似文献   

3.
We introduce a new method, moment reconstruction, of correcting for measurement error in covariates in regression models. The central idea is similar to regression calibration in that the values of the covariates that are measured with error are replaced by "adjusted" values. In regression calibration the adjusted value is the expectation of the true value conditional on the measured value. In moment reconstruction the adjusted value is the variance-preserving empirical Bayes estimate of the true value conditional on the outcome variable. The adjusted values thereby have the same first two moments and the same covariance with the outcome variable as the unobserved "true" covariate values. We show that moment reconstruction is equivalent to regression calibration in the case of linear regression, but leads to different results for logistic regression. For case-control studies with logistic regression and covariates that are normally distributed within cases and controls, we show that the resulting estimates of the regression coefficients are consistent. In simulations we demonstrate that for logistic regression, moment reconstruction carries less bias than regression calibration, and for case-control studies is superior in mean-square error to the standard regression calibration approach. Finally, we give an example of the use of moment reconstruction in linear discriminant analysis and a nonstandard problem where we wish to adjust a classification tree for measurement error in the explanatory variables.  相似文献   

4.
We have recently developed analysis methods (GREML) to estimate the genetic variance of a complex trait/disease and the genetic correlation between two complex traits/diseases using genome-wide single nucleotide polymorphism (SNP) data in unrelated individuals. Here we use analytical derivations and simulations to quantify the sampling variance of the estimate of the proportion of phenotypic variance captured by all SNPs for quantitative traits and case-control studies. We also derive the approximate sampling variance of the estimate of a genetic correlation in a bivariate analysis, when two complex traits are either measured on the same or different individuals. We show that the sampling variance is inversely proportional to the number of pairwise contrasts in the analysis and to the variance in SNP-derived genetic relationships. For bivariate analysis, the sampling variance of the genetic correlation additionally depends on the harmonic mean of the proportion of variance explained by the SNPs for the two traits and the genetic correlation between the traits, and depends on the phenotypic correlation when the traits are measured on the same individuals. We provide an online tool for calculating the power of detecting genetic (co)variation using genome-wide SNP data. The new theory and online tool will be helpful to plan experimental designs to estimate the missing heritability that has not yet been fully revealed through genome-wide association studies, and to estimate the genetic overlap between complex traits (diseases) in particular when the traits (diseases) are not measured on the same samples.  相似文献   

5.
We prove that the slope parameter of the ordinary least squares regression of phylogenetically independent contrasts (PICs) conducted through the origin is identical to the slope parameter of the method of generalized least squares (GLSs) regression under a Brownian motion model of evolution. This equivalence has several implications: 1. Understanding the structure of the linear model for GLS regression provides insight into when and why phylogeny is important in comparative studies. 2. The limitations of the PIC regression analysis are the same as the limitations of the GLS model. In particular, phylogenetic covariance applies only to the response variable in the regression and the explanatory variable should be regarded as fixed. Calculation of PICs for explanatory variables should be treated as a mathematical idiosyncrasy of the PIC regression algorithm. 3. Since the GLS estimator is the best linear unbiased estimator (BLUE), the slope parameter estimated using PICs is also BLUE. 4. If the slope is estimated using different branch lengths for the explanatory and response variables in the PIC algorithm, the estimator is no longer the BLUE, so this is not recommended. Finally, we discuss whether or not and how to accommodate phylogenetic covariance in regression analyses, particularly in relation to the problem of phylogenetic uncertainty. This discussion is from both frequentist and Bayesian perspectives.  相似文献   

6.
Molecular markers allow to estimate the pairwise relatedness between the members of a breeding pool when their selection history is no longer available or has become too complex for a classical pedigree analysis. The field of population genetics has several estimation procedures at its disposal, but when the genotyped individuals are highly selected inbred lines, their application is not warranted as the theoretical assumptions on which these estimators were built, usually linkage equilibrium between marker loci or even Hardy–Weinberg equilibrium, are not met. An alternative approach requires the availability of a genotyped reference set of inbred lines, which allows to correct the observed marker similarities for their inherent upward bias when used as a coancestry measure. However, this approach does not guarantee that the resulting coancestry matrix is at least positive semi-definite (psd), a necessary condition for its use as a covariance matrix. In this paper we present the weighted alikeness in state (WAIS) estimator. This marker-based coancestry estimator is compared to several other commonly applied relatedness estimators under realistic hybrid breeding conditions in a number of simulations. We also fit a linear mixed model to phenotypical data from a commercial maize breeding programme and compare the likelihood of the different variance structures. WAIS is shown to be psd which makes it suitable for modelling the covariance between genetic components in linear mixed models involved in breeding value estimation or association studies. Results indicate that it generally produces a low root mean squared error under different breeding circumstances and provides a fit to the data that is comparable to that of several other marker-based alternatives. Recommendations for each of the examined coancestry measures are provided.  相似文献   

7.
Krafty RT  Gimotty PA  Holtz D  Coukos G  Guo W 《Biometrics》2008,64(4):1023-1031
SUMMARY: In this article we develop a nonparametric estimation procedure for the varying coefficient model when the within-subject covariance is unknown. Extending the idea of iterative reweighted least squares to the functional setting, we iterate between estimating the coefficients conditional on the covariance and estimating the functional covariance conditional on the coefficients. Smoothing splines for correlated errors are used to estimate the functional coefficients with smoothing parameters selected via the generalized maximum likelihood. The covariance is nonparametrically estimated using a penalized estimator with smoothing parameters chosen via a Kullback-Leibler criterion. Empirical properties of the proposed method are demonstrated in simulations and the method is applied to the data collected from an ovarian tumor study in mice to analyze the effects of different chemotherapy treatments on the volumes of two classes of tumors.  相似文献   

8.
A major challenge in biomedical studies in recent years has been the classification of gene expression profiles into categories, such as cases and controls. This is done by first training a classifier by using a labeled training set containing labeled samples from the two populations, and then using that classifier to predict the labels of new samples. Such predictions have recently been shown to improve the diagnosis and treatment selection practices for several diseases. This procedure is complicated, however, by the high dimensionality if the data. While microarrays can measure the levels of thousands of genes per sample, case-control microarray studies usually involve no more than several dozen samples. Standard classifiers do not work well in these situations where the number of features (gene expression levels measured in these microarrays) far exceeds the number of samples. Selecting only the features that are most relevant for discriminating between the two categories can help construct better classifiers, in terms of both accuracy and efficiency. In this work we developed a novel method for multivariate feature selection based on the Partial Least Squares algorithm. We compared the method''s variants with common feature selection techniques across a large number of real case-control datasets, using several classifiers. We demonstrate the advantages of the method and the preferable combinations of classifier and feature selection technique.  相似文献   

9.
Simultaneous confidence intervals for contrasts of means in a one-way layout with several independent samples are well established for Gaussian distributed data. Procedures addressing different hypotheses are available, such as all pairwise comparisons or comparisons to control, comparison with average, or different tests for order-restricted alternatives. However, if the distribution of the response is not Gaussian, corresponding methods are usually not available or not implemented in software. For the case of comparisons among several binomial proportions, we extended recently proposed confidence interval methods for the difference of two proportions or single contrasts to multiple contrasts by using quantiles of the multivariate normal distribution, taking the correlation into account. The small sample performance of the proposed methods was investigated in simulation studies. The simple adjustment of adding 2 pseudo-observations to each sample estimate leads to reasonable coverage probabilities. The methods are illustrated by the evaluation of real data examples of a clinical trial and a toxicological study. The proposed methods and examples are available in the R package MCPAN. ((c) 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim).  相似文献   

10.
In DNA microarray analysis, there is often interest in isolating a few genes that best discriminate between tissue types. This is especially important in cancer, where different clinicopathologic groups are known to vary in their outcomes and response to therapy. The identification of a small subset of gene expression patterns distinctive for tumor subtypes can help design treatment strategies and improve diagnosis. Toward this goal, we propose a methodology for the analysis of high-density oligonucleotide arrays. The gene expression measures are modeled as censored data to account for the quantification limits of the technology, and two gene selection criteria based on contrasts from an analysis of covariance (ANCOVA) model are presented. The model is formulated in a hierarchical Bayesian framework, which in addition to making the fit of the model straightforward and computationally efficient, allows us to borrow strength across genes. The elicitation of hierarchical priors, as well as issues related to parameter identifiability and posterior propriety, are discussed in detail. We examine the performance of our proposed method on simulated data, then present a detailed case study of an endometrial cancer dataset.  相似文献   

11.
Hsu L  Chen L  Gorfine M  Malone K 《Biometrics》2004,60(4):936-944
Estimating marginal hazard function from the correlated failure time data arising from case-control family studies is complicated by noncohort study design and risk heterogeneity due to unmeasured, shared risk factors among the family members. Accounting for both factors in this article, we propose a two-stage estimation procedure. At the first stage, we estimate the dependence parameter in the distribution for the risk heterogeneity without obtaining the marginal distribution first or simultaneously. Assuming that the dependence parameter is known, at the second stage we estimate the marginal hazard function by iterating between estimation of the risk heterogeneity (frailty) for each family and maximization of the partial likelihood function with an offset to account for the risk heterogeneity. We also propose an iterative procedure to improve the efficiency of the dependence parameter estimate. The simulation study shows that both methods perform well under finite sample sizes. We illustrate the method with a case-control family study of early onset breast cancer.  相似文献   

12.
Abstract This study is concerned with statistical methods used for the analysis of comparative data (in which observations are not expected to be independent because they are sampled across phylogenetically related species). The phylogenetically independent contrasts (PIC), phylogenetic generalized least‐squares (PGLS), and phylogenetic autocorrelation (PA) methods are compared. Although the independent contrasts are not orthogonal, they are independent if the data conform to the Brownian motion model of evolution on which they are based. It is shown that uncentered correlations and regressions through the origin using the PIC method are identical to those obtained using PGLS with an intercept included in the model. The PIC method is a special case of PGLS. Corrected standard errors are given for estimates of the ancestral states based on the PGLS approach. The treatment of trees with hard polytomies is discussed and is shown to be an algorithmic rather than a statistical problem. Some of the relationships among the methods are shown graphically using the multivariate space in which variables are represented as vectors with respect to OTUs used as coordinate axes. The maximum‐likelihood estimate of the autoregressive parameter, ρ, has not been computed correctly in previous studies (an appendix with MATLAB code provides a corrected algorithm). The importance of the eigenvalues and eigenvectors of the connection matrix, W, for the distribution of ρ is discussed. The PA method is shown to have several problems that limit its usefulness in comparative studies. Although the PA method is a generalized least‐squares procedure, it cannot be made equivalent to the PGLS method using a phylogenetic model.  相似文献   

13.
Deng HW  Gao G  Li JL 《Genetics》2002,162(3):1487-1500
The genomes of all organisms are subject to continuous bombardment of deleterious genomic mutations (DGM). Our ability to accurately estimate various parameters of DGM has profound significance in population and evolutionary genetics. The Deng-Lynch method can estimate the parameters of DGM in natural selfing and outcrossing populations. This method assumes constant fitness effects of DGM and hence is biased under variable fitness effects of DGM. Here, we develop a statistical method to estimate DGM parameters by considering variable mutation effects across loci. Under variable mutation effects, the mean fitness and genetic variance for fitness of parental and progeny generations across selfing/outcrossing in outcrossing/selfing populations and the covariance between mean fitness of parents and that of their progeny are functions of DGM parameters: the genomic mutation rate U, average homozygous effect s, average dominance coefficient h, and covariance of selection and dominance coefficients cov(h, s). The DGM parameters can be estimated by the algorithms we developed herein, which may yield improved estimation of DGM parameters over the Deng-Lynch method as demonstrated by our simulation studies. Importantly, this method is the first one to characterize cov(h, s) for DGM.  相似文献   

14.
Several methods have been proposed to estimate the variance in disease liability explained by large sets of genetic markers. However, current methods do not scale up well to large sample sizes. Linear mixed models require solving high-dimensional matrix equations, and methods that use polygenic scores are very computationally intensive. Here we propose a fast analytic method that uses polygenic scores, based on the formula for the non-centrality parameter of the association test of the score. We estimate model parameters from the results of multiple polygenic score tests based on markers with p values in different intervals. We estimate parameters by maximum likelihood and use profile likelihood to compute confidence intervals. We compare various options for constructing polygenic scores, based on nested or disjoint intervals of p values, weighted or unweighted effect sizes, and different numbers of intervals, in estimating the variance explained by a set of markers, the proportion of markers with effects, and the genetic covariance between a pair of traits. Our method provides nearly unbiased estimates and confidence intervals with good coverage, although estimation of the variance is less reliable when jointly estimated with the covariance. We find that disjoint p value intervals perform better than nested intervals, but the weighting did not affect our results. A particular advantage of our method is that it can be applied to summary statistics from single markers, and so can be quickly applied to large consortium datasets. Our method, named AVENGEME (Additive Variance Explained and Number of Genetic Effects Method of Estimation), is implemented in R software.  相似文献   

15.
Summary In this article, we propose a positive stable shared frailty Cox model for clustered failure time data where the frailty distribution varies with cluster‐level covariates. The proposed model accounts for covariate‐dependent intracluster correlation and permits both conditional and marginal inferences. We obtain marginal inference directly from a marginal model, then use a stratified Cox‐type pseudo‐partial likelihood approach to estimate the regression coefficient for the frailty parameter. The proposed estimators are consistent and asymptotically normal and a consistent estimator of the covariance matrix is provided. Simulation studies show that the proposed estimation procedure is appropriate for practical use with a realistic number of clusters. Finally, we present an application of the proposed method to kidney transplantation data from the Scientific Registry of Transplant Recipients.  相似文献   

16.
The paper reviews the linear mixed models (LMM) methodology that is suitable for the statistical and genetic analyses of spatially repeated measures collected from clonal progeny tests. For example, we consider a poplar clonal trial where progenies of different families are propagated by cuttings, and only one ramet per clone is planted on each block. Modeling covariance structures following the LMM theory allows improving genetic parameter estimation based on clonal testing. Besides variance components, we also obtained an estimate of the covariance between residuals (within clonal effects in two different blocks). This covariance is due to planting more than one ramet from the same genotype in the same trial, which generates correlated residual effects from different blocks. Its estimation can significantly improve the comparison among clones within a progeny test or between tests in a clonal testing network. Results indicate that the covariance is also a component of the genetic variance estimator and plays a significant role in assessing the variance of specific (micro) environmental effects. A positive covariance implies that ramets show a similar performance in more than one block. Thus, a larger and more positive covariance implies a stronger genetic effect controlling the expression of the trait in the local environment and a smaller variance of specific environmental effects. On the contrary, a negative covariance implies that the performance of individual ramets is affected by strong microenvironmental effects, specific to one or more blocks, which can also directly increase the within-clone variability.  相似文献   

17.
Genomewide association studies are now a widely used approach in the search for loci that affect complex traits. After detection of significant association, estimates of penetrance and allele-frequency parameters for the associated variant indicate the importance of that variant and facilitate the planning of replication studies. However, when these estimates are based on the original data used to detect the variant, the results are affected by an ascertainment bias known as the "winner's curse." The actual genetic effect is typically smaller than its estimate. This overestimation of the genetic effect may cause replication studies to fail because the necessary sample size is underestimated. Here, we present an approach that corrects for the ascertainment bias and generates an estimate of the frequency of a variant and its penetrance parameters. The method produces a point estimate and confidence region for the parameter estimates. We study the performance of this method using simulated data sets and show that it is possible to greatly reduce the bias in the parameter estimates, even when the original association study had low power. The uncertainty of the estimate decreases with increasing sample size, independent of the power of the original test for association. Finally, we show that application of the method to case-control data can improve the design of replication studies considerably.  相似文献   

18.
Association mapping in structured populations   总被引:43,自引:0,他引:43       下载免费PDF全文
The use, in association studies, of the forthcoming dense genomewide collection of single-nucleotide polymorphisms (SNPs) has been heralded as a potential breakthrough in the study of the genetic basis of common complex disorders. A serious problem with association mapping is that population structure can lead to spurious associations between a candidate marker and a phenotype. One common solution has been to abandon case-control studies in favor of family-based tests of association, such as the transmission/disequilibrium test (TDT), but this comes at a considerable cost in the need to collect DNA from close relatives of affected individuals. In this article we describe a novel, statistically valid, method for case-control association studies in structured populations. Our method uses a set of unlinked genetic markers to infer details of population structure, and to estimate the ancestry of sampled individuals, before using this information to test for associations within subpopulations. It provides power comparable with the TDT in many settings and may substantially outperform it if there are conflicting associations in different subpopulations.  相似文献   

19.
Although mitochondrial DNA markers have several properties that make them suitable for phylogeographic studies, they are not free of difficulties. Phylogeographic inferences within and between closely related species can be mislead by introgression and retention of ancestral polymorphism. Here we combine different phylogenetic, phylogeographic, and population genetic methods to extract the maximum information from the Liolaemus darwinii complex. We estimate the phylogeographic structure of L. darwinii across most of its distributional range, and we then estimate relationships between L. darwinii and the syntopic species L. laurenti and L. grosseorum. Our results suggest that range expansion of these lineages brought them into secondary contact in areas where they are presently in syntopy. Here we present the first evidence for introgression in lizards from temperate South America (of L. danwinii mitochondrial DNA into L. laurenti and L. grosseorum), and for incomplete lineage sorting (between L. darwinii and L. laurenti). We show that a combination of methods can provide additional support for inferences derived from any single method and thus provide more robust interpretations and narrow the range of plausible hypotheses about mechanisms and processes of divergence. Additional studies are needed in this group of lizards and in other codistributed groups to determine if Pleistocene climatic changes could be a general factor influencing the evolutionary history of a regional biota.  相似文献   

20.
Genotype-environment interactions and natural selection can result in local specialization when different genotypes are favored in different environments. Restricted gene flow or genetic subdivision enhances local genetic diversification across a species when natural selection acts on such variation. The indirect evolution of reproductive isolation and the restriction of gene flow between species in statu nascendi may provide a central role for genotype-environment interactions in speciation genetics. We derive the expected genetic covariance between heterospecific and conspecific viability fitness under several different models of selection, dominance, and breeding structure. Standard quantitative genetic methods can be used to estimate these covariances in experimental studies. These genetic covariances permit us to evaluate in a formal way the indirect effects of selection within a species on the evolution of hybrid inviability between species. We find that, for autosomal loci and random mating, the genetic covariance across species is equal to the product of three quantities: (1) the viability of the best hybrid genotype; (2) the viability effect of an allele in hybrids; and, (3) the change in allele frequency due to selection in the conspecific population. Inbreeding within the conspecific population, expressed as Wright's coefficient, F, increases the genetic covariance by a factor (1 + F). In all cases, a negative genetic covariance across species is evidence for hybrid inviability evolving as an indirect effect of selection within species for adaptive (as opposed to neutral) genetic change. “It is an irony of evolutionary genetics, that although it is a fusion of Mendelism and Darwinism, it has made no direct contribution to what Darwin obviously saw as the fundamental problem: the origin of species…. While it is a question of elementary population genetics to state how many generations will be required for the frequency of an allele to change from q1 to q2, we do not know how to incorporate such a statement into speciation theory, in large part because we know virtually nothing about the genetic changes that occur in species formation.” (Lewontin 1974, p. 159)  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号