首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Probabilistic models of sequence evolution are in widespreaduse in phylogenetics and molecular sequence evolution. Thesemodels have become increasingly sophisticated and combined withstatistical model comparison techniques have helped to shedlight on how genes and proteins evolve. Models of codon evolutionhave been particularly useful, because, in addition to providinga significant improvement in model realism for protein-codingsequences, codon models can also be designed to test hypothesesabout the selective pressures that shape the evolution of thesequences. Such models typically assume a phylogeny and canbe used to identify sites or lineages that have evolved adaptively.Recently some of the key assumptions that underlie phylogenetictests of selection have been questioned, such as the assumptionthat the rate of synonymous changes is constant across sitesor that a single phylogenetic tree can be assumed at all sitesfor recombining sequences. While some of these issues have beenaddressed through the development of novel methods, others remainas caveats that need to be considered on a case-by-case basis.Here, we outline the theory of codon models and their applicationto the detection of positive selection. We review some of themore recent developments that have improved their power andutility, laying a foundation for further advances in the modelingof coding sequence evolution.   相似文献   

2.
In the past, 2 kinds of Markov models have been considered to describe protein sequence evolution. Codon-level models have been mechanistic with a small number of parameters designed to take into account features, such as transition-transversion bias, codon frequency bias, and synonymous-nonsynonymous amino acid substitution bias. Amino acid models have been empirical, attempting to summarize the replacement patterns observed in large quantities of data and not explicitly considering the distinct factors that shape protein evolution. We have estimated the first empirical codon model (ECM). Previous codon models assume that protein evolution proceeds only by successive single nucleotide substitutions, but our results indicate that model accuracy is significantly improved by incorporating instantaneous doublet and triplet changes. We also find that the affiliations between codons, the amino acid each encodes and the physicochemical properties of the amino acids are main factors driving the process of codon evolution. Neither multiple nucleotide changes nor the strong influence of the genetic code nor amino acids' physicochemical properties form a part of standard mechanistic models and their views of how codon evolution proceeds. We have implemented the ECM for likelihood-based phylogenetic analysis, and an assessment of its ability to describe protein evolution shows that it consistently outperforms comparable mechanistic codon models. We point out the biological interpretation of our ECM and possible consequences for studies of selection.  相似文献   

3.
采用最大似然区间定位法对阈模型与一般线性模型的QTL定位效率进行了比较,并对影响离散性状QTL检测效率的主要因素(QTL效应、性状的遗传力和表型发生率)进行了模拟研究,实验设计为多个家系的女儿设计.资源群体大小为500头。研究结果表明:在QTL参数估计及检验功效方面,阈模型方法具有较大的优势,对离散性状QTL定位的效率明显高于LM(Linear Model)方法,定位的准确性也较高。另外,性状遗传力、QTL效应的大小和性状表型发生率对QTL定位的准确度也有直接的影响,随着性状遗传力和表型发生率的提高,随着QTL效应的增大,QTL定位的效率也进一步提高。  相似文献   

4.
In the present paper the linear logistic extension of latent class analysis is described. Thereby it is assumed that the item latent probabilities as well as the class sizes can be attributed to some explanatory variables. The basic equations of the model state the decomposition of the log-odds of the item latent probabilities and of the class sizes into weighted sums of basic parameters representing the effects of the predictor variables. Further, the maximum likelihood equations for these effect parameters and statistical tests for goodness-of-fit are given. Finally, an example illustrates the practical application of the model and the interpretation of the model parameters.  相似文献   

5.
A maximum likelihood procedure is developed to estimate the dependence relations between plants at equal distances along a row, by fitting simultaneous bilateral models to the observations. Where there is more than one characteristic measured on each plant, a simultaneous bilateral vector model can be fitted by maximum likelihood procedures. The latter model also applies when one characteristic is measured on each plant in a two-dimensional planting array where interplant distances within each row are equal but interrow spacing varies. The estimation is particularly suited to the small sample situation.  相似文献   

6.
Almost all studies that estimate phylogenies from DNA sequencedata under the maximum-likelihood (ML) criterion employ an approximateapproach. Most commonly, model parameters are estimated on someinitial phylogenetic estimate derived using a rapid method (neighbor-joiningor parsimony). Parameters are then held constant during a treesearch, and ideally, the procedure is repeated until convergenceis achieved. However, the effectiveness of this approximationhas not been formally assessed, in part because doing so requirescomputationally intensive, full-optimization analyses. Here,we report both indirect and direct evaluations of the effectivenessof successive approximations. We obtained an indirect evaluationby comparing the results of replicate runs on real data thatuse random trees to provide initial parameter estimates. Forsix real data sets taken from the literature, all replicateiterative searches converged to the same joint estimates oftopology and model parameters, suggesting that the approximationis not starting-point dependent, as long as the heuristic searchesof tree space are rigorous. We conducted a more direct assessmentusing simulations in which we compared the accuracy of phylogeniesestimated using full optimization of all model parameters oneach tree evaluated to the accuracy of trees estimated via successiveapproximations. There is no significant difference between theaccuracy of the approximation searches relative to full-optimizationsearches. Our results demonstrate that successive approximationis reliable and provide reassurance that this much faster approachis safe to use for ML estimation of topology.  相似文献   

7.
Laplace's approximation for nonlinear mixed models   总被引:5,自引:0,他引:5  
WOLFINGER  RUSS 《Biometrika》1993,80(4):791-795
  相似文献   

8.
A Gaussian mixture model with a finite number of components and correlated random effects is described. The ultimate objective is to model somatic cell count information in dairy cattle and to develop criteria for genetic selection against mastitis, an important udder disease. Parameter estimation is by maximum likelihood or by an extension of restricted maximum likelihood. A Monte Carlo expectation-maximization algorithm is used for this purpose. The expectation step is carried out using Gibbs sampling, whereas the maximization step is deterministic. Ranking rules based on the conditional probability of membership in a putative group of uninfected animals, given the somatic cell information, are discussed. Several extensions of the model are suggested.  相似文献   

9.
Computer programs are available in the software package SAGE to perform a variety of segregation and linkage analyses used by human geneticists. These methods are designed specifically to uncover major gene segregation in pedigree data coming from non-inbred populations. With the aid of a closely linked polymorphic marker, they can detect a locus that contributes as little as 10% to the variation of a quantitative trait in a pedigree sample of several hundred individuals.  相似文献   

10.
Despite the proliferation of increasingly sophisticated models of DNA sequence evolution, choosing among models remains a major problem in phylogenetic reconstruction. The choice of appropriate models is thought to be especially important when there is large variation among branch lengths. We evaluated the ability of nested models to reconstruct experimentally generated, known phylogenies of bacteriophage T7 as we varied the terminal branch lengths. Then, for each phylogeny we determined the best-fit model by progressively adding parameters to simpler models. We found that in several cases the choice of best-fit model was affected by the parameter addition sequence. In terms of phylogenetic performance, there was little difference between models when the ratio of short: long terminal branches was 1:3 or less. However, under conditions of extreme terminal branch-length variation, there were not only dramatic differences among models, but best-fit models were always among the best at overcoming long-branch attraction. The performance of minimum-evolution-distance methods was generally lower than that of discrete maximum-likelihood methods, even if maximum-likelihood methods were used to generate distance matrices. Correcting for among-site rate variation was especially important for overcoming long-branch attraction. The generality of our conclusions is supported by earlier simulation studies and by a preliminary analysis of mitochondrial and nuclear sequences from a well-supported four-taxon amniote phylogeny.  相似文献   

11.
FAREWELL  V. T. 《Biometrika》1979,66(1):27-32
  相似文献   

12.
Summary .   L-splines are a large family of smoothing splines defined in terms of a linear differential operator. This article develops L-splines within the context of linear mixed models and uses the resulting mixed model L-spline to analyze longitudinal data from a grassland experiment. In the spirit of time-series analysis, a periodic mixed model L-spline is developed, which partitions data into a smooth periodic component plus smooth long-term trend.  相似文献   

13.
The present study discusses two variants of linear logistic models for polytomous variables for ?unordered”? and for ?ordered”? categories (polydimensional and one-dimensional model). The ML-estimation equations and the possibilities to test the validity of the model are given for both. A test for goodness-of-fit (external validity) and a test for equality of the parameter estimates for split data (interval validity) are suggested. In addition, statistical tests for the significance of individual parameters on the basis of the information matrix and likelihood ratio tests for one or more parameters are described. The presentation is completed by an empirical example from the area of audiology.  相似文献   

14.
An exponential model for the spectrum of a scalar time series   总被引:8,自引:0,他引:8  
BLOOMFIELD  P. 《Biometrika》1973,60(2):217-226
  相似文献   

15.
Statistical analysis of diversification with species traits   总被引:1,自引:0,他引:1  
Testing whether some species traits have a significant effect on diversification rates is central in the assessment of macroevolutionary theories. However, we still lack a powerful method to tackle this objective. I present a new method for the statistical analysis of diversification with species traits. The required data are observations of the traits on recent species, the phylogenetic tree of these species, and reconstructions of ancestral values of the traits. Several traits, either continuous or discrete, and in some cases their interactions, can be analyzed simultaneously. The parameters are estimated by the method of maximum likelihood. The statistical significance of the effects in a model can be tested with likelihood ratio tests. A simulation study showed that past random extinction events do not affect the Type I error rate of the tests, whereas statistical power is decreased, though some power is still kept if the effect of the simulated trait on speciation is strong. The use of the method is illustrated by the analysis of published data on primates. The analysis of these data showed that the apparent overall positive relationship between body mass and species diversity is actually an artifact due to a clade-specific effect. Within each clade the effect of body mass on speciation rate was in fact negative. The present method allows to take both effects (clade and body mass) into account simultaneously.  相似文献   

16.
Due to increasing discoveries of biomarkers and observed diversity among patients, there is growing interest in personalized medicine for the purpose of increasing the well‐being of patients (ethics) and extending human life. In fact, these biomarkers and observed heterogeneity among patients are useful covariates that can be used to achieve the ethical goals of clinical trials and improving the efficiency of statistical inference. Covariate‐adjusted response‐adaptive (CARA) design was developed to use information in such covariates in randomization to maximize the well‐being of participating patients as well as increase the efficiency of statistical inference at the end of a clinical trial. In this paper, we establish conditions for consistency and asymptotic normality of maximum likelihood (ML) estimators of generalized linear models (GLM) for a general class of adaptive designs. We prove that the ML estimators are consistent and asymptotically follow a multivariate Gaussian distribution. The efficiency of the estimators and the performance of response‐adaptive (RA), CARA, and completely randomized (CR) designs are examined based on the well‐being of patients under a logit model with categorical covariates. Results from our simulation studies and application to data from a clinical trial on stroke prevention in atrial fibrillation (SPAF) show that RA designs lead to ethically desirable outcomes as well as higher statistical efficiency compared to CARA designs if there is no treatment by covariate interaction in an ideal model. CARA designs were however more ethical than RA designs when there was significant interaction.  相似文献   

17.
Recent technological advances continue to provide noninvasive and more accurate biomarkers for evaluating disease status. One standard tool for assessing the accuracy of diagnostic tests is the receiver operating characteristic (ROC) curve. Few statistical methods exist to accommodate multiple continuous‐scale biomarkers in the framework of ROC analysis. In this paper, we propose a method to integrate continuous‐scale biomarkers to optimize classification accuracy. Specifically, we develop semiparametric transformation models for multiple biomarkers. We assume that unknown and marker‐specific transformations of biomarkers follow a multivariate normal distribution. Our models accommodate biomarkers subject to limits of detection and account for the dependence among biomarkers by including a subject‐specific random effect. We also propose a diagnostic measure using an optimal linear combination of the transformed biomarkers. Our diagnostic rule does not depend on any monotone transformation of biomarkers and is not sensitive to extreme biomarker values. Nonparametric maximum likelihood estimation (NPMLE) is used for inference. We show that the parameter estimators are asymptotically normal and efficient. We illustrate our semiparametric approach using data from the Endometriosis, Natural History, Diagnosis, and Outcomes (ENDO) study.  相似文献   

18.
"Stochastic survival models which adjust for covariate information have been developed by Beck (1979). These models can include one or two living states and several competing death states. The transitions between stages are assumed irreversible and the transition intensity functions are assumed to be independent of time but dependent upon the covariates." Explicit solutions of the maximum likelihood equations for such models when there are one or two dichotomous covariates are presented. Applications of these models to the case of heart transplants and lung cancer are discussed, and survival in two or four groups is compared. (summary in FRE)  相似文献   

19.
Robust estimation of multivariate covariance components   总被引:1,自引:0,他引:1  
Dueck A  Lohr S 《Biometrics》2005,61(1):162-169
In many settings, such as interlaboratory testing, small area estimation in sample surveys, and heritability studies, investigators are interested in estimating covariance components for multivariate measurements. However, the presence of outliers can seriously distort estimates obtained using standard procedures such as maximum likelihood. We propose a procedure based on M-estimation for robustly estimating multivariate covariance components in the presence of outliers; the procedure applies to balanced and unbalanced data. We present an algorithm for computing the robust estimates and examine the performance of the estimator through a simulation study. The estimator is used to find covariance components and identify outliers in a study of variability of egg length and breadth measurements of American coots.  相似文献   

20.
殷宗俊  张勤  张纪刚  丁向东 《遗传学报》2005,32(11):1147-1155
在广义线性模型的框架内模拟研究了家畜抗性等级性状的QTL定位方法,QTL参数的估计采用最大似然方法,比较了阈模型方法与一般线性方法的QTL定位效率,并对影响等级性状QTL定位效率的主要因素(QTL效应、性状的遗传力)进行了模拟研究,实验设计为多个家系的女儿设计,资源群体大小为500头。研究结果表明:在QTL位置参数估计及检验功效方面,阈模型方法具有一定的优势,对抗性等级性状QTL定位的功效也高于线性方法。另外,性状遗传力和QTL效应的大小对QTL定位的准确度也有直接的影响,随着性状遗传力QTL效应的  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号