首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The robustness (sensitivity to violation of assumptions) of the maximum- likelihood and neighbor-joining methods was examined using simulation. Maximum likelihood and neighbor joining were implemented with Jukes- Cantor, Kimura, and gamma models of DNA substitution. Simulations were performed in which the assumptions of the methods were violated to varying degrees on three model four-taxon trees. The performance of the methods was evaluated with respect to ability to correctly estimate the unrooted four-taxon tree. Maximum likelihood outperformed neighbor joining in 29 of the 36 cases in which the assumptions of both methods were satisfied. In 133 of 180 of the simulations in which the assumptions of the maximum-likelihood and neighbor-joining methods were violated, maximum likelihood outperformed neighbor joining. These results are consistent with a general superiority of maximum likelihood over neighbor joining under comparable conditions. They extend and clarify an earlier study that found an advantage for neighbor joining over maximum likelihood for gamma-distributed mutation rates.   相似文献   

2.
There has been growing interest in the likelihood paradigm of statistics, where statistical evidence is represented by the likelihood function and its strength is measured by likelihood ratios. The available literature in this area has so far focused on parametric likelihood functions, though in some cases a parametric likelihood can be robustified. This focused discussion on parametric models, while insightful and productive, may have left the impression that the likelihood paradigm is best suited to parametric situations. This article discusses the use of empirical likelihood functions, a well‐developed methodology in the frequentist paradigm, to interpret statistical evidence in nonparametric and semiparametric situations. A comparative review of literature shows that, while an empirical likelihood is not a true probability density, it has the essential properties, namely consistency and local asymptotic normality that unify and justify the various parametric likelihood methods for evidential analysis. Real examples are presented to illustrate and compare the empirical likelihood method and the parametric likelihood methods. These methods are also compared in terms of asymptotic efficiency by combining relevant results from different areas. It is seen that a parametric likelihood based on a correctly specified model is generally more efficient than an empirical likelihood for the same parameter. However, when the working model fails, a parametric likelihood either breaks down or, if a robust version exists, becomes less efficient than the corresponding empirical likelihood.  相似文献   

3.
Advocates of cladistic parsimony methods have invoked the philosophy of Karl Popper in an attempt to argue for the superiority of those methods over phylogenetic methods based on Ronald Fisher's statistical principle of likelihood. We argue that the concept of likelihood in general, and its application to problems of phylogenetic inference in particular, are highly compatible with Popper's philosophy. Examination of Popper's writings reveals that his concept of corroboration is, in fact, based on likelihood. Moreover, because probabilistic assumptions are necessary for calculating the probabilities that define Popper's corroboration, likelihood methods of phylogenetic inference--with their explicit probabilistic basis--are easily reconciled with his concept. In contrast, cladistic parsimony methods, at least as described by certain advocates of those methods, are less easily reconciled with Popper's concept of corroboration. If those methods are interpreted as lacking probabilistic assumptions, then they are incompatible with corroboration. Conversely, if parsimony methods are to be considered compatible with corroboration, then they must be interpreted as carrying implicit probabilistic assumptions. Thus, the non-probabilistic interpretation of cladistic parsimony favored by some advocates of those methods is contradicted by an attempt by the same authors to justify parsimony methods in terms of Popper's concept of corroboration. In addition to being compatible with Popperian corroboration, the likelihood approach to phylogenetic inference permits researchers to test the assumptions of their analytical methods (models) in a way that is consistent with Popper's ideas about the provisional nature of background knowledge.  相似文献   

4.
Allozyme data are widely used to infer the phylogenies of populations and closely-related species. Numerous parsimony, distance, and likelihood methods have been proposed for phylogenetic analysis of these data; the relative merits of these methods have been debated vigorously, but their accuracy has not been well explored. In this study, I compare the performance of 13 phylogenetic methods (six parsimony, six distance, and continuous maximum likelihood) by applying a congruence approach to eight allozyme data sets from the literature. Clades are identified that are supported by multiple data sets other than allozymes (e.g. morphology, DNA sequences), and the ability of different methods to recover these 'known' clades is compared. The results suggest that (1) distance and likelihood methods generally outperform parsimony methods, (2) methods that utilize frequency data tend to perform well, and (3) continuous maximum likelihood is among the most accurate methods, and appears to be robust to violations of its assumptions. These results are in agreement with those from recent simulation studies, and help provide a basis for empirical workers to choose among the many methods available for analysing allozyme characters.  相似文献   

5.
Methods for the analysis of unmatched case-control data based on a finite population sampling model are developed. Under this model, and the prospective logistic model for disease probabilities, a likelihood for case-control data that accommodates very general sampling of controls is derived. This likelihood has the form of a weighted conditional logistic likelihood. The flexibility of the methods is illustrated by providing a number of control sampling designs and a general scheme for their analyses. These include frequency matching, counter-matching, case-base, randomized recruitment, and quota sampling. A study of risk factors for childhood asthma illustrates an application of the counter-matching design. Some asymptotic efficiency results are presented and computational methods discussed. Further, it is shown that a 'marginal' likelihood provides a link to unconditional logistic methods. The methods are examined in a simulation study that compares frequency and counter-matching using conditional and unconditional logistic analyses and indicate that the conditional logistic likelihood has superior efficiency. Extensions that accommodate sampling of cases and multistage designs are presented. Finally, we compare the analysis methods presented here to other approaches, compare counter-matching and two-stage designs, and suggest areas for further research.To whom correspondence should be addressed.  相似文献   

6.
Coalescent likelihood is the probability of observing the given population sequences under the coalescent model. Computation of coalescent likelihood under the infinite sites model is a classic problem in coalescent theory. Existing methods are based on either importance sampling or Markov chain Monte Carlo and are inexact. In this paper, we develop a simple method that can compute the exact coalescent likelihood for many data sets of moderate size, including real biological data whose likelihood was previously thought to be difficult to compute exactly. Our method works for both panmictic and subdivided populations. Simulations demonstrate that the practical range of exact coalescent likelihood computation for panmictic populations is significantly larger than what was previously believed. We investigate the application of our method in estimating mutation rates by maximum likelihood. A main application of the exact method is comparing the accuracy of approximate methods. To demonstrate the usefulness of the exact method, we evaluate the accuracy of program Genetree in computing the likelihood for subdivided populations.  相似文献   

7.
Kluge's (2001, Syst. Biol. 50:322-330) continued arguments that phylogenetic methods based on the statistical principle of likelihood are incompatible with the philosophy of science described by Karl Popper are based on false premises related to Kluge's misrepresentations of Popper's philosophy. Contrary to Kluge's conjectures, likelihood methods are not inherently verificationist; they do not treat every instance of a hypothesis as confirmation of that hypothesis. The historical nature of phylogeny does not preclude phylogenetic hypotheses from being evaluated using the probability of evidence. The low absolute probabilities of hypotheses are irrelevant to the correct interpretation of Popper's concept termed degree of corroboration, which is defined entirely in terms of relative probabilities. Popper did not advocate minimizing background knowledge; in any case, the background knowledge of both parsimony and likelihood methods consists of the general assumption of descent with modification and additional assumptions that are deterministic, concerning which tree is considered most highly corroborated. Although parsimony methods do not assume (in the sense of entailing) that homoplasy is rare, they do assume (in the sense of requiring to obtain a correct phylogenetic inference) certain things about patterns of homoplasy. Both parsimony and likelihood methods assume (in the sense of implying by the manner in which they operate) various things about evolutionary processes, although violation of those assumptions does not always cause the methods to yield incorrect phylogenetic inferences. Test severity is increased by sampling additional relevant characters rather than by character reanalysis, although either interpretation is compatible with the use of phylogenetic likelihood methods. Neither parsimony nor likelihood methods assess test severity (critical evidence) when used to identify a most highly corroborated tree(s) based on a single method or model and a single body of data; however, both classes of methods can be used to perform severe tests. The assumption of descent with modification is insufficient background knowledge to justify cladistic parsimony as a method for assessing degree of corroboration. Invoking equivalency between parsimony methods and likelihood models that assume no common mechanism emphasizes the necessity of additional assumptions, at least some of which are probabilistic in nature. Incongruent characters do not qualify as falsifiers of phylogenetic hypotheses except under extremely unrealistic evolutionary models; therefore, justifications of parsimony methods as falsificationist based on the idea that they minimize the ad hoc dismissal of falsifiers are questionable. Probabilistic concepts such as degree of corroboration and likelihood provide a more appropriate framework for understanding how phylogenetics conforms with Popper's philosophy of science. Likelihood ratio tests do not assume what is at issue but instead are methods for testing hypotheses according to an accepted standard of statistical significance and for incorporating considerations about test severity. These tests are fundamentally similar to Popper's degree of corroboration in being based on the relationship between the probability of the evidence e in the presence versus absence of the hypothesis h, i.e., between p(e|hb) and p(e|b), where b is the background knowledge. Both parsimony and likelihood methods are inductive in that their inferences (particular trees) contain more information than (and therefore do not follow necessarily from) the observations upon which they are based; however, both are deductive in that their conclusions (tree lengths and likelihoods) follow necessarily from their premises (particular trees, observed character state distributions, and evolutionary models). For these and other reasons, phylogenetic likelihood methods are highly compatible with Karl Popper's philosophy of science and offer several advantages over parsimony methods in this context.  相似文献   

8.
Not until recently has much attention been given to deriving maximum likelihood methods for estimating the intercept and slope parameters from a binormal ROC curve that assesses the accuracy of a continuous diagnostic test. We propose two new methods for estimating these parameters. The first method uses the profile likelihood and a simple algorithm to produce fully efficient estimates. The second method is based on a pseudo-maximum likelihood that can easily accommodate adjusting for covariates that could affect the accuracy of the continuous test.  相似文献   

9.
Thoresen M  Laake P 《Biometrics》2000,56(3):868-872
Measurement error models in logistic regression have received considerable theoretical interest over the past 10-15 years. In this paper, we present the results of a simulation study that compares four estimation methods: the so-called regression calibration method, probit maximum likelihood as an approximation to the logistic maximum likelihood, the exact maximum likelihood method based on a logistic model, and the naive estimator, which is the result of simply ignoring the fact that some of the explanatory variables are measured with error. We have compared the behavior of these methods in a simple, additive measurement error model. We show that, in this situation, the regression calibration method is a very good alternative to more mathematically sophisticated methods.  相似文献   

10.
The multispecies coalescent model provides a natural framework for species tree estimation accounting for gene-tree conflicts. Although a number of species tree methods under the multispecies coalescent have been suggested and evaluated using simulation, their statistical properties remain poorly understood. Here, we use mathematical analysis aided by computer simulation to examine the identifiability, consistency, and efficiency of different species tree methods in the case of three species and three sequences under the molecular clock. We consider four major species-tree methods including concatenation, two-step, independent-sites maximum likelihood, and maximum likelihood. We develop approximations that predict that the probit transform of the species tree estimation error decreases linearly with the square root of the number of loci. Even in this simplest case, major differences exist among the methods. Full-likelihood methods are considerably more efficient than summary methods such as concatenation and two-step. They also provide estimates of important parameters such as species divergence times and ancestral population sizes,whereas these parameters are not identifiable by summary methods. Our results highlight the need to improve the statistical efficiency of summary methods and the computational efficiency of full likelihood methods of species tree estimation.  相似文献   

11.
Stochastic models of nucleotide substitution are playing an increasingly important role in phylogenetic reconstruction through such methods as maximum likelihood. Here, we examine the behaviour of a simple substitution model, and establish some links between the methods of maximum parsimony and maximum likelihood under this model.  相似文献   

12.
Summary The efficiency of obtaining the correct tree by the maximum likelihood method (Felsenstein 1981) for inferring trees from DNA sequence data was compared with trees obtained by distance methods. It was shown that the maximum likelihood method is superior to distance methods in the efficiency particularly when the evolutionary rate differs among lineages.  相似文献   

13.
Composite likelihood methods have become very popular for the analysis of large-scale genomic data sets because of the computational intractability of the basic coalescent process and its generalizations: It is virtually impossible to calculate the likelihood of an observed data set spanning a large chromosomal region without using approximate or heuristic methods. Composite likelihood methods are approximate methods and, in the present article, assume the likelihood is written as a product of likelihoods, one for each of a number of smaller regions that together make up the whole region from which data is collected. A very general framework for neutral coalescent models is presented and discussed. The framework comprises many of the most popular coalescent models that are currently used for analysis of genetic data. Assume data is collected from a series of consecutive regions of equal size. Then it is shown that the observed data forms a stationary, ergodic process. General conditions are given under which the maximum composite estimator of the parameters describing the model (e.g. mutation rates, demographic parameters and the recombination rate) is a consistent estimator as the number of regions tends to infinity.  相似文献   

14.
The structure of dependence between neighboring genetic loci is intractable under some models that treat each locus as a single data-point. Composite likelihood-based methods present a simple approach under such models by treating the data as if they are independent. A maximum composite likelihood estimator (MCLE) is not easy to find numerically, as in most cases we do not have a way of knowing if a maximum is global. We study the local maxima of the composite likelihood (ECLE, the efficient composite likelihood estimators), which is straightforward to compute. We establish desirable properties of the ECLE and provide an estimator of the variance of MCLE and ECLE. We also modify two proper likelihood-based tests to be used with composite likelihood. We modify our methods to make them applicable to datasets where some loci are excluded.  相似文献   

15.
1. Observations of different organisms can often be used to infer environmental conditions at a site. These inferences may be useful for diagnosing the causes of degradation in streams and rivers. 2. When used for diagnosis, biological inferences must not only provide accurate, unbiased predictions of environmental conditions, but also pairs of inferred environmental variables must covary no more strongly than actual measurements of those same environmental variables. 3. Mathematical analysis of the relationship between the measured and inferred values of different environmental variables provides an approach for comparing the covariance between measurements with the covariance between inferences. Then, simulated and field‐collected data are used to assess the performance of weighted average and maximum likelihood inference methods. 4. Weighted average inferences became less accurate as covariance in the calibration data increased, whereas maximum likelihood inferences were unaffected by covariance in the calibration data. In contrast, the accuracy of weighted average inferences was unaffected by changes in measurement error, whilst the accuracy of maximum likelihood inferences decreased as measurement error increased. Weighted average inferences artificially increased the covariance of environmental variables beyond what was expected from measurements, whereas maximum likelihood inference methods more accurately reproduced the expected covariances. 5. Multivariate maximum likelihood inference methods can potentially provide more useful diagnostic information than single variable inference models.  相似文献   

16.
Statistical models are the traditional choice to test scientific theories when observations, processes or boundary conditions are subject to stochasticity. Many important systems in ecology and biology, however, are difficult to capture with statistical models. Stochastic simulation models offer an alternative, but they were hitherto associated with a major disadvantage: their likelihood functions can usually not be calculated explicitly, and thus it is difficult to couple them to well-established statistical theory such as maximum likelihood and Bayesian statistics. A number of new methods, among them Approximate Bayesian Computing and Pattern-Oriented Modelling, bypass this limitation. These methods share three main principles: aggregation of simulated and observed data via summary statistics, likelihood approximation based on the summary statistics, and efficient sampling. We discuss principles as well as advantages and caveats of these methods, and demonstrate their potential for integrating stochastic simulation models into a unified framework for statistical modelling.  相似文献   

17.
为了探究进化模型对DNA条形码分类的影响, 本研究以雾灵山夜蛾科44个种的标本为材料, 获得COI基因序列。使用邻接法(neighbor-joining)、 最大简约法(maximum parsimony)、 最大似然法(maximum likelihood)以及贝叶斯法(Bayesian inference)构建系统发育树, 并且对邻接法的12种模型、 最大似然法的7种模型、 贝叶斯法的2种模型进行模型成功率的评估。结果表明, 邻接法的12种模型成功率相差不大, 较稳定; 最大似然法及贝叶斯法的不同模型成功率存在明显差异, 不稳定; 最大简约法不基于模型, 成功率比较稳定。邻接法及最大似然法共有6种相同的模型, 这6种模型在不同的方法中成功率存在差异。此外, 分子数据中存在单个物种仅有一条序列的情况, 显著降低了模型成功率, 表明在DNA条形码研究中, 每个物种需要有多个样本。  相似文献   

18.
MOTIVATION: Maximum likelihood (ML) methods have become very popular for constructing phylogenetic trees from sequence data. However, despite noticeable recent progress, with large and difficult datasets (e.g. multiple genes with conflicting signals) current ML programs still require huge computing time and can become trapped in bad local optima of the likelihood function. When this occurs, the resulting trees may still show some of the defects (e.g. long branch attraction) of starting trees obtained using fast distance or parsimony programs. METHODS: Subtree pruning and regrafting (SPR) topological rearrangements are usually sufficient to intensively search the tree space. Here, we propose two new methods to make SPR moves more efficient. The first method uses a fast distance-based approach to detect the least promising candidate SPR moves, which are then simply discarded. The second method locally estimates the change in likelihood for any remaining potential SPRs, as opposed to globally evaluating the entire tree for each possible move. These two methods are implemented in a new algorithm with a sophisticated filtering strategy, which efficiently selects potential SPRs and concentrates most of the likelihood computation on the promising moves. RESULTS: Experiments with real datasets comprising 35-250 taxa show that, while indeed greatly reducing the amount of computation, our approach provides likelihood values at least as good as those of the best-known ML methods so far and is very robust to poor starting trees. Furthermore, combining our new SPR algorithm with local moves such as PHYML's nearest neighbor interchanges, the time needed to find good solutions can sometimes be reduced even more.  相似文献   

19.
A central task in the study of molecular evolution is the reconstruction of a phylogenetic tree from sequences of current-day taxa. The most established approach to tree reconstruction is maximum likelihood (ML) analysis. Unfortunately, searching for the maximum likelihood phylogenetic tree is computationally prohibitive for large data sets. In this paper, we describe a new algorithm that uses Structural Expectation Maximization (EM) for learning maximum likelihood phylogenetic trees. This algorithm is similar to the standard EM method for edge-length estimation, except that during iterations of the Structural EM algorithm the topology is improved as well as the edge length. Our algorithm performs iterations of two steps. In the E-step, we use the current tree topology and edge lengths to compute expected sufficient statistics, which summarize the data. In the M-Step, we search for a topology that maximizes the likelihood with respect to these expected sufficient statistics. We show that searching for better topologies inside the M-step can be done efficiently, as opposed to standard methods for topology search. We prove that each iteration of this procedure increases the likelihood of the topology, and thus the procedure must converge. This convergence point, however, can be a suboptimal one. To escape from such "local optima," we further enhance our basic EM procedure by incorporating moves in the flavor of simulated annealing. We evaluate these new algorithms on both synthetic and real sequence data and show that for protein sequences even our basic algorithm finds more plausible trees than existing methods for searching maximum likelihood phylogenies. Furthermore, our algorithms are dramatically faster than such methods, enabling, for the first time, phylogenetic analysis of large protein data sets in the maximum likelihood framework.  相似文献   

20.
Xue  Liugen; Zhu  Lixing 《Biometrika》2007,94(4):921-937
A semiparametric regression model for longitudinal data is considered.The empirical likelihood method is used to estimate the regressioncoefficients and the baseline function, and to construct confidenceregions and intervals. It is proved that the maximum empiricallikelihood estimator of the regression coefficients achievesasymptotic efficiency and the estimator of the baseline functionattains asymptotic normality when a bias correction is made.Two calibrated empirical likelihood approaches to inferencefor the baseline function are developed. We propose a groupwiseempirical likelihood procedure to handle the inter-series dependencefor the longitudinal semiparametric regression model, and employbias correction to construct the empirical likelihood ratiofunctions for the parameters of interest. This leads us to provea nonparametric version of Wilks' theorem. Compared with methodsbased on normal approximations, the empirical likelihood doesnot require consistent estimators for the asymptotic varianceand bias. A simulation compares the empirical likelihood andnormal-based methods in terms of coverage accuracies and averageareas/lengths of confidence regions/intervals.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号