首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 3 毫秒
1.
2.
Ancestral maximum likelihood (AML) is a method that simultaneously reconstructs a phylogenetic tree and ancestral sequences from extant data (sequences at the leaves). The tree and ancestral sequences maximize the probability of observing the given data under a Markov model of sequence evolution, in which branch lengths are also optimized but constrained to take the same value on any edge across all sequence sites. AML differs from the more usual form of maximum likelihood (ML) in phylogenetics because ML averages over all possible ancestral sequences. ML has long been know to be statistically consistent - that is, it converges on the correct tree with probability approaching 1 as the sequence length grows. However, the statistical consistency of AML has not been formally determined, despite informal remarks in a literature that dates back 20 years. In this short note we prove a general result that implies that AML is statistically inconsistent. In particular we show that AML can 'shrink' short edges in a tree, resulting in a tree that has no internal resolution as the sequence length grows. Our results apply to any number of taxa.  相似文献   

3.
Tuffley and Steel (Bull. Math. Biol. 59:581–607, 1997) proved that maximum likelihood and maximum parsimony methods in phylogenetics are equivalent for sequences of characters under a simple symmetric model of substitution with no common mechanism. This result has been widely cited ever since. We show that small changes to the model assumptions suffice to make the two methods inequivalent. In particular, we analyze the case of bounded substitution probabilities as well as the molecular clock assumption. We show that in these cases, even under no common mechanism, maximum parsimony and maximum likelihood might make conflicting choices. We also show that if there is an upper bound on the substitution probabilities which is ‘sufficiently small’, every maximum likelihood tree is also a maximum parsimony tree (but not vice versa).  相似文献   

4.
Biases present in maximum likelihood and parsimony are investigated through a simulation study in a 10-taxon case in which several long branches coexist with short branches in the modeled topology. The performance of these methods is explored while increasing the length of the long branches with different amounts of data. Also, simulations with different taxonomic sampling schemes are examined through this study. The presence of a strong bias in parsimony is corroborated: the well-known long-branch attraction. Likelihood performance is found to be sensitive to the mere presence extreme of branch length disparity, retrieving topologies compatible with long-branch attraction and long-branch repulsion, irrespective of the correctness of the model used.  相似文献   

5.
In this paper, we investigate a conjecture by Arndt von Haeseler concerning the Maximum Parsimony method for phylogenetic estimation, which was published by the Newton Institute in Cambridge on a list of open phylogenetic problems in 2007. This conjecture deals with the question whether Maximum Parsimony trees are hereditary. The conjecture suggests that a Maximum Parsimony tree for a particular (DNA) alignment necessarily has subtrees of all possible sizes which are most parsimonious for the corresponding subalignments. We answer the conjecture affirmatively for binary alignments on 5 taxa but also show how to construct examples for which Maximum Parsimony trees are not hereditary. Apart from showing that a most parsimonious tree cannot generally be reduced to a most parsimonious tree on fewer taxa, we also show that compatible most parsimonious quartets do not have to provide a most parsimonious supertree. Last, we show that our results can be generalized to Maximum Likelihood for certain nucleotide substitution models.  相似文献   

6.

Background

Phylogenetic networks are generalizations of phylogenetic trees, that are used to model evolutionary events in various contexts. Several different methods and criteria have been introduced for reconstructing phylogenetic trees. Maximum Parsimony is a character-based approach that infers a phylogenetic tree by minimizing the total number of evolutionary steps required to explain a given set of data assigned on the leaves. Exact solutions for optimizing parsimony scores on phylogenetic trees have been introduced in the past.

Results

In this paper, we define the parsimony score on networks as the sum of the substitution costs along all the edges of the network; and show that certain well-known algorithms that calculate the optimum parsimony score on trees, such as Sankoff and Fitch algorithms extend naturally for networks, barring conflicting assignments at the reticulate vertices. We provide heuristics for finding the optimum parsimony scores on networks. Our algorithms can be applied for any cost matrix that may contain unequal substitution costs of transforming between different characters along different edges of the network. We analyzed this for experimental data on 10 leaves or fewer with at most 2 reticulations and found that for almost all networks, the bounds returned by the heuristics matched with the exhaustively determined optimum parsimony scores.

Conclusion

The parsimony score we define here does not directly reflect the cost of the best tree in the network that displays the evolution of the character. However, when searching for the most parsimonious network that describes a collection of characters, it becomes necessary to add additional cost considerations to prefer simpler structures, such as trees over networks. The parsimony score on a network that we describe here takes into account the substitution costs along the additional edges incident on each reticulate vertex, in addition to the substitution costs along the other edges which are common to all the branching patterns introduced by the reticulate vertices. Thus the score contains an in-built cost for the number of reticulate vertices in the network, and would provide a criterion that is comparable among all networks. Although the problem of finding the parsimony score on the network is believed to be computationally hard to solve, heuristics such as the ones described here would be beneficial in our efforts to find a most parsimonious network.  相似文献   

7.
8.
9.

Background  

Maximum parsimony is one of the most commonly used criteria for reconstructing phylogenetic trees. Recently, Nakhleh and co-workers extended this criterion to enable reconstruction of phylogenetic networks, and demonstrated its application to detecting reticulate evolutionary relationships. However, one of the major problems with this extension has been that it favors more complex evolutionary relationships over simpler ones, thus having the potential for overestimating the amount of reticulation in the data. An ad hoc solution to this problem that has been used entails inspecting the improvement in the parsimony length as more reticulation events are added to the model, and stopping when the improvement is below a certain threshold.  相似文献   

10.
Maximum Likelihood Estimation of Population Parameters   总被引:10,自引:5,他引:5       下载免费PDF全文
Y. X. Fu  W. H. Li 《Genetics》1993,134(4):1261-1270
One of the most important parameters in population genetics is θ = 4N(e)μ where N(e) is the effective population size and μ is the rate of mutation per gene per generation. We study two related problems, using the maximum likelihood method and the theory of coalescence. One problem is the potential improvement of accuracy in estimating the parameter θ over existing methods and the other is the estimation of parameter λ which is the ratio of two θ's. The minimum variances of estimates of the parameter θ are derived under two idealized situations. These minimum variances serve as the lower bounds of the variances of all possible estimates of θ in practice. We then show that Watterson's estimate of θ based on the number of segregating sites is asymptotically an optimal estimate of θ. However, for a finite sample of sequences, substantial improvement over Watterson's estimate is possible when θ is large. The maximum likelihood estimate of λ = θ(1)/θ(2) is obtained and the properties of the estimate are discussed.  相似文献   

11.
Neural networks have received much attention in recent years mostly by non-statisticians. The purpose of this paper is to incorporate neural networks in a non-linear regression model and obtain maximum likelihood estimates of the network parameters using a standard Newton-Raphson algorithm. We use maximum likelihood estimators instead of the usual back-propagation technique and compare the neural network predictions with predictions of quadratic regression models and with non-parametric nearest neighbor predictions. These comparisons are made using data generated from a variety of functions. Because of the number of parameters involved, neural network models can easily over-fit the data, hence validation of results is crucial.  相似文献   

12.
A condition for practical independence of contact distribution functions in Boolean models is obtained. This result allows the authors to use maximum likelihcod methods, via sparse sampling, for estimating unknown parameters of an isotropic Boolean model. The second part of this paper is devoted to a simulation study of the proposed method. AMS classification: 60D05  相似文献   

13.
Maximum likelihood estimator is obtained for the mortality rate function of a specific type appearing in survival data analysis. Strict consistency of this estimator is proved.  相似文献   

14.
The bootstrap is an important tool for estimating the confidence interval of monophyletic groups within phylogenies. Although bootstrap analyses are used in most evolutionary studies, there is no clear consensus as how best to interpret bootstrap probability values. To study further the bootstrap method, nine small subunit ribosomal DNA (SSU rDNA) data sets were submitted to bootstrapped maximum parsimony (MP) analyses using unweighted and weighted sequence positions. Analyses of the lengths (i.e., parsimony steps) of the bootstrap trees show that the shape and mean of the bootstrap tree distribution may provide important insights into the evolutionary signal within the sequence data. With complex phylogenies containing nodes defined by short internal branches (multifurcations), the mean of the bootstrap tree distribution may differ by 2 standard deviations from the length of the best tree found from the original data set. Weighting sequence positions significantly increases the bootstrap values at internal nodes. There may, however, be strong bootstrap support for conflicting species groupings among different data sets. This phenomenon appears to result from a correlation between the topology of the tree used to create the weights and the topology of the bootstrap consensus tree inferred from the MP analysis of these weighted data. The analyses also show that characteristics of the bootstrap tree distribution (e.g., skewness) may be used to choose between alternative weighting schemes for phylogenetic analyses.  相似文献   

15.
This paper deals with the problem of making inferences on the maximum radius and the intensity of the Poisson point process associated to a Boolean Model of circular primary grains with uniformly distributed random radii. The only sample information used is observed radii of circular clumps (DUPAC, 1980). The behaviour of maximum likelihood estimation has been evaluated by means of Monte Carlo methods.  相似文献   

16.
The maximum parsimony (MP) method for inferring phylogenies is widely used, but little is known about its limitations in non-asymptotic situations. This study employs large-scale computations with simulated phylogenetic data to estimate the probability that MP succeeds in finding the true phylogeny for up to twelve taxa and 256 characters. The set of candidate phylogenies are taken to be unrooted binary trees; for each simulated data set, the tree lengths of all (2n − 5)!! candidates are computed to evaluate quantities related to the performance of MP, such as the probability of finding the true phylogeny, the probability that the tree with the shortest length is unique, the probability that the true phylogeny has the shortest tree length, and the expected inverse of the number of trees sharing the shortest length. The tree length distributions are also used to evaluate and extend the skewness test of Hillis for distinguishing between random and phylogenetic data. The results indicate, for example, that the critical point after which MP achieves a success probability of at least 0.9 is roughly around 128 characters. The skewness test is found to perform well on simulated data and the study extends its scope to up to twelve taxa.  相似文献   

17.
The purpose of this paper is to present a procedure for obtaining approximate maximum likelihood estimates for compound binary response models. The extra binomial variation is incorporated into the model by adding random effects to the fixed effects on the probit (or logit) scale. Numerical integration techniques are used to arrive at a solution of the likelihood equations. The paper also presents an illustrating numerical example based on a large toxicological data set. The computations are carried out within the GLIM statistical package.  相似文献   

18.
The problem of assessing the relative calibrations and relative accuracies of a set of p instruments, each designed to measure the same characteristic on a common group of individuals is considered by using the EM algorithm. As shown, the EM algorithm provides a general solution for this problem. Its implementation is simple and in its most general form requires no extra iterative procedures within the M step. One important feature of the algorithm in this set up is that the error variance estimates are always positive. Thus, it can be seen as a kind of restricted maximization procedure. The expected information matrix for the maximum likelihood estimators is derived, upon which the large sample estimated covariance matrix for the maximum likelihood estimators can be computed. The problem of testing hypothesis about the calibration lines can be approached by using the Wald statistics. The approach is illustrated by re-analysing two data sets in the literature.  相似文献   

19.
Statistical techniques are presented for the analysis of geographic variation in allelic frequencies. Likelihood ratio test criteria are derived from a multinominal sampling distribution, and are used to answer three questions. (1) Are there geographic differences in allelic frequencies? (2) Are population differences in allelic frequencies associated with environmental differences? (3) Is there any residual "lack of fit" variation among populations, after accounting for that variation associated with environmental differences? The two- and three-allele cases are explicitly treated, and the extension to more alleles is indicated.  相似文献   

20.
The accuracy of phylogenetic methods is reinvestigated for the four-taxon case with a two-edge rate and a three-edge rate. Unlike previous studies involving computer simulations, the two-edge rate relates to branches that are sister taxa in the model tree. As with previous studies, certain methods are found to behave inaccurately in a portion of the parameter space where the two-edge rate is proportionally large. This phenomenon, to which parsimony is immune, is termed “long-branch repulsion” and the region of poor performance is called the Farris Zone. Maximum likelihood methods are shown to be particularly prone to failure when closely related taxa have long branches. Long-branch repulsion is demonstrated with an empirical case involving Strepsiptera and Diptera.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号