首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We explore the maximum parsimony (MP) and ancestral maximum likelihood (AML) criteria in phylogenetic tree reconstruction. Both problems are NP-hard, so we seek approximate solutions. We formulate the two problems as Steiner tree problems under appropriate distances. The gist of our approach is the succinct characterization of Steiner trees for a small number of leaves for the two distances. This enables the use of known Steiner tree approximation algorithms. The approach leads to a 16/9 approximation ratio for AML and asymptotically to a 1.55 approximation ratio for MP.  相似文献   

2.
Biases present in maximum likelihood and parsimony are investigated through a simulation study in a 10-taxon case in which several long branches coexist with short branches in the modeled topology. The performance of these methods is explored while increasing the length of the long branches with different amounts of data. Also, simulations with different taxonomic sampling schemes are examined through this study. The presence of a strong bias in parsimony is corroborated: the well-known long-branch attraction. Likelihood performance is found to be sensitive to the mere presence extreme of branch length disparity, retrieving topologies compatible with long-branch attraction and long-branch repulsion, irrespective of the correctness of the model used.  相似文献   

3.
4.
We have reconstructed the evolution of the anciently derived kinesin superfamily using various alignment and tree-building methods. In addition to classifying previously described kinesins from protists, fungi, and animals, we analyzed a variety of kinesin sequences from the plant kingdom including 12 from Zea mays and 29 from Arabidopsis thaliana. Also included in our data set were four sequences from the anciently diverged amitochondriate protist Giardia lamblia. The overall topology of the best tree we found is more likely than previously reported topologies and allows us to make the following new observations: (1) kinesins involved in chromosome movement including MCAK, chromokinesin, and CENP-E may be descended from a single ancestor; (2) kinesins that form complex oligomers are limited to a monophyletic group of families; (3) kinesins that crosslink antiparallel microtubules at the spindle midzone including BIMC, MKLP, and CENP-E are closely related; (4) Drosophila NOD and human KID group with other characterized chromokinesins; and (5) Saccharomyces SMY1 groups with kinesin-I sequences, forming a family of kinesins capable of class V myosin interactions. In addition, we found that one monophyletic clade composed exclusively of sequences with a C-terminal motor domain contains all known minus end-directed kinesins. Received: 20 February 2001 / Accepted: 5 June 2001  相似文献   

5.
Ancestral maximum likelihood (AML) is a method that simultaneously reconstructs a phylogenetic tree and ancestral sequences from extant data (sequences at the leaves). The tree and ancestral sequences maximize the probability of observing the given data under a Markov model of sequence evolution, in which branch lengths are also optimized but constrained to take the same value on any edge across all sequence sites. AML differs from the more usual form of maximum likelihood (ML) in phylogenetics because ML averages over all possible ancestral sequences. ML has long been know to be statistically consistent - that is, it converges on the correct tree with probability approaching 1 as the sequence length grows. However, the statistical consistency of AML has not been formally determined, despite informal remarks in a literature that dates back 20 years. In this short note we prove a general result that implies that AML is statistically inconsistent. In particular we show that AML can 'shrink' short edges in a tree, resulting in a tree that has no internal resolution as the sequence length grows. Our results apply to any number of taxa.  相似文献   

6.
This paper describes mathematical and computational methodology for estimating the parameters of the Burr Type XII distribution by the method of maximum likelihood. Expressions for the asymptotic variances and covariances of the parameter estimates are given, and the modality of the log-likelihood and conditional log-likelihood functions is analyzed. As a result of this analysis for various a priori known and unknown parameter combinations, conditions are given which guarantee that the parameter estimates obtained will, indeed, be maximum likelihood estimates. An efficient numerical method for maximizing the conditional log-likelihood function is described, and mathematical expressions are given for the various numerical approximations needed to evaluate the expressions given for the asymptotic variances and covariances of the parameter estimates. The methodology discussed is applied in a numerical example to life test data arising in a clinical setting.  相似文献   

7.
最近,人们突变积累实验(MA)中测定有害基因突变(DGM)的兴趣大增。在MA实验中有两种常见的DGM估计方法(极大似然法ML和距法MM),依靠计算机模拟和处理真实数据的应用软件来比较这两种方法。结论是:ML法难于得到最大似然估计(MLEs),所以ML法不如MM法估计有效;即使MLEs可得,也因其具严重的微样误差(据偏差和抽样差异)而产生估计偏差;似然函数曲线较平坦而难于区分高峰态和低峰态的分布。  相似文献   

8.
Estimation of variance components by Monte Carlo (MC) expectation maximization (EM) restricted maximum likelihood (REML) is computationally efficient for large data sets and complex linear mixed effects models. However, efficiency may be lost due to the need for a large number of iterations of the EM algorithm. To decrease the computing time we explored the use of faster converging Newton-type algorithms within MC REML implementations. The implemented algorithms were: MC Newton-Raphson (NR), where the information matrix was generated via sampling; MC average information(AI), where the information was computed as an average of observed and expected information; and MC Broyden''s method, where the zero of the gradient was searched using a quasi-Newton-type algorithm. Performance of these algorithms was evaluated using simulated data. The final estimates were in good agreement with corresponding analytical ones. MC NR REML and MC AI REML enhanced convergence compared to MC EM REML and gave standard errors for the estimates as a by-product. MC NR REML required a larger number of MC samples, while each MC AI REML iteration demanded extra solving of mixed model equations by the number of parameters to be estimated. MC Broyden''s method required the largest number of MC samples with our small data and did not give standard errors for the parameters directly. We studied the performance of three different convergence criteria for the MC AI REML algorithm. Our results indicate the importance of defining a suitable convergence criterion and critical value in order to obtain an efficient Newton-type method utilizing a MC algorithm. Overall, use of a MC algorithm with Newton-type methods proved feasible and the results encourage testing of these methods with different kinds of large-scale problem settings.  相似文献   

9.
The accuracy of phylogenetic methods is reinvestigated for the four-taxon case with a two-edge rate and a three-edge rate. Unlike previous studies involving computer simulations, the two-edge rate relates to branches that are sister taxa in the model tree. As with previous studies, certain methods are found to behave inaccurately in a portion of the parameter space where the two-edge rate is proportionally large. This phenomenon, to which parsimony is immune, is termed “long-branch repulsion” and the region of poor performance is called the Farris Zone. Maximum likelihood methods are shown to be particularly prone to failure when closely related taxa have long branches. Long-branch repulsion is demonstrated with an empirical case involving Strepsiptera and Diptera.  相似文献   

10.
A condition for practical independence of contact distribution functions in Boolean models is obtained. This result allows the authors to use maximum likelihcod methods, via sparse sampling, for estimating unknown parameters of an isotropic Boolean model. The second part of this paper is devoted to a simulation study of the proposed method. AMS classification: 60D05  相似文献   

11.
Statistical techniques are presented for the analysis of geographic variation in allelic frequencies. Likelihood ratio test criteria are derived from a multinominal sampling distribution, and are used to answer three questions. (1) Are there geographic differences in allelic frequencies? (2) Are population differences in allelic frequencies associated with environmental differences? (3) Is there any residual "lack of fit" variation among populations, after accounting for that variation associated with environmental differences? The two- and three-allele cases are explicitly treated, and the extension to more alleles is indicated.  相似文献   

12.
In this paper we consider a cell population such as bacteria consisting of two types of cells, mutant and nonmutant. Under the mutation and homogeneous pure birth processes, this paper derives a maximum likelihood estimation procedure for estimating mutation rate and birth rate. The method is applied to Newcombe's data; further some Monte Carlo studies are generated. The numerical results indicate that the method is quite efficient for estimating genetic parameters in cell populations.  相似文献   

13.
Hidden Markov models were successfully applied in various fields of time series analysis, especially for analyzing ion channel recordings. The maximum likelihood estimator (MLE) has recently been proven to be asymptotically normally distributed. Here, we investigate finite sample properties of the MLE and of different types of likelihood ratio tests (LRTs) by means of simulation studies. The MLE is shown to reach the asymptotic behavior within sample sizes that are common for various applications. Thus, reliable estimates and confidence intervals can be obtained. We give an approximative scaling function for the estimation error for finite samples, and investigate the power of different LRTs suitable for applications to ion channels, including tests for superimposed hidden Markov processes. Our results are applied to physiological sodium channel data.  相似文献   

14.
The CRF04_cpx strains of HIV-1 accounts for approximately 2–10% of the infected population in Greece, across different transmission risk groups. CRF04_cpx was the lineage documented in an HIV-1 transmission network in Thessalonica, northern Greece. Most of the transmissions occurred through unprotected heterosexual contacts between 1989 and 1993. Blood samples were available for six patients, obtained 6–10 years later, except for one patient sampled in 1991. Our objective was to examine whether the transmission history is compatible with the evolutionary tree of the virus, in partial gag, partial env, and partial gag+env. The inferred phylogenetic tree obtained using maximum likelihood and Bayesian methods in partial gag+env was much closer to the transmission tree than that using either env or gag separately. Our findings suggest that the epidemiological relationships among patients who have been infected by a common source correspond almost exactly to the evolutionary trees of the virus, given that enough phylogenetic signal is present in the alignment. Moreover, we found evidence that recombination is not the most parsimonious explanation for the phylogenetic incongruence between gag and env. For patients with known infection dates, the estimated dates of the coalescent events obtained using molecular clock calculations based on a newly developed Bayesian method in gag + env were in agreement with the actual infection dates.This article contains online supplementary material.Reviewing Editor: Dr. Lauren Ancel-MeyersIsolated sequences from patients belonging to the CRF04_cpx transmission network always correspond to partially characterized gag, env, and gag+env genomic regions.  相似文献   

15.
The maximum parsimony (MP) method for inferring phylogenies is widely used, but little is known about its limitations in non-asymptotic situations. This study employs large-scale computations with simulated phylogenetic data to estimate the probability that MP succeeds in finding the true phylogeny for up to twelve taxa and 256 characters. The set of candidate phylogenies are taken to be unrooted binary trees; for each simulated data set, the tree lengths of all (2n − 5)!! candidates are computed to evaluate quantities related to the performance of MP, such as the probability of finding the true phylogeny, the probability that the tree with the shortest length is unique, the probability that the true phylogeny has the shortest tree length, and the expected inverse of the number of trees sharing the shortest length. The tree length distributions are also used to evaluate and extend the skewness test of Hillis for distinguishing between random and phylogenetic data. The results indicate, for example, that the critical point after which MP achieves a success probability of at least 0.9 is roughly around 128 characters. The skewness test is found to perform well on simulated data and the study extends its scope to up to twelve taxa.  相似文献   

16.
非线性再生散度随机效应模型是指数族非线性随机效应模型和非线性再生散度模型的推广和发展.通过视模型中的随机效应为假想的缺失数据和应用Metropolis-Hastings(MH)算法,提出了模型参数极大似然估计的Monte-Carlo EM(MCEM)算法,并用模拟研究和实例分析说明了该算法的可行性.  相似文献   

17.
Exact and heuristic algorithms for the Indel Maximum Likelihood Problem.   总被引:1,自引:0,他引:1  
Given a multiple alignment of orthologous DNA sequences and a phylogenetic tree for these sequences, we investigate the problem of reconstructing the most likely scenario of insertions and deletions capable of explaining the gaps observed in the alignment. This problem, that we called the Indel Maximum Likelihood Problem (IMLP), is an important step toward the reconstruction of ancestral genomics sequences, and is important for studying evolutionary processes, genome function, adaptation and convergence. We solve the IMLP using a new type of tree hidden Markov model whose states correspond to single-base evolutionary scenarios and where transitions model dependencies between neighboring columns. The standard Viterbi and Forward-backward algorithms are optimized to produce the most likely ancestral reconstruction and to compute the level of confidence associated to specific regions of the reconstruction. A heuristic is presented to make the method practical for large data sets, while retaining an extremely high degree of accuracy. The methods are illustrated on a 1-Mb alignment of the CFTR regions from 12 mammals.  相似文献   

18.
19.
非线性再生散度随机效应模型包括了非线性随机效应模型和指数族非线性随机效应模型等.通过视模型中的随机效应为假想的缺失数据和应用Metropolis-Hastings(简称MH) 算法,提出了模型参数极大似然估计的随机逼近算法.模拟研究和实例分析表明了该算法的可行性.  相似文献   

20.
We implement an isolation with migration model for three species, with migration occurring between two closely related species while an out-group species is used to provide further information concerning gene trees and model parameters. The model is implemented in the likelihood framework for analyzing multilocus genomic sequence alignments, with one sequence sampled from each of the three species. The prior distribution of gene tree topology and branch lengths at every locus is calculated using a Markov chain characterization of the genealogical process of coalescent and migration, which integrates over the histories of migration events analytically. The likelihood function is calculated by integrating over branch lengths in the gene trees (coalescent times) numerically. We analyze the model to study the gene tree-species tree mismatch probability and the time to the most recent common ancestor at a locus. The model is used to construct a likelihood ratio test (LRT) of speciation with gene flow. We conduct computer simulations to evaluate the LRT and found that the test is in general conservative, with the false positive rate well below the significance level. For the test to have substantial power, hundreds of loci are needed. Application of the test to a human-chimpanzee-gorilla genomic data set suggests gene flow around the time of speciation of the human and the chimpanzee.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号