首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Inching toward reality: An improved likelihood model of sequence evolution   总被引:3,自引:0,他引:3  
Summary Our previous evolutionary model is generalized to permit approximate treatment of multiple-base insertions and deletions as well as regional heterogeneity of substitution rates. Parameter estimation and alignment procedures that incorporate these generalizations are developed. Simulations are used to assess the accuracy of the parameter estimation procedure and an example of an inferred alignment is included. Offprint requests to: J.L. Thorne  相似文献   

2.
Evolutionary trees from DNA sequences: A maximum likelihood approach   总被引:129,自引:0,他引:129  
Summary The application of maximum likelihood techniques to the estimation of evolutionary trees from nucleic acid sequence data is discussed. A computationally feasible method for finding such maximum likelihood estimates is developed, and a computer program is available. This method has advantages over the traditional parsimony algorithms, which can give misleading results if rates of evolution differ in different lineages. It also allows the testing of hypotheses about the constancy of evolutionary rates by likelihood ratio tests, and gives rough indication of the error of the estimate of the tree.By acceptance of this article, the publisher and/or recipient acknowledges the U.S. government's right to retain a nonexclusive, royalty-free licence in and to any copyright covering this paperThis report was prepared as an account of work sponsored by the United States Government. Neither the United States nor the United States Department of Energy, nor any of their employees, nor any of their contractors, subcontractors, or their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness or usefulness of any information, apparatus, product or process disclosed, or represents that its use would not infringe privately-owned rights  相似文献   

3.
Summary The maximum likelihood (ML) method for constructing phylogenetic trees (both rooted and unrooted trees) from DNA sequence data was studied. Although there is some theoretical problem in the comparison of ML values conditional for each topology, it is possible to make a heuristic argument to justify the method. Based on this argument, a new algorithm for estimating the ML tree is presented. It is shown that under the assumption of a constant rate of evolution, the ML method and UPGMA always give the same rooted tree for the case of three operational taxonomic units (OTUs). This also seems to hold approximately for the case with four OTUs. When we consider unrooted trees with the assumption of a varying rate of nucleotide substitution, the efficiency of the ML method in obtaining the correct tree is similar to those of the maximum parsimony method and distance methods. The ML method was applied to Brown et al.'s data, and the tree topology obtained was the same as that found by the maximum parsimony method, but it was different from those obtained by distance methods.  相似文献   

4.
Summary The efficiency of obtaining the correct tree by the maximum likelihood method (Felsenstein 1981) for inferring trees from DNA sequence data was compared with trees obtained by distance methods. It was shown that the maximum likelihood method is superior to distance methods in the efficiency particularly when the evolutionary rate differs among lineages.  相似文献   

5.
A "Long Indel" model for evolutionary sequence alignment   总被引:7,自引:0,他引:7  
We present a new probabilistic model of sequence evolution, allowing indels of arbitrary length, and give sequence alignment algorithms for our model. Previously implemented evolutionary models have allowed (at most) single-residue indels or have introduced artifacts such as the existence of indivisible "fragments." We compare our algorithm to these previous methods by applying it to the structural homology dataset HOMSTRAD, evaluating the accuracy of (1) alignments and (2) evolutionary time estimates. With our method, it is possible (for the first time) to integrate probabilistic sequence alignment, with reliability indicators and arbitrary gap penalties, in the same framework as phylogenetic reconstruction. Our alignment algorithm requires that we evaluate the likelihood of any specific path of mutation events in a continuous-time Markov model, with the event times integrated out. To this effect, we introduce a "trajectory likelihood" algorithm (Appendix A). We anticipate that this algorithm will be useful in more general contexts, such as Markov Chain Monte Carlo simulations.  相似文献   

6.
The calculation of maximum likelihood pedigrees for related organisms using genotypic data is considered. The problem is formulated so that the domain of optimization is a permutation space. This is a feature shared by the travelling salesman problem, for which simulated annealing is known to be effective. Using this technique it is found that pedigrees can be reconstructed with minimal error using genotypic data of a quality currently realizable. In complex pedigrees accurate reconstruction can be done with no a priori age or sex information. For smaller numbers of individuals a method of efficiently enumerating all admissible pedigrees of nonzero likelihood is given.  相似文献   

7.

Background

The study of discrete characters is crucial for the understanding of evolutionary processes. Even though great advances have been made in the analysis of nucleotide sequences, computer programs for non-DNA discrete characters are often dedicated to specific analyses and lack flexibility. Discrete characters often have different transition rate matrices, variable rates among sites and sometimes contain unobservable states. To obtain the ability to accurately estimate a variety of discrete characters, programs with sophisticated methodologies and flexible settings are desired.

Results

DiscML performs maximum likelihood estimation for evolutionary rates of discrete characters on a provided phylogeny with the options that correct for unobservable data, rate variations, and unknown prior root probabilities from the empirical data. It gives users options to customize the instantaneous transition rate matrices, or to choose pre-determined matrices from models such as birth-and-death (BD), birth-death-and-innovation (BDI), equal rates (ER), symmetric (SYM), general time-reversible (GTR) and all rates different (ARD). Moreover, we show application examples of DiscML on gene family data and on intron presence/absence data.

Conclusion

DiscML was developed as a unified R program for estimating evolutionary rates of discrete characters with no restriction on the number of character states, and with flexibility to use different transition models. DiscML is ideal for the analyses of binary (1s/0s) patterns, multi-gene families, and multistate discrete morphological characteristics.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-320) contains supplementary material, which is available to authorized users.  相似文献   

8.
9.
Summary A bias correction was derived for the maximum likelihood estimator (MLE) of the intraclass correlation. The bias consisted of two parts: a correction from MLE to the analysis of variance estimator (ANOVA) and the bias of ANOVA. The total possible bias was always negative and depended upon both the degree of correlation and the design size and balance. The first part of the bias was an exact algebraic expression from MLE to ANOVA, and the corrected estimator by this part was ANOVA. It was also shown that the first correction term was equivalent to Fisher's reciprocal bias correction on hisZ scores. The total possible bias of MLE was large for small and moderate samples. Relative biases were larger for small parametric values and vice versa. To ensure a relative bias less than 10% assuming an intraclass correlation of 0.025, which is not unusual in most of the animal genetic studies, the total number of observations (N) should be not less than 500. From a design point of view, minimum bias occurred atn = 2, the minimum family size possible, underN fixed.  相似文献   

10.
11.
12.
13.
An evolutionary Monte Carlo algorithm for predicting DNA hybridization   总被引:1,自引:0,他引:1  
Kim JS  Lee JW  Noh YK  Park JY  Lee DY  Yang KA  Chai YG  Kim JC  Zhang BT 《Bio Systems》2008,91(1):69-75
Many DNA-based technologies, such as DNA computing, DNA nanoassembly and DNA biochips, rely on DNA hybridization reactions. Previous hybridization models have focused on macroscopic reactions between two DNA strands at the sequence level. Here, we propose a novel population-based Monte Carlo algorithm that simulates a microscopic model of reacting DNA molecules. The algorithm uses two essential thermodynamic quantities of DNA molecules: the binding energy of bound DNA strands and the entropy of unbound strands. Using this evolutionary Monte Carlo method, we obtain a minimum free energy configuration in the equilibrium state. We applied this method to a logical reasoning problem and compared the simulation results with the experimental results of the wet-lab DNA experiments performed subsequently. Our simulation predicted the experimental results quantitatively.  相似文献   

14.
The phylogeny of theDrosophila hydei subgroup, which is a member of theD. repleta species group, was inferred from 1,515 base pairs of mitochondrial DNA sequence of the cytochrome oxidase subunits I, II, and III. Four of the seven species in the subgroup were examined, which are placed into two taxonomic complexes: theD. bifurca complex (D. bifurca) andD. nigrohydei) and theD. hydei complex (D. hydei and (D. eohydei). Both complexes appear to be monophyletic, although theD. bifurca complex is only weakly supported. The evolution of chromosomal change, interspecific crossability, sperm gigantism, and divergence times of the subgroup is discussed in a phylogenetic context. Correspondence to: G. Spicer  相似文献   

15.
Schafer DW 《Biometrics》2001,57(1):53-61
This paper presents an EM algorithm for semiparametric likelihood analysis of linear, generalized linear, and nonlinear regression models with measurement errors in explanatory variables. A structural model is used in which probability distributions are specified for (a) the response and (b) the measurement error. A distribution is also assumed for the true explanatory variable but is left unspecified and is estimated by nonparametric maximum likelihood. For various types of extra information about the measurement error distribution, the proposed algorithm makes use of available routines that would be appropriate for likelihood analysis of (a) and (b) if the true x were available. Simulations suggest that the semiparametric maximum likelihood estimator retains a high degree of efficiency relative to the structural maximum likelihood estimator based on correct distributional assumptions and can outperform maximum likelihood based on an incorrect distributional assumption. The approach is illustrated on three examples with a variety of structures and types of extra information about the measurement error distribution.  相似文献   

16.
Summary A large amount of information is contained within the phylogentic relationships between species. In addition to their branching patterns it is also possible to examine other aspects of the biology of the species. The influence that deleterious selection might have is determined here. The likelihood of different phylogenies in the presence of selection is explored to determine the properties of such a likelihood surface. The calculation of likelihoods for a phylogeny in the presence and absence of selection, permits the application of a likelihood ratio test to search for selection. It is shown that even a single selected site can have a strong effect on the likelihood. The method is illustrated with an example fromDrosophila melanogaster and suggests that delerious selection may be acting on transposable elements.  相似文献   

17.
Choice of a substitution model is a crucial step in the maximum likelihood (ML) method of phylogenetic inference, and investigators tend to prefer complex mathematical models to simple ones. However, when complex models with many parameters are used, the extent of noise in statistical inferences increases, and thus complex models may not produce the true topology with a higher probability than simple ones. This problem was studied using computer simulation. When the number of nucleotides used was relatively large (1000 bp), the HKY+Gamma model showed smaller d(T) topological distance between the inferred and the true trees) than the JC and Kimura models. In the cases of shorter sequences (300 bp) simpler model and search algorithm such as JC model and SA+NNI search were found to be as efficient as more complicated searches and models in terms of topological distances, although the topologies obtained under HKY+Gamma model had the highest likelihood values. The performance of relatively simple search algorithm SA+NNI was found to be essentially the same as that of more extensive SA+TBR search under all models studied. Similarly to the conclusions reached by Takahashi and Nei [Mol. Biol. Evol. 17 (2000) 1251], our results indicate that simple models can be as efficient as complex models, and that use of complex models does not necessarily give more reliable trees compared with simple models.  相似文献   

18.
Pledger S 《Biometrics》2000,56(2):434-442
Agresti (1994, Biometrics 50, 494-500) and Norris and Pollock (1996a, Biometrics 52, 639-649) suggested using methods of finite mixtures to partition the animals in a closed capture-recapture experiment into two or more groups with relatively homogeneous capture probabilities. This enabled them to fit the models Mh, Mbh (Norris and Pollock), and Mth (Agresti) of Otis et al. (1978, Wildlife Monographs 62, 1-135). In this article, finite mixture partitions of animals and/or samples are used to give a unified linear-logistic framework for fitting all eight models of Otis et al. by maximum likelihood. Likelihood ratio tests are available for model comparisons. For many data sets, a simple dichotomy of animals is enough to substantially correct for heterogeneity-induced bias in the estimation of population size, although there is the option of fitting more than two groups if the data warrant it.  相似文献   

19.
MALLET  A. 《Biometrika》1986,73(3):645-656
  相似文献   

20.
Summary Studies are carried out on the uniqueness of the stationary point on the likelihood function for estimating molecular phylogenetic trees, yielding proof that there exists at most one stationary point, i.e., the maximum point, in the parameter range for the one parameter model of nucleotide substitution. The proof is simple yet applicable to any type of tree topology with an arbitrary number of operational taxonomic units (OTUs). The proof ensures that any valid approximation algorithm be able to reach the unique maximum point under the conditions mentioned above. An algorithm developed incorporating Newton's approximation method is then compared with the conventional one by means of computers simulation. The results show that the newly developed algorithm always requires less CPU time than the conventional one, whereas both algorithms lead to identical molecular phylogenetic trees in accordance with the proof. Contribution No. 1780 from the National Institute of Genetics, Mishima 411, Japan  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号