首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Several stochastic models of character change, when implemented in a maximum likelihood framework, are known to give a correspondence between the maximum parsimony method and the method of maximum likelihood. One such model has an independently estimated branch-length parameter for each site and each branch of the phylogenetic tree. This model--the no-common-mechanism model--has many parameters, and, in fact, the number of parameters increases as fast as the alignment is extended. We take a Bayesian approach to the no-common-mechanism model and place independent gamma prior probability distributions on the branch-length parameters. We are able to analytically integrate over the branch lengths, and this allowed us to implement an efficient Markov chain Monte Carlo method for exploring the space of phylogenetic trees. We were able to reliably estimate the posterior probabilities of clades for phylogenetic trees of up to 500 sequences. However, the Bayesian approach to the problem, at least as implemented here with an independent prior on the length of each branch, does not tame the behavior of the branch-length parameters. The integrated likelihood appears to be a simple rescaling of the parsimony score for a tree, and the marginal posterior probability distribution of the length of a branch is dependent upon how the maximum parsimony method reconstructs the characters at the interior nodes of the tree. The method we describe, however, is of potential importance in the analysis of morphological character data and also for improving the behavior of Markov chain Monte Carlo methods implemented for models in which sites share a common branch-length parameter.  相似文献   

2.
Begun and Aquadro have demonstrated that levels of nucleotide variation correlate with recombination rate among 20 gene regions from across the genome of Drosophila melanogaster. It has been suggested that this correlation results from genetic hitchhiking associated with the fixation of strongly selected mutants. The hitchhiking process can be described as a series of two-step events. The first step consists of a strongly selected substitution wiping out linked variation in a population; this is followed by a recovery period in which polymorphism can build up via neutral mutations and random genetic drift. Genetic hitchhiking has previously been modeled as a steady-state process driven by recurring selected substitutions. We show here that the characteristic parameter of this steady-state model is alpha v, the product of selection intensity (alpha = 2Ns) and the frequency of beneficial mutations v (where N is population size and s is the selective advantage of the favored allele). We also demonstrate that the steady-state model describes the hitchhiking process adequately, unless the recombination rate is very low. To estimate alpha v, we use the data of DNA sequence variation from 17 D. melanogaster loci from regions of intermediate to high recombination rates. We find that alpha v is likely to be > 1.3 x 10(-8). Additional data are needed to estimate this parameter more precisely. The estimation of alpha v is important, as this parameter determines the shape of the frequency distribution of strongly selected substitutions.   相似文献   

3.
Interspecific morphological variation in animal genitalia has long attracted the attention of evolutionary biologists because of the role genital form may play in the generation and/or maintenance of species boundaries. Here we examine the origin and evolution of genital variation in rodents of the muroid genus Neotoma. We test the hypothesis that a relatively rare genital form has evolved only once in Neotoma. We use four mitochondrial and four nuclear markers to evaluate this hypothesis by establishing a phylogenetic framework in which to examine genital evolution. We find intron seven of the beta-fibrinogen gene to be a highly informative nuclear marker for the levels of differentiation that characterize Neotoma with this locus evolving at a rate slower than cytochrome b but faster than 12S. We estimate phylogenetic relationships within Neotoma using both maximum parsimony and maximum likelihood-based Bayesian methods. Our Bayesian and parsimony reconstructions differ in significant ways, but we show that our parsimony analysis may be influenced by long-branch attraction. Furthermore, our estimate of Neotoma phylogeny remains consistent across various data partitioning strategies in the Bayesian analyses. Using ancestral state reconstruction, we find support for the monophyly of taxa that possess the relatively rare genital form. However, we also find support for the independent evolution of the common genital form and discuss possible underlying developmental shifts that may have contributed to our observed patterns of morphological evolution.  相似文献   

4.
The ability to generate large molecular datasets for phylogenetic studies benefits biologists, but such data expansion introduces numerous analytical problems. A typical molecular phylogenetic study implicitly assumes that sequences evolve under stationary, reversible and homogeneous conditions, but this assumption is often violated in real datasets. When an analysis of large molecular datasets results in unexpected relationships, it often reflects violation of phylogenetic assumptions, rather than a correct phylogeny. Molecular evolutionary phenomena such as base compositional heterogeneity and among‐site rate variation are known to affect phylogenetic inference, resulting in incorrect phylogenetic relationships. The ability of methods to overcome such bias has not been measured on real and complex datasets. We investigated how base compositional heterogeneity and among‐site rate variation affect phylogenetic inference in the context of a mitochondrial genome phylogeny of the insect order Coleoptera. We show statistically that our dataset is affected by base compositional heterogeneity regardless of how the data are partitioned or recoded. Among‐site rate variation is shown by comparing topologies generated using models of evolution with and without a rate variation parameter in a Bayesian framework. When compared for their effectiveness in dealing with systematic bias, standard phylogenetic methods tend to perform poorly, and parsimony without any data transformation performs worst. Two methods designed specifically to overcome systematic bias, LogDet and a Bayesian method implementing variable composition vectors, can overcome some level of base compositional heterogeneity, but are still affected by among‐site rate variation. A large degree of variation in both noise and phylogenetic signal among all three codon positions is observed. We caution and argue that more data exploration is imperative, especially when many genes are included in an analysis.  相似文献   

5.
We conducted a simulation study of the phylogenetic methods UPGMA, neighbor joining, maximum parsimony, and maximum likelihood for a five-taxon tree under a molecular clock. The parameter space included a small region where maximum parsimony is inconsistent, so we tested inconsistency correction for parsimony and distance correction for neighbor joining. As expected, corrected parsimony was consistent. For these data, maximum likelihood with the clock assumption outperformed each of the other methods tested. The distance-based methods performed marginally better than did maximum parsimony and maximum likelihood without the clock assumption. Data correction was generally detrimental to accuracy, especially for short sequence lengths. We identified another region of the parameter space where, although consistent for a given method, some incorrect trees were each selected with up to twice the frequency of the correct (generating) tree for sequences of bounded length. These incorrect trees are those where the outgroup has been incorrectly placed. In addition to this problem, the placement of the outgroup sequence can have a confounding effect on the ingroup tree, whereby the ingroup is correct when using the ingroup sequences alone, but with the inclusion of the outgroup the ingroup tree becomes incorrect.  相似文献   

6.
To understand patterns and processes of the diversification of life, we require an accurate understanding of taxon interrelationships. Recent studies have suggested that analyses of morphological character data using the Bayesian and maximum likelihood Mk model provide phylogenies of higher accuracy compared to parsimony methods. This has proved controversial, particularly studies simulating morphology‐data under Markov models that assume shared branch lengths for characters, as it is claimed this leads to bias favouring the Bayesian or maximum likelihood Mk model over parsimony models which do not explicitly make this assumption. We avoid these potential issues by employing a simulation protocol in which character states are randomly assigned to tips, but datasets are constrained to an empirically realistic distribution of homoplasy as measured by the consistency index. Datasets were analysed with equal weights and implied weights parsimony, and the maximum likelihood and Bayesian Mk model. We find that consistent (low homoplasy) datasets render method choice largely irrelevant, as all methods perform well with high consistency (low homoplasy) datasets, but the largest discrepancies in accuracy occur with low consistency datasets (high homoplasy). In such cases, the Bayesian Mk model is significantly more accurate than alternative models and implied weights parsimony never significantly outperforms the Bayesian Mk model. When poorly supported branches are collapsed, the Bayesian Mk model recovers trees with higher resolution compared to other methods. As it is not possible to assess homoplasy independently of a tree estimate, the Bayesian Mk model emerges as the most reliable approach for categorical morphological analyses.  相似文献   

7.
8.
Wong WS  Yang Z  Goldman N  Nielsen R 《Genetics》2004,168(2):1041-1051
The parsimony method of Suzuki and Gojobori (1999) and the maximum likelihood method developed from the work of Nielsen and Yang (1998) are two widely used methods for detecting positive selection in homologous protein coding sequences. Both methods consider an excess of nonsynonymous (replacement) substitutions as evidence for positive selection. Previously published simulation studies comparing the performance of the two methods show contradictory results. Here we conduct a more thorough simulation study to cover and extend the parameter space used in previous studies. We also reanalyzed an HLA data set that was previously proposed to cause problems when analyzed using the maximum likelihood method. Our new simulations and a reanalysis of the HLA data demonstrate that the maximum likelihood method has good power and accuracy in detecting positive selection over a wide range of parameter values. Previous studies reporting poor performance of the method appear to be due to numerical problems in the optimization algorithms and did not reflect the true performance of the method. The parsimony method has a very low rate of false positives but very little power for detecting positive selection or identifying positively selected sites.  相似文献   

9.
Mathematical models have long been used for prediction of dynamics in biological systems. Recently, several efforts have been made to render these models patient specific. One way to do so is to employ techniques to estimate parameters that enable model based prediction of observed quantities. Knowledge of variation in parameters within and between groups of subjects have potential to provide insight into biological function. Often it is not possible to estimate all parameters in a given model, in particular if the model is complex and the data is sparse. However, it may be possible to estimate a subset of model parameters reducing the complexity of the problem. In this study, we compare three methods that allow identification of parameter subsets that can be estimated given a model and a set of data. These methods will be used to estimate patient specific parameters in a model predicting baroreceptor feedback regulation of heart rate during head-up tilt. The three methods include: structured analysis of the correlation matrix, analysis via singular value decomposition followed by QR factorization, and identification of the subspace closest to the one spanned by eigenvectors of the model Hessian. Results showed that all three methods facilitate identification of a parameter subset. The “best” subset was obtained using the structured correlation method, though this method was also the most computationally intensive. Subsets obtained using the other two methods were easier to compute, but analysis revealed that the final subsets contained correlated parameters. In conclusion, to avoid lengthy computations, these three methods may be combined for efficient identification of parameter subsets.  相似文献   

10.
The statistical framework of maximum likelihood estimation is used to examine character weighting in inferring phylogenies. A simple probabilistic model of evolution is used, in which each character evolves independently among two states, and different lineages evolve independently. When different characters have different known probabilities of change, all sufficiently small, the proper maximum likelihood method of estimating phylogenies is a weighted parsimony method in which the weights are logarithmically related to the rates of change. When rates of change are taken extremely small, the weights become more equal and unweighted parsimony methods are obtained. When it is known that a few characters have very high rates of change and the rest very low rates, but it is not known which characters are the ones having the high rates, the maximum likelihood criterion supports use of compatibility methods. By varying the fraction of characters believed to have high rates of change one obtains a ‘threshold method’ whose behavior depends on the value of a parameter. By altering this parameter the method changes smoothly from being a parsimony method to being a compatibility method. This provides us with a spectrum of intermediates between these methods. These intermediate methods may be of use in analysing real data.  相似文献   

11.
The statistical framework of maximum likelihood estimation is used to examine character weighting in inferring phylogenies. A simple probabilistic model of evolution is used, in which each character evolves independently among two states, and different lineages evolve independently. When different characters have different known probabilities of change, all sufficiently small, the proper maximum likelihood method of estimating phylogenies is a weighted parsimony method in which the weights are logarithmically related to the rates of change. When rates of change are taken extremely small, the weights become more equal and unweighted parsimony methods are obtained.
When it is known that a few characters have very high rates of change and the rest very low rates, but it is not known which characters are the ones having the high rates, the maximum likelihood criterion supports use of compatibility methods. By varying the fraction of characters believed to have high rates of change one obtains a 'threshold method' whose behavior depends on the value of a parameter. By altering this parameter the method changes smoothly from being a parsimony method to being a compatibility method. This provides us with a spectrum of intermediates between these methods. These intermediate methods may be of use in analysing real data.  相似文献   

12.
Key issues in protein science and computational biology are design and evaluation of algorithms aimed at detection of proteins that belong to a specific family, as defined by structural, evolutionary, or functional criteria. In this context, several validation techniques are often used to compare different parameter settings of the detector, and to subsequently select the setting that yields the smallest error rate estimate. A frequently overlooked problem associated with this approach is that this smallest error rate estimate may have a large optimistic bias. Based on computer simulations, we show that a detector's error rate estimate can be overly optimistic and propose a method to obtain unbiased performance estimates of a detector design procedure. The method is founded on an external 10-fold cross-validation (CV) loop that embeds an internal validation procedure used for parameter selection in detector design. The designed detector generated in each of the 10 iterations are evaluated on held-out examples exclusively available in the external CV iterations. Notably, the average of these 10 performance estimates is not associated with a final detector, but rather with the average performance of the design procedure used. We apply the external CV loop to the particular problem of detecting potentially allergenic proteins, using a previously reported design procedure. Unbiased performance estimates of the allergen detector design procedure are presented together with information about which algorithms and parameter settings that are most frequently selected.  相似文献   

13.
Elongation factor-1alpha (EF-1alpha) is a highly conserved nuclear coding gene that can be used to investigate recent divergences due to the presence of rapidly evolving introns. However, a universal feature of intron sequences is that even closely related species exhibit insertion and deletion events, which cause variation in the lengths of the sequences. Indels are frequently rich in evolutionary information, but most investigators ignore sites that fall within these variable regions, largely because the analytical tools and theory are not well developed. We examined this problem in the taxonomically problematic parasitoid wasp genus Pauesia (Hymenoptera: Braconidae: Aphidiinae) using congruence as a criterion for assessing a range of methods for aligning such variable-length EF-1alpha intron sequences. These methods included distance- and parsimony-based multiple-alignment programs (CLUSTAL W and MALIGN), direct optimization (POY), and two "by eye" alignment strategies. Furthermore, with one method (CLUSTAL W) we explored in detail the robustness of results to changes in the gap cost parameters. Phenetic-based alignments ("by eye" and CLUSTAL W) appeared, under our criterion, to perform as well as more readily defensible, but computationally more demanding, methods. In general, all of our alignment and tree-building strategies recovered the same basic topological structure, which means that an underlying phylogenetic signal remained regardless of the strategy chosen. However, several relationships between clades were sensitive both to alignment and to tree-building protocol. Further alignments, considering only sequences belonging to the same group, allowed us to infer a range of phylogenetic relationships that were highly robust to tree-building protocol. By comparing these topologies with those obtained by varying the CLUSTAL parameters, we generated the distribution area of congruence and taxonomic compatibility. Finally, we present the first robust estimate of the European Pauesia phylogeny by using two EF-1alpha introns and 38 taxa (plus 3 outgroups). This estimate conflicts markedly with the traditional subgeneric classification. We recommend that this classification be abandoned, and we propose a series of monophyletic species groups.  相似文献   

14.
Wu H  Xue H  Kumar A 《Biometrics》2012,68(2):344-352
Differential equations are extensively used for modeling dynamics of physical processes in many scientific fields such as engineering, physics, and biomedical sciences. Parameter estimation of differential equation models is a challenging problem because of high computational cost and high-dimensional parameter space. In this article, we propose a novel class of methods for estimating parameters in ordinary differential equation (ODE) models, which is motivated by HIV dynamics modeling. The new methods exploit the form of numerical discretization algorithms for an ODE solver to formulate estimating equations. First, a penalized-spline approach is employed to estimate the state variables and the estimated state variables are then plugged in a discretization formula of an ODE solver to obtain the ODE parameter estimates via a regression approach. We consider three different order of discretization methods, Euler's method, trapezoidal rule, and Runge-Kutta method. A higher-order numerical algorithm reduces numerical error in the approximation of the derivative, which produces a more accurate estimate, but its computational cost is higher. To balance the computational cost and estimation accuracy, we demonstrate, via simulation studies, that the trapezoidal discretization-based estimate is the best and is recommended for practical use. The asymptotic properties for the proposed numerical discretization-based estimators are established. Comparisons between the proposed methods and existing methods show a clear benefit of the proposed methods in regards to the trade-off between computational cost and estimation accuracy. We apply the proposed methods t an HIV study to further illustrate the usefulness of the proposed approaches.  相似文献   

15.
Haplotype information plays an important role in many genetic analyses. However, the identification of haplotypes based on sequencing methods is both expensive and time consuming. Current sequencing methods are only efficient to determine conflated data of haplotypes, that is, genotypes. This raises the need to develop computational methods to infer haplotypes from genotypes.Haplotype inference by pure parsimony is an NP-hard problem and still remains a challenging task in bioinformatics. In this paper, we propose an efficient ant colony optimization (ACO) heuristic method, named ACOHAP, to solve the problem. The main idea is based on the construction of a binary tree structure through which ants can travel and resolve conflated data of all haplotypes from site to site. Experiments with both small and large data sets show that ACOHAP outperforms other state-of-the-art heuristic methods. ACOHAP is as good as the currently best exact method, RPoly, on small data sets. However, it is much better than RPoly on large data sets. These results demonstrate the efficiency of the ACOHAP algorithm to solve the haplotype inference by pure parsimony problem for both small and large data sets.  相似文献   

16.
In this study, we explored how the concept of the process partition may be applied to phylogenetic analysis. Sequence data were gathered from 23 species and subspecies of the swallowtail butterfly genus Papilio, as well as from two outgroup species from the genera Eurytides and Pachliopta. Sequence data consisted of 1,010 bp of the nuclear protein-coding gene elongation factor-1 alpha (EF-1 alpha) as well as the entire sequences (a total of 2,211 bp) of the mitochondrial protein-coding genes cytochrome oxidase I and cytochrome oxidase II (COI and COII). In order to examine the interaction between the nuclear and mitochondrial partitions in a combined analysis, we used a method of visualizing branch support as a function of partition weight ratios. We demonstrated how this method may be used to diagnose error at different levels of a tree in a combined maximum-parsimony analysis. Further, we assessed patterns of evolution within and between subsets of the data by implementing a multipartition maximum-likelihood model to estimate evolutionary parameters for various putative process partitions. COI third positions have an estimated average substitution rate more than 15 times that of EF-1 alpha, while COII third positions have an estimated average substitution rate more than 22 times that of EF-1 alpha. Ultimately, we found that although the mitochondrial and nuclear data were not significantly incongruent, homoplasy in the fast-evolving mitochondrial data confounded the resolution of basal relationships in the combined unweighted parsimony analysis despite the fact that there was relatively strong support for the relationships in the nuclear data. We conclude that there may be shortcomings to the methods of "total evidence" and "conditional combination" because they may fail to detect or accommodate the type of confounding bias we found in our data.  相似文献   

17.
Recent studies based on different types of data (i.e., morphology, molecules) have found strongly conflicting phylogenies for the genera of iguanid lizards but have been unable to explain the basis for this incongruence. We reanalyze published data from morphology and from the mitochondrial ND4, cytochrome b, 12S, and 16S genes to explore the sources of incongruence and resolve these conflicts. Much of the incongruence centers on the genus Cyclura, which is the sister taxon of Iguana, according to parsimony analyses of the morphology and the ribosomal genes, but is the sister taxon of all other Iguanini, according to the protein-coding genes. Maximum likelihood analyses show that there has been an increase in the rate of nucleotide substitution in Cyclura in the two protein-coding genes (ND4 and cytochrome b), although this increase is not as clear when parsimony is used to estimate branch lengths. Parametric simulations suggest that Cyclura may be misplaced by the protein-coding genes as a result of long-branch attraction; even when Cyclura and Iguana are sister taxa in a simulated phylogeny, Cyclura is still placed as the basal member of the Iguanini by parsimony analysis in 55% of the replicates. A similar long-branch attraction problem may also exist in the morphological data with regard to the placement of Sauromalus with the Galápagos iguanas (Amblyrhynchus and Conolophus). The results have many implications for the analysis of diverse data sets, the impact of long branches on parsimony and likelihood methods, and the use of certain protein-coding genes in phylogeny reconstruction.  相似文献   

18.
Memory and the efficient use of information   总被引:2,自引:0,他引:2  
We consider the problem of how an animal's memory should be designed in order to cope with a stochastic and changing environment. In particular we consider the problem of forming the best estimate of an unknown and possibly changing environmental parameter. Under the simple model we consider, the effect of an observation is to update this estimate using a linear operator. Two models of a changing environment are analysed. For each model we show how estimates change as a function of time elapsed and observations taken. The effect of a regular sequence of observations is also considered, and it is shown that an exponential weighting of past observations is a sufficient statistic on which to base decisions. The weighting factors are different in the two model environments considered, but each is shown to be a function of the rate at which the environment is changing.  相似文献   

19.
The relative efficiencies of the maximum-likelihood (ML), neighbor- joining (NJ), and maximum-parsimony (MP) methods in obtaining the correct topology and in estimating the branch lengths for the case of four DNA sequences were studied by computer simulation, under the assumption either that there is variation in substitution rate among different nucleotide sites or that there is no variation. For the NJ method, several different distance measures (Jukes-Cantor, Kimura two- parameter, and gamma distances) were used, whereas for the ML method three different transition/transversion ratios (R) were used. For the MP method, both the standard unweighted parsimony and the dynamically weighted parsimony methods were used. The results obtained are as follows: (1) When the R value is high, dynamically weighted parsimony is more efficient than unweighted parsimony in obtaining the correct topology. (2) However, both weighted and unweighted parsimony methods are generally less efficient than the NJ and ML methods even in the case where the MP method gives a consistent tree. (3) When all the assumptions of the ML method are satisfied, this method is slightly more efficient than the NJ method. However, when the assumptions are not satisfied, the NJ method with gamma distances is slightly better in obtaining the correct topology than is the ML method. In general, the two methods show more or less the same performance. The NJ method may give a correct topology even when the distance measures used are not unbiased estimators of nucleotide substitutions. (4) Branch length estimates of a tree with the correct topology are affected more easily than topology by violation of the assumptions of the mathematical model used, for both the ML and the NJ methods. Under certain conditions, branch lengths are seriously overestimated or underestimated. The MP method often gives serious underestimates for certain branches. (5) Distance measures that generate the correct topology, with high probability, do not necessarily give good estimates of branch lengths. (6) The likelihood-ratio test and the confidence-limit test, in Felsenstein's DNAML, for examining the statistical of branch length estimates are quite sensitive to violation of the assumptions and are generally too liberal to be used for actual data. Rzhetsky and Nei's branch length test is less sensitive to violation of the assumptions than is Felsenstein's test. (7) When the extent of sequence divergence is < or = 5% and when > or = 1,000 nucleotides are used, all three methods show essentially the same efficiency in obtaining the correct topology and in estimating branch lengths.(ABSTRACT TRUNCATED AT 400 WORDS)   相似文献   

20.
The role of a parsimony principle is unclear in most methods which have been claimed to be valid for the reconstruction of tionary kinship. There appear to be two reasons for this: first, the role of parsimony is generally uncertain in scientific method; second, the majority of methods proposed transform data and order them, but are not appropriate to the reconstruction of phyto Commitment to a probabilistic model of tionary processes seems to be the essential component which may enable us justifiably to estimate phylo An example is provided which emphasizes the importance of knowledge about the nature of the process before undertaking estimation of the pattern of kinship.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号