共查询到20条相似文献,搜索用时 0 毫秒
1.
Invariants are functions of the probabilities of state configurations among lineages, with expected values equal to zero under certain phylogenies. For two-state sequences, the existence of certain quadratic invariants requires a symmetric substitution model. For sequences with more than two states, the necessary condition for the existence of certain quadratic invariants in terms of independent events is much stronger than symmetry. For DNA sequences, only three parameters are allowed in the substitution model, which includes Kimura's two-parameter model as a special case. 相似文献
2.
An analytical method is presented for constructing linear invariants. All linear invariants of a k-species tree can be derived from those of (k-1)-species trees using this method. The new method is simpler than that of Cavender, which relies on numerical computations. Moreover, the new method provides a convenient tool to study the relationships between linear invariants of the same tree or of different trees. All linear invariants of trees of up to five species are derived in this study. For four species, there are 16 independent linear invariants for each of the three possible unrooted trees, 14 of which are shared by two unrooted trees and 12 of these are shared by all three unrooted trees; the last types of linear invariants can be used to construct tests on the assumptions about nucleotide substitutions. The number of linear invariants for a tree is found to increase rapidly with the number of species. 相似文献
3.
J A Cavender 《Mathematical biosciences》1991,103(1):69-75
It is known that if all the Markov transition matrices that govern the substitution of one nucleotide for another satisfy six linear constraints, then equations can be derived that permit one to infer evolutionary trees from nucleic acid sequences by the method of linear invariants. These sufficient conditions are also necessary. Any relaxation of them results in the loss of all linear invariants. Necessary conditions for any given set of linear invariants can be derived by examining conditions a matrix must satisfy to map a certain set of matrices into itself. To the extent that necessary conditions are incorrect, a method is not reliable. In a world where different parts of molecules evolve at different rates, the two-parameter model of Kimura may not be empirically distinguishable from the more general one treated here. 相似文献
4.
This paper presents a necessary condition for the existence of a numerical quantity optimised by evolution by natural selection, which also turns out to be a sufficient condition under rather general conditions. As a corollary, a related criterion with a particularly intuitive graphical interpretation in terms of pairwise invadability plots is obtained. 相似文献
5.
6.
Necessary and sufficient conditions for evolutionary suicide 总被引:4,自引:0,他引:4
Evolutionary suicide is an evolutionary process where a viable population adapts in such a way that it can no longer persist.
It has already been found that a discontinuous transition to extinction is a necessary condition for suicide. Here we present
necessary and sufficient conditions, concerning the bifurcation point, for suicide to occur. Evolutionary suicide has been
found in structured metapopulation models. Here we show that suicide can occur also in unstructured population models. Moreover,
a structured model does not guarantee the possibility of suicide: we show that suicide cannot occur in age-structured population
models of the Gurtin-MacCamy type. The point is that the mutant’s fitness must explicitly depend not only on the environmental
interaction variable, but also on the resident strategy. 相似文献
7.
In phylogenetic inference, an evolutionary model describes the substitution processes along each edge of a phylogenetic tree. Misspecification of the model has important implications for the analysis of phylogenetic data. Conventionally, however, the selection of a suitable evolutionary model is based on heuristics or relies on the choice of an approximate input tree. We introduce a method for model Selection in Phylogenetics based on linear INvariants (SPIn), which uses recent insights on linear invariants to characterize a model of nucleotide evolution for phylogenetic mixtures on any number of components. Linear invariants are constraints among the joint probabilities of the bases in the operational taxonomic units that hold irrespective of the tree topologies appearing in the mixtures. SPIn therefore requires no input tree and is designed to deal with nonhomogeneous phylogenetic data consisting of multiple sequence alignments showing different patterns of evolution, for example, concatenated genes, exons, and/or introns. Here, we report on the results of the proposed method evaluated on multiple sequence alignments simulated under a variety of single-tree and mixture settings for both continuous- and discrete-time models. In the simulations, SPIn successfully recovers the underlying evolutionary model and is shown to perform better than existing approaches. 相似文献
8.
The method of invariants is an approach to the problem of reconstructing the phylogenetic tree of a collection of m taxa using nucleotide sequence data. Models for the respective probabilities of the 4m possible vectors of bases at a given site will have unknown parameters that describe the random mechanism by which substitution occurs along the branches of a putative phylogenetic tree. An invariant is a polynomial in these probabilities that, for a given phylogeny, is zero for all choices of the substitution mechanism parameters. If the invariant is typically non-zero for another phylogenetic tree, then estimates of the invariant can be used as evidence to support one phylogeny over another. Previous work of Evans and Speed showed that, for certain commonly used substitution models, the problem of finding a minimal generating set for the ideal of invariants can be reduced to the linear algebra problem of finding a basis for a certain lattice (that is, a free Z-module). They also conjectured that the cardinality of such a generating set can be computed using a simple "degrees of freedom" formula. We verify this conjecture. Along the way, we explain in detail how the observations of Evans and Speed lead to a simple, computationally feasible algorithm for constructing a minimal generating set. 相似文献
9.
Statistical models of evolution are algebraic varieties in the space of joint probability distributions on the leaf colorations of a phylogenetic tree. The phylogenetic invariants of a model are the polynomials which vanish on the variety. Several widely used models for biological sequences have transition matrices that can be diagonalized by means of the Fourier transform of an Abelian group. Their phylogenetic invariants form a toric ideal in the Fourier coordinates. We determine generators and Gr?bner bases for these toric ideals. For the Jukes-Cantor and Kimura models on a binary tree, our Gr?bner bases consist of certain explicitly constructed polynomials of degree at most four. 相似文献
10.
Statistical models of evolution are algebraic varieties in the space of joint probability distributions on the leaf colorations of a phylogenetic tree. The phylogenetic invariants of a model are the polynomials which vanish on the variety. Several widely used models for biological sequences have transition matrices that can be diagonalized by means of the Fourier transform of an abelian group. Their phylogenetic invariants form a toric ideal in the Fourier coordinates. We determine generators and Gr?bner bases for these toric ideals. For the Jukes-Cantor and Kimura models on a binary tree, our Gr?bner bases consist of certain explicitly constructed polynomials of degree at most four. 相似文献
11.
Sample size for a phylogenetic inference. 总被引:1,自引:0,他引:1
The objective of this work is to describe sample-size calculations for the inference of a nonzero central branch length in an unrooted four-species phylogeny. Attention is restricted to independent binary characters, such as might be obtained from an alignment of the purine-pyrimidine sequences of a nucleic acid molecule. A statistical test based on a multinomial model for character-state configurations is described. The importance of including invariable sites in models for sequence change is demonstrated, and their effect on sample size is quantified. The methods are applied to a four-species alignment of small-subunit rRNA sequences derived from two archaebacteria, a eubacteria and a eukaryote. We conclude that the information in these sequences is not sufficient to resolve the branching order of this tree. Estimates of the number of aligned nucleotide positions required to provide a reasonably powerful test are given. 相似文献
12.
Vrzheshch PV 《Biochemistry. Biokhimii?a》2008,73(10):1114-1120
Steady-state kinetics of compulsory-ordered single-substrate irreversible and reversible enzyme reactions with two, three, and arbitrary number of intermediates were observed. Necessary and sufficient conditions for application of the quasi-equilibrium assumption and restrictions of this assumption were found in cases of two and three intermediates in the equilibrium segment. For all cases, accuracy of the quasi-equilibrium assumption was evaluated. 相似文献
13.
A central task in the study of molecular evolution is the reconstruction of a phylogenetic tree from sequences of current-day taxa. The most established approach to tree reconstruction is maximum likelihood (ML) analysis. Unfortunately, searching for the maximum likelihood phylogenetic tree is computationally prohibitive for large data sets. In this paper, we describe a new algorithm that uses Structural Expectation Maximization (EM) for learning maximum likelihood phylogenetic trees. This algorithm is similar to the standard EM method for edge-length estimation, except that during iterations of the Structural EM algorithm the topology is improved as well as the edge length. Our algorithm performs iterations of two steps. In the E-step, we use the current tree topology and edge lengths to compute expected sufficient statistics, which summarize the data. In the M-Step, we search for a topology that maximizes the likelihood with respect to these expected sufficient statistics. We show that searching for better topologies inside the M-step can be done efficiently, as opposed to standard methods for topology search. We prove that each iteration of this procedure increases the likelihood of the topology, and thus the procedure must converge. This convergence point, however, can be a suboptimal one. To escape from such "local optima," we further enhance our basic EM procedure by incorporating moves in the flavor of simulated annealing. We evaluate these new algorithms on both synthetic and real sequence data and show that for protein sequences even our basic algorithm finds more plausible trees than existing methods for searching maximum likelihood phylogenies. Furthermore, our algorithms are dramatically faster than such methods, enabling, for the first time, phylogenetic analysis of large protein data sets in the maximum likelihood framework. 相似文献
14.
Counting phylogenetic invariants in some simple cases. 总被引:1,自引:0,他引:1
J Felsenstein 《Journal of theoretical biology》1991,152(3):357-376
An informal degrees of freedom argument is used to count the number of phylogenetic invariants in cases where we have three or four species and can assume a Jukes-Cantor model of base substitution with or without a molecular clock. A number of simple cases are treated and in each the number of invariants can be found. Two new classes of invariants are found: non-phylogenetic cubic invariants testing independence of evolutionary events in different lineages, and linear phylogenetic invariants which occur when there is a molecular clock. Most of the linear invariants found by Cavender (1989, Molec. Biol. Evol. 6, 301-316) turn out in the Jukes-Cantor case to be simple tests of symmetry of the substitution model, and not phylogenetic invariants. 相似文献
15.
16.
Traditionally, phylogenetic analyses over many genes combine data into a contiguous block. Under this concatenated model, all genes are assumed to evolve at the same rate. However, it is clear that genes evolve at very different rates and that accounting for this rate heterogeneity is important if we are to accurately infer phylogenies from heterogeneous multigene data sets. There remain open questions regarding how best to incorporate gene rate parameters into phylogenetic models and which properties of real data correlate with improved fit over the concatenated model. In this study, two methods of accounting for gene rate heterogeneity are compared: the n-parameter method, which allows for each of the n gene partitions to have a gene rate parameter, and the alpha-parameter method, which fits a distribution to the gene rates. Results demonstrate that the n-parameter method is both computationally faster and in general provides a better fit over the concatenated model than the alpha-parameter method. Furthermore, improved model fit over the concatenated model is highly correlated with the presence of a gene with a slow relative rate of evolution. 相似文献
17.
18.
Dutilh BE van Noort V van der Heijden RT Boekhout T Snel B Huynen MA 《Bioinformatics (Oxford, England)》2007,23(7):815-824
MOTIVATION: Phylogenomics integrates the vast amount of phylogenetic information contained in complete genome sequences, and is rapidly becoming the standard for reliably inferring species phylogenies. There are, however, fundamental differences between the ways in which phylogenomic approaches like gene content, superalignment, superdistance and supertree integrate the phylogenetic information from separate orthologous groups. Furthermore, they all depend on the method by which the orthologous groups are initially determined. Here, we systematically compare these four phylogenomic approaches, in parallel with three approaches for large-scale orthology determination: pairwise orthology, cluster orthology and tree-based orthology. RESULTS: Including various phylogenetic methods, we apply a total of 54 fully automated phylogenomic procedures to the fungi, the eukaryotic clade with the largest number of sequenced genomes, for which we retrieved a golden standard phylogeny from the literature. Phylogenomic trees based on gene content show, relative to the other methods, a bias in the tree topology that parallels convergence in lifestyle among the species compared, indicating convergence in gene content. CONCLUSIONS: Complete genomes are no guarantee for good or even consistent phylogenies. However, the large amounts of data in genomes enable us to carefully select the data most suitable for phylogenomic inference. In terms of performance, the superalignment approach, combined with restrictive orthology, is the most successful in recovering a fungal phylogeny that agrees with current taxonomic views, and allows us to obtain a high-resolution phylogeny. We provide solid support for what has grown to be a common practice in phylogenomics during its advance in recent years. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. 相似文献
19.
Ribosomal DNA: molecular evolution and phylogenetic inference. 总被引:79,自引:0,他引:79
Ribosomal DNA (rDNA) sequences have been aligned and compared in a number of living organisms, and this approach has provided a wealth of information about phylogenetic relationships. Studies of rDNA sequences have been used to infer phylogenetic history across a very broad spectrum, from studies among the basal lineages of life to relationships among closely related species and populations. The reasons for the systematic versatility of rDNA include the numerous rates of evolution among different regions of rDNA (both among and within genes), the presence of many copies of most rDNA sequences per genome, and the pattern of concerted evolution that occurs among repeated copies. These features facilitate the analysis of rDNA by direct RNA sequencing, DNA sequencing (either by cloning or amplification), and restriction enzyme methodologies. Constraints imposed by secondary structure of rRNA and concerted evolution need to be considered in phylogenetic analyses, but these constraints do not appear to impede seriously the usefulness of rDNA. An analysis of aligned sequences of the four nuclear and two mitochondrial rRNA genes identified regions of these genes that are likely to be useful to address phylogenetic problems over a wide range of levels of divergence. In general, the small subunit nuclear sequences appear to be best for elucidating Precambrian divergences, the large subunit nuclear sequences for Paleozoic and Mesozoic divergences, and the organellar sequences of both subunits for Cenozoic divergences. Primer sequences were designed for use in amplifying the entire nuclear rDNA array in 15 sections by use of the polymerase chain reaction; these "universal" primers complement previously described primers for the mitochondrial rRNA genes. Pairs of primers can be selected in conjunction with the analysis of divergence of the rRNA genes to address systematic problems throughout the hierarchy of life. 相似文献
20.
Recently, de-Camino-Beck and Lewis (Bull Math Biol 69:1341–1354, 2007) have presented a method that under certain restricted conditions allows computing the basic reproduction ratio $R_0$ in a simple manner from life cycle graphs, without, however, giving an explicit indication of these conditions. In this paper, we give various sets of sufficient and generically necessary conditions. To this end, we develop a fully algebraic counterpart of their graph-reduction method which we actually found more useful in concrete applications. Both methods, if they work, give a simple algebraic formula that can be interpreted as the sum of contributions of all fertility loops. This formula can be used in e.g. pest control and conservation biology, where it can complement sensitivity and elasticity analyses. The simplest of the necessary and sufficient conditions is that, for irreducible projection matrices, all paths from birth to reproduction have to pass through a common state. This state may be visible in the state representation for the chosen sampling time, but the passing may also occur in between sampling times, like a seed stage in the case of sampling just before flowering. Note that there may be more than one birth state, like when plants in their first year can already have different sizes at the sampling time. Also the common state may occur only later in life. However, in all cases $R_0$ allows a simple interpretation as the expected number of new individuals that in the next generation enter the common state deriving from a single individual in this state. We end with pointing to some alternative algebraically simple quantities with properties similar to those of $R_{0}$ that may sometimes be used to good effect in cases where no simple formula for $R_{0}$ exists. 相似文献