首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The necessary and sufficient conditions for the existence of linear invariants under semigroups of probability transition matrices are derived. It is found that a biologically meaningful nucleotide substitution model has linear invariants if and only if it is a submodel of one of the three most general models, which include the so-called balanced and unbalanced transversion models. Each of these three general models is a nucleotide substitution model with six parameters.  相似文献   

2.
It is known that if all the Markov transition matrices that govern the substitution of one nucleotide for another satisfy six linear constraints, then equations can be derived that permit one to infer evolutionary trees from nucleic acid sequences by the method of linear invariants. These sufficient conditions are also necessary. Any relaxation of them results in the loss of all linear invariants. Necessary conditions for any given set of linear invariants can be derived by examining conditions a matrix must satisfy to map a certain set of matrices into itself. To the extent that necessary conditions are incorrect, a method is not reliable. In a world where different parts of molecules evolve at different rates, the two-parameter model of Kimura may not be empirically distinguishable from the more general one treated here.  相似文献   

3.
Statistical models of evolution are algebraic varieties in the space of joint probability distributions on the leaf colorations of a phylogenetic tree. The phylogenetic invariants of a model are the polynomials which vanish on the variety. Several widely used models for biological sequences have transition matrices that can be diagonalized by means of the Fourier transform of an abelian group. Their phylogenetic invariants form a toric ideal in the Fourier coordinates. We determine generators and Gr?bner bases for these toric ideals. For the Jukes-Cantor and Kimura models on a binary tree, our Gr?bner bases consist of certain explicitly constructed polynomials of degree at most four.  相似文献   

4.
Statistical models of evolution are algebraic varieties in the space of joint probability distributions on the leaf colorations of a phylogenetic tree. The phylogenetic invariants of a model are the polynomials which vanish on the variety. Several widely used models for biological sequences have transition matrices that can be diagonalized by means of the Fourier transform of an Abelian group. Their phylogenetic invariants form a toric ideal in the Fourier coordinates. We determine generators and Gr?bner bases for these toric ideals. For the Jukes-Cantor and Kimura models on a binary tree, our Gr?bner bases consist of certain explicitly constructed polynomials of degree at most four.  相似文献   

5.
6.
7.
Counting phylogenetic invariants in some simple cases.   总被引:1,自引:0,他引:1  
An informal degrees of freedom argument is used to count the number of phylogenetic invariants in cases where we have three or four species and can assume a Jukes-Cantor model of base substitution with or without a molecular clock. A number of simple cases are treated and in each the number of invariants can be found. Two new classes of invariants are found: non-phylogenetic cubic invariants testing independence of evolutionary events in different lineages, and linear phylogenetic invariants which occur when there is a molecular clock. Most of the linear invariants found by Cavender (1989, Molec. Biol. Evol. 6, 301-316) turn out in the Jukes-Cantor case to be simple tests of symmetry of the substitution model, and not phylogenetic invariants.  相似文献   

8.
The Cavender-Felsenstein edge-length invariants for binary characters on 4-trees provide the starting point for the development of "customized" invariants for evaluating and comparing phylogenetic hypotheses. The binary character invariants may be generalized to k-valued characters without losing the quadratic nature of the invariants as functions of the theoretical frequencies f(UVXY) of observable character configurations (U at organism 1, V at 2, etc.). The key to the approach is that certain sets of these configurations constitute events which are probabilistically independent from other such sets, under the symmetric Markov change models studied. By introducing more complex sets of configurations, we find the quadratic invariants for 5-trees in the binary model and for individual edges in 6-trees or, indeed, in any size tree. The same technique allows us to formulate invariants for entire trees, but these are cubic functions for 6-trees and are higher-degree polynomials for larger trees. With k-valued characters and, especially, with large trees, the types of configuration sets (events) used in the simpler examples are too rare (i.e., their predicted frequencies are too low) to be useful, and the construction of meaningful pairs of independent events becomes an important and nontrivial task in designing invariants suited to testing specific hypotheses. In a very natural way, this approach fits in with well-known statistical methodology for contingency tables. We explore use of events such as "only transitions occur for character i (i.e., position i in a nucleic acid sequence) in subtree a" in analyzing a set of data on ribosomal RNA in the context of the controversy over the origins of archaebacteria, eubacteria, and eukaryotes.  相似文献   

9.
Tests of applicability of several substitution models for DNA sequence data   总被引:8,自引:3,他引:5  
Using linear invariants for various models of nucleotide substitution, we developed test statistics for examining the applicability of a specific model to a given dataset in phylogenetic inference. The models examined are those developed by Jukes and Cantor (1969), Kimura (1980), Tajima and Nei (1984), Hasegawa et al. (1985), Tamura (1992), Tamura and Nei (1993), and a new model called the eight-parameter model. The first six models are special cases of the last model. The test statistics developed are independent of evolutionary time and phylogeny, although the variances of the statistics contain phylogenetic information. Therefore, these statistics can be used before a phylogenetic tree is estimated. Our objective is to find the simplest model that is applicable to a given dataset, keeping in mind that a simple model usually gives an estimate of evolutionary distance (number of nucleotide substitutions per site) with a smaller variance than a complicated model when the simple model is correct. We have also developed a statistical test of the homogeneity of nucleotide frequencies of a sample of several sequences that takes into account possible phylogenetic correlations. This test is used to examine the stationarity in time of the base frequencies in the sample. For Hasegawa et al.'s and the eight-parameter models, analytical formulas for estimating evolutionary distances are presented. Application of the above tests to several sets of real data has shown that the assumption of stationarity of base composition is usually acceptable when the sequences studied are closely related but otherwise it is rejected. Similarly, the simple models of nucleotide substitution are almost always rejected when actual genes are distantly related and/or the total number of nucleotides examined is large.   相似文献   

10.
The geometric mean frequency of any EEG wave divides its frequency band B into sub-bands b and s. For the wave beta the values s, b and B are the subsequent elements of the geometric progression with denominator equals to the invariant of the gold section. A hypothesis was proposed that all the EEG waves were described by the system SG of the recurrence equations. This system was derived by the generalization of the Fibonacci generating function. Theoretical invariants Ilambda of the system and experimental ratios b/s were found to coincide with the quadratic mean error equals to 1%. The system SG predicts the existence of the EEG waves, rho and sigma (55-118, 118-225 cycles per sec.), which have not yet been discovered experimentally.  相似文献   

11.
We explore model-based techniques of phylogenetic tree inference exercising Markov invariants. Markov invariants are group invariant polynomials and are distinct from what is known in the literature as phylogenetic invariants, although we establish a commonality in some special cases. We show that the simplest Markov invariant forms the foundation of the Log–Det distance measure. We take as our primary tool group representation theory, and show that it provides a general framework for analyzing Markov processes on trees. From this algebraic perspective, the inherent symmetries of these processes become apparent, and focusing on plethysms, we are able to define Markov invariants and give existence proofs. We give an explicit technique for constructing the invariants, valid for any number of character states and taxa. For phylogenetic trees with three and four leaves, we demonstrate that the corresponding Markov invariants can be fruitfully exploited in applied phylogenetic studies.  相似文献   

12.
The hepatitis B virus (HBV) has a circular DNA genome of about 3,200 base pairs. Economical use of the genome with overlapping reading frames may have led to severe constraints on nucleotide substitutions along the genome and to highly variable rates of substitution among nucleotide sites. Nucleotide sequences from 13 complete HBV genomes were compared to examine such variability of substitution rates among sites and to examine the phylogenetic relationships among the HBV variants. The maximum likelihood method was employed to fit models of DNA sequence evolution that can account for the complexity of the pattern of nucleotide substitution. Comparison of the models suggests that the rates of substitution are different in different genes and codon positions; for example, the third codon position changes at a rate over ten times higher than the second position. Furthermore, substantial variation of substitution rates was detected even after the effects of genes and codon positions were corrected; that is, rates are different at different sites of the same gene or at the same codon position. Such rates after the correction were also found to be positively correlated at adjacent sites, which indicated the existence of conserved and variable domains in the proteins encoded by the viral genome. A multiparameter model validates the earlier finding that the variation in nucleotide conservation is not random around the HBV genome. The test for the existence of a molecular clock suggests that substitution rates are more or less constant among lineages. The phylogenetic relationships among the viral variants were examined. Although the data do not seem to contain sufficient information to resolve the details of the phylogeny, it appears quite certain that the serotypes of the viral variants do not reflect their genetic relatedness. Correspondence to: Z. Yang  相似文献   

13.
Estimating the pattern of nucleotide substitution   总被引:43,自引:0,他引:43  
Knowledge of the pattern of nucleotide substitution is important both to our understanding of molecular sequence evolution and to reliable estimation of phylogenetic relationships. The method of parsimony analysis, which has been used to estimate substitution patterns in real sequences, has serious drawbacks and leads to results difficult to interpret. In this paper a model-based maximum likelihood approach is proposed for estimating substitution patterns in real sequences. Nucleotide substitution is assumed to follow a homogeneous Markov process, and the general reversible process model (REV) and the unrestricted model without the reversibility assumption are used. These models are also applied to examine the adequacy of the model of Hasegawa et al. (J. Mol. Evol. 1985;22:160–174) (HKY85). Two data sets are analyzed. For the -globin pseudogenes of six primate species, the REV model fits the data much better than HKY85, while, for a segment of mtDNA sequences from nine primates, REV cannot provide a significantly better fit than HKY85 when rate variation over sites is taken into account in the models. It is concluded that the use of the REV model in phylogenetic analysis can be recommended, especially for large data sets or for sequences with extreme substitution patterns, while HKY85 may be expected to provide a good approximation. The use of the unrestricted model does not appear to be worthwhile.  相似文献   

14.
The method of evolutionary parsimony--or operator invariants--is a technique of nucleic acid sequence analysis related to parsimony analysis and explicitly designed for determining evolutionary relationships among four distantly related taxa. The method is independent of substitution rates because it is derived from consideration of the group properties of substitution operators rather than from an analysis of the probabilities of substitution in branches of a tree. In both parsimony and evolutionary parsimony, three patterns of nucleotide substitution are associated one-to-one with the three topologically linked trees for four taxa. In evolutionary parsimony, the three quantities are operator invariants. These invariants are the remnants of substitutions that have occurred in the interior branch of the tree and are analogous to the substitutions assigned to the central branch by parsimony. The two invariants associated with the incorrect trees must equal zero (statistically), whereas only the correct tree can have a nonzero invariant. The chi 2-test is used to ascertain the nonzero invariant and the statistically favored tree. Examples, obtained using data calculated with evolutionary rates and branchings designed to camouflage the true tree, show that the method accurately predicts the tree, even when substitution rates differ greatly in neighboring peripheral branches (conditions under which parsimony will consistently fail). As the number of substitutions in peripheral branches becomes fewer, the parsimony and the evolutionary-parsimony solutions converge. The method is robust and easy to use.   相似文献   

15.
16.
We stimulate the evolution of model protein sequences subject to mutations. A mutation is considered neutral if it conserves (1) the structure of the ground state, (2) its thermodynamic stability and (3) its kinetic accessibility. All other mutations are considered lethal and are rejected. We adopt a lattice model, amenable to a reliable solution of the protein folding problem. We prove the existence of extended neutral networks in sequence space-sequences can evolve until their similarity with the starting point is almost the same as for random sequences. Furthermore, we find that the rate of neutral mutations has a broad distribution in sequence space. Due to this fact, the substitution process is overdispersed (the ratio between variance and mean is larger than 1). This result is in contrast with the simplest model of neutral evolution, which assumes a Poisson process for substitutions, and in qualitative agreement with the biological data.  相似文献   

17.
A phylogenetic invariant for a model of biological sequence evolution along a phylogenetic tree is a polynomial that vanishes on the expected frequencies of base patterns at the terminal taxa. While the use of these invariants for phylogenetic inference has long been of interest, explicitly constructing such invariants has been problematic.We construct invariants for the general Markov model of kappa-base sequence evolution on an n-taxon tree, for any kappa and n. The method depends primarily on the observation that certain matrices defined in terms of expected pattern frequencies must commute, and yields many invariants of degree kappa+1, regardless of the value of n. We define strong and parameter-strong sets of invariants, and prove several theorems indicating that the set of invariants produced here has these properties on certain sets of possible pattern frequencies. Thus our invariants may be sufficient for phylogenetic applications.  相似文献   

18.
For a model of molecular evolution to be useful for phylogenetic inference, the topology of evolutionary trees must be identifiable. That is, from a joint distribution the model predicts, it must be possible to recover the tree parameter. We establish tree identifiability for a number of phylogenetic models, including a covarion model and a variety of mixture models with a limited number of classes. The proof is based on the introduction of a more general model, allowing more states at internal nodes of the tree than at leaves, and the study of the algebraic variety formed by the joint distributions to which it gives rise. Tree identifiability is first established for this general model through the use of certain phylogenetic invariants.  相似文献   

19.
We study to what degree patterns of amino acid substitution vary between genes using two models of protein-coding gene evolution. The first divides the amino acids into groups, with one substitution rate for pairs of residues in the same group and a second for those in differing groups. Unlike previous applications of this model, the groups themselves are estimated from data by simulated annealing. The second model makes substitution rates a function of the physical and chemical similarity between two residues. Because we model the evolution of coding DNA sequences as opposed to protein sequences, artifacts arising from the differing numbers of nucleotide substitutions required to bring about various amino acid substitutions are avoided. Using 10 alignments of related sequences (five of orthologous genes and five gene families), we do find differences in substitution patterns. We also find that, although patterns of amino acid substitution vary temporally within the history of a gene, variation is not greater in paralogous than in orthologous genes. Improved understanding of such gene-specific variation in substitution patterns may have implications for applications such as sequence alignment and phylogenetic inference.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号