首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Mechanized derivation of linear invariants   总被引:1,自引:0,他引:1  
Linear invariants, discovered by Lake, promise to provide a versatile way of inferring phylogenies on the basis of nucleic acid sequences (the method that he called "evolutionary parsimony"). A semigroup of Markov transition matrices embodies the assumptions underlying the method, and alternative semigroups exist. The set of all linear invariants may be derived from the semigroup by using an algorithm described here. Under assumptions no stronger than Lake's, there are greater than 50 independent linear invariants for each of the 15 rooted trees linking four species.  相似文献   

2.
The necessary and sufficient conditions for the existence of linear invariants under semigroups of probability transition matrices are derived. It is found that a biologically meaningful nucleotide substitution model has linear invariants if and only if it is a submodel of one of the three most general models, which include the so-called balanced and unbalanced transversion models. Each of these three general models is a nucleotide substitution model with six parameters.  相似文献   

3.
An analytical method is presented for constructing linear invariants. All linear invariants of a k-species tree can be derived from those of (k-1)-species trees using this method. The new method is simpler than that of Cavender, which relies on numerical computations. Moreover, the new method provides a convenient tool to study the relationships between linear invariants of the same tree or of different trees. All linear invariants of trees of up to five species are derived in this study. For four species, there are 16 independent linear invariants for each of the three possible unrooted trees, 14 of which are shared by two unrooted trees and 12 of these are shared by all three unrooted trees; the last types of linear invariants can be used to construct tests on the assumptions about nucleotide substitutions. The number of linear invariants for a tree is found to increase rapidly with the number of species.  相似文献   

4.
5.
Classical dimensional analysis in its original form starts by expressing the units for derived quantities, such as force, in terms of power products of basic units etc. This suggests the use of toric ideal theory from algebraic geometry. Within this the Graver basis provides a unique primitive basis in a well-defined sense, which typically has more terms than the standard Buckingham approach. Some textbook examples are revisited and the full set of primitive invariants found. First, a worked example based on convection is introduced to recall the Buckingham method, but using computer algebra to obtain an integer matrix from the initial integer matrix holding the exponents for the derived quantities. The matrix defines the dimensionless variables. But, rather than this integer linear algebra approach it is shown how, by staying with the power product representation, the full set of invariants (dimensionless groups) is obtained directly from the toric ideal defined by . One candidate for the set of invariants is a simple basis of the toric ideal. This, although larger than the rank of , is typically not unique. However, the alternative Graver basis is unique and defines a maximal set of invariants, which are primitive in a simple sense. In addition to the running example four examples are taken from: a windmill, convection, electrodynamics and the hydrogen atom. The method reveals some named invariants. A selection of computer algebra packages is used to show the considerable ease with which both a simple basis and a Graver basis can be found.  相似文献   

6.
HOMOPLASY AND THE CHOICE AMONG CLADOGRAMS   总被引:6,自引:0,他引:6  
Abstract Cladistic data are more decisive when the possible trees differ more in tree length. When all the possible dichotomous trees have the same length, no one tree is better supported than the others, and the data are completely undecisive . From a rule for recursively generating undecisive matrices for different numbers of taxa, formulas to calculate consistency, rescaled consistency and retention indices in undecisive matrices are derived. The least decisive matrices are not the matrices with the lowest possible consistency, rescaled consistency or retention indices (on the most parsimonious trees); those statistics do not directly vary with decisiveness. Decisiveness can be measured with a newly proposed statistic, DD = − S )/( − S ) (where S = length of the most parsimonious cladogram, = mean length of all the possible cladograms for the data set and M = observed variation). For any data set, can be calculated exactly with simple formulas; it depends on the types of characters present, and not on their congruence. Despite some recent assertions to the contrary, the consistency index is an appropriate measure of homoplasy (= deviation from hierarchy). The retention index seems more appropriate for comparing the fit of different trees for the same data set.  相似文献   

7.
B. Liao  T. Wang  K. Ding 《Molecular simulation》2013,39(14-15):1063-1071
In this paper, we proposed a seven-dimensional (7D) representation of ribonucleic acid (RNA) secondary structures. The use of the 7D representation is illustrated by constructing structure invariants. Comparisons with the similarity/dissimilarity results based on 7D representation for a set of RNA 3 secondary structures at the 3′-terminus of different viruses, are considered to illustrate the use of our structure invariants based on the entries in derived sequence matrices restricted to a selected width of a band along the main diagonal.  相似文献   

8.
P J Kraulis  T A Jones 《Proteins》1987,2(3):188-201
A method to build a three-dimensional protein model from nuclear magnetic resonance (NMR) data using fragments from a data base of crystallographically determined protein structures is presented. The interproton distances derived from the nuclear Overhauser effect (NOE) data are compared to the precalculated distances in the known protein structures. An efficient search algorithm is used, which arranges the distances in matrices akin to a C alpha diagonal distance plot, and compares the NOE distance matrices for short sequential zones of the protein to the data base matrices. After cluster analysis of the fragments found in this way, the structure is built by aligning fragments in overlapping zones. The sequentially long-range NOEs cannot be used in the initial fragments search but are vital to discriminate between several possible combinations of different groups of fragments. The method has been tested on one simulated NOE data set derived from a crystal structure and one experimental NMR data set. The method produces models that have good local structure, but may contain larger global errors. These models can be used as the starting point for further refinement, e.g., by restrained molecular dynamics or interactive graphics.  相似文献   

9.
On a six-dimensional representation of RNA secondary structures   总被引:2,自引:0,他引:2  
In this paper, we proposed a 6-D representation of RNA secondary structures. The use of the 6-D representation is illustrated by constructing structure invariants. Comparisons with the similarity/dissimilarity results based on 6-D representation for a set of RNA secondary structures, are considered to illustrate the use of our structure invariants based on the entries in derived sequence matrices restricted to a selected width of a band along the main diagonal.  相似文献   

10.
In this paper, we propose a nongraphical representation for protein secondary structures. By counting the frequency of occurrence of all possible four-tuples (i.e., four-letter words) of a protein secondary structure sequence, we construct a set of 3x3 matrices for the corresponding protein secondary structure sequence. Furthermore, the leading eigenvalues of these matrices are computed and considered as invariants for the protein secondary structure sequences. To illustrate the utility of our approach, we apply it to a set of real data to distinguish protein structural classes. The result indicates that it can be used to complement the classification of protein secondary structures.  相似文献   

11.
Abstract

In this paper, we proposed a 6-D representation of RNA secondary structures. The use of the 6-D representation is illustrated by constructing structure invariants. Comparisons with the similarity/dissimilarity results based on 6-D representation for a set of RNA secondary structures, are considered to illustrate the use of our structure invariants based on the entries in derived sequence matrices restricted to a selected width of a band along the main diagonal.  相似文献   

12.
The Mantel test is widely used to test the linear or monotonic independence of the elements in two distance matrices. It is one of the few appropriate tests when the hypothesis under study can only be formulated in terms of distances; this is often the case with genetic data. In particular, the Mantel test has been widely used to test for spatial relationship between genetic data and spatial layout of the sampling locations. We describe the domain of application of the Mantel test and derived forms. Formula development demonstrates that the sum-of-squares (SS) partitioned in Mantel tests and regression on distance matrices differs from the SS partitioned in linear correlation, regression and canonical analysis. Numerical simulations show that in tests of significance of the relationship between simple variables and multivariate data tables, the power of linear correlation, regression and canonical analysis is far greater than that of the Mantel test and derived forms, meaning that the former methods are much more likely than the latter to detect a relationship when one is present in the data. Examples of difference in power are given for the detection of spatial gradients. Furthermore, the Mantel test does not correctly estimate the proportion of the original data variation explained by spatial structures. The Mantel test should not be used as a general method for the investigation of linear relationships or spatial structures in univariate or multivariate data. Its use should be restricted to tests of hypotheses that can only be formulated in terms of distances.  相似文献   

13.
The method of invariants is an approach to the problem of reconstructing the phylogenetic tree of a collection of m taxa using nucleotide sequence data. Models for the respective probabilities of the 4m possible vectors of bases at a given site will have unknown parameters that describe the random mechanism by which substitution occurs along the branches of a putative phylogenetic tree. An invariant is a polynomial in these probabilities that, for a given phylogeny, is zero for all choices of the substitution mechanism parameters. If the invariant is typically non-zero for another phylogenetic tree, then estimates of the invariant can be used as evidence to support one phylogeny over another. Previous work of Evans and Speed showed that, for certain commonly used substitution models, the problem of finding a minimal generating set for the ideal of invariants can be reduced to the linear algebra problem of finding a basis for a certain lattice (that is, a free Z-module). They also conjectured that the cardinality of such a generating set can be computed using a simple "degrees of freedom" formula. We verify this conjecture. Along the way, we explain in detail how the observations of Evans and Speed lead to a simple, computationally feasible algorithm for constructing a minimal generating set.  相似文献   

14.
We review the combinatorial optimization problems in calculating edit distances between genomes and phylogenetic inference based on minimizing gene order changes. With a view to avoiding the computational cost and the "long branches attract" artifact of some tree-building methods, we explore the probabilization of genome rearrangement models prior to developing a methodology based on branch-length invariants. We characterize probabilistically the evolution of the structure of the gene adjacency set for reversals on unsigned circular genomes and, using a nontrivial recurrence relation, reversals on signed genomes. Concepts from the theory of invariants developed for the phylogenetics of homologous gene sequences can be used to derive a complete set of linear invariants for unsigned reversals, as well as for a mixed rearrangement model for signed genomes, though not for pure transposition or pure signed reversal models. The invariants are based on an extended Jukes-Cantor semigroup. We illustrate the use of these invariants to relate mitochondrial genomes from a number of invertebrate animals.  相似文献   

15.
Different genes often have different phylogenetic histories. Even within regions having the same phylogenetic history, the mutation rates often vary. We investigate the prospects of phylogenetic reconstruction when all the characters are generated from the same tree topology, but the branch lengths vary (with possibly different tree shapes). Furthering work of Kolaczkowski and Thornton (2004, Nature 431: 980-984) and Chang (1996, Math. Biosci. 134: 189-216), we show examples where maximum likelihood (under a homogeneous model) is an inconsistent estimator of the tree. We then explore the prospects of phylogenetic inference under a heterogeneous model. In some models, there are examples where phylogenetic inference under any method is impossible - despite the fact that there is a common tree topology. In particular, there are nonidentifiable mixture distributions, i.e., multiple topologies generate identical mixture distributions. We address which evolutionary models have nonidentifiable mixture distributions and prove that the following duality theorem holds for most DNA substitution models. The model has either: (i) nonidentifiability - two different tree topologies can produce identical mixture distributions, and hence distinguishing between the two topologies is impossible; or (ii) linear tests - there exist linear tests which identify the common tree topology for character data generated by a mixture distribution. The theorem holds for models whose transition matrices can be parameterized by open sets, which includes most of the popular models, such as Tamura-Nei and Kimura's 2-parameter model. The duality theorem relies on our notion of linear tests, which are related to Lake's linear invariants.  相似文献   

16.
Abstract

In this paper, we propose a nongraphical representation for protein secondary structures. By counting the frequency of occurrence of all possible four-tuples (i.e., four-letter words) of a protein secondary structure sequence, we construct a set of 3 × 3 matrices for the corresponding protein secondary structure sequence. Furthermore, the leading eigenvalues of these matrices are computed and considered as invariants for the protein secondary structure sequences. To illustrate the utility of our approach, we apply it to a set of real data to distinguish protein structural classes. The result indicates that it can be used to complement the classification of protein secondary structures.  相似文献   

17.
Counting phylogenetic invariants in some simple cases.   总被引:1,自引:0,他引:1  
An informal degrees of freedom argument is used to count the number of phylogenetic invariants in cases where we have three or four species and can assume a Jukes-Cantor model of base substitution with or without a molecular clock. A number of simple cases are treated and in each the number of invariants can be found. Two new classes of invariants are found: non-phylogenetic cubic invariants testing independence of evolutionary events in different lineages, and linear phylogenetic invariants which occur when there is a molecular clock. Most of the linear invariants found by Cavender (1989, Molec. Biol. Evol. 6, 301-316) turn out in the Jukes-Cantor case to be simple tests of symmetry of the substitution model, and not phylogenetic invariants.  相似文献   

18.
A phylogenetic invariant for a model of biological sequence evolution along a phylogenetic tree is a polynomial that vanishes on the expected frequencies of base patterns at the terminal taxa. While the use of these invariants for phylogenetic inference has long been of interest, explicitly constructing such invariants has been problematic.We construct invariants for the general Markov model of kappa-base sequence evolution on an n-taxon tree, for any kappa and n. The method depends primarily on the observation that certain matrices defined in terms of expected pattern frequencies must commute, and yields many invariants of degree kappa+1, regardless of the value of n. We define strong and parameter-strong sets of invariants, and prove several theorems indicating that the set of invariants produced here has these properties on certain sets of possible pattern frequencies. Thus our invariants may be sufficient for phylogenetic applications.  相似文献   

19.
Systems of two, three, and four linear non-homogeneous differential equations are examined with a view toward determining whether they can possibly serve as mathematical models to describe periodicities in the concentrations of substances which enhance or inhibit each other's rate of production (or dissipation). The nature of the model demands that the solutions of the differential equations be non-negative at all times, i.e., that all the steady states be positive. Conditions for periodicity and for positive steady states are derived, and it is shown that these conditions are not always compatible with each other. In particular it is shown that certain three- and four-hormone models proposed to account for the periodicities observed in the menstrual cycle cannot satisfy the above conditions for any values of the parameters and hence are inadequate.  相似文献   

20.
We address phylogenetic reconstruction when the data is generated from a mixture distribution. Such topics have gained considerable attention in the biological community with the clear evidence of heterogeneity of mutation rates. In our work we consider data coming from a mixture of trees which share a common topology, but differ in their edge weights (i.e., branch lengths). We first show the pitfalls of popular methods, including maximum likelihood and Markov chain Monte Carlo algorithms. We then determine in which evolutionary models, reconstructing the tree topology, under a mixture distribution, is (im)possible. We prove that every model whose transition matrices can be parameterized by an open set of multilinear polynomials, either has non-identifiable mixture distributions, in which case reconstruction is impossible in general, or there exist linear tests which identify the topology. This duality theorem, relies on our notion of linear tests and uses ideas from convex programming duality. Linear tests are closely related to linear invariants, which were first introduced by Lake, and are natural from an algebraic geometry perspective.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号