首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper describes two types of problems related to tree shapes, as well as algorithms that can be used to solve these problems. The first problem is that of comparing the similarity of the unlabelled shapes instead of merely their degree of balance, in a manner analogous to that routinely used to compare topologies for labelled trees. There are possible practical applications for this comparison, such as determining, based on tree shape similarity alone, whether the taxa in two phylogenies are likely to have a correspondence (e.g. hosts and parasites with high specificity). It is shown that tree balance is insufficient for this task and that standard measures of topological difference (Robinson–Foulds distances, SPR distances or retention indices of the matrices representing the trees, MRPs) can be easily adapted to the problem. The second type of problem is to determine whether taxa of uncertain matching unique to two different phylogenies could correspond to each other (e.g. the same species in larvae and adults of metamorphic animals, fossils known from different body parts). This second problem can be solved by either relabelling taxa in such a way that the number of consensus nodes is maximized, or relabelling taxa in such a way that the sum of the number of steps in the MRP of each tree mapped onto the other is minimum.  相似文献   

2.
Popular methods for exploring the space of rooted phylogenetic trees use rearrangement moves such as rooted Nearest Neighbour Interchange (rNNI) and rooted Subtree Prune and Regraft (rSPR). Recently, these moves were generalized to rooted phylogenetic networks, which are a more suitable representation of reticulate evolutionary histories, and it was shown that any two rooted phylogenetic networks of the same complexity are connected by a sequence of either rSPR or rNNI moves. Here, we show that this is possible using only tail moves, which are a restricted version of rSPR moves on networks that are more closely related to rSPR moves on trees. The connectedness still holds even when we restrict to distance-1 tail moves (a localized version of tail moves). Moreover, we give bounds on the number of (distance-1) tail moves necessary to turn one network into another, which in turn yield new bounds for rSPR, rNNI and SPR (i.e. the equivalent of rSPR on unrooted networks). The upper bounds are constructive, meaning that we can actually find a sequence with at most this length for any pair of networks. Finally, we show that finding a shortest sequence of tail or rSPR moves is NP-hard.  相似文献   

3.
MOTIVATION: Maximum likelihood (ML) methods have become very popular for constructing phylogenetic trees from sequence data. However, despite noticeable recent progress, with large and difficult datasets (e.g. multiple genes with conflicting signals) current ML programs still require huge computing time and can become trapped in bad local optima of the likelihood function. When this occurs, the resulting trees may still show some of the defects (e.g. long branch attraction) of starting trees obtained using fast distance or parsimony programs. METHODS: Subtree pruning and regrafting (SPR) topological rearrangements are usually sufficient to intensively search the tree space. Here, we propose two new methods to make SPR moves more efficient. The first method uses a fast distance-based approach to detect the least promising candidate SPR moves, which are then simply discarded. The second method locally estimates the change in likelihood for any remaining potential SPRs, as opposed to globally evaluating the entire tree for each possible move. These two methods are implemented in a new algorithm with a sophisticated filtering strategy, which efficiently selects potential SPRs and concentrates most of the likelihood computation on the promising moves. RESULTS: Experiments with real datasets comprising 35-250 taxa show that, while indeed greatly reducing the amount of computation, our approach provides likelihood values at least as good as those of the best-known ML methods so far and is very robust to poor starting trees. Furthermore, combining our new SPR algorithm with local moves such as PHYML's nearest neighbor interchanges, the time needed to find good solutions can sometimes be reduced even more.  相似文献   

4.
New examples are presented, showing that supertree methods such as matrix representation with parsimony, minimum flip trees, and compatibility analysis of the matrix representing the input trees, produce supertrees that cannot be interpreted as displaying the groups present in the majority of the input trees. These methods may produce a supertree displaying some groups present in the minority of the trees, and contradicted by the majority. Of the three methods, compatibility analysis is the least used, but it seems to be the one that differs the least from majority rule consensus. The three methods are similar in that they choose the supertree(s) that best fit the set of input trees (quantified as some measure of the fit to the matrix representation of the input trees); in the case of complete trees, it is argued that, for a supertree method to be equivalent to majority rule or frequency difference consensus, two necessary (but not sufficient) conditions must be met. First, the measure of fit between a supertree and an input tree must be symmetrical. Second, the fit for a character representing a group must be measured as absolute: either it fits or it does not fit. In the restricted case of complete and equally resolved input trees, compatibility analysis (unlike MRP and minimum flipping) fulfils these two conditions: it is symmetrical (i.e., as long as the trees have the same taxon sets and are equally resolved, the number of characters in the matrix representation of tree A that require homoplasy in tree B is always the same as the number of characters in the matrix representation of tree B that require homoplasy in tree A) and it measures fit as all‐or‐none. In the case of just two complete and equally resolved input trees, the two conditions (symmetry and absolute fit) are necessary and sufficient, which explains why the compatibility analysis of such trees behaves as majority consensus. With more than two such trees, these conditions are still necessary but no longer sufficient for the equivalence; in such cases, the compatibility supertree may differ significantly from the majority rule consensus, even when these conditions apply (as shown by example). MRP and minimum flipping are asymmetric and measure various degrees of fit for each character, which explains why they often behave very differently from majority rule procedures, and why they are very likely to have groups contradicted by each of the input trees, or groups supported by a minority of the input trees. © The Willi Hennig Society 2005.  相似文献   

5.
Resolution of the total evidence (i.e., character congruence) versus consensus (i.e., taxonomic congruence) debate has been impeded by (1) a failure to employ validation methods consistently across both tree-building and consensus analyses, (2) the incomparability of methods for constructing as opposed to those for combining trees, and (3) indifference to aspects of trees other than their topologies. We demonstrate a uniform, distance-based approach which allows for comparability among the results of character- and taxonomic-congruence studies, whether or not an identical suite of taxa has been included in all contributing data sets. Our results indicate that total-evidence and consensus trees differ little in topology if branch lengths are taken into account when combining two or more trees. In addition, when character-state data are converted to distances, our method permits their combination with information produced by techniques which generate distances directly. Moreover, treating all data sets or trees as distance matrices avoids the problem that different numbers of characters in contributing studies may confound the conclusions of a total-evidence or consensus analysis. Our protocol is illustrated with an example involving bats, in which the three component studies based on serology, DNA hybridization, and anatomy imply distinct phylogenies. However, the total-evidence and consensus trees support a fourth, somewhat different, topology resolved at all but one node and which conforms closely to the currently accepted higher category classification of Chiroptera.  相似文献   

6.

Background  

Phylogenies, i.e., the evolutionary histories of groups of taxa, play a major role in representing the interrelationships among biological entities. Many software tools for reconstructing and evaluating such phylogenies have been proposed, almost all of which assume the underlying evolutionary history to be a tree. While trees give a satisfactory first-order approximation for many families of organisms, other families exhibit evolutionary mechanisms that cannot be represented by trees. Processes such as horizontal gene transfer (HGT), hybrid speciation, and interspecific recombination, collectively referred to as reticulate evolutionary events, result in networks, rather than trees, of relationships. Various software tools have been recently developed to analyze reticulate evolutionary relationships, which include SplitsTree4, LatTrans, EEEP, HorizStory, and T-REX.  相似文献   

7.

Background  

Overlapping genes (OGs) are defined as adjacent genes whose coding sequences overlap partially or entirely. In fact, they are ubiquitous in microbial genomes and more conserved between species than non-overlapping genes. Based on this property, we have previously implemented a web server, named OGtree, that allows the user to reconstruct genome trees of some prokaryotes according to their pairwise OG distances. By analogy to the analyses of gene content and gene order, the OG distance between two genomes we defined was based on a measure of combining OG content (i.e., the normalized number of shared orthologous OG pairs) and OG order (i.e., the normalized OG breakpoint distance) in their whole genomes. A shortcoming of using the concept of breakpoints to define the OG distance is its inability to analyze the OG distance of multi-chromosomal genomes. In addition, the amount of overlapping coding sequences between some distantly related prokaryotic genomes may be limited so that it is hard to find enough OGs to properly evaluate their pairwise OG distances.  相似文献   

8.

Background  

Phylogenetic trees based on sequences from a set of taxa can be incongruent due to horizontal gene transfer (HGT). By identifying the HGT events, we can reconcile the gene trees and derive a taxon tree that adequately represents the species' evolutionary history. One HGT can be represented by a rooted Subtree Prune and Regraft (RSPR) operation and the number of RSPRs separating two trees corresponds to the minimum number of HGT events. Identifying the minimum number of RSPRs separating two trees is NP-hard, but the problem can be reduced to fixed parameter tractable. A number of heuristic and two exact approaches to identifying the minimum number of RSPRs have been proposed. This is the first implementation delivering an exact solution as well as the intermediate trees connecting the input trees.  相似文献   

9.
Comparing and computing distances between phylogenetic trees are important biological problems, especially for models where edge lengths play an important role. The geodesic distance measure between two phylogenetic trees with edge lengths is the length of the shortest path between them in the continuous tree space introduced by Billera, Holmes, and Vogtmann. This tree space provides a powerful tool for studying and comparing phylogenetic trees, both in exhibiting a natural distance measure and in providing a euclidean-like structure for solving optimization problems on trees. An important open problem is to find a polynomial time algorithm for finding geodesics in tree space. This paper gives such an algorithm, which starts with a simple initial path and moves through a series of successively shorter paths until the geodesic is attained.  相似文献   

10.
Relative-rate tests have previously been developed to compare the substitution rates of two sequences or two groups of sequences. These tests usually assume that the process of nucleotide substitution is stationary and the same for all lineages, i.e., uniform. In this study, we conducted simulations to assess the performance of the relative-rate tests when the molecular-clock (MC) hypothesis is true (i.e., there is no rate difference between lineages), but the stationarity and uniformity assumptions are violated. Kimura's and bias-corrected LogDet distances were used. We found that the computation of the variances and covariances of LogDet distances had to be modified, because the constraint that the sum of the frequencies of the 16 nucleotide pair types is equal to 1 must be imposed. Comparison of the rates of two single sequences (Wu and Li's test) or two groups of sequences (Li and Bousquet's test) gave similar results. When the sequences are long (> or = 500 nt), the test based on LogDet distances and their appropriate variances and covariances is appropriate even when the substitution process is not stationary and/or not uniform. That is, at the 5% significance level, the test rejects the MC hypothesis in about 5% of the simulation replicates. In contrast, if the sequences are short (< or = 200 bases) and highly divergent, the LogDet test is very conservative due to overestimation of the variances of the distances. When the uniformity assumption is violated, the relative-rate test based on Kimura's distances can be severely misleading because of differences in base composition between sequences. However, if the uniformity assumption held and so the base frequencies remained similar among sequences, the rate of rejection turned out to be close to 5%, especially with short sequences. Under such conditions, the test using Kimura's distances performs better than the LogDet test. The reason seems to be that these distances are less affected by a reduction in the number of sites than the LogDet distances because they depend on only two parameters.  相似文献   

11.
Given a collection of discrete characters (e.g., aligned DNA sites, gene adjacencies), a common measure of distance between taxa is the proportion of characters for which taxa have different character states. Tree reconstruction based on these (uncorrected) distances can be statistically inconsistent and can lead to trees different from those obtained using character-based methods such as maximum likelihood or maximum parsimony. However, in these cases the distance data often reveal their unreliability by some deviation from additivity, as indicated by conflicting support for more than one tree. We describe two results that show how uncorrected (and miscorrected) distance data can be simultaneously perfectly additive and misleading. First, multistate character data can be perfectly compatible and define one tree, and yet the uncorrected distances derived from these characters are perfectly treelike (and obey a molecular clock), only for a completely different tree. Second, under a Markov model of character evolution a similar phenomenon can occur; not only is there statistical inconsistency using uncorrected distances, but there is no evidence of this inconsistency because the distances look perfectly treelike (this does not occur in the classic two-parameter Felsenstein zone). We characterize precisely when uncorrected distances are additive on the true (and on a false) tree for four taxa. We also extend this result to a more general setting that applies to distances corrected according to an incorrect model.  相似文献   

12.
Reaction time (RT) and the number of correct estimations of time microintervals (10 and 180 ms) between two visual stimuli were recorded in healthy subjects. It has been shown that 10 ms interval is better estimated when the stimuli are presented in the right visual field, i.e. when they are addressed directly to the left hemisphere. At the same time the number of correct estimations of 180 ms interval is greater and their RT is less when the stimuli are addressed directly to the right hemisphere. This points to different hemispheric mechanisms of time microintervals estimation. Study of the influence of different forms of verbal reinforcement on this learning has shown that after positive reinforcement (the word "good") the number of correct estimations is on average by 10% greater than after negative reinforcement (the word "error"). This may be connected with such processes as isolation and identification of erroneous reaction.  相似文献   

13.
梨枣在果实生长期对土壤水势的响应   总被引:1,自引:0,他引:1  
韩立新  汪有科  张琳琳 《生态学报》2012,32(7):2004-2011
以4年生梨枣为试验材料,在果实生长期设置了4个土壤水势水平,研究不同处理梨枣茎秆直径生长、光合速率、蒸腾速率、叶片相对含水量以及果实数量对土壤水势的响应,探讨了梨枣果实生长期适宜的土壤水势范围。结果表明:1)在果实缓慢生长期,茎秆直径生长缓慢;土壤水势高于-84 kPa时能显著地降低落果率。2)果实快速生长期,茎秆直径日最大值和叶片相对含水量能反映梨枣的水分状况;适当的控制土壤水势能显著的提高叶片的水分利用效率;土壤水势高于-84 kPa时果实快速生长期出现坐果现象。3)果实生长期前期的土壤水势低至-461 kPa会影响果实生长期叶片的功能和后期的坐果。因此,梨枣果实生长期的适宜的土壤水势范围为-41—-84 kPa,提高了叶片水分利用效率,提高了单果重,不影响产量。  相似文献   

14.
Rooted phylogenetic networks are used to model non-treelike evolutionary histories. Such networks are often constructed by combining trees, clusters, triplets or characters into a single network that in some well-defined sense simultaneously represents them all. We review these four models and investigate how they are related. Motivated by the parsimony principle, one often aims to construct a network that contains as few reticulations (non-treelike evolutionary events) as possible. In general, the model chosen influences the minimum number of reticulation events required. However, when one obtains the input data from two binary (i.e. fully resolved) trees, we show that the minimum number of reticulations is independent of the model. The number of reticulations necessary to represent the trees, triplets, clusters (in the softwired sense) and characters (with unrestricted multiple crossover recombination) are all equal. Furthermore, we show that these results also hold when not the number of reticulations but the level of the constructed network is minimised. We use these unification results to settle several computational complexity questions that have been open in the field for some time. We also give explicit examples to show that already for data obtained from three binary trees the models begin to diverge.  相似文献   

15.
Mitochondrial cytochrome b sequence data from 15 species of herons (Aves: Ardeidae), representing 13 genera, were compared with DNA hybridization data of single-copy nuclear DNA (scnDNA) from the same species in a taxonomic congruence assessment of heron phylogeny. The two data sets produced a partially resolved, completely congruent estimate of phylogeny with the following basic structure: (Tigrisoma, Cochlearius, (((Zebrilus, (Ixobrychus, Botaurus)), (((Ardea, Casmerodius), Bubulcus), ((Egretta thula, Egretta caerulea, Egretta tricolor), Syrigma), Butorides, Nycticorax, Nyctanassa)))). Because congruence indicated similar phylogenetic information in the two data sets, we used the relatively unsaturated DNA hybridization distances as surrogates of time to examine graphically the patterns and rates of change in cytochrome b distances. Cytochrome b distances were computed either from whole sequences or from partitioned sequences consisting of transitions, transversions, specific codon site positions, or specific protein-coding regions. These graphical comparisons indicated that unpartitioned cytochrome b has evolved at 5-10 times the rate of scnDNA. Third-position transversions appeared to offer the most useful sequence partition for phylogenetic analysis because of their relatively fast rate of substitution (two times that of scnDNA) and negligible saturation. We also examined lineage-based rates of evolution by comparing branch length patterns between the nuclear and cytochrome b trees. The degree of correlation in corresponding branch lengths between cytochrome b and DNA hybridization trees depended on DNA sequence partitioning. When cytochrome b sequences were not partitioned, branch lengths in the cytochrome b and DNA hybridization trees were not correlated. However, when cytochrome b sequences were reduced to third-position transversions (i.e., unsaturated, relatively fast changing data), branch lengths were correlated. This finding suggests that lineage-based rates of DNA evolution in nuclear and mitochondrial genomes are influenced by common causes.  相似文献   

16.
Numerous simulation studies have investigated the accuracy of phylogenetic inference of gene trees under maximum parsimony, maximum likelihood, and Bayesian techniques. The relative accuracy of species tree inference methods under simulation has received less study. The number of analytical techniques available for inferring species trees is increasing rapidly, and in this paper, we compare the performance of several species tree inference techniques at estimating recent species divergences using computer simulation. Simulating gene trees within species trees of different shapes and with varying tree lengths (T) and population sizes (), and evolving sequences on those gene trees, allows us to determine how phylogenetic accuracy changes in relation to different levels of deep coalescence and phylogenetic signal. When the probability of discordance between the gene trees and the species tree is high (i.e., T is small and/or is large), Bayesian species tree inference using the multispecies coalescent (BEST) outperforms other methods. The performance of all methods improves as the total length of the species tree is increased, which reflects the combined benefits of decreasing the probability of discordance between species trees and gene trees and gaining more accurate estimates for gene trees. Decreasing the probability of deep coalescences by reducing also leads to accuracy gains for most methods. Increasing the number of loci from 10 to 100 improves accuracy under difficult demographic scenarios (i.e., coalescent units ≤ 4N(e)), but 10 loci are adequate for estimating the correct species tree in cases where deep coalescence is limited or absent. In general, the correlation between the phylogenetic accuracy and the posterior probability values obtained from BEST is high, although posterior probabilities are overestimated when the prior distribution for is misspecified.  相似文献   

17.
Takezaki N  Nei M 《Genetics》2008,178(1):385-392
Microsatellite DNA loci or short tandem repeats (STRs) are abundant in eukaryotic genomes and are often used for constructing phylogenetic trees of closely related populations or species. These phylogenetic trees are usually constructed by using some genetic distance measure based on allele frequency data, and there are many distance measures that have been proposed for this purpose. In the past the efficiencies of these distance measures in constructing phylogenetic trees have been studied mathematically or by computer simulations. Recently, however, allele frequencies of 783 STR loci have been compiled from various human populations. We have therefore used these empirical data to investigate the relative efficiencies of different distance measures in constructing phylogenetic trees. The results showed that (1) the probability of obtaining the correct branching pattern of a tree (PC) is generally highest for DA distance; (2) FST*, standard genetic distance (DS), and FST/(1-FST) give similar PC-values, FST* being slightly better than the other two; and (3) (deltamu)2 shows PC-values much lower than the other distance measures. To have reasonably high PC-values for trees similar to ours, at least 30 loci with a minimum of 15 individuals are required when DA distance is used.  相似文献   

18.
A new approach for analyzing lipid-lipid transfer protein interactions is described. The transfer protein is genetically engineered for expression with a C-terminal biotinylated peptide extension (AviTag). This allows protein anchoring to a streptavidin-coated chip for surface plasmon resonance (SPR)-based assessment of lipid binding. Sterol carrier protein-2 (SCP-2), involved in the intracellular trafficking of cholesterol, fatty acids, and other lipids, was selected as the prototype. Biotinylated SCP-2 (bSCP-2) was expressed in Escherichia coli, purified to homogeneity by mutated streptavidin (SoftLink) affinity chromatography, and confirmed by mass spectrometry to contain one biotin group at the expected position. Intermembrane [(14)C]cholesterol transfer was strongly enhanced by bSCP-2, demonstrating that it was functional. Using bSCP-2 immobilized on a Biacore streptavidin chip, we determined on- and off-rate constants along with equilibrium dissociation constants for the following analytes: oleic acid, linoleic acid, cholesterol, and fluorophore (NBD)-derivatized cholesterol. The dissociation constant for NBD-cholesterol was similar to that determined by fluorescence titration for SCP-2 in solution, thereby validating the SPR approach. This method can be readily adapted to other transfer proteins and has several advantages over existing techniques for measuring lipid binding, including (i) the ability to study lipids in their natural states (i.e., without relatively large reporter groups) and (ii) the ability to measure on- and off- rate constants as well as equilibrium constants.  相似文献   

19.
Tertiary contact distance information of varying resolution for large biological molecules abounds in the literature. The results provided herein develop a framework by which information of this type can be used to reduce the allowable configuration space of a macromolecule. The approach combines graph theory and distance geometry. Large molecules are represented as simple, undirected graphs, with atoms, or groups, as vertices, and distances between them as edges. It is shown that determination of the exact structure of a molecule in three dimensions only requires the specification of all the distances in a single tetrahedron, and four distances to every other atom. This is 4N-10 distances which is a subset of the total N(N-1)/2 unique distances in a molecule consisting of N atoms. This requirement for only 4N-10 distances has serious implications for distance geometry implementations in which all N(N-1)/2 distances are specified by bounded random numbers. Such distance matrices represent overspecified systems which when solved lead to non-obvious distribution of any error caused by inherent contradictions in the input data. It is also shown that numerous valid subsets of 4N-10 distances can be constructed. It is thus possible to tailor a subset of distances using all known distances as degrees of freedom, and thereby reduce the configuration space of the molecule. Simple algebraic relationships are derived that relate sets of distances, and complicated rotations are avoided. These relationships are used to construct minimum, complete sets of distances necessary to specify the exact structure of the entire molecule in three dimensions from incomplete distance information, and to identify sets of inconsistent distances. The method is illustrated for the flexible structural types present in large ribosomal RNAs: 1.) A five-membered ring; 2.) a chemically bonded chain with its ends in contact (i.e., a hairpin loop); 3.) the spatial orientation of two separate molecules, and; 4.) an RNA helix that can have variation in individual base pairs, giving rise to global deviation from standardized helical forms.  相似文献   

20.
Abstract

Tertiary contact distance information of varying resolution for large biological molecules abounds in the literature. The results provided herein develop a framework by which information of this type can be used to reduce the allowable configuration space of a macromolecule. The approach combines graph theory and distance geometry. Large molecules are represented as simple, undirected graphs, with atoms, or groups, as vertices, and distances between them as edges. It is shown that determination of the exact structure of a molecule in three dimensions only requires the specification of all the distances in a single tetrahedron, and four distances to every other atom. This is 4N-10 distances which is a subset of the total N(N-l)/2 unique distances in a molecule consisting of N atoms. This requirement for only 4N-10 distances has serious implications for distance geometry implementations in which all N(N-l)/2 distances are specified by bounded random numbers. Such distance matrices represent overspecified systems which when solved lead to non-obvious distribution of any error caused by inherent contradictions in the input data. It is also shown that numerous valid subsets of 4N-10 distances can be constructed. It is thus possible to tailor a subset of distances using all known distances as degrees of freedom, and thereby reduce the configuration space of the molecule. Simple algebraic relationships are derived that relate sets of distances, and complicated rotations are avoided. These relationships are used to construct minimum, complete sets of distances necessary to specify the exact structure of the entire molecule in three dimensions from incomplete distance information, and to identify sets of inconsistent distances. The method is illustrated for the flexible structural types present in large ribosomal RNAs: 1.) A five-membered ring; 2.) a chemically bonded chain with its ends in contact (i.e., a hairpin loop); 3.) the spatial orientation of two separate molecules, and; 4.) an RNA helix that can have variation in individual base pairs, giving rise to global deviation from standardized helical forms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号