首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Sequence logos are frequently used to illustrate substrate preferences and specificity of proteases. Here, we employed the compiled substrates of the MEROPS database to introduce a novel metric for comparison of protease substrate preferences. The constructed similarity matrix of 62 proteases can be used to intuitively visualize similarities in protease substrate readout via principal component analysis and construction of protease specificity trees. Since our new metric is solely based on substrate data, we can engraft the protease tree including proteolytic enzymes of different evolutionary origin. Thereby, our analyses confirm pronounced overlaps in substrate recognition not only between proteases closely related on sequence basis but also between proteolytic enzymes of different evolutionary origin and catalytic type. To illustrate the applicability of our approach we analyze the distribution of targets of small molecules from the ChEMBL database in our substrate-based protease specificity trees. We observe a striking clustering of annotated targets in tree branches even though these grouped targets do not necessarily share similarity on protein sequence level. This highlights the value and applicability of knowledge acquired from peptide substrates in drug design of small molecules, e.g., for the prediction of off-target effects or drug repurposing. Consequently, our similarity metric allows to map the degradome and its associated drug target network via comparison of known substrate peptides. The substrate-driven view of protein-protein interfaces is not limited to the field of proteases but can be applied to any target class where a sufficient amount of known substrate data is available.  相似文献   

2.
Majority-rule supertrees   总被引:1,自引:0,他引:1  
Most supertree methods proposed to date are essentially ad hoc, rather than designed with particular properties in mind. Although the supertree problem remains difficult, one promising avenue is to develop from better understood consensus methods to the more general supertree setting. Here, we generalize the widely used majority-rule consensus method to the supertree setting. The majority-rule consensus tree is the strict consensus of the median trees under the symmetric-difference metric, so we can generalize the consensus method by generalizing this metric to trees with differing leaf sets. There are two different natural generalizations, based on pruning or grafting leaves to produce comparable trees, and these two generalizations produce two different, but related, majority-rule supertree methods.  相似文献   

3.

Background

This paper is devoted to distance measures for leaf-labelled trees on free leafset. A leaf-labelled tree is a data structure which is a special type of a tree where only leaves (terminal) nodes are labelled. This data structure is used in bioinformatics for modelling of evolution history of genes and species and also in linguistics for modelling of languages evolution history. Many domain specific problems occur and need to be solved with help of tree postprocessing techniques such as distance measures.

Results

Here we introduce the tree edit distance designed for leaf labelled trees on free leafset, which occurs to be a metric. It is presented together with tree edit consensus tree notion. We provide statistical evaluation of provided measure with respect to R-F, MAST and frequent subsplit based dissimilarity measures as the reference measures.

Conclusions

The tree edit distance was proven to be a metric and has the advantage of using different costs for contraction and pruning, therefore their properties can be tuned depending on the needs of the user. Two of the presented methods carry the most interesting properties. E(3,1) is very discriminative (having a wide range of values) and has a very regular distance distribution which is similar to a normal distribution in its shape and is good both for similar and non-similar trees. NFC(2,1) on the other hand is proportional or nearly proportional to the number of mutation operations used, irrespective of their type.  相似文献   

4.
It has been postulated that existing species have been linked in the past in a way that can be described using an additive tree structure. Any such tree structure reflecting species relationships is associated with a matrix of distances between the species considered which is called a distance matrix or a tree metric matrix. A circular order of elements of X corresponds to a circular (clockwise) scanning of the subset X of vertices of a tree drawn on a plane. This paper describes an optimal algorithm using circular orders to compare the topology of two trees given by their distance matrices. This algorithm allows us to compute the Robinson and Foulds topologic distance between two trees. It employs circular order tree reconstruction to compute an ordered bipartition table of the tree edges for both given distance matrices. These bipartition tables are then compared to determine the Robinson and Foulds topologic distance, known to be an important criterion of tree similarity. The described algorithm has optimal time complexity, requiring O(n(2)) time when performed on two n x n distance matrices. It can be generalized to get another optimal algorithm, which enables the strict consensus tree of k unrooted trees, given their distance matrices, to be constructed in O(kn(2)) time.  相似文献   

5.
Estimating the reliability of evolutionary trees   总被引:9,自引:1,他引:8  
Six protein sequences from the same 11 mammalian taxa were used to estimate the accuracy and reliability of phylogenetic trees using real, rather than simulated, data. A tree comparison metric was used to measure the increase in similarity of minimal trees as larger, randomly selected subsets of nucleotide positions were taken. The ratio of the observed to the expected number of incompatibilities for each nucleotide position (character) is a good predictor of the number of changes required at that position on the minimal (most-parsimonious) tree. This allows a higher weighting of nucleotide positions that have changed more slowly and should result in the minimal length tree converging to the correct tree as more sequences are obtained. An estimate was made of the smallest subset of trees that need to be considered to include the actual historical tree for a given set of data. It was concluded that it is possible to give a reasonable estimate of the reliability of the final tree, at least when several sequences are combined. With the present data, resolving the rodent- primate-lagomorph (rabbit) trichotomy is the least certain aspect of the final tree, followed then by establishing the position of dog. In our opinion, it is unreasonable to publish an evolutionary tree derived from sequence data without giving an idea of the reliability of the tree.   相似文献   

6.
Comparing two or more phylogenetic trees is a fundamental task in computational biology. The simplest outcome of such a comparison is a pairwise measure of similarity, dissimilarity, or distance. A large number of such measures have been proposed, but so far all suffer from problems varying from computational cost to lack of robustness; many can be shown to behave unexpectedly under certain plausible inputs. For instance, the widely used Robinson-Foulds distance is poorly distributed and thus affords little discrimination, while also lacking robustness in the face of very small changes--reattaching a single leaf elsewhere in a tree of any size can instantly maximize the distance. In this paper, we introduce a new pairwise distance measure, based on matching, for phylogenetic trees. We prove that our measure induces a metric on the space of trees, show how to compute it in low polynomial time, verify through statistical testing that it is robust, and finally note that it does not exhibit unexpected behavior under the same inputs that cause problems with other measures. We also illustrate its usefulness in clustering trees, demonstrating significant improvements in the quality of hierarchical clustering as compared to the same collections of trees clustered using the Robinson-Foulds distance.  相似文献   

7.
The crossover or nearest neighbor interchange metric has been proposed for use in numerical taxonomy to obtain a quantitative measure of distance between classifications that are modeled as unrooted binary trees with labeled leaves. This metric seems difficult to compute and its properties are poorly understood. A variant called the closest partition distance measure has also been proposed, but no efficient algorithm for its computation has yet appeared and its relationship to the nearest neighbor interchange metric is incompletely understood. I investigate four conjectures concerning the nearest neighbor interchange and closest partition distance measures and establish their validity for trees with as many as seven labeled vertices. For trees in this size range the two distance measures are identical. If a certain decomposition property holds for the nearest neighbor interchange metric, then the two distance measures are also identical at small distances for trees of any size.  相似文献   

8.
Co-evolution and co-adaptation in protein networks   总被引:2,自引:0,他引:2  
Juan D  Pazos F  Valencia A 《FEBS letters》2008,582(8):1225-1230
Interacting or functionally related proteins have been repeatedly shown to have similar phylogenetic trees. Two main hypotheses have been proposed to explain this fact. One involves compensatory changes between the two protein families (co-adaptation). The other states that the tree similarity may be an indirect consequence of the involvement of the two proteins in similar cellular process, which in turn would be reflected by similar evolutionary pressure on the corresponding sequences. There are published data supporting both propositions, and currently the available information is compatible with both hypotheses being true, in an scenario in which both sets of forces are shaping the tree similarity at different levels.  相似文献   

9.
10.
Background and AimsCrown shyness describes the phenomenon whereby tree crowns avoid growing into each other, producing a puzzle-like pattern of complementary tree crowns in the canopy. Previous studies found that tree slenderness plays a role in the development of crown shyness. Attempts to quantify crown shyness have largely been confined to 2-D approaches. This study aimed to expand the current set of metrics for crown shyness by quantifying the characteristic of 3-D surface complementarity between trees displaying crown shyness, using LiDAR-derived tree point clouds. Subsequently, the relationship between crown surface complementarity and slenderness of trees was assessed.MethodsFourteen trees were scanned using a laser scanning device. Individual tree points clouds were extracted semi-automatically and manually corrected where needed. A metric that quantifies the surface complementarity (Sc) of a pair of protein molecules is applied to point clouds of pairs of adjacent trees. Then 3-D tree crown surfaces were generated from point clouds by computing their α shapes.Key ResultsTree pairs that were visually determined to have overlapping crowns scored significantly lower Sc values than pairs that did not overlap (n = 14, P < 0.01). Furthermore, average slenderness of pairs of trees correlated positively with their Sc score (R2 = 0.484, P < 0.01), showing agreement with previous studies on crown shyness.ConclusionsThe characteristic of crown surface complementarity present in trees displaying crown shyness was succesfully quantified using a 3-D surface complementarity metric adopted from molecular biology. Crown surface complementarity showed a positive relationship to tree slenderness, similar to other metrics used for measuring crown shyness. The 3-D metric developed in this study revealed how trees adapt the shape of their crowns to those of adjacent trees and how this is linked to the slenderness of the trees.  相似文献   

11.
Reconstructing evolution of sequences subject to recombination using parsimony   总被引:14,自引:0,他引:14  
The parsimony principle states that a history of a set of sequences that minimizes the amount of evolution is a good approximation to the real evolutionary history of the sequences. This principle is applied to the reconstruction of the evolution of homologous sequences where recombinations or horizontal transfer can occur. First it is demonstrated that the appropriate structure to represent the evolution of sequences with recombinations is a family of trees each describing the evolution of a segment of the sequence. Two trees for neighboring segments will differ by exactly the transfer of a subtree within the whole tree. This leads to a metric between trees based on the smallest number of such operations needed to convert one tree into the other. An algorithm is presented that calculates this metric. This metric is used to formulate a dynamic programming algorithm that finds the most parsimonious history that fits a given set of sequences. The algorithm is potentially very practical, since many groups of sequences defy analysis by methods that ignore recombinations. These methods give ambiguous or contradictory results because the sequence history cannot be described by one phylogeny, but only a family of phylogenies that each describe the history of a segment of the sequences. The generalization of the algorithm to reconstruct gene conversions and the possibility for heuristic versions of the algorithm for larger data sets are discussed.  相似文献   

12.
Summary In this paper we present an iterative character weighting method for the construction of phyletic trees. An initial tree is used to calculate the character weights, which are the number of mutations normalized so that the possible range is corrected for. The weights obtained are used to adjust the tree; this process is iterated until a stable tree is found. Using data generated according to a model tree, we show that the trees constructed by the iterative character weighting method converge to the true underlying tree. Using biological data, the trees become closer to the systematic classification of the species concerned, and patterns conflicting with the phylogenetic pattern can be singled out. The method involves a combination of minimal length methods and similarity methods, whereby the strict parsimony criterion is relaxed.  相似文献   

13.
Paralogy is a pervasive problem in trying to use nuclear gene sequences to infer species phylogenies. One strategy for dealing with this problem is to infer species phylogenies from gene trees using reconciled trees, rather than directly from the sequences themselves. In this approach, the optimal species tree is the tree that requires the fewest gene duplications to be invoked. Because reconciled trees can identify orthologous from paralogous sequences, there is no need to do this prior to the analysis. Multiple gene trees can be analyzed simultaneously; however, the problem of nonuniform gene sampling raises practical problems which are discussed. In this paper the technique is applied to phylogenies for nine vertebrate genes (aldolase, alpha-fetoprotein, lactate dehydrogenase, prolactin, rhodopsin, trypsinogen, tyrosinase, vassopressin, and Wnt-7). The resulting species tree shows much similarity with currently accepted vertebrate relationships.  相似文献   

14.
We study distorted metrics on binary trees in the context of phylogenetic reconstruction. Given a binary tree T on n leaves with a path metric d, consider the pairwise distances {d(u,v)} between leaves. It is well known that these determine the tree and the d length of all edges. Here, we consider distortions d of d such that, for all leaves u and v, it holds that |d(u,v)-dmacr(u,v)|1.....T0 such that the true tree T may be obtained from that forest by adding alpha-1 edges and alpha-1les2-Omega(M/g)n. Our distorted metric result implies a reconstruction algorithm of phylogenetic forests with a small number of trees from sequences of length logarithmic in the number of species. The reconstruction algorithm is applicable for the general Markov model. Both the distorted metric result and its applications to phylogeny are almost tight  相似文献   

15.
In this paper the partition metric is used to compare binary trees deriving from (i) the study of the evolutionary relationships between aminoacyl-tRNA synthetases, (ii) the physicochemical properties of amino acids and (iii) the biosynthetic relationships between amino acids. If the tree defining the evolutionary relationships between aminoacyl-tRNA synthetases is assumed to be a manifestation of the mechanism that originated the organization of the genetic code, then the results appear to indicate the following: the hypothesis that regards the genetic code as a map of the biosynthetic relationships between amino acids seems to explain the organization of the genetic code, at least as plausibly as the hypotheses that consider the physicochemical properties of amino acids as the main adaptive theme that lead to the structuring of the code.  相似文献   

16.
Katoh K  Miyata T 《FEBS letters》1999,463(1-2):129-132
Applying the tree bisection and reconnection (TBR) algorithm, we have developed a heuristic method (maximum likelihood (ML)-TBR) for inferring the ML tree based on tree topology search. For initial trees from which iterative processes start in ML-TBR, two cases were considered: one is 100 neighbor-joining (NJ) trees based on the bootstrap resampling and the other is 100 randomly generated trees. The same ML tree was obtained in both cases. All different iterative processes started from 100 independent initial trees ultimately converged on one optimum tree with the largest log-likelihood value, suggesting that a limited number of initial trees will be quite enough in ML-TBR. This also suggests that the optimum tree corresponds to the global optimum in tree topology space and thus probably coincides with the ML tree inferred by intact ML analysis. This method has been applied to the inference of phylogenetic tree of the SOX family members. The mammalian testis-determining gene SRY is believed to have evolved from SOX-3, a member of the SOX family, based on several lines of evidence, including their sequence similarity, the location of SOX-3 on the X chromosome and some aspects of their expression. This model should be supported directly from the phylogenetic tree of the SOX family, but no evidence has been provided to date. A recently published NJ tree shows implausibly remote origin of SRY, suggesting that a more sophisticated method is required for understanding this problem. The ML tree inferred by the present method showed that the SRYs of marsupial and placental mammals form a monophyletic cluster which had diverged from the mammalian SOX-3 in the early evolution of mammals.  相似文献   

17.
18.
A C May 《Proteins》1999,37(1):20-29
Recently, several hierarchical classifications of protein three-dimensional (3D) structures have been published. However, none of them provides any assessment of the validity of a hierarchical representation or test individual clusters contained within. In fact, testing here of published trees reveals that they vary in meaning. Protein structure similarity measures are then assessed in terms of the robustness of the resulting trees for 24 protein families. A meaningful tree is defined as one in which all the clusters are found to be reliable according to a jackknife test. With the use of this criterion, a previously published similarity measure described as a "better RMS" is shown in fact to be usually less suited to protein fold classification than normal RMS after superposition. Here the "best" protein structure similarity measure for hierarchical classification-in terms of that which after clustering produces the highest number of meaningful trees, 20, for the 24 families-is found to be a new one. This measure includes information on the relationship of a distance at a given aligned position in a pair to the rest of the unique distances at that position in a protein family. There are only 2 families of the 24 tested, the globins (3 trees) and Kazal-type serine proteinase inhibitors (21 trees), in which the topology (branching order) of the meaningful 3D structure-based trees is constant. Thus, a new view of protein family sequence-structure relationships is afforded by comparing meaningful trees for each family. More generally, there is a need for care in interpretation of the results of those molecular biology algorithms that force a tree structure on data without assessing its applicability. Proteins 1999;37:20-29.  相似文献   

19.
Tree structures are useful for describing and analyzing biological objects and processes. Consequently, there is a need to design metrics and algorithms to compare trees. A natural comparison metric is the "Tree Edit Distance," the number of simple edit (insert/delete) operations needed to transform one tree into the other. Rooted-ordered trees, where the order between the siblings is significant, can be compared in polynomial time. Rooted-unordered trees are used to describe processes or objects where the topology, rather than the order or the identity of each node, is important. For example, in immunology, rooted-unordered trees describe the process of immunoglobulin (antibody) gene diversification in the germinal center over time. Comparing such trees has been proven to be a difficult computational problem that belongs to the set of NP-Complete problems. Comparing two trees can be viewed as a search problem in graphs. A* is a search algorithm that explores the search space in an efficient order. Using a good lower bound estimation of the degree of difference between the two trees, A* can reduce search time dramatically. We have designed and implemented a variant of the A* search algorithm suitable for calculating tree edit distance. We show here that A* is able to perform an edit distance measurement in reasonable time for trees with dozens of nodes.  相似文献   

20.
Species interact in many ways. Potentially, the type of interaction, e.g. mutualistic, commensalistic or antagonistic, determines the structure of interaction networks, but this remains poorly tested. Here we investigate whether epiphytes and wood decomposers, having different types of interaction with their host trees, show different network properties. We also test whether the traits of host trees affect network architecture. We recorded presence/absence of organisms colonizing trees, and traits of host trees, in 102 forest plots. Epiphytic bryophytes (64 species) and lichens (119 species) were recorded on c. 2300 trees. Similarly, wood-inhabiting fungi (193 species) were recorded on c. 900 dead wood items. We studied the patterns of species aggregation on host trees by comparing network metrics of species specialization, nestedness and modularity. Next, we tested whether the prevalence of interactions was influenced by host tree traits. We found non-random interaction patterns between host trees and the three ecological groups (bryophytes, lichens and fungi), with nested and modular structures associated with high host specificity. A higher modularity and number of modules was found for fungi than for epiphytes, which is likely related to their trophic relationship with the host plant, whilst the stronger nestedness for epiphytes is likely reflecting the commensalistic nature of their interactions. For all three groups, the difference in prevalence of interaction across modules was determined by a gradient in interaction intimacy (i.e. host tree specialization), driven by host tree traits. We conclude that the type of interaction with host trees defines the properties of each network: while autotrophic epiphyte networks show similar properties to mutualistic networks, the heterotrophic wood decomposers show similarity with antagonistic networks.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号