首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background

We analyze phylogenetic tree building methods from molecular sequences (PTMS). These are methods which base their construction solely on sequences, coding DNA or amino acids.

Results

Our first result is a statistically significant evaluation of 176 PTMSs done by comparing trees derived from 193138 orthologous groups of proteins using a new measure of quality between trees. This new measure, called the Intra measure, is very consistent between different groups of species and strong in the sense that it separates the methods with high confidence. The second result is the comparison of the trees against trees derived from accepted taxonomies, the Taxon measure. We consider the NCBI taxonomic classification and their derived topologies as the most accepted biological consensus on phylogenies, which are also available in electronic form. The correlation between the two measures is remarkably high, which supports both measures simultaneously.

Conclusions

The big surprise of the evaluation is that the maximum likelihood methods do not score well, minimal evolution distance methods over MSA-induced alignments score consistently better. This comparison also allows us to rank different components of the tree building methods, like MSAs, substitution matrices, ML tree builders, distance methods, etc. It is also clear that there is a difference between Metazoa and the rest, which points out to evolution leaving different molecular traces. We also think that these measures of quality of trees will motivate the design of new PTMSs as it is now easier to evaluate them with certainty.  相似文献   

2.

Background

This paper is devoted to distance measures for leaf-labelled trees on free leafset. A leaf-labelled tree is a data structure which is a special type of a tree where only leaves (terminal) nodes are labelled. This data structure is used in bioinformatics for modelling of evolution history of genes and species and also in linguistics for modelling of languages evolution history. Many domain specific problems occur and need to be solved with help of tree postprocessing techniques such as distance measures.

Results

Here we introduce the tree edit distance designed for leaf labelled trees on free leafset, which occurs to be a metric. It is presented together with tree edit consensus tree notion. We provide statistical evaluation of provided measure with respect to R-F, MAST and frequent subsplit based dissimilarity measures as the reference measures.

Conclusions

The tree edit distance was proven to be a metric and has the advantage of using different costs for contraction and pruning, therefore their properties can be tuned depending on the needs of the user. Two of the presented methods carry the most interesting properties. E(3,1) is very discriminative (having a wide range of values) and has a very regular distance distribution which is similar to a normal distribution in its shape and is good both for similar and non-similar trees. NFC(2,1) on the other hand is proportional or nearly proportional to the number of mutation operations used, irrespective of their type.  相似文献   

3.
We describe a method that will reconstruct an unrooted binary phylogenetic level-1 network on \(n\) taxa from the set of all quartets containing a certain fixed taxon, in \(O(n^3)\) time. We also present a more general method which can handle more diverse quartet data, but which takes \(O(n^6)\) time. Both methods proceed by solving a certain system of linear equations over the two-element field \(\mathrm{GF}(2)\) . For a general dense quartet set, i.e. a set containing at least one quartet on every four taxa, our \(O(n^6)\) algorithm constructs a phylogenetic level-1 network consistent with the quartet set if such a network exists and returns an \(O(n^2)\) -sized certificate of inconsistency otherwise. This answers a question raised by Gambette, Berry and Paul regarding the complexity of reconstructing a level-1 network from a dense quartet set, and more particularly regarding the complexity of constructing a cyclic ordering of taxa consistent with a dense quartet set.  相似文献   

4.

Background

The MatrixMatchMaker algorithm was recently introduced to detect the similarity between phylogenetic trees and thus the coevolution between proteins. MMM finds the largest common submatrices between pairs of phylogenetic distance matrices, and has numerous advantages over existing methods of coevolution detection. However, these advantages came at the cost of a very long execution time.

Results

In this paper, we show that the problem of finding the maximum submatrix reduces to a multiple maximum clique subproblem on a graph of protein pairs. This allowed us to develop a new algorithm and program implementation, MMMvII, which achieved more than 600× speedup with comparable accuracy to the original MMM.

Conclusions

MMMvII will thus allow for more more extensive and intricate analyses of coevolution.

Availability

An implementation of the MMMvII algorithm is available at: http://www.uhnresearch.ca/labs/tillier/MMMWEBvII/MMMWEBvII.php  相似文献   

5.

Background

Measuring similarities between tree structured data is important for analysis of RNA secondary structures, phylogenetic trees, glycan structures, and vascular trees. The edit distance is one of the most widely used measures for comparison of tree structured data. However, it is known that computation of the edit distance for rooted unordered trees is NP-hard. Furthermore, there is almost no available software tool that can compute the exact edit distance for unordered trees.

Results

In this paper, we present a practical method for computing the edit distance between rooted unordered trees. In this method, the edit distance problem for unordered trees is transformed into the maximum clique problem and then efficient solvers for the maximum clique problem are applied. We applied the proposed method to similar structure search for glycan structures. The result suggests that our proposed method can efficiently compute the edit distance for moderate size unordered trees. It also suggests that the proposed method has the accuracy comparative to those by the edit distance for ordered trees and by an existing method for glycan search.

Conclusions

The proposed method is simple but useful for computation of the edit distance between unordered trees. The object code is available upon request.
  相似文献   

6.

Background  

A number of algorithms have been developed for calculating the quartet distance between two evolutionary trees on the same set of species. The quartet distance is the number of quartets – sub-trees induced by four leaves – that differs between the trees. Mostly, these algorithms are restricted to work on binary trees, but recently we have developed algorithms that work on trees of arbitrary degree.  相似文献   

7.
Sayyari  Erfan  Mirarab  Siavash 《BMC genomics》2016,17(10):783-113

Background

Inferring species trees from gene trees using the coalescent-based summary methods has been the subject of much attention, yet new scalable and accurate methods are needed.

Results

We introduce DISTIQUE, a new statistically consistent summary method for inferring species trees from gene trees under the coalescent model. We generalize our results to arbitrary phylogenetic inference problems; we show that two arbitrarily chosen leaves, called anchors, can be used to estimate relative distances between all other pairs of leaves by inferring relevant quartet trees. This results in a family of distance-based tree inference methods, with running times ranging between quadratic to quartic in the number of leaves.

Conclusions

We show in simulated studies that DISTIQUE has comparable accuracy to leading coalescent-based summary methods and reduced running times.
  相似文献   

8.

Background

Many clustering procedures only allow the user to input a pairwise dissimilarity or distance measure between objects. We propose a clustering method that can input a multi-point dissimilarity measure d(i1, i2, ..., iP) where the number of points P can be larger than 2. The work is motivated by gene network analysis where clusters correspond to modules of highly interconnected nodes. Here, we define modules as clusters of network nodes with high multi-node topological overlap. The topological overlap measure is a robust measure of interconnectedness which is based on shared network neighbors. In previous work, we have shown that the multi-node topological overlap measure yields biologically meaningful results when used as input of network neighborhood analysis.

Findings

We adapt network neighborhood analysis for the use of module detection. We propose the Module Affinity Search Technique (MAST), which is a generalized version of the Cluster Affinity Search Technique (CAST). MAST can accommodate a multi-node dissimilarity measure. Clusters grow around user-defined or automatically chosen seeds (e.g. hub nodes). We propose both local and global cluster growth stopping rules. We use several simulations and a gene co-expression network application to argue that the MAST approach leads to biologically meaningful results. We compare MAST with hierarchical clustering and partitioning around medoid clustering.

Conclusion

Our flexible module detection method is implemented in the MTOM software which can be downloaded from the following webpage: http://www.genetics.ucla.edu/labs/horvath/MTOM/  相似文献   

9.
Recently, we have shown that calculating the minimum–temporal-hybridization number for a set ${\mathcal{P}}$ of rooted binary phylogenetic trees is NP-hard and have characterized this minimum number when ${\mathcal{P}}$ consists of exactly two trees. In this paper, we give the first characterization of the problem for ${\mathcal{P}}$ being arbitrarily large. The characterization is in terms of cherries and the existence of a particular type of sequence. Furthermore, in an online appendix to the paper, we show that this new characterization can be used to show that computing the minimum–temporal hybridization number for two trees is fixed-parameter tractable.  相似文献   

10.

Background

Visualising the evolutionary history of a set of sequences is a challenge for molecular phylogenetics. One approach is to use undirected graphs, such as median networks, to visualise phylogenies where reticulate relationships such as recombination or homoplasy are displayed as cycles. Median networks contain binary representations of sequences as nodes, with edges connecting those sequences differing at one character; hypothetical ancestral nodes are invoked to generate a connected network which contains all most parsimonious trees. Quasi-median networks are a generalisation of median networks which are not restricted to binary data, although phylogenetic information contained within the multistate positions can be lost during the preprocessing of data. Where the history of a set of samples contain frequent homoplasies or recombination events quasi-median networks will have a complex topology. Graph reduction or pruning methods have been used to reduce network complexity but some of these methods are inapplicable to datasets in which recombination has occurred and others are procedurally complex and/or result in disconnected networks.

Results

We address the problems inherent in construction and reduction of quasi-median networks. We describe a novel method of generating quasi-median networks that uses all characters, both binary and multistate, without imposing an arbitrary ordering of the multistate partitions. We also describe a pruning mechanism which maintains at least one shortest path between observed sequences, displaying the underlying relations between all pairs of sequences while maintaining a connected graph.

Conclusion

Application of this approach to 5S rDNA sequence data from sea beet produced a pruned network within which genetic isolation between populations by distance was evident, demonstrating the value of this approach for exploration of evolutionary relationships.  相似文献   

11.

Key message

A hypergeometric model is proposed explicitly instead of two previous stochastic models (the Poisson model and Neyman-A model) to describe the topological relationship of trees and the influence of the exclusion distance on gap fraction and clumping index of forest plantation canopies.

Abstract

Gap fraction (GF) and clumping index (CI) play key roles in plant light interception, and therefore they have strong impacts on plant growth and canopy radiative transfer processes. Trees are usually assumed to be randomly distributed in natural forests in many previous studies. However, few studies have shown how trees are distributed in forest plantations and how these distribution patterns affect GF and CI in these forests. In this paper, a simple and general distance factor defined as relative allowable shortest distance between centers of two adjacent crowns divided by the mean diameter of the crowns (RASD) is proposed to describe quantitatively the degree of mutual exclusion among trees in forest plantations of various tree distribution patterns. A hypergeometric model is proposed instead of two previous stochastic tree distribution models (the Poisson model and Neyman-A model) to describe the topological relationship of trees and the influences of the exclusion distance on the GF and CI of the forest plantation canopies. The results show that: (1) the hypergeometric model is more suitable than the Poisson model and Neyman-A model for describing the topological relationship of trees in forest plantations; (2) the exclusion distance has strong impacts on GF and CI: there are significant differences between the results of the hypergeometric model and the Poisson model. Larger RASD causes lower GF and larger CI. The simulations are verified by field measurements in four forest plantation stands. Similarly, impacts of RASD on GF and CI are also found for other two crown shapes (prolate and oblate ellipsoids).
  相似文献   

12.

Background

Isometric gene tree reconciliation is a gene tree/species tree reconciliation problem where both the gene tree and the species tree include branch lengths, and these branch lengths must be respected by the reconciliation. The problem was introduced by Ma et al. in 2008 in the context of reconstructing evolutionary histories of genomes in the infinite sites model.

Results

In this paper, we show that the original algorithm by Ma et al. is incorrect, and we propose a modified algorithm that addresses the problems that we discovered. We have also improved the running time from \(O(N^2)\) to \(O(N\log N)\), where N is the total number of nodes in the two input trees. Finally, we examine two new variants of the problem: reconciliation of two unrooted trees and scaling of branch lengths of the gene tree during reconciliation of two rooted trees.

Conclusions

We provide several new algorithms for isometric reconciliation of trees. Some questions in this area remain open; most importantly extensions of the problem allowing for imprecise estimates of branch lengths.
  相似文献   

13.

Key message

This study provides data necessary to develop mechanistic models of the failure of open-grown trees. The literature contains few such data. Some results contrast previous studies on conifers.

Abstract

In cities and towns, tree failure can cause damage and injury. Few studies have considered large, open-grown trees when measuring parameters related to tree failure. To measure elastic modulus and maximum bending moment and stress, we winched red oaks (Quercus rubra L.), including some with co-dominant stems and others with extant decay. To simulate decay in a subsample of trees, we cut voids in the trunk before pulling trees to failure. Maximum bending moment was greatest for uprooted trees, but maximum bending and shear stresses were greatest for trees that failed in the crown in the vicinity of branches. The likelihood of failure at a void or area of extant decay increased as the loss in area moment of inertia increased. The moduli of elasticity and rupture of specimens taken from trees were greater than values measured on the trees themselves. Failure at the union of co-dominant stems only occurred when we pulled them apart, loading them perpendicular to the plane bifurcating the union. Some of the results are inconsistent with previous work on conifers; more data on open-grown trees are necessary to develop mechanistic models to predict tree failure.  相似文献   

14.

Aims

Dehesas are agroforestry systems characterized by scattered trees among pastures, crops and/or fallows. A study at a Spanish dehesa has been carried out to estimate the spatial distribution of the soil organic carbon stock and to assess the influence of the tree cover.

Methods

The soil organic carbon stock was estimated from the five uppermost cm of the mineral soil with high spatial resolution at two plots with different grazing intensities. The Universal Kriging technique was used to assess the spatial distribution of the soil organic carbon stocks, using tree coverage within a buffering area as an auxiliary variable.

Results

A significant positive correlation between tree presence and soil organic carbon stocks up to distances of around 8 m from the trees was found. The tree crown cover within a buffer up to a distance similar to the crown radius around the point absorbed 30 % of the variance in the model for both grazing intensities, but residual variance showed stronger spatial autocorrelation under regular grazing conditions.

Conclusions

Tree cover increases soil organic carbon stocks, and can be satisfactorily estimated by means of crown parameters. However, other factors are involved in the spatial pattern of the soil organic carbon distribution. Livestock plays an interactive role together with tree presence in soil organic carbon distribution.  相似文献   

15.

Background

The quantification of the spatial order of biological patterns or mosaics provides useful information as many properties are determined by the spatial distribution of their constituent elements. These are usually characterised by methods based on nearest neighbours distances, by the number of sides of cells, or by angles defined by the adjacent cells.

Methods

A measure of regularity in polygonal mosaics of different kinds in biological systems is proposed. It is based on the condition of eutacticity, expressed in terms of eutactic stars, which is closely related to regularity of polytopes. Thus it constitutes a natural measure of regularity. The proposed measure is tested with numerical and real data. Numerically is tested with a hexagonal lattice that is distorted progressively and with a non-periodic regular tiling. With real data, the distribution of oak trees in forests from three locations in the State of Querétaro, Mexico, and the spiral pattern of florets in a flowering plant are characterised.

Results

The proposed measure performs well and as expected while tested with a numerical experiment, as well as when applied to a known non-periodic tiling of the plane. Concerning real data, the measure is sensitive to the degree of perturbation observed in the distribution of oak trees and detects high regularity in a phyllotactic pattern studied.

Conclusions

The measure here proposed has a clear geometrical meaning, establishing what regularity means, and constitute an advantageous general purposes alternative to analyse spatial distributions, capable to indicate the degree of regularity of a mosaic or an array of points.
  相似文献   

16.
17.

Key Message

An improved quantification of variations in bark microrelief is presented that uses wavelets on a circular domain from data acquired using the LaserBark? automated tree measurement system.

Abstract

An important metric of canopy structure, bark microrelief affects both the hydrology and biogeochemistry of forests. Increased bark microrelief leads to reduced stemflow volumes and higher concentrations of stemflow leachates and nutrient-ions. Consequently, an improved representation of bark microrelief would be useful to describe the influence of various tree species on water and solute contributions to the forest floor. Most existing methods to quantify bark microrelief are ‘global’ measures; that is, they provide a single number that represents the overall bark microrelief of the entire perimeter of the tree. To remedy this, wavelet analysis of LaserBark? automated tree measurement system data is proposed and described to quantify variations in bark microrelief around the perimeter of the tree. This measure describes the spatial differences in bark microrelief and allows representation of trees that exhibit directional variability in bark microrelief due to natural or anthropogenic effects. The results show that wavelet analysis is effective in quantifying both bark microrelief and large-scale tree asymmetry. The radial component highlights changes in the depth of bark microrelief while the tangential component relates to the distance between bark furrows in the bark cross section. Thus, wavelet analysis may be a useful tool for comparing bark structure that varies, for example, within- and between-tree species, at different stages of tree growth, and among trees grown under different environmental conditions.  相似文献   

18.

Background and aims

Vegetation can have direct and indirect effects on soil nutrients. To test the effects of trees on soils, we examined the patterns of soil nutrients and nutrient ratios at two spatial scales: at sites spanning the alpine tundra/subalpine forest ecotone (ecotone scale), and beneath and beyond individual tree canopies within the transitional krummholz zone (tree scale).

Methods

Soils were collected and analyzed for total carbon (C), nitrogen (N), and phosphorus (P) as well as available N and P on Niwot Ridge in the Colorado Rocky Mountains.

Results

Total C, N, and P were higher in the krummholz zone than the forest or tundra. Available P was also greatest in the krummholz zone while available N increased from the forest to the tundra. Throughout the krummholz zone, total soil nutrients and available P were higher downwind compared to upwind of trees.

Conclusions

The krummholz zone in general, and downwind of krummholz trees in particular, are zones of nutrient accumulation. This pattern indicates that the indirect effects of trees on soils are more important than the direct effects. The higher N:P ratios in the tundra suggest nutrient dynamics differ from the lower elevation sites. We propose that evaluating soil N and P simultaneously in soils may provide a robust assay of ecosystem nutrient limitation.  相似文献   

19.

Background

The domestic goat is one of the important livestock species of India. In the present study we assess genetic diversity of Indian goats using 17 microsatellite markers. Breeds were sampled from their natural habitat, covering different agroclimatic zones.

Results

The mean number of alleles per locus (NA) ranged from 8.1 in Barbari to 9.7 in Jakhrana goats. The mean expected heterozygosity (He) ranged from 0.739 in Barbari to 0.783 in Jakhrana goats. Deviations from Hardy-Weinberg Equilibrium (HWE) were statistically significant (P < 0.05) for 5 loci breed combinations. The DA measure of genetic distance between pairs of breeds indicated that the lowest distance was between Marwari and Sirohi (0.135). The highest distance was between Pashmina and Black Bengal. An analysis of molecular variance indicated that 6.59% of variance exists among the Indian goat breeds. Both a phylogenetic tree and Principal Component Analysis showed the distribution of breeds in two major clusters with respect to their geographic distribution.

Conclusion

Our study concludes that Indian goat populations can be classified into distinct genetic groups or breeds based on the microsatellites as well as mtDNA information.  相似文献   

20.

Background

We previously developed the DBRF-MEGN (difference-based regulation finding-minimum equivalent gene network) method, which deduces the most parsimonious signed directed graphs (SDGs) consistent with expression profiles of single-gene deletion mutants. However, until the present study, we have not presented the details of the method's algorithm or a proof of the algorithm.

Results

We describe in detail the algorithm of the DBRF-MEGN method and prove that the algorithm deduces all of the exact solutions of the most parsimonious SDGs consistent with expression profiles of gene deletion mutants.

Conclusions

The DBRF-MEGN method provides all of the exact solutions of the most parsimonious SDGs consistent with expression profiles of gene deletion mutants.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号