首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 187 毫秒
1.

Background

Relationships between species, genes and genomes have been printed as trees for over a century. Whilst this may have been the best format for exchanging and sharing phylogenetic hypotheses during the 20th century, the worldwide web now provides faster and automated ways of transferring and sharing phylogenetic knowledge. However, novel software is needed to defrost these published phylogenies for the 21st century.

Results

TreeRipper is a simple website for the fully-automated recognition of multifurcating phylogenetic trees (http://linnaeus.zoology.gla.ac.uk/~jhughes/treeripper/). The program accepts a range of input image formats (PNG, JPG/JPEG or GIF). The underlying command line c++ program follows a number of cleaning steps to detect lines, remove node labels, patch-up broken lines and corners and detect line edges. The edge contour is then determined to detect the branch length, tip label positions and the topology of the tree. Optical Character Recognition (OCR) is used to convert the tip labels into text with the freely available tesseract-ocr software. 32% of images meeting the prerequisites for TreeRipper were successfully recognised, the largest tree had 115 leaves.

Conclusions

Despite the diversity of ways phylogenies have been illustrated making the design of a fully automated tree recognition software difficult, TreeRipper is a step towards automating the digitization of past phylogenies. We also provide a dataset of 100 tree images and associated tree files for training and/or benchmarking future software. TreeRipper is an open source project licensed under the GNU General Public Licence v3.  相似文献   

2.

Background

The duration of treatment for HCV infection is partly indicated by the genotype of the virus. For studies of disease transmission, vaccine design, and surveillance for novel variants, subtype-level classification is also needed. This study used the Shimodaira-Hasegawa test and related statistical techniques to compare phylogenetic trees obtained from coding and non-coding regions of a whole-genome alignment for the reliability of subtyping in different regions.

Results

Different regions of the HCV genome yield inconsistent phylogenies, which can lead to erroneous conclusions about classification of a given infection. In particular, the highly conserved 5' untranslated region (UTR) yields phylogenetic trees with topologies that differ from the HCV polyprotein and complete genome phylogenies. Phylogenetic trees from the NS5B gene reliably cluster related subtypes, and yield topologies consistent with those of the whole genome and polyprotein.

Conclusion

These results extend those from previous studies and indicate that, unlike the NS5B gene, the 5' UTR contains insufficient variation to resolve HCV classifications to the level of viral subtype, and fails to distinguish genotypes reliably. Use of the 5' UTR for clinical tests to characterize HCV infection should be replaced by a subtype-informative test.  相似文献   

3.

Background

Visualising the evolutionary history of a set of sequences is a challenge for molecular phylogenetics. One approach is to use undirected graphs, such as median networks, to visualise phylogenies where reticulate relationships such as recombination or homoplasy are displayed as cycles. Median networks contain binary representations of sequences as nodes, with edges connecting those sequences differing at one character; hypothetical ancestral nodes are invoked to generate a connected network which contains all most parsimonious trees. Quasi-median networks are a generalisation of median networks which are not restricted to binary data, although phylogenetic information contained within the multistate positions can be lost during the preprocessing of data. Where the history of a set of samples contain frequent homoplasies or recombination events quasi-median networks will have a complex topology. Graph reduction or pruning methods have been used to reduce network complexity but some of these methods are inapplicable to datasets in which recombination has occurred and others are procedurally complex and/or result in disconnected networks.

Results

We address the problems inherent in construction and reduction of quasi-median networks. We describe a novel method of generating quasi-median networks that uses all characters, both binary and multistate, without imposing an arbitrary ordering of the multistate partitions. We also describe a pruning mechanism which maintains at least one shortest path between observed sequences, displaying the underlying relations between all pairs of sequences while maintaining a connected graph.

Conclusion

Application of this approach to 5S rDNA sequence data from sea beet produced a pruned network within which genetic isolation between populations by distance was evident, demonstrating the value of this approach for exploration of evolutionary relationships.  相似文献   

4.

Background

We analyze phylogenetic tree building methods from molecular sequences (PTMS). These are methods which base their construction solely on sequences, coding DNA or amino acids.

Results

Our first result is a statistically significant evaluation of 176 PTMSs done by comparing trees derived from 193138 orthologous groups of proteins using a new measure of quality between trees. This new measure, called the Intra measure, is very consistent between different groups of species and strong in the sense that it separates the methods with high confidence. The second result is the comparison of the trees against trees derived from accepted taxonomies, the Taxon measure. We consider the NCBI taxonomic classification and their derived topologies as the most accepted biological consensus on phylogenies, which are also available in electronic form. The correlation between the two measures is remarkably high, which supports both measures simultaneously.

Conclusions

The big surprise of the evaluation is that the maximum likelihood methods do not score well, minimal evolution distance methods over MSA-induced alignments score consistently better. This comparison also allows us to rank different components of the tree building methods, like MSAs, substitution matrices, ML tree builders, distance methods, etc. It is also clear that there is a difference between Metazoa and the rest, which points out to evolution leaving different molecular traces. We also think that these measures of quality of trees will motivate the design of new PTMSs as it is now easier to evaluate them with certainty.  相似文献   

5.

Background

When inferring phylogenetic trees different algorithms may give different trees. To study such effects a measure for the distance between two trees is useful. Quartet distance is one such measure, and is the number of quartet topologies that differ between two trees.

Results

We have derived a new algorithm for computing the quartet distance between a pair of general trees, i.e. trees where inner nodes can have any degree ≥ 3. The time and space complexity of our algorithm is sub-cubic in the number of leaves and does not depend on the degree of the inner nodes. This makes it the fastest algorithm so far for computing the quartet distance between general trees independent of the degree of the inner nodes.

Conclusions

We have implemented our algorithm and two of the best competitors. Our new algorithm is significantly faster than the competition and seems to run in close to quadratic time in practice.  相似文献   

6.

Background

Most phylogenetic studies using molecular data treat gaps in multiple sequence alignments as missing data or even completely exclude alignment columns that contain gaps.

Results

Here we show that gap patterns in large-scale, genome-wide alignments are themselves phylogenetically informative and can be used to infer reliable phylogenies provided the gap data are properly filtered to reduce noise introduced by the alignment method. We introduce here the notion of split-inducing indels (splids) that define an approximate bipartition of the taxon set. We show both in simulated data and in case studies on real-life data that splids can be efficiently extracted from phylogenomic data sets.

Conclusions

Suitably processed gap patterns extracted from genome-wide alignment provide a surprisingly clear phylogenetic signal and an allow the inference of accurate phylogenetic trees.
  相似文献   

7.
TreeSnatcher is a GUI-driven JAVA application for the semi-automatic recognition of multifurcating phylogenetic trees in pixel images. The program accepts an image file as input and analyzes the topology and the metrics of a tree depicted. The analysis is carried out in a multiple-stage process using algorithms from image analysis. In the end, TreeSnatcher produces a Newick tree code that represents the tree structure optionally including branch lengths. TreeSnatcher can process trees with 100 leaves or more in a few seconds. AVAILABILITY: TreeSnatcher was developed in JAVA under Mac OS X and is executable on UNIX/Linux, Windows and Mac OS X systems. The application and its documentation can be freely downloaded from http://www.cibiv.at/software/treesnatcher.  相似文献   

8.

Background

The MatrixMatchMaker algorithm was recently introduced to detect the similarity between phylogenetic trees and thus the coevolution between proteins. MMM finds the largest common submatrices between pairs of phylogenetic distance matrices, and has numerous advantages over existing methods of coevolution detection. However, these advantages came at the cost of a very long execution time.

Results

In this paper, we show that the problem of finding the maximum submatrix reduces to a multiple maximum clique subproblem on a graph of protein pairs. This allowed us to develop a new algorithm and program implementation, MMMvII, which achieved more than 600× speedup with comparable accuracy to the original MMM.

Conclusions

MMMvII will thus allow for more more extensive and intricate analyses of coevolution.

Availability

An implementation of the MMMvII algorithm is available at: http://www.uhnresearch.ca/labs/tillier/MMMWEBvII/MMMWEBvII.php  相似文献   

9.

Aims

To assess whether the yew roots, which are able to provide a very constant environment due to their long life-span, can maintain the original arbuscular mycorrhizal (AM) fungal community during yew population decline.

Methods

The diversity of AM fungi (AMF) colonizing the roots of yew was analyzed by selecting the small subunit ribosomal RNA genes to construct a database of the overall community of AMF in the experimental area. A terminal restriction fragment length polymorphism (TRFLP) approach was used to identify the AMF communities present in yew roots. Physiological and environmental variables related to topology and soil and plant characteristics were determined as markers of habitat degradation.

Results

The AMF communities within yew roots were found to be dependent on soil, plant and topological variables indicative of habitat degradation surrounding the yew. The phylogenetic diversity of AMF associated to the yews was lower in habitats more exposed to degradation than in those better conserved.

Conclusions

The target yews can be grouped into two degradation levels. AMF communities were also affected by the degradation processes affecting their hosts. This finding rules out the role of these trees as refugia for their original AMF community, a fact that should be considered in plant reintroduction programs using AMF as bioenhancers.  相似文献   

10.
MrBayes is a program that uses a Bayesian framework for inferring phylogenetic relationships. As MrBayes is a command-line-driven program, users acquainted to programs with graphical user interfaces will not find it easy to operate, especially as it requires a complex input format for the data to be analysed. We thus developed siMBa (simple MrBayes), a simple graphical user interface for MrBayes. This tool gives the user interactive control over most of the parameters and also facilitates the input of a multiple sequence alignment, as any widely used format can be used. siMBa is coded in Perl using the Tk module. Executables are provided for Windows, Linux, and Macintosh. The Perl codes, along with executables for different operating system, are freely available to download from [http://www.thines-lab.senckenberg.de/simba].  相似文献   

11.
12.

Background  

Distance matrix methods constitute a major family of phylogenetic estimation methods, and the minimum evolution (ME) principle (aiming at recovering the phylogeny with shortest length) is one of the most commonly used optimality criteria for estimating phylogenetic trees. The major difficulty for its application is that the number of possible phylogenies grows exponentially with the number of taxa analyzed and the minimum evolution principle is known to belong to the -hard class of problems.  相似文献   

13.
14.

Background

Although it has proven to be an important foundation for investigations of carnivoran ecology, biology and evolution, the complete species-level supertree for Carnivora of Bininda-Emonds et al. is showing its age. Additional, largely molecular sequence data are now available for many species and the advancement of computer technology means that many of the limitations of the original analysis can now be avoided. We therefore sought to provide an updated estimate of the phylogenetic relationships within all extant Carnivora, again using supertree analysis to be able to analyze as much of the global phylogenetic database for the group as possible.

Results

In total, 188 source trees were combined, representing 114 trees from the literature together with 74 newly constructed gene trees derived from nearly 45,000 bp of sequence data from GenBank. The greater availability of sequence data means that the new supertree is almost completely resolved and also better reflects current phylogenetic opinion (for example, supporting a monophyletic Mephitidae, Eupleridae and Prionodontidae; placing Nandinia binotata as sister to the remaining Feliformia). Following an initial rapid radiation, diversification rate analyses indicate a downturn in the net speciation rate within the past three million years as well as a possible increase some 18.0 million years ago; numerous diversification rate shifts within the order were also identified.

Conclusions

Together, the two carnivore supertrees remain the only complete phylogenetic estimates for all extant species and the new supertree, like the old one, will form a key tool in helping us to further understand the biology of this charismatic group of carnivores.  相似文献   

15.

Background

The majority of experimentally verified molecular interaction and biological pathway data are present in the unstructured text of biomedical journal articles where they are inaccessible to computational methods. The Biomolecular interaction network database (BIND) seeks to capture these data in a machine-readable format. We hypothesized that the formidable task-size of backfilling the database could be reduced by using Support Vector Machine technology to first locate interaction information in the literature. We present an information extraction system that was designed to locate protein-protein interaction data in the literature and present these data to curators and the public for review and entry into BIND.

Results

Cross-validation estimated the support vector machine's test-set precision, accuracy and recall for classifying abstracts describing interaction information was 92%, 90% and 92% respectively. We estimated that the system would be able to recall up to 60% of all non-high throughput interactions present in another yeast-protein interaction database. Finally, this system was applied to a real-world curation problem and its use was found to reduce the task duration by 70% thus saving 176 days.

Conclusions

Machine learning methods are useful as tools to direct interaction and pathway database back-filling; however, this potential can only be realized if these techniques are coupled with human review and entry into a factual database such as BIND. The PreBIND system described here is available to the public at http://bind.ca. Current capabilities allow searching for human, mouse and yeast protein-interaction information.  相似文献   

16.
17.

Background

Most clinical trial publications include figures, but there is little guidance on what results should be displayed as figures and how.

Purpose

To evaluate the current use of figures in Trial reports, and to make constructive suggestions for future practice.

Methods

We surveyed all 77 reports of randomised controlled trials in five general medical journals during November 2006 to January 2007. The numbers and types of figures were determined, and then each Figure was assessed for its style, content, clarity and suitability. As a consequence, guidelines are developed for presenting figures, both in general and for each specific common type of Figure.

Results

Most trial reports contained one to three figures, mean 2.3 per article. The four main types were flow diagram, Kaplan Meier plot, Forest plot (for subgroup analyses) and repeated measures over time: these accounted for 92% of all figures published. For each type of figure there is a considerable diversity of practice in both style and content which we illustrate with selected examples of both good and bad practice. Some pointers on what to do, and what to avoid, are derived from our critical evaluation of these articles' use of figures.

Conclusion

There is considerable scope for authors to improve their use of figures in clinical trial reports, as regards which figures to choose, their style of presentation and labelling, and their specific content. Particular improvements are needed for the four main types of figures commonly used.  相似文献   

18.

Background

Most studies inferring species phylogenies use sequences from single copy genes or sets of orthologs culled from gene families. For taxa such as plants, with very high levels of gene duplication in their nuclear genomes, this has limited the exploitation of nuclear sequences for phylogenetic studies, such as those available in large EST libraries. One rarely used method of inference, gene tree parsimony, can infer species trees from gene families undergoing duplication and loss, but its performance has not been evaluated at a phylogenomic scale for EST data in plants.

Results

A gene tree parsimony analysis based on EST data was undertaken for six angiosperm model species and Pinus, an outgroup. Although a large fraction of the tentative consensus sequences obtained from the TIGR database of ESTs was assembled into homologous clusters too small to be phylogenetically informative, some 557 clusters contained promising levels of information. Based on maximum likelihood estimates of the gene trees obtained from these clusters, gene tree parsimony correctly inferred the accepted species tree with strong statistical support. A slight variant of this species tree was obtained when maximum parsimony was used to infer the individual gene trees instead.

Conclusion

Despite the complexity of the EST data and the relatively small fraction eventually used in inferring a species tree, the gene tree parsimony method performed well in the face of very high apparent rates of duplication.
  相似文献   

19.

Background

Reconstruction of evolutionary history of bacteriophages is a difficult problem because of fast sequence drift and lack of omnipresent genes in phage genomes. Moreover, losses and recombinational exchanges of genes are so pervasive in phages that the plausibility of phylogenetic inference in phage kingdom has been questioned.

Results

We compiled the profiles of presence and absence of 803 orthologous genes in 158 completely sequenced phages with double-stranded DNA genomes and used these gene content vectors to infer the evolutionary history of phages. There were 18 well-supported clades, mostly corresponding to accepted genera, but in some cases appearing to define new taxonomic groups. Conflicts between this phylogeny and trees constructed from sequence alignments of phage proteins were exploited to infer 294 specific acts of intergenome gene transfer.

Conclusion

A notoriously reticulate evolutionary history of fast-evolving phages can be reconstructed in considerable detail by quantitative comparative genomics.

Open peer review

This article was reviewed by Eugene Koonin, Nicholas Galtier and Martijn Huynen.  相似文献   

20.
IJ_Rhizo: an open-source software to measure scanned images of root samples   总被引:1,自引:0,他引:1  

Background and aims

This paper provides an overview of the measuring capabilities of IJ_Rhizo, an ImageJ macro that measures scanned images of washed root samples. IJ_Rhizo is open-source, platform-independent and offers a simple graphic user interface (GUI) for a main audience of non-programmer scientists. Being open-source based, it is also fully modifiable to accommodate the specific needs of the more computer-literate users. A comparison of IJ_Rhizo’s performance with that of the widely used commercial package WinRHIZO? is discussed.

Methods

We compared IJ_Rhizo’s performance with that of the commercial package WinRHIZO? using two sets of images, one comprising test-line images, the second consisting of images of root samples collected in the field. IJ_Rhizo and WinRHIZO? estimates were compared by means of correlation and regression analysis.

Results

IJ_Rhizo “Kimura” and WinRHIZO? “Tennant” were the length estimates that were best linearly correlated with each other. Correlation between average root diameter estimates was weaker, due to the sensitivity of this parameter to thresholding and filtering of image background noise.

Conclusions

Overall, IJ_Rhizo offers new opportunities for researchers who cannot afford the cost of commercial software packages to carry out automated measurement of scanned images of root samples, without sacrificing accuracy.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号