首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Distance-based methods are popular for reconstructing evolutionary trees of protein sequences, mainly because of their speed and generality. A number of variants of the classical neighbor-joining (NJ) algorithm have been proposed, as well as a number of methods to estimate protein distances. We here present a large-scale assessment of performance in reconstructing the correct tree topology for the most popular algorithms. The programs BIONJ, FastME, Weighbor, and standard NJ were run using 12 distance estimators, producing 48 tree-building/distance estimation method combinations. These were evaluated on a test set based on real trees taken from 100 Pfam families. Each tree was used to generate multiple sequence alignments with the ROSE program using three evolutionary models. The accuracy of each method was analyzed as a function of both sequence divergence and location in the tree. We found that BIONJ produced the overall best results, although the average accuracy differed little between the tree-building methods (normally less than 1%). A noticeable trend was that FastME performed poorer than the rest on long branches. Weighbor was several orders of magnitude slower than the other programs. Larger differences were observed when using different distance estimators. Protein-adapted Jukes-Cantor and Kimura distance correction produced clearly poorer results than the other methods, even worse than uncorrected distances. We also assessed the recently developed Scoredist measure, which performed equally well as more complex methods.  相似文献   

2.
Shaw G 《BioTechniques》2000,28(6):1198-1201
Biologists today make extensive use of word processing programs for the production of research reports, literature reviews and grant proposals. Frequently, such programs become the default platform for viewing and the later publication of protein and nucleic acid sequence data. Thus, researchers often switch between their word processor and more specialized programs designed to analyze protein and nucleic acid sequences. It would be more convenient to perform these simple sequence analyses using the word processor without switching to another program. The focus here is on the use of the Visual Basic programming language, which is built into all recent versions of Microsoft Word to generate surprisingly complex and useful macros that can conveniently analyze several important features of protein and nucleic acid sequences. The standard Word interface can also be easily modified to display and run these macros from a pull-down menu. Several examples of this approach are provided.  相似文献   

3.
Multiple sequence alignment with the Clustal series of programs   总被引:2,自引:0,他引:2  
The Clustal series of programs are widely used in molecular biology for the multiple alignment of both nucleic acid and protein sequences and for preparing phylogenetic trees. The popularity of the programs depends on a number of factors, including not only the accuracy of the results, but also the robustness, portability and user-friendliness of the programs. New features include NEXUS and FASTA format output, printing range numbers and faster tree calculation. Although, Clustal was originally developed to run on a local computer, numerous Web servers have been set up, notably at the EBI (European Bioinformatics Institute) (http://www.ebi.ac.uk/clustalw/).  相似文献   

4.
The design of synthetic genes   总被引:1,自引:1,他引:0       下载免费PDF全文
Computer programs are described that aid in the design of synthetic genes coding for proteins that are targets of a research program in site directed mutagenesis. These programs "reverse-translate" protein sequences into general nucleic acid sequences (those where codons have not yet been selected), map restriction sites into general DNA sequences, identify points in the synthetic gene where unique restriction sites can be introduced, and assist in the design of genes coding for hybrids and evolutionary intermediates between homologous proteins. Application of these programs therefore facilitates the use of modular mutagenesis to create variants of proteins, and the implementation of evolutionary guidance as a strategy for selecting mutants.  相似文献   

5.
The adequacy of various phenetic and phylogenetic estimation methods was evaluated using simulated data sets. Two parsimony programs were used to construct maximum parsimony trees (WAGNER 78 and HENNIG 86). The CAFCA program was used to perform group-compatibility analysis. Four UPGMA clustering strategies were employed. The simulation model GENESIS was used to generate data sets under different evolutionary conditions. The effects of input parameters and tree properties on the accuracy of the estimated trees were evaluated. UPGMA based on product moment correlations of unstandardized characters appeared to perform best, under all evolutionary conditions tested. The effect of input parameters on the accuracy was not very significant. Among the tree statistics the stemminess of the true tree appeared to be the most important estimator of accuracy.  相似文献   

6.
This study describes novel algorithms for searching for most parsimonious trees. These algorithms are implemented as a parsimony computer program, PARSIGAL, which performs well even with difficult data sets. For high level search, PARSIGAL uses an evolutionary optimization algorithm, which feeds good tree candidates to a branch-swapping local search procedure. This study also describes an extremely fast method of recomputing state sets for binary characters (additive or nonadditive characters with two states), based on packing 32 characters into a single memory word and recomputing the tree simultaneously for all 32 characters using fast bitwise logical operations. The operational principles of PARSIGAL are quite different from those previously published for other parsimony computer programs. Hence it is conceivable that PARSIGAL may be able to locate islands of trees that are different from those that are easily located with existing parsimony computer programs.  相似文献   

7.
Museums play a vitally important role in supporting both informal and formal education and are important venues for fostering public understanding of evolution. The Yale Peabody Museum has implemented significant education programs on evolution for many decades, mostly focused on the museum’s extensive collections that represent the past and present tree of life. Twelve years ago, the Peabody began a series of new programs that explored biodiversity and evolution as it relates to human health. Modern evolutionary theory contributes significantly to our understanding of health and disease, and medical topics provide many excellent and relevant examples to explore evolutionary concepts. The Peabody developed a program on vector-borne diseases, specifically Lyme disease and West Nile virus, which have become endemic in the United States. Both of these diseases have complex transmission cycles involving an intricate interplay among the pathogen, host, and vector, each of which is subject to differing evolutionary pressures. Using these stories, the museum explored evolutionary concepts of adaptation (e.g., the evolution of blood feeding), coevolution (e.g., the “arms race” between host and vector), and variation and selection (e.g., antibiotic resistance) among others. The project included a temporary exhibition and the development of curriculum materials for middle and high school teachers and students. The popularity of the exhibit and some formal evaluation of student participants suggested that this educational approach has significant potential to engage wide audiences in evolutionary issues. In addition it demonstrated how natural history museums can incorporate evolution into a broad array of programs.  相似文献   

8.
9.
Summary: TOPALi v2 simplifies and automates the use of severalmethods for the evolutionary analysis of multiple sequence alignments.Jobs are submitted from a Java graphical user interface as TOPALiweb services to either run remotely on high-performance computingclusters or locally (with multiple cores supported). Methodsavailable include model selection and phylogenetic tree estimationusing the Bayesian inference and maximum likelihood (ML) approaches,in addition to recombination detection methods. The optimalsubstitution model can be selected for protein or nucleic acid(standard, or protein-coding using a codon position model) datausing accurate statistical criteria derived from ML co-estimationof the tree and the substitution model. Phylogenetic softwareavailable includes PhyML, RAxML and MrBayes. Availability: Freely downloadable from http://www.topali.orgfor Windows, Mac OS X, Linux and Solaris. Contact: iain.milne{at}scri.ac.uk Associate Editor: Martin Bishop  相似文献   

10.
MOTIVATION: The programs currently available for the analysis of nucleic acid and protein sequences suffer from a variety of problems: Web-based programs often require inconvenient reformatting of sequences when proceeding from one analysis to the next, and commercial-console-based programs are cost prohibitive. Here, we report the development of DNASSIST:, an inexpensive, multiple-document, interface program for the fully integrated editing and analysis of nucleic acid and protein sequences in the familiar environment of Microsoft Windows.  相似文献   

11.
The method of evolutionary parsimony--or operator invariants--is a technique of nucleic acid sequence analysis related to parsimony analysis and explicitly designed for determining evolutionary relationships among four distantly related taxa. The method is independent of substitution rates because it is derived from consideration of the group properties of substitution operators rather than from an analysis of the probabilities of substitution in branches of a tree. In both parsimony and evolutionary parsimony, three patterns of nucleotide substitution are associated one-to-one with the three topologically linked trees for four taxa. In evolutionary parsimony, the three quantities are operator invariants. These invariants are the remnants of substitutions that have occurred in the interior branch of the tree and are analogous to the substitutions assigned to the central branch by parsimony. The two invariants associated with the incorrect trees must equal zero (statistically), whereas only the correct tree can have a nonzero invariant. The chi 2-test is used to ascertain the nonzero invariant and the statistically favored tree. Examples, obtained using data calculated with evolutionary rates and branchings designed to camouflage the true tree, show that the method accurately predicts the tree, even when substitution rates differ greatly in neighboring peripheral branches (conditions under which parsimony will consistently fail). As the number of substitutions in peripheral branches becomes fewer, the parsimony and the evolutionary-parsimony solutions converge. The method is robust and easy to use.   相似文献   

12.

Background  

Research in evolution requires software for visualizing and editing phylogenetic trees, for increasingly very large datasets, such as arise in expression analysis or metagenomics, for example. It would be desirable to have a program that provides these services in an effcient and user-friendly way, and that can be easily installed and run on all major operating systems. Although a large number of tree visualization tools are freely available, some as a part of more comprehensive analysis packages, all have drawbacks in one or more domains. They either lack some of the standard tree visualization techniques or basic graphics and editing features, or they are restricted to small trees containing only tens of thousands of taxa. Moreover, many programs are diffcult to install or are not available for all common operating systems.  相似文献   

13.
14.
Efficient primer design algorithms   总被引:5,自引:0,他引:5  
MOTIVATION: Primer design involves various parameters such as string-based alignment scores, melting temperature, primer length and GC content. This entails a design approach from multicriteria decision making. Values of some of the criteria are easy to compute while others require intense calculations. RESULTS: The reference point method was found to be tractable for trading-off between deviations from ideal values of all the criteria. Some criteria computations are based on dynamic programs with value iteration whose run time can be bounded by a low-degree polynomial. For designing standard PCR primers, the scheme offers in a relative gain in computing speed of up to 50: 1 over ad-hoc computational methods. Single PCR primer pairs have been used as model systems in order to simplify the quantization of the computational acceleration factors. The program has been structured so as to facilitate the analysis of large numbers of primer pairs with minor modifications. The scheme significantly increases primer design throughput which in turn facilitates the use of oligonucleotides in a wide range of applications including: multiplex PCR and other nucleic acid-based amplification systems, as well as in zip code targeting, oligonucleotide microarrays and nucleic acid-based nanoengineering.  相似文献   

15.
Many phylogenetic inference programs are available to infer evolutionary relationships among taxa using aligned sequences of characters, typically DNA or amino acids. These programs are often used to infer the evolutionary history of species. However, in most cases it is impossible to systematically verify the correctness of the tree returned by these programs, as the correct evolutionary history is generally unknown and unknowable. In addition, it is nearly impossible to verify whether any non-trivial tree is correct in accordance to the specification of the often complicated search and scoring algorithms. This difficulty is known as the oracle problem of software testing: there is no oracle that we can use to verify the correctness of the returned tree. This makes it very challenging to test the correctness of any phylogenetic inference programs. Here, we demonstrate how to apply a simple software testing technique, called Metamorphic Testing, to alleviate the oracle problem in testing phylogenetic inference programs. We have used both real and randomly generated test inputs to evaluate the effectiveness of metamorphic testing, and found that metamorphic testing can detect failures effectively in faulty phylogenetic inference programs with both types of test inputs.  相似文献   

16.
Multilocus genomic data sets can be used to infer a rich set of information about the evolutionary history of a lineage, including gene trees, species trees, and phylogenetic networks. However, user‐friendly tools to run such integrated analyses are lacking, and workflows often require tedious reformatting and handling time to shepherd data through a series of individual programs. Here, we present a tool written in Python—TREEasy—that performs automated sequence alignment (with MAFFT), gene tree inference (with IQ‐Tree), species inference from concatenated data (with IQ‐Tree and RaxML‐NG), species tree inference from gene trees (with ASTRAL, MP‐EST, and STELLS2), and phylogenetic network inference (with SNaQ and PhyloNet). The tool only requires FASTA files and nine parameters as inputs. The tool can be run as command line or through a Graphical User Interface (GUI). As examples, we reproduced a recent analysis of staghorn coral evolution, and performed a new analysis on the evolution of the “WGD clade” of yeast. The latter revealed novel patterns that were not identified by previous analyses. TREEasy represents a reliable and simple tool to accelerate research in systematic biology ( https://github.com/MaoYafei/TREEasy ).  相似文献   

17.
Each amino acid in a protein is considered to be an individual, mutable characteristic of the species from which the protein is extracted. For a branching tree representing the evolutionary history of the known sequences in different species, our computer programs use majority logic and parsimony of mutations to determine the most likely ancestral amino acid for each position of the protein at each node of the tree. The number of mutations necessary between the ancestral and present species is summed for each branch and the entire tree. The programs then move branches to make many different configurations, from which we select the one with the minimum number of mutations as the most likely evolutionary history. We used this method to elucidate primate phylogeny from sequences of fibrinopeptides, carbonic anhydrase, and the hemoglobin beta, delta and alpha chains. All available sequences indicate that the early Pongidae had diverged into two lines before the divergence of an ancestor for the human line alone. We have constructed some probable ancestral sequences at major points during primate evolution and have developed tentative trees showing the order of divergences and evolutionary distances among primate groups. Further questions on primate evolution could be answered in the future by the detemination of the appropriate sequences.  相似文献   

18.
Likelihood methods and methods using invariants are procedures for inferring the evolutionary relationships among species through statistical analysis of nucleic acid sequences. A likelihood-ratio test may be used to determine the feasibility of any tree for which the maximum likelihood can be computed. The method of linear invariants described by Cavender, which includes Lake's method of evolutionary parsimony as a special case, is essentially a form of the likelihood-ratio method. In the case of a small number of species (four or five), these methods may be used to find a confidence set for the correct tree. An exact version of Lake's asymptotic chi 2 test has been mentioned by Holmquist et al. Under very general assumptions, a one-sided exact test is appropriate, which greatly increases power.  相似文献   

19.
A comprehensive DNA analysis computer program was described in the second special issue of Nucleic Acids Research on the applications of computers to research on nucleic acids by Stone and Potter (1). Criteria used in designing the program were user friendliness, ability to handle large DNA sequences, low storage requirement, migratability to other computers and comprehensive analysis capability. The program has been used extensively in an industrial-research environment. This paper talks about improvements to that program. These improvements include testing for methylation blockage of restriction enzyme recognition sites, homology analysis, RNA folding analysis, integration of a large DNA database (GenBank), a site specific mutagenesis analysis, a protein database and protein searching programs. The original design of the DNA analysis program using a command executive from which any analytical programs can be called, has proven to be extremely versatile in integrating both developed and outside programs to the file management system employed.  相似文献   

20.
Computer programs for phylogenetic analysis have been important tools in systematics and evolutionary biology, but most have been designed primarily for the reconstruction of phylogenetic trees and not the interpretation of patterns of character evolution. Described here is the computer program MacClade, designed for interactive analysis of character evolution and phylogeny. For a given tree and a matrix of character data, MacClade displays its reconstruction of character evolution by shading the branches of the tree to indicate ancestral states. Trees can be manipulated for instance by picking up and moving branches. Assumptions underlying the reconstruction of character evolution can be varied extensively. With these manipulations and MacClade's graphical feedback, one can explore the relationships among phylogenetic trees, character data, assumptions and interpretations of character evolution. MacClade has extensive facilities for editing data, displaying various summaries of character evolution in charts and diagrams, and printing.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号