共查询到20条相似文献,搜索用时 0 毫秒
1.
ABSTRACT: BACKGROUND: A number of software packages are available to generate DNA multiple sequence alignments (MSAs) evolved under continuous-time Markov processes on phylogenetic trees. On the other hand, methods of simulating the DNA MSA directly from the transition matrices do not exist. Moreover, existing software restricts to the time-reversible models and it is not optimized to generate nonhomogeneous data (i.e. placing distinct substitution rates at different lineages). RESULTS: We present the first package designed to generate MSAs evolving under discrete-time Markov processes on phylogenetic trees, directly from probability substitution matrices. Based on the input model and a phylogenetic tree in the Newick format (with branch lengths measured as the expected number of substitutions per site), the algorithm produces DNA alignments of desired length. GenNon-h is publicly available for download. CONCLUSION: The software presented here is an efficient tool to generate DNA MSAs on a given phylogenetic tree. GenNon-h provides the user with the nonstationary or nonhomogeneous phylogenetic data that is well suited for testing complex biological hypotheses, exploring the limits of the reconstruction algorithms and their robustness to such models. 相似文献
2.
Holger Wagner Burkhard Morgenstern Andreas Dress 《Algorithms for molecular biology : AMB》2008,3(1):15
Background
Sequence-based phylogeny reconstruction is a fundamental task in Bioinformatics. Practically all methods for phylogeny reconstruction are based on multiple alignments. The quality and stability of the underlying alignments is therefore crucial for phylogenetic analysis. 相似文献3.
Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis 总被引:34,自引:0,他引:34
Castresana J 《Molecular biology and evolution》2000,17(4):540-552
The use of some multiple-sequence alignments in phylogenetic analysis, particularly those that are not very well conserved, requires the elimination of poorly aligned positions and divergent regions, since they may not be homologous or may have been saturated by multiple substitutions. A computerized method that eliminates such positions and at the same time tries to minimize the loss of informative sites is presented here. The method is based on the selection of blocks of positions that fulfill a simple set of requirements with respect to the number of contiguous conserved positions, lack of gaps, and high conservation of flanking positions, making the final alignment more suitable for phylogenetic analysis. To illustrate the efficiency of this method, alignments of 10 mitochondrial proteins from several completely sequenced mitochondrial genomes belonging to diverse eukaryotes were used as examples. The percentages of removed positions were higher in the most divergent alignments. After removing divergent segments, the amino acid composition of the different sequences was more uniform, and pairwise distances became much smaller. Phylogenetic trees show that topologies can be different after removing conserved blocks, particularly when there are several poorly resolved nodes. Strong support was found for the grouping of animals and fungi but not for the position of more basal eukaryotes. The use of a computerized method such as the one presented here reduces to a certain extent the necessity of manually editing multiple alignments, makes the automation of phylogenetic analysis of large data sets feasible, and facilitates the reproduction of the final alignment by other researchers. 相似文献
4.
Using multiple interdependency to separate functional from phylogenetic correlations in protein alignments 总被引:3,自引:0,他引:3
MOTIVATION: Multiple sequence alignments of homologous proteins are useful for inferring their phylogenetic history and to reveal functionally important regions in the proteins. Functional constraints may lead to co-variation of two or more amino acids in the sequence, such that a substitution at one site is accompanied by compensatory substitutions at another site. It is not sufficient to find the statistical correlations between sites in the alignment because these may be the result of several undetermined causes. In particular, phylogenetic clustering will lead to many strong correlations. RESULTS: A procedure is developed to detect statistical correlations stemming from functional interaction by removing the strong phylogenetic signal that leads to the correlations of each site with many others in the sequence. Our method relies upon the accuracy of the alignment but it does not require any assumptions about the phylogeny or the substitution process. The effectiveness of the method was verified using computer simulations and then applied to predict functional interactions between amino acids in the Pfam database of alignments. 相似文献
5.
Given a family of related sequences, one can first determinealignments between various pairs of those sequences, then constructa simultaneous alignment of all the sequences that is determinedin a natural manner by the set of pairwise alignments. Thisapproach is sometimes effective for exposing the existence andlocations of conserved regions, which can then be aligned bymore sensitive multiple-alignment methods. This paper presentsan efficient algorithm for constructing a multiple alignmentfrom a set of pairwise alignments. 相似文献
6.
Liu K Warnow TJ Holder MT Nelesen SM Yu J Stamatakis AP Linder CR 《Systematic biology》2012,61(1):90-106
Highly accurate estimation of phylogenetic trees for large data sets is difficult, in part because multiple sequence alignments must be accurate for phylogeny estimation methods to be accurate. Coestimation of alignments and trees has been attempted but currently only SATé estimates reasonably accurate trees and alignments for large data sets in practical time frames (Liu K., Raghavan S., Nelesen S., Linder C.R., Warnow T. 2009b. Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees. Science. 324:1561-1564). Here, we present a modification to the original SATé algorithm that improves upon SATé (which we now call SATé-I) in terms of speed and of phylogenetic and alignment accuracy. SATé-II uses a different divide-and-conquer strategy than SATé-I and so produces smaller more closely related subsets than SATé-I; as a result, SATé-II produces more accurate alignments and trees, can analyze larger data sets, and runs more efficiently than SATé-I. Generally, SATé is a metamethod that takes an existing multiple sequence alignment method as an input parameter and boosts the quality of that alignment method. SATé-II-boosted alignment methods are significantly more accurate than their unboosted versions, and trees based upon these improved alignments are more accurate than trees based upon the original alignments. Because SATé-I used maximum likelihood (ML) methods that treat gaps as missing data to estimate trees and because we found a correlation between the quality of tree/alignment pairs and ML scores, we explored the degree to which SATé's performance depends on using ML with gaps treated as missing data to determine the best tree/alignment pair. We present two lines of evidence that using ML with gaps treated as missing data to optimize the alignment and tree produces very poor results. First, we show that the optimization problem where a set of unaligned DNA sequences is given and the output is the tree and alignment of those sequences that maximize likelihood under the Jukes-Cantor model is uninformative in the worst possible sense. For all inputs, all trees optimize the likelihood score. Second, we show that a greedy heuristic that uses GTR+Gamma ML to optimize the alignment and the tree can produce very poor alignments and trees. Therefore, the excellent performance of SATé-II and SATé-I is not because ML is used as an optimization criterion for choosing the best tree/alignment pair but rather due to the particular divide-and-conquer realignment techniques employed. 相似文献
7.
MOTIVATION: The program ESPript (Easy Sequencing in PostScript) allows the rapid visualization, via PostScript output, of sequences aligned with popular programs such as CLUSTAL-W or GCG PILEUP. It can read secondary structure files (such as that created by the program DSSP) to produce a synthesis of both sequence and structural information. RESULTS: ESPript can be run via a command file or a friendly html-based user interface. The program calculates an homology score by columns of residues and can sort this calculation by groups of sequences. It offers a palette of markers to highlight important regions in the alignment. ESPript can also paste information on residue conservation into coordinate files, for subsequent visualization with a graphics program. AVAILABILITY: ESPript can be accessed on its Web site at http://www.ipbs.fr/ESPript. Sources and helpfiles can be downloaded via anonymous ftp from ftp.ipbs.fr. A tar file is held in the directory pub/ESPript. 相似文献
8.
Lebrun E Santini JM Brugna M Ducluzeau AL Ouchane S Schoepp-Cothenet B Baymann F Nitschke W 《Molecular biology and evolution》2006,23(6):1180-1191
Previously published phylogenetic trees reconstructed on "Rieske protein" sequences frequently are at odds with each other, with those of other subunits of the parent enzymes and with small-subunit rRNA trees. These differences are shown to be at least partially if not completely due to problems in the reconstruction procedures. A major source of erroneous Rieske protein trees lies in the presence of a large, poorly conserved domain prone to accommodate very long insertions in well-defined structural hot spots substantially hampering multiple alignments. The remaining smaller domain, in contrast, is too conserved to allow distant phylogenies to be deduced with sufficient confidence. Three-dimensional structures of representatives from this protein family are now available from phylogenetically distant species and from diverse enzymes. Multiple alignments can thus be refined on the basis of these structures. We show that structurally guided alignments of Rieske proteins from Rieske-cytochrome b complexes and arsenite oxidases strongly reduce conflicts between resulting trees and those obtained on their companion enzyme subunits. Further problems encountered during this work, mainly consisting in database errors such as wrong annotations and frameshifts, are described. The obtained results are discussed against the background of hypotheses stipulating pervasive lateral gene transfer in prokaryotes. 相似文献
9.
10.
Wuster A Venkatakrishnan AJ Schertler GF Babu MM 《Bioinformatics (Oxford, England)》2010,26(22):2906-2907
MOTIVATION: Spial (Specificity in alignments) is a tool for the comparative analysis of two alignments of evolutionarily related sequences that differ in their function, such as two receptor subtypes. It highlights functionally important residues that are either specific to one of the two alignments or conserved across both alignments. It permits visualization of this information in three complementary ways: by colour-coding alignment positions, by sequence logos and optionally by colour-coding the residues of a protein structure provided by the user. This can aid in the detection of residues that are involved in the subtype-specific interaction with a ligand, other proteins or nucleic acids. Spial may also be used to detect residues that may be post-translationally modified in one of the two sets of sequences. AVAILABILITY: http://www.mrc-lmb.cam.ac.uk/genomes/spial/; supplementary information is available at http://www.mrc-lmb.cam.ac.uk/genomes/spial/help.html. 相似文献
11.
AltAVisT: comparing alternative multiple sequence alignments 总被引:2,自引:0,他引:2
We introduce a WWW-based tool that is able to compare two alternative multiple alignments of a given sequence set. Regions where both alignments coincide are color-coded to visualize the local agreement between the two alignments and to identify those regions that can be considered to be reliably aligned. AVAILABILITY: http://bibiserv.techfak.uni-bielefeld.de/altavist/. 相似文献
12.
Chakrabarti S Lanczycki CJ Panchenko AR Przytycka TM Thiessen PA Bryant SH 《Nucleic acids research》2006,34(9):2598-2606
Accurate multiple sequence alignments of proteins are very important to several areas of computational biology and provide an understanding of phylogenetic history of domain families, their identification and classification. This article presents a new algorithm, REFINER, that refines a multiple sequence alignment by iterative realignment of its individual sequences with the predetermined conserved core (block) model of a protein family. Realignment of each sequence can correct misalignments between a given sequence and the rest of the profile and at the same time preserves the family's overall block model. Large-scale benchmarking studies showed a noticeable improvement of alignment after refinement. This can be inferred from the increased alignment score and enhanced sensitivity for database searching using the sequence profiles derived from refined alignments compared with the original alignments. A standalone version of the program is available by ftp distribution (ftp://ftp.ncbi.nih.gov/pub/REFINER) and will be incorporated into the next release of the Cn3D structure/alignment viewer. 相似文献
13.
S Henikoff 《The New biologist》1991,3(12):1148-1154
Block alignments of multiple amino acid sequences are useful representations of regions thought to share common ancestry and function. Often the block alignments are motivated by the expectation that a protein of interest is similar in function to members of a family of proteins. However, when alignments are forced by using ad hoc methods, it is often difficult to decide whether the proposed relationship is valid. Visual examination can be deceptive, especially when alignments are not carried out in the context of controls subjected to similar procedures. Even computer-aided methods can be misleading when biases are introduced. To illustrate some of the problems that can arise, a few examples from the literature are analyzed. It is concluded that when standard methods fail to find an interesting block alignment unaided by human intervention, then the result should be regarded with caution. 相似文献
14.
SUMMARY: BLAST statistics have been shown to be extremely useful for searching for significant similarity hits, for amino acid and nucleotide sequences. Although these statistics are well understood for pairwise comparisons, there has been little success developing statistical scores for multiple alignments. In particular, there is no score for multiple alignment that is well founded and treated as a standard. We extend the BLAST theory to multiple alignments. Following some simple assumptions, we present and justify a significance score for multiple segments of a local multiple alignment. We demonstrate its usefulness in distinguishing high and moderate quality multiple alignments from low quality ones, with supporting experiments on orthologous vertebrate promoter sequences. 相似文献
15.
Background
The quality of multiple sequence alignments plays an important role in the accuracy of phylogenetic inference. It has been shown that removing ambiguously aligned regions, but also other sources of bias such as highly variable (saturated) characters, can improve the overall performance of many phylogenetic reconstruction methods. A current scientific trend is to build phylogenetic trees from a large number of sequence datasets (semi-)automatically extracted from numerous complete genomes. Because these approaches do not allow a precise manual curation of each dataset, there exists a real need for efficient bioinformatic tools dedicated to this alignment character trimming step. 相似文献16.
The partition matrix: exploring variable phylogenetic signals along nucleotide sequence alignments 总被引:4,自引:2,他引:4
The partition matrix is a graphical tool for comparative analysis of
nucleotide sequences following alignment. It is particularly useful for
investigating the divergent phylogenies of sequence regions undergoing
reticulate evolution. A partition matrix is generated by determining the
consistency of the parsimoniously informative sites in a set of aligned
sequences with the binary partitions inferred from the sequences. Since the
linear order of sites is maintained, the matrix can be used to assess
whether the distribution of sites either supporting or conflicting with
particular partitions changes along the length of the alignment. The
usefulness of the matrix in allowing visual identification of differences
in evolutionary history among regions depends on the order in which
partitions are shown; several suitable ordering schemes are proposed. We
demonstrate the use of the partition matrix in interpreting the evolution
of the pseudoautosomal boundary region on the sex chromosome of catarrhine
primates. Its routine use should help to avoid attempts to derive single
phylogenies from sequences whose evolution has been reticulate and to
identify the gene conversion or recombination events underlying the
reticulation. The method is relatively fast. It is exploratory, and it can
form the basis for more formal analysis, which we discuss.
相似文献
17.
Genome-wide multiple sequence alignments (MSAs) are a necessary prerequisite for an increasingly diverse collection of comparative genomic approaches. Here we present a versatile method that generates high-quality MSAs for non-protein-coding sequences. The NcDNAlign pipeline combines pairwise BLAST alignments to create initial MSAs, which are then locally improved and trimmed. The program is optimized for speed and hence is particulary well-suited to pilot studies. We demonstrate the practical use of NcDNAlign in three case studies: the search for ncRNAs in gammaproteobacteria and the analysis of conserved noncoding DNA in nematodes and teleost fish, in the latter case focusing on the fate of duplicated ultra-conserved regions. Compared to the currently widely used genome-wide alignment program TBA, our program results in a 20- to 30-fold reduction of CPU time necessary to generate gammaproteobacterial alignments. A showcase application of bacterial ncRNA prediction based on alignments of both algorithms results in similar sensitivity, false discovery rates, and up to 100 putatively novel ncRNA structures. Similar findings hold for our application of NcDNAlign to the identification of ultra-conserved regions in nematodes and teleosts. Both approaches yield conserved sequences of unknown function, result in novel evolutionary insights into conservation patterns among these genomes, and manifest the benefits of an efficient and reliable genome-wide alignment package. The software is available under the GNU Public License at http://www.bioinf.uni-leipzig.de/Software/NcDNAlign/. 相似文献
18.
CHROMA annotates multiple protein sequence alignments by consensus to produce formatted and coloured text suitable for incorporation into other documents for publication. The package is designed to be flexible and reliable, and has a simple-to-use graphical user interface running under Microsoft Windows. Both the executables and source code for CHROMA running under Windows and Linux (portable command-line only) are freely available at http://www.lg.ndirect.co.uk/chroma. Software enquiries should be directed to CHROMA@lg.ndirect.co.uk. 相似文献
19.
Shah N Couronne O Pennacchio LA Brudno M Batzoglou S Bethel EW Rubin EM Hamann B Dubchak I 《Bioinformatics (Oxford, England)》2004,20(5):636-643
MOTIVATION: The power of multi-sequence comparison for biological discovery is well established. The need for new capabilities to visualize and compare cross-species alignment data is intensified by the growing number of genomic sequence datasets being generated for an ever-increasing number of organisms. To be efficient these visualization algorithms must support the ability to accommodate consistently a wide range of evolutionary distances in a comparison framework based upon phylogenetic relationships. RESULTS: We have developed Phylo-VISTA, an interactive tool for analyzing multiple alignments by visualizing a similarity measure for multiple DNA sequences. The complexity of visual presentation is effectively organized using a framework based upon interspecies phylogenetic relationships. The phylogenetic organization supports rapid, user-guided interspecies comparison. To aid in navigation through large sequence datasets, Phylo-VISTA leverages concepts from VISTA that provide a user with the ability to select and view data at varying resolutions. The combination of multiresolution data visualization and analysis, combined with the phylogenetic framework for interspecies comparison, produces a highly flexible and powerful tool for visual data analysis of multiple sequence alignments. AVAILABILITY: Phylo-VISTA is available at http://www-gsd.lbl.gov/phylovista. It requires an Internet browser with Java Plug-in 1.4.2 and it is integrated into the global alignment program LAGAN at http://lagan.stanford.edu 相似文献
20.