期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences

Olaf?RP?Bininda-Emonds Email author 《BMC bioinformatics》2005,6(1):156

Background

Alignments of homologous DNA sequences are crucial for comparative genomics and phylogenetic analysis. However, multiple alignment represents a computationally difficult problem. For protein-coding DNA sequences, it is more advantageous in terms of both speed and accuracy to align the amino-acid sequences specified by the DNA sequences rather than the DNA sequences themselves. Many implementations making use of this concept of "translated alignments" are incomplete in the sense that they require the user to manually translate the DNA sequences and to perform the amino-acid alignment. As such, they are not well suited to large-scale automated alignments of large and/or numerous DNA data sets. 相似文献

2.

DNA reference alignment benchmarks based on tertiary structure of encoded proteins 总被引：1，自引：0，他引：1

Carroll H Beckstead W O'Connor T Ebbert M Clement M Snell Q McClellan D 《Bioinformatics (Oxford, England)》2007,23(19):2648-2649

MOTIVATION: Multiple sequence alignments (MSAs) are at the heart of bioinformatics analysis. Recently, a number of multiple protein sequence alignment benchmarks (i.e. BAliBASE, OXBench, PREFAB and SMART) have been released to evaluate new and existing MSA applications. These databases have been well received by researchers and help to quantitatively evaluate MSA programs on protein sequences. Unfortunately, analogous DNA benchmarks are not available, making evaluation of MSA programs difficult for DNA sequences. RESULTS: This work presents the first known multiple DNA sequence alignment benchmarks that are (1) comprised of protein-coding portions of DNA (2) based on biological features such as the tertiary structure of encoded proteins. These reference DNA databases contain a total of 3545 alignments, comprising of 68 581 sequences. Two versions of the database are available: mdsa_100s and mdsa_all. The mdsa_100s version contains the alignments of the data sets that TBLASTN found 100% sequence identity for each sequence. The mdsa_all version includes all hits with an E-value score above the threshold of 0.001. A primary use of these databases is to benchmark the performance of MSA applications on DNA data sets. The first such case study is included in the Supplementary Material. 相似文献

3.

Protein sequence threading: Averaging over structures

Russell AJ Torda AE 《Proteins》2002,47(4):496-505

Multiple sequence alignments are a routine tool in protein fold recognition, but multiple structure alignments are computationally less cooperative. This work describes a method for protein sequence threading and sequence-to-structure alignments that uses multiple aligned structures, the aim being to improve models from protein threading calculations. Sequences are aligned into a field due to corresponding sites in homologous proteins. On the basis of a test set of more than 570 protein pairs, the procedure does improve alignment quality, although no more than averaging over sequences. For the force field tested, the benefit of structure averaging is smaller than that of adding sequence similarity terms or a contribution from secondary structure predictions. Although there is a significant improvement in the quality of sequence-to-structure alignments, this does not directly translate to an immediate improvement in fold recognition capability. 相似文献

4.

Phylo-VISTA: interactive visualization of multiple DNA sequence alignments

Shah N Couronne O Pennacchio LA Brudno M Batzoglou S Bethel EW Rubin EM Hamann B Dubchak I 《Bioinformatics (Oxford, England)》2004,20(5):636-643

MOTIVATION: The power of multi-sequence comparison for biological discovery is well established. The need for new capabilities to visualize and compare cross-species alignment data is intensified by the growing number of genomic sequence datasets being generated for an ever-increasing number of organisms. To be efficient these visualization algorithms must support the ability to accommodate consistently a wide range of evolutionary distances in a comparison framework based upon phylogenetic relationships. RESULTS: We have developed Phylo-VISTA, an interactive tool for analyzing multiple alignments by visualizing a similarity measure for multiple DNA sequences. The complexity of visual presentation is effectively organized using a framework based upon interspecies phylogenetic relationships. The phylogenetic organization supports rapid, user-guided interspecies comparison. To aid in navigation through large sequence datasets, Phylo-VISTA leverages concepts from VISTA that provide a user with the ability to select and view data at varying resolutions. The combination of multiresolution data visualization and analysis, combined with the phylogenetic framework for interspecies comparison, produces a highly flexible and powerful tool for visual data analysis of multiple sequence alignments. AVAILABILITY: Phylo-VISTA is available at http://www-gsd.lbl.gov/phylovista. It requires an Internet browser with Java Plug-in 1.4.2 and it is integrated into the global alignment program LAGAN at http://lagan.stanford.edu 相似文献

5.

GenNon-h: Generating multiple sequence alignments on nonhomogeneous phylogenetic trees

AM Kedzierska M Casanellas 《BMC bioinformatics》2012,13(1):216

ABSTRACT: BACKGROUND: A number of software packages are available to generate DNA multiple sequence alignments (MSAs) evolved under continuous-time Markov processes on phylogenetic trees. On the other hand, methods of simulating the DNA MSA directly from the transition matrices do not exist. Moreover, existing software restricts to the time-reversible models and it is not optimized to generate nonhomogeneous data (i.e. placing distinct substitution rates at different lineages). RESULTS: We present the first package designed to generate MSAs evolving under discrete-time Markov processes on phylogenetic trees, directly from probability substitution matrices. Based on the input model and a phylogenetic tree in the Newick format (with branch lengths measured as the expected number of substitutions per site), the algorithm produces DNA alignments of desired length. GenNon-h is publicly available for download. CONCLUSION: The software presented here is an efficient tool to generate DNA MSAs on a given phylogenetic tree. GenNon-h provides the user with the nonstationary or nonhomogeneous phylogenetic data that is well suited for testing complex biological hypotheses, exploring the limits of the reconstruction algorithms and their robustness to such models. 相似文献

6.

PromoSer: A large-scale mammalian promoter and transcription start site identification service

Halees AS Leyfer D Weng Z 《Nucleic acids research》2003,31(13):3554-3559

相似文献

7.

Sigma-2: Multiple sequence alignment of non-coding DNA via an evolutionary model

Gayathri Jayaraman Rahul Siddharthan 《BMC bioinformatics》2010,11(1):464

Background

While most multiple sequence alignment programs expect that all or most of their input is known to be homologous, and penalise insertions and deletions, this is not a reasonable assumption for non-coding DNA, which is much less strongly conserved than protein-coding genes. Arguing that the goal of sequence alignment should be the detection of homology and not similarity, we incorporate an evolutionary model into a previously published multiple sequence alignment program for non-coding DNA, Sigma, as a sensitive likelihood-based way to assess the significance of alignments. Version 1 of Sigma was successful in eliminating spurious alignments but exhibited relatively poor sensitivity on synthetic data. Sigma 1 used a p-value (the probability under the "null hypothesis" of non-homology) to assess the significance of alignments, and, optionally, a background model that captured short-range genomic correlations. Sigma version 2, described here, retains these features, but calculates the p-value using a sophisticated evolutionary model that we describe here, and also allows for a transition matrix for different substitution rates from and to different nucleotides. Our evolutionary model takes separate account of mutation and fixation, and can be extended to allow for locally differing functional constraints on sequence. 相似文献

8.

CSA: An efficient algorithm to improve circular DNA multiple alignment

Francisco Fernandes Luísa Pereira Ana T Freitas 《BMC bioinformatics》2009,10(1):230

Background

The comparison of homologous sequences from different species is an essential approach to reconstruct the evolutionary history of species and of the genes they harbour in their genomes. Several complete mitochondrial and nuclear genomes are now available, increasing the importance of using multiple sequence alignment algorithms in comparative genomics. MtDNA has long been used in phylogenetic analysis and errors in the alignments can lead to errors in the interpretation of evolutionary information. Although a large number of multiple sequence alignment algorithms have been proposed to date, they all deal with linear DNA and cannot handle directly circular DNA. Researchers interested in aligning circular DNA sequences must first rotate them to the "right" place using an essentially manual process, before they can use multiple sequence alignment tools. 相似文献

9.

SCGPred： A Score-based Method for Gene Structure Prediction by Combining Multiple Sources of Evidence

Xiao Li Qingan Ren Yang Weng Haoyang Cai Yunmin Zhu Yizheng Zhang 《基因组蛋白质组与生物信息学报(英文版)》2008,6(3):175-185

Predicting protein-coding genes still remains a significant challenge. Although a variety of computational programs that use commonly machine learning methods have emerged, the accuracy of predictions remains a low level when implementing in large genomic sequences. Moreover, computational gene finding in newly se- quenced genomes is especially a difficult task due to the absence of a training set of abundant validated genes. Here we present a new gene-finding program, SCGPred, to improve the accuracy of prediction by combining multiple sources of evidence. SCGPred can perform both supervised method in previously well-studied genomes and unsupervised one in novel genomes. By testing with datasets composed of large DNA sequences from human and a novel genome of Ustilago maydi, SCGPred gains a significant improvement in comparison to the popular ab initio gene predictors. We also demonstrate that SCGPred can significantly improve prediction in novel genomes by combining several foreign gene finders with similarity alignments, which is superior to other unsupervised methods. Therefore, SCGPred can serve as an alternative gene-finding tool for newly sequenced eukaryotic genomes. The program is freely available at http：//bio.scu.edu.cn/SCGPred/. 相似文献

10.

AftrRAD: a pipeline for accurate and efficient de novo assembly of RADseq data

Michael G. Sovic Anthony C. Fries H. Lisle Gibbs 《Molecular ecology resources》2015,15(5):1163-1171

An increase in studies using restriction site‐associated DNA sequencing (RADseq) methods has led to a need for both the development and assessment of novel bioinformatic tools that aid in the generation and analysis of these data. Here, we report the availability of AftrRAD, a bioinformatic pipeline that efficiently assembles and genotypes RADseq data, and outputs these data in various formats for downstream analyses. We use simulated and experimental data sets to evaluate AftrRAD's ability to perform accurate de novo assembly of loci, and we compare its performance with two other commonly used programs, stacks and pyrad. We demonstrate that AftrRAD is able to accurately assemble loci, while accounting for indel variation among alleles, in a more computationally efficient manner than currently available programs. AftrRAD run times are not strongly affected by the number of samples in the data set, making this program a useful tool when multicore systems are not available for parallel processing, or when data sets include large numbers of samples. 相似文献

11.

HMM-Kalign: a tool for generating sub-optimal HMM alignments

Becker E Cotillard A Meyer V Madaoui H Guérois R 《Bioinformatics (Oxford, England)》2007,23(22):3095-3097

Recent development of strategies using multiple sequence alignments (MSA) or profiles to detect remote homologies between proteins has led to a significant increase in the number of proteins whose structures can be generated by comparative modeling methods. However, prediction of the optimal alignment between these highly divergent homologous proteins remains a difficult issue. We present a tool based on a generalized Viterbi algorithm that generates optimal and sub-optimal alignments between a sequence and a Hidden Markov Model. The tool is implemented as a new function within the HMMER package called hmmkalign. 相似文献

12.

Increasing Sequence Search Sensitivity with Transitive Alignments

Ketil Malde Tomasz Furmanek 《PloS one》2013,8(2)

Sequence alignment is an important bioinformatics tool for identifying homology, but searching against the full set of available sequences is likely to result in many hits to poorly annotated sequences providing very little information. Consequently, we often want alignments against a specific subset of sequences: for instance, we are looking for sequences from a particular species, sequences that have known 3d-structures, sequences that have a reliable (curated) function annotation, and so on. Although such subset databases are readily available, they only represent a small fraction of all sequences. Thus, the likelihood of finding close homologs for query sequences is smaller, and the alignments will in general have lower scores. This makes it difficult to distinguish hits to homologous sequences from random hits to unrelated sequences. Here, we propose a method that addresses this problem by first aligning query sequences against a large database representing the corpus of known sequences, and then constructing indirect (or transitive) alignments by combining the results with alignments from the large database against the desired target database. We compare the results to direct pairwise alignments, and show that our method gives us higher sensitivity alignments against the target database. 相似文献

13.

OCPAT: an online codon-preserved alignment tool for evolutionary genomic analysis of protein coding sequences

Guozhen Liu Monica Uddin Munirul Islam Morris Goodman Lawrence I Grossman Roberto Romero Derek E Wildman 《Source code for biology and medicine》2007,2(1):5

相似文献

14.

Browsing protein families via the 'Rich Family Description' format

Corpet F Gouzy J Kahn D 《Bioinformatics (Oxford, England)》1999,15(12):1020-1027

相似文献

15.

A Novel Heuristic for Local Multiple Alignment of Interspersed DNA Repeats

Treangen T.J. Darling A.E. Achaz G. Ragan M.A. Messeguer X. Rocha E.P.C. 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2009,6(2):180-189

Pairwise local sequence alignment methods have been the prevailing technique to identify homologous nucleotides between related species. However, existing methods that identify and align all homologous nucleotides in one or more genomes have suffered from poor scalability and limited accuracy. We propose a novel method that couples a gapped extension heuristic with an efficient filtration method for identifying interspersed repeats in genome sequences. During gapped extension, we use the MUSCLE implementation of progressive global multiple alignment with iterative refinement. The resulting gapped extensions potentially contain alignments of unrelated sequence. We detect and remove such undesirable alignments using a hidden Markov model (HMM) to predict the posterior probability of homology. The HMM emission frequencies for nucleotide substitutions can be derived from any time-reversible nucleotide substitution matrix. We evaluate the performance of our method and previous approaches on a hybrid data set of real genomic DNA with simulated interspersed repeats. Our method outperforms a related method in terms of sensitivity, positive predictive value, and localizing boundaries of homology. The described methods have been implemented in freely available software, Repeatoire, available from: http://wwwabi.snv.jussieu.fr/public/Repeatoire. 相似文献

16.

A pairwise alignment algorithm which favors clusters of blocks.

Elodie Nédélec Thomas Moncion Elisabeth Gassiat Bruno Bossard Guillemette Duchateau-Nguyen Alain Denise Michel Termier 《Journal of computational biology》2005,12(1):33-47

Pairwise sequence alignments aim to decide whether two sequences are related and, if so, to exhibit their related domains. Recent works have pointed out that a significant number of true homologous sequences are missed when using classical comparison algorithms. This is the case when two homologous sequences share several little blocks of homology, too small to lead to a significant score. On the other hand, classical alignment algorithms, when detecting homologies, may fail to recognize all the significant biological signals. The aim of the paper is to give a solution to these two problems. We propose a new scoring method which tends to increase the score of an alignment when "blocks" are detected. This so-called Block-Scoring algorithm, which makes use of dynamic programming, is worth being used as a complementary tool to classical exact alignments methods. We validate our approach by applying it on a large set of biological data. Finally, we give a limit theorem for the score statistics of the algorithm. 相似文献

17.

DNAAlignEditor: DNA alignment editor tool

Hector Sanchez-Villeda Steven Schroeder Sherry Flint-Garcia Katherine E Guill Masanori Yamasaki Michael D McMullen 《BMC bioinformatics》2008,9(1):154

Background

With advances in DNA re-sequencing methods and Next-Generation parallel sequencing approaches, there has been a large increase in genomic efforts to define and analyze the sequence variability present among individuals within a species. For very polymorphic species such as maize, this has lead to a need for intuitive, user-friendly software that aids the biologist, often with naïve programming capability, in tracking, editing, displaying, and exporting multiple individual sequence alignments. To fill this need we have developed a novel DNA alignment editor.

Results

We have generated a nucleotide sequence alignment editor (DNAAlignEditor) that provides an intuitive, user-friendly interface for manual editing of multiple sequence alignments with functions for input, editing, and output of sequence alignments. The color-coding of nucleotide identity and the display of associated quality score aids in the manual alignment editing process. DNAAlignEditor works as a client/server tool having two main components: a relational database that collects the processed alignments and a user interface connected to database through universal data access connectivity drivers. DNAAlignEditor can be used either as a stand-alone application or as a network application with multiple users concurrently connected.

Conclusion

We anticipate that this software will be of general interest to biologists and population genetics in editing DNA sequence alignments and analyzing natural sequence variation regardless of species, and will be particularly useful for manual alignment editing of sequences in species with high levels of polymorphism.

相似文献

18.

4P: fast computing of population genetics statistics from large DNA polymorphism panels

下载免费PDF全文

Andrea Benazzo Alex Panziera Giorgio Bertorelle 《Ecology and evolution》2015,5(1):172-175

Massive DNA sequencing has significantly increased the amount of data available for population genetics and molecular ecology studies. However, the parallel computation of simple statistics within and between populations from large panels of polymorphic sites is not yet available, making the exploratory analyses of a set or subset of data a very laborious task. Here, we present 4P (parallel processing of polymorphism panels), a stand‐alone software program for the rapid computation of genetic variation statistics (including the joint frequency spectrum) from millions of DNA variants in multiple individuals and multiple populations. It handles a standard input file format commonly used to store DNA variation from empirical or simulation experiments. The computational performance of 4P was evaluated using large SNP (single nucleotide polymorphism) datasets from human genomes or obtained by simulations. 4P was faster or much faster than other comparable programs, and the impact of parallel computing using multicore computers or servers was evident. 4P is a useful tool for biologists who need a simple and rapid computer program to run exploratory population genetics analyses in large panels of genomic data. It is also particularly suitable to analyze multiple data sets produced in simulation studies. Unix, Windows, and MacOs versions are provided, as well as the source code for easier pipeline implementations. 相似文献

19.

GeneWiz browser: An Interactive Tool for Visualizing Sequenced Chromosomes

Hallin PF Stærfeldt HH Rotenberg E Binnewies TT Benham CJ Ussery DW 《Standards in genomic sciences》2009,1(2):204-215

We present an interactive web application for visualizing genomic data of prokaryotic chromosomes. The tool (GeneWiz browser) allows users to carry out various analyses such as mapping alignments of homologous genes to other genomes, mapping of short sequencing reads to a reference chromosome, and calculating DNA properties such as curvature or stacking energy along the chromosome. The GeneWiz browser produces an interactive graphic that enables zooming from a global scale down to single nucleotides, without changing the size of the plot. Its ability to disproportionally zoom provides optimal readability and increased functionality compared to other browsers. The tool allows the user to select the display of various genomic features, color setting and data ranges. Custom numerical data can be added to the plot allowing, for example, visualization of gene expression and regulation data. Further, standard atlases are pre-generated for all prokaryotic genomes available in GenBank, providing a fast overview of all available genomes, including recently deposited genome sequences. The tool is available online from http://www.cbs.dtu.dk/services/gwBrowser. Supplemental material including interactive atlases is available online at http://www.cbs.dtu.dk/services/gwBrowser/suppl/. 相似文献

20.

ETOPE: Evolutionary test of predicted exons

Nekrutenko A Chung WY Li WH 《Nucleic acids research》2003,31(13):3564-3567

Since a large number of computationally predicted exons are not supported by existing sequence (e.g. ESTs) or experimental (e.g. expression analysis) data they need to be validated by other methods. ETOPE is designed to test computational predictions by using signals that have not been included in any current computational prediction method. The test is based on the ratio of non-synonymous to synonymous substitution rates between sequences from different genomes. It has been previously shown, by empirical data and computer simulation, to be a powerful criterion for identifying protein-coding regions. The ETOPE is available at http://nekrut.uchicago.edu/etope/. 相似文献