共查询到20条相似文献,搜索用时 15 毫秒
1.
PROMALS: towards accurate multiple sequence alignments of distantly related proteins 总被引:7,自引:0,他引:7
MOTIVATION: Accurate multiple sequence alignments are essential in protein structure modeling, functional prediction and efficient planning of experiments. Although the alignment problem has attracted considerable attention, preparation of high-quality alignments for distantly related sequences remains a difficult task. RESULTS: We developed PROMALS, a multiple alignment method that shows promising results for protein homologs with sequence identity below 10%, aligning close to half of the amino acid residues correctly on average. This is about three times more accurate than traditional pairwise sequence alignment methods. PROMALS algorithm derives its strength from several sources: (i) sequence database searches to retrieve additional homologs; (ii) accurate secondary structure prediction; (iii) a hidden Markov model that uses a novel combined scoring of amino acids and secondary structures; (iv) probabilistic consistency-based scoring applied to progressive alignment of profiles. Compared to the best alignment methods that do not use secondary structure prediction and database searches (e.g. MUMMALS, ProbCons and MAFFT), PROMALS is up to 30% more accurate, with improvement being most prominent for highly divergent homologs. Compared to SPEM and HHalign, which also employ database searches and secondary structure prediction, PROMALS shows an accuracy improvement of several percent. AVAILABILITY: The PROMALS web server is available at: http://prodata.swmed.edu/promals/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. 相似文献
2.
O'Sullivan O Suhre K Abergel C Higgins DG Notredame C 《Journal of molecular biology》2004,340(2):385-395
Most bioinformatics analyses require the assembly of a multiple sequence alignment. It has long been suspected that structural information can help to improve the quality of these alignments, yet the effect of combining sequences and structures has not been evaluated systematically. We developed 3DCoffee, a novel method for combining protein sequences and structures in order to generate high-quality multiple sequence alignments. 3DCoffee is based on TCoffee version 2.00, and uses a mixture of pairwise sequence alignments and pairwise structure comparison methods to generate multiple sequence alignments. We benchmarked 3DCoffee using a subset of HOMSTRAD, the collection of reference structural alignments. We found that combining TCoffee with the threading program Fugue makes it possible to improve the accuracy of our HOMSTRAD dataset by four percentage points when using one structure only per dataset. Using two structures yields an improvement of ten percentage points. The measures carried out on HOM39, a HOMSTRAD subset composed of distantly related sequences, show a linear correlation between multiple sequence alignment accuracy and the ratio of number of provided structure to total number of sequences. Our results suggest that in the case of distantly related sequences, a single structure may not be enough for computing an accurate multiple sequence alignment. 相似文献
3.
Background
In a computed protein multiple sequence alignment, the coreness of a column is the fraction of its substitutions that are in so-called core columns of the gold-standard reference alignment of its proteins. In benchmark suites of protein reference alignments, the core columns of the reference alignment are those that can be confidently labeled as correct, usually due to all residues in the column being sufficiently close in the spatial superposition of the known three-dimensional structures of the proteins. Typically the accuracy of a protein multiple sequence alignment that has been computed for a benchmark is only measured with respect to the core columns of the reference alignment. When computing an alignment in practice, however, a reference alignment is not known, so the coreness of its columns can only be predicted.Results
We develop for the first time a predictor of column coreness for protein multiple sequence alignments. This allows us to predict which columns of a computed alignment are core, and hence better estimate the alignment’s accuracy. Our approach to predicting coreness is similar to nearest-neighbor classification from machine learning, except we transform nearest-neighbor distances into a coreness prediction via a regression function, and we learn an appropriate distance function through a new optimization formulation that solves a large-scale linear programming problem. We apply our coreness predictor to parameter advising, the task of choosing parameter values for an aligner’s scoring function to obtain a more accurate alignment of a specific set of sequences. We show that for this task, our predictor strongly outperforms other column-confidence estimators from the literature, and affords a substantial boost in alignment accuracy.4.
5.
We offer a tool, denoted VISTAL, for two-dimensional visualization of protein structural alignments. VISTAL describes aligned structures as a series of matched secondary structure elements, colored according to the three-dimensional distance of their Calpha atoms. AVAILABILITY: VISTAL can be downloaded from http://trantor.bioc.columbia.edu/~kolodny/software.html. 相似文献
6.
Profile search methods based on protein domain alignments have proven to be useful tools in comparative sequence analysis. Domain alignments used by currently available search methods have been computed by sequence comparison. With the growth of the protein structure database, however, alignments of many domain pairs have also been computed by structure comparison. Here, we examine the extent to which information from these two sources agrees. We measure agreement with respect to identification of homologous regions in each protein, that is, with respect to the location of domain boundaries. We also measure agreement with respect to identification of homologous residue sites by comparing alignments and assessing the accuracy of the molecular models they predict. We find that domain alignments in publicly available collections based on sequence and structure comparison are largely consistent. However, the homologous regions identified by sequence comparison are often shorter than those identified by 3D structure comparison. In addition, when overall sequence similarity is low alignments from sequence comparison produce less accurate molecular models, suggesting that they less accurately identify homologous sites. These observations suggest that structure comparison results might be used to improve the overall accuracy of domain alignment collections and the performance of profile search methods based on them. 相似文献
7.
Greene SCPrimer: a rapid comprehensive tool for designing degenerate primers from multiple sequence alignments 总被引:2,自引:2,他引:2
Jabado OJ Palacios G Kapoor V Hui J Renwick N Zhai J Briese T Lipkin WI 《Nucleic acids research》2006,34(22):6605-6611
Polymerase chain reaction (PCR) is widely applied in clinical and environmental microbiology. Primer design is key to the development of successful assays and is often performed manually by using multiple nucleic acid alignments. Few public software tools exist that allow comprehensive design of degenerate primers for large groups of related targets based on complex multiple sequence alignments. Here we present a method for designing such primers based on tree building followed by application of a set covering algorithm, and demonstrate its utility in compiling Multiplex PCR primer panels for detection and differentiation of viral pathogens. 相似文献
8.
We develop a new threading algorithm MUSTER by extending the previous sequence profile-profile alignment method, PPA. It combines various sequence and structure information into single-body terms which can be conveniently used in dynamic programming search: (1) sequence profiles; (2) secondary structures; (3) structure fragment profiles; (4) solvent accessibility; (5) dihedral torsion angles; (6) hydrophobic scoring matrix. The balance of the weighting parameters is optimized by a grading search based on the average TM-score of 111 training proteins which shows a better performance than using the conventional optimization methods based on the PROSUP database. The algorithm is tested on 500 nonhomologous proteins independent of the training sets. After removing the homologous templates with a sequence identity to the target >30%, in 224 cases, the first template alignment has the correct topology with a TM-score >0.5. Even with a more stringent cutoff by removing the templates with a sequence identity >20% or detectable by PSI-BLAST with an E-value <0.05, MUSTER is able to identify correct folds in 137 cases with the first model of TM-score >0.5. Dependent on the homology cutoffs, the average TM-score of the first threading alignments by MUSTER is 5.1-6.3% higher than that by PPA. This improvement is statistically significant by the Wilcoxon signed rank test with a P-value < 1.0 x 10(-13), which demonstrates the effect of additional structural information on the protein fold recognition. The MUSTER server is freely available to the academic community at http://zhang.bioinformatics.ku.edu/MUSTER. 相似文献
9.
Pfam contains multiple alignments and hidden Markov model based profiles (HMM-profiles) of complete protein domains. The definition of domain boundaries, family members and alignment is done semi-automatically based on expert knowledge, sequence similarity, other protein family databases and the ability of HMM-profiles to correctly identify and align the members. Release 2.0 of Pfam contains 527 manually verified families which are available for browsing and on-line searching via the World Wide Web in the UK at http://www.sanger.ac.uk/Pfam/ and in the US at http://genome.wustl. edu/Pfam/ Pfam 2.0 matches one or more domains in 50% of Swissprot-34 sequences, and 25% of a large sample of predicted proteins from the Caenorhabditis elegans genome. 相似文献
10.
We present a novel method for the comparison of multiple protein alignments with assessment of statistical significance (COMPASS). The method derives numerical profiles from alignments, constructs optimal local profile-profile alignments and analytically estimates E-values for the detected similarities. The scoring system and E-value calculation are based on a generalization of the PSI-BLAST approach to profile-sequence comparison, which is adapted for the profile-profile case. Tested along with existing methods for profile-sequence (PSI-BLAST) and profile-profile (prof_sim) comparison, COMPASS shows increased abilities for sensitive and selective detection of remote sequence similarities, as well as improved quality of local alignments. The method allows prediction of relationships between protein families in the PFAM database beyond the range of conventional methods. Two predicted relations with high significance are similarities between various Rossmann-type folds and between various helix-turn-helix-containing families. The potential value of COMPASS for structure/function predictions is illustrated by the detection of an intricate homology between the DNA-binding domain of the CTF/NFI family and the MH1 domain of the Smad family. 相似文献
11.
Higa RH Cruz SA Kuser PR Yamagishi ME Fileto R Oliveira SR Mazoni I Santos EH Mancini AL Neshich G 《Genetics and molecular research : GMR》2006,5(1):127-137
Homology-derived secondary structure of proteins (HSSP) is a well-known database of multiple sequence alignments (MSAs) which merges information of protein sequences and their three-dimensional structures. It is available for all proteins whose structure is deposited in the PDB. It is also used by STING and (Java)Protein Dossier to calculate and present relative entropy as a measure of the degree of conservation for each residue of proteins whose structure has been solved and deposited in the PDB. However, if the STING and (Java)Protein Dossier are to provide support for analysis of protein structures modeled in computers or being experimentally solved but not yet deposited in the PDB, then we need a new method for building alignments having a flavor of HSSP alignments (myMSAr). The present study describes a new method and its corresponding databank (SH2QS--database of sequences homologue to the query [structure-having] sequence). Our main interest in making myMSAr was to measure the degree of residue conservation for a given query sequence, regardless of whether it has a corresponding structure deposited in the PDB. In this study, we compare the measurement of residue conservation provided by corresponding alignments produced by HSSP and SH2QS. As a case study, we also present two biologically relevant examples, the first one highlighting the equivalence of analysis of the degree of residue conservation by using HSSP or SH2QS alignments, and the second one presenting the degree of residue conservation for a structure modeled in a computer, which , as a consequence, does not have an alignment reported by HSSP. 相似文献
12.
Zhang Z Xiao J Wu J Zhang H Liu G Wang X Dai L 《Biochemical and biophysical research communications》2012,419(4):779-781
Constructing multiple homologous alignments for protein-coding DNA sequences is crucial for a variety of bioinformatic analyses but remains computationally challenging. With the growing amount of sequence data available and the ongoing efforts largely dependent on protein-coding DNA alignments, there is an increasing demand for a tool that can process a large number of homologous groups and generate multiple protein-coding DNA alignments. Here we present a parallel tool - ParaAT that is capable of parallelly constructing multiple protein-coding DNA alignments for a large number of homologs. As testified on empirical datasets, ParaAT is well suited for large-scale data analysis in the high-throughput era, providing good scalability and exhibiting high parallel efficiency for computationally demanding tasks. ParaAT is freely available for academic use only at http://cbb.big.ac.cn/software. 相似文献
13.
14.
Darío Guerrero Rocío Bautista David P Villalobos Francisco R Cantón M Gonzalo Claros 《Algorithms for molecular biology : AMB》2010,5(1):24
Background
Multiple sequence alignments are used to study gene or protein function, phylogenetic relations, genome evolution hypotheses and even gene polymorphisms. Virtually without exception, all available tools focus on conserved segments or residues. Small divergent regions, however, are biologically important for specific quantitative polymerase chain reaction, genotyping, molecular markers and preparation of specific antibodies, and yet have received little attention. As a consequence, they must be selected empirically by the researcher. AlignMiner has been developed to fill this gap in bioinformatic analyses. 相似文献15.
Daniel R Caffrey Paul H Dana Vidhya Mathur Marco Ocano Eun-Jong Hong Yaoyu E Wang Shyamal Somaroo Brian E Caffrey Shobha Potluri Enoch S Huang 《BMC bioinformatics》2007,8(1):381
Background
By virtue of their shared ancestry, homologous sequences are similar in their structure and function. Consequently, multiple sequence alignments are routinely used to identify trends that relate to function. This type of analysis is particularly productive when it is combined with structural and phylogenetic analysis. 相似文献16.
We describe a tool, THoR, that automatically creates and curates multiple sequence alignments representing protein domains. This exploits both PSI-BLAST and HMMER algorithms and provides an accurate and comprehensive alignment for any domain family. The entire process is designed for use via a web-browser, with simple links and cross-references to relevant information, to assist the assessment of biological significance. THoR has been benchmarked for accuracy using the SMART and pufferfish genome databases. 相似文献
17.
Valencia A 《Comparative and Functional Genomics》2003,4(4):424-427
Multiple sequence alignments have much to offer to the understanding of protein structure, evolution and function. We are developing approaches to use this information in predicting protein-binding specificity, intra-protein and protein-protein interactions, and in reconstructing protein interaction networks. 相似文献
18.
Motif3D is a web-based protein structure viewer designed to allow sequence motifs, and in particular those contained in the fingerprints of the PRINTS database, to be visualised on three-dimensional (3D) structures. Additional functionality is provided for the rhodopsin-like G protein-coupled receptors, enabling fingerprint motifs of any of the receptors in this family to be mapped onto the single structure available, that of bovine rhodopsin. Motif3D can be used via the web interface available at: http://www.bioinf.man.ac.uk/dbbrowser/motif3d/motif3d.html. 相似文献
19.
Stothard PM 《Bioinformatics (Oxford, England)》2001,17(2):198-199
COMBOSA3D is a program that allows sequence conservation to be viewed in its proper three-dimensional environment. It superimposes sequence alignment information onto a protein structure using a customizable color scheme, which is also applied to a textual sequence alignment for reference. AVAILABILITY: The program can be tested at http://www.bioinformatics.org/combosa3d/, and the source code is freely available. 相似文献
20.
AltAVisT: comparing alternative multiple sequence alignments 总被引:2,自引:0,他引:2
We introduce a WWW-based tool that is able to compare two alternative multiple alignments of a given sequence set. Regions where both alignments coincide are color-coded to visualize the local agreement between the two alignments and to identify those regions that can be considered to be reliably aligned. AVAILABILITY: http://bibiserv.techfak.uni-bielefeld.de/altavist/. 相似文献