共查询到20条相似文献,搜索用时 15 毫秒
1.
A structure-based method for protein sequence alignment 总被引:1,自引:0,他引:1
Kann MG Thiessen PA Panchenko AR Schäffer AA Altschul SF Bryant SH 《Bioinformatics (Oxford, England)》2005,21(8):1451-1456
MOTIVATION: With the continuing rapid growth of protein sequence data, protein sequence comparison methods have become the most widely used tools of bioinformatics. Among these methods are those that use position-specific scoring matrices (PSSMs) to describe protein families. PSSMs can capture information about conserved patterns within families, which can be used to increase the sensitivity of searches for related sequences. Certain types of structural information, however, are not generally captured by PSSM search methods. Here we introduce a program, Structure-based ALignment TOol (SALTO), that aligns protein query sequences to PSSMs using rules for placing and scoring gaps that are consistent with the conserved regions of domain alignments from NCBI's Conserved Domain Database. RESULTS: In most cases, the alignment scores obtained using the local alignment version follow an extreme value distribution. SALTO's performance in finding related sequences and producing accurate alignments is similar to or better than that of IMPALA; one advantage of SALTO is that it imposes an explicit gapping model on each protein family. AVAILABILITY: A stand-alone version of the program that can generate global or local alignments is available by ftp distribution (ftp://ftp.ncbi.nih.gov/pub/SALTO/), and has been incorporated to Cn3D structure/alignment viewer. CONTACT: bryant@ncbi.nlm.nih.gov. 相似文献
2.
Comparison of multiple protein structures has a broad range of applications in the analysis of protein structure, function and evolution. Multiple structure alignment tools (MSTAs) are necessary to obtain a simultaneous comparison of a family of related folds. In this study, we have developed a method for multiple structure comparison largely based on sequence alignment techniques. A widely used Structural Alphabet named Protein Blocks (PBs) was used to transform the information on 3D protein backbone conformation as a 1D sequence string. A progressive alignment strategy similar to CLUSTALW was adopted for multiple PB sequence alignment (mulPBA). Highly similar stretches identified by the pairwise alignments are given higher weights during the alignment. The residue equivalences from PB based alignments are used to obtain a three dimensional fit of the structures followed by an iterative refinement of the structural superposition. Systematic comparisons using benchmark datasets of MSTAs underlines that the alignment quality is better than MULTIPROT, MUSTANG and the alignments in HOMSTRAD, in more than 85% of the cases. Comparison with other rigid-body and flexible MSTAs also indicate that mulPBA alignments are superior to most of the rigid-body MSTAs and highly comparable to the flexible alignment methods. 相似文献
3.
Patterns of receptor-ligand interaction can be conserved in functionally equivalent proteins even in the absence of sequence homology. Therefore, structural comparison of ligand-binding pockets and their pharmacophoric features allow for the characterization of so-called "orphan" proteins with known three-dimensional structure but unknown function, and predict ligand promiscuity of binding pockets. We present an algorithm for rapid pocket comparison (PoLiMorph), in which protein pockets are represented by self-organizing graphs that fill the volume of the cavity. Vertices in these three-dimensional frameworks contain information about the local ligand-receptor interaction potential coded by fuzzy property labels. For framework matching, we developed a fast heuristic based on the maximum dispersion problem, as an alternative to techniques utilizing clique detection or geometric hashing algorithms. A sophisticated scoring function was applied that incorporates knowledge about property distributions and ligand-receptor interaction patterns. In an all-against-all virtual screening experiment with 207 pocket frameworks extracted from a subset of PDBbind, PoLiMorph correctly assigned 81% of 69 distinct structural classes and demonstrated sustained ability to group pockets accommodating the same ligand chemotype. We determined a score threshold that indicates "true" pocket similarity with high reliability, which not only supports structure-based drug design but also allows for sequence-independent studies of the proteome. 相似文献
4.
5.
6.
We propose a detailed protein structure alignment method named "MatAlign". It is a two-step algorithm. Firstly, we represent 3D protein structures as 2D distance matrices, and align these matrices by means of dynamic programming in order to find the initially aligned residue pairs. Secondly, we refine the initial alignment iteratively into the optimal one according to an objective scoring function. We compare our method against DALI and CE, which are among the most accurate and the most widely used of the existing structural comparison tools. On the benchmark set of 68 protein structure pairs by Fischer et al., MatAlign provides better alignment results, according to four different criteria, than both DALI and CE in a majority of cases. MatAlign also performs as well in structural database search as DALI does, and much better than CE does. MatAlign is about two to three times faster than DALI, and has about the same speed as CE. The software and the supplementary information for this paper are available at http://xena1.ddns.comp.nus.edu.sg/~genesis/MatAlign/. 相似文献
7.
MOTIVATION: Local structure segments (LSSs) are small structural units shared by unrelated proteins. They are extensively used in protein structure comparison, and predicted LSSs (PLSSs) are used very successfully in ab initio folding simulations. However, predicted or real LSSs are rarely exploited by protein sequence comparison programs that are based on position-by-position alignments. RESULTS: We developed a SEgment Alignment algorithm (SEA) to compare proteins described as a collection of predicted local structure segments (PLSSs), which is equivalent to an unweighted graph (network). Any specific structure, real or predicted corresponds to a specific path in this network. SEA then uses a network matching approach to find two most similar paths in networks representing two proteins. SEA explores the uncertainty and diversity of predicted local structure information to search for a globally optimal solution. It simultaneously solves two related problems: the alignment of two proteins and the local structure prediction for each of them. On a benchmark of protein pairs with low sequence similarity, we show that application of the SEA algorithm improves alignment quality as compared to FFAS profile-profile alignment, and in some cases SEA alignments can match the structural alignments, a feat previously impossible for any sequence based alignment methods. 相似文献
8.
MOTIVATION: In recent years, several methods have been proposed for aligning two protein sequence profiles, with reported improvements in alignment accuracy and homolog discrimination versus sequence-sequence methods (e.g. BLAST) and profile-sequence methods (e.g. PSI-BLAST). Profile-profile alignment is also the iterated step in progressive multiple sequence alignment algorithms such as CLUSTALW. However, little is known about the relative performance of different profile-profile scoring functions. In this work, we evaluate the alignment accuracy of 23 different profile-profile scoring functions by comparing alignments of 488 pairs of sequences with identity < or =30% against structural alignments. We optimize parameters for all scoring functions on the same training set and use profiles of alignments from both PSI-BLAST and SAM-T99. Structural alignments are constructed from a consensus between the FSSP database and CE structural aligner. We compare the results with sequence-sequence and sequence-profile methods, including BLAST and PSI-BLAST. RESULTS: We find that profile-profile alignment gives an average improvement over our test set of typically 2-3% over profile-sequence alignment and approximately 40% over sequence-sequence alignment. No statistically significant difference is seen in the relative performance of most of the scoring functions tested. Significantly better results are obtained with profiles constructed from SAM-T99 alignments than from PSI-BLAST alignments. AVAILABILITY: Source code, reference alignments and more detailed results are freely available at http://phylogenomics.berkeley.edu/profilealignment/ 相似文献
9.
Background
Accurate sequence alignments are essential for homology searches and for building three-dimensional structural models of proteins. Since structure is better conserved than sequence, structure alignments have been used to guide sequence alignments and are commonly used as the gold standard for sequence alignment evaluation. Nonetheless, as far as we know, there is no report of a systematic evaluation of pairwise structure alignment programs in terms of the sequence alignment accuracy. 相似文献10.
Tobi D 《Proteins》2012,80(4):1167-1176
A novel methodology for comparison of protein dynamics is presented. Protein dynamics is calculated using the Gaussian network model and the modes of motion are globally aligned using the dynamic programming algorithm of Needleman and Wunsch, commonly used for sequence alignment. The alignment is fast and can be used to analyze large sets of proteins. The methodology is applied to the four major classes of the SCOP database: "all alpha proteins," "all beta proteins," "alpha and beta proteins," and "alpha/beta proteins". We show that different domains may have similar global dynamics. In addition, we report that the dynamics of "all alpha proteins" domains are less specific to structural variations within a given fold or superfamily compared with the other classes. We report that domain pairs with the most similar and the least similar global dynamics tend to be of similar length. The significance of the methodology is that it suggests a new and efficient way of mapping between the global structural features of protein families/subfamilies and their encoded dynamics. 相似文献
11.
RNA structural motifs are recurrent three-dimensional (3D) components found in the RNA architecture. These RNA structural motifs play important structural or functional roles and usually exhibit highly conserved 3D geometries and base-interaction patterns. Analysis of the RNA 3D structures and elucidation of their molecular functions heavily rely on efficient and accurate identification of these motifs. However, efficient RNA structural motif search tools are lacking due to the high complexity of these motifs. In this work, we present RNAMotifScanX, a motif search tool based on a base-interaction graph alignment algorithm. This novel algorithm enables automatic identification of both partially and fully matched motif instances. RNAMotifScanX considers noncanonical base-pairing interactions, base-stacking interactions, and sequence conservation of the motifs, which leads to significantly improved sensitivity and specificity as compared with other state-of-the-art search tools. RNAMotifScanX also adopts a carefully designed branch-and-bound technique, which enables ultra-fast search of large kink-turn motifs against a 23S rRNA. The software package RNAMotifScanX is implemented using GNU C++, and is freely available from http://genome.ucf.edu/RNAMotifScanX. 相似文献
12.
The Smith-Waterman (SW) algorithm is a typical technique for local sequence alignment in computational biology. However, the SW algorithm does not consider the local behaviours of the amino acids, which may result in loss of some useful information. Inspired by the success of Markov Edit Distance (MED) method, this paper therefore proposes a novel Markov pairwise protein sequence alignment (MPPSA) method that takes the local context dependencies into consideration. The numerical results have shown its superiority to the SW for pairwise protein sequence comparison. 相似文献
13.
The β-lactamases enzymes cleave the amide bond in β-lactam ring, rendering β-lactam antibiotics harmless to bacteria. In this
communication we have studied structure-function relationship and phylogenies of class A, B and D beta-lactamases using
structure-based sequence alignment and phylip programs respectively. The data of structure-based sequence alignment suggests
that in different isolates of TEM-1, mutations did not occur at or near sequence motifs. Since deletions are reported to be lethal to
structure and function of enzyme. Therefore, in these variants antibiotic hydrolysis profile and specificity will be affected. The
alignment data of class A enzyme SHV-1, CTX-M-15, class D enzyme, OXA-10, and class B enzyme VIM-2 and SIM-1 show
sequence motifs along with other part of polypeptide are essentially conserved. These results imply that conformations of betalactamases
are close to native state and possess normal hydrolytic activities towards beta-lactam antibiotics. However, class B
enzyme such as IMP-1 and NDM-1 are less conserved than other class A and D studied here because mutation and deletions
occurred at critically important region such as active site. Therefore, the structure of these beta-lactamases will be altered and
antibiotic hydrolysis profile will be affected. Phylogenetic studies suggest that class A and D beta-lactamases including TOHO-1
and OXA-10 respectively evolved by horizontal gene transfer (HGT) whereas other member of class A such as TEM-1 evolved by
gene duplication mechanism. Taken together, these studies justify structure-function relationship of beta-lactamases and
phylogenetic studies suggest these enzymes evolved by different mechanisms. 相似文献
14.
Accurate sequence alignments are crucial for modelling and to provide an evolutionary picture of related proteins. It is well-known that alignments are hard to obtain during distant relationships. Three thousand and fifty-two alignments of 218 pairs of protein domain structural entries, with <40% sequence identity, belonging to different structural classes, of diverse domain sizes and length-rigid/variable domains were performed using 12 programs. Structural parameters such as root mean square deviation, secondary-structural content and equivalences were considered for critical assessment. Methods that compare fragments and permit twists and translations align well during distant relationships and length variations. 相似文献
15.
Sequence alignment programs such as BLAST and PSI-BLAST are used routinely in pairwise, profile-based, or intermediate-sequence-search (ISS) methods to detect remote homologies for the purposes of fold assignment and comparative modeling. Yet, the sequence alignment quality of these methods at low sequence identity is not known. We have used the CE structure alignment program (Shindyalov and Bourne, Prot Eng 1998;11:739) to derive sequence alignments for all superfamily and family-level related proteins in the SCOP domain database. CE aligns structures and their sequences based on distances within each protein, rather than on interprotein distances. We compared BLAST, PSI-BLAST, CLUSTALW, and ISS alignments with the CE structural alignments. We found that global alignments with CLUSTALW were very poor at low sequence identity (<25%), as judged by the CE alignments. We used PSI-BLAST to search the nonredundant sequence database (nr) with every sequence in SCOP using up to four iterations. The resulting matrix was used to search a database of SCOP sequences. PSI-BLAST is only slightly better than BLAST in alignment accuracy on a per-residue basis, but PSI-BLAST matrix alignments are much longer than BLAST's, and so align correctly a larger fraction of the total number of aligned residues in the structure alignments. Any two SCOP sequences in the same superfamily that shared a hit or hits in the nr PSI-BLAST searches were identified as linked by the shared intermediate sequence. We examined the quality of the longest SCOP-query/ SCOP-hit alignment via an intermediate sequence, and found that ISS produced longer alignments than PSI-BLAST searches alone, of nearly comparable per-residue quality. At 10-15% sequence identity, BLAST correctly aligns 28%, PSI-BLAST 40%, and ISS 46% of residues according to the structure alignments. We also compared CE structure alignments with FSSP structure alignments generated by the DALI program. In contrast to the sequence methods, CE and structure alignments from the FSSP database identically align 75% of residue pairs at the 10-15% level of sequence identity, indicating that there is substantial room for improvement in these sequence alignment methods. BLAST produced alignments for 8% of the 10,665 nonimmunoglobulin SCOP superfamily sequence pairs (nearly all <25% sequence identity), PSI-BLAST matched 17% and the double-PSI-BLAST ISS method aligned 38% with E-values <10.0. The results indicate that intermediate sequences may be useful not only in fold assignment but also in achieving more complete sequence alignments for comparative modeling. 相似文献
16.
Structure-based drug design for chemical molecules has been widely used in drug discovery in the last 30 years. Many successful applications have been reported, especially in the field of virtual screening based on molecular docking. Recently, there has been much progress in fragment-based as well as de novo drug discovery. As many protein-protein interactions can be used as key targets for drug design, one of the solutions is to design protein drugs based directly on the protein complexes or the target structure. Compared with protein-ligand interactions, protein-protein interactions are more complicated and present more challenges for design. Over the last decade, both sampling efficiency and scoring accuracy of protein-protein docking have increased significantly. We have developed several strategies for structure-based protein drug design. A grafting strategy for key interaction residues has been developed and successfully applied in designing erythropoietin receptor-binding proteins. Similarly to small-molecule design, we also tested de novo protein-binder design and a virtual screen of protein binders using protein-protein docking calculations. In comparison with the development of structure-based small-molecule drug design, we believe that structure-based protein drug design has come of age. 相似文献
17.
Background
Sequence similarity searching is an important and challenging task in molecular biology and next-generation sequencing should further strengthen the need for faster algorithms to process such vast amounts of data. At the same time, the internal architecture of current microprocessors is tending towards more parallelism, leading to the use of chips with two, four and more cores integrated on the same die. The main purpose of this work was to design an effective algorithm to fit with the parallel capabilities of modern microprocessors. 相似文献18.
MOTIVATION: A tool that simultaneously aligns multiple protein sequences, automatically utilizes information about protein domains, and has a good compromise between speed and accuracy will have practical advantages over current tools. RESULTS: We describe COBALT, a constraint based alignment tool that implements a general framework for multiple alignment of protein sequences. COBALT finds a collection of pairwise constraints derived from database searches, sequence similarity and user input, combines these pairwise constraints, and then incorporates them into a progressive multiple alignment. We show that using constraints derived from the conserved domain database (CDD) and PROSITE protein-motif database improves COBALT's alignment quality. We also show that COBALT has reasonable runtime performance and alignment accuracy comparable to or exceeding that of other tools for a broad range of problems. AVAILABILITY: COBALT is included in the NCBI C++ toolkit. A Linux executable for COBALT, and CDD and PROSITE data used is available at: ftp://ftp.ncbi.nlm.nih.gov/pub/agarwala/cobalt 相似文献
19.
We investigated and optimized a method for structure comparison which is based on rigid body superimposition. The method maximizes the number of structurally equivalent residues while keeping the root mean square deviation constant. The resulting number of equivalent residues then provides an adequate similarity measure, which is easy to interpret. We demonstrate that the approach is able to detect remote structural similarity. We show that the number of equivalent residues is a suitable measure for ranking database searches and that the results are in good agreement with expert knowledge protein structure classification. Structure comparison frequently has multiple solutions. The approach that we use provides a range of alternative alignments rather a single solution. We discuss the nature of alternative solutions on several examples. 相似文献
20.
This study presents a manually constructed alignment of nearly complete rRNA genes from most animal clades (371 taxa from ~33 of the ~36 metazoan phyla), expanded from the 197 sequences in a previous study. This thorough, taxon-rich alignment, available at http://www.wsu.edu/~jmallatt/research/rRNAalignment.html and in the Dryad Repository (doi: http://dx.doi.org/10.5061/dryad.1v62kr3q), is based rigidly on the secondary structure of the SSU and LSU rRNA molecules, and is annotated in detail, including labeling of the erroneous sequences (contaminants). The alignment can be used for future studies of the molecular evolution of rRNA. Here, we use it to explore if the larger number of sequences produces an improved phylogenetic tree of animal relationships. Disappointingly, the resolution did not improve, neither when the standard maximum-likelihood method was used, nor with more sophisticated methods that partitioned the rRNA into paired and unpaired sites (stem, loop, bulge, junction), or accounted for the evolution of the paired sites. For example, no doublet model of paired-site substitutions (16-state, 16A and 16B, 7A-F, or 6A-C models) corrected the placement of any rogue taxa or increased resolution. The following findings are from the simplest, standard, ML analysis. The 371-taxon tree only imperfectly supported the bilaterian clades of Lophotrochozoa and Ecdysozoa, and this problem remained after 17 taxa with unstably positioned sequences were omitted from the analysis. The problem seems to stem from base-compositional heterogeneity across taxa and from an overrepresentation of highly divergent sequences among the newly added taxa (e.g., sequences from Cephalopoda, Rotifera, Acoela, and Myxozoa). The rogue taxa continue to concentrate in two locations in the rRNA tree: near the base of Arthropoda and of Bilateria. The approximately uncertain (AU) test refuted the monophyly of Mollusca and of Chordata, probably due to long-branch attraction of the highly divergent cephalopod and urochordate sequences out of those clades. Unlikely to be correct, these refutations show for the first time that rRNA phylogeny can support some 'wrong' clades. Along with its weaknesses, the rRNA tree has strengths: It recovers many clades that are supported by independent evidence (e.g., Metazoa, Bilateria, Hexapoda, Nonoculata, Ambulacraria, Syndermata, and Thecostraca with Malacostraca) and shows good resolution within certain groups (e.g., in Platyhelminthes, Insecta, Cnidaria). As another strength, the newly added rRNA sequences yielded the first rRNA-based support for Carnivora and Cetartiodactyla (dolphin+llama) in Mammalia, for basic subdivisions of Bryozoa ('Gymnolaemata+Stenolaemata' versus Phylactolaemata), and for Oligostraca (ostracods+branchiurans+pentastomids+mystacocarids). Future improvement could come from better sequence-evolution models that account for base-compositional heterogeneity, and from combining rRNA with protein-coding genes in phylogenetic reconstruction. 相似文献