首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 61 毫秒
1.
Protein interfaces are thought to be distinguishable from the rest of the protein surface by their greater degree of residue conservation. We test the validity of this approach on an expanded set of 64 protein-protein interfaces using conservation scores derived from two multiple sequence alignment types, one of close homologs/orthologs and one of diverse homologs/paralogs. Overall, we find that the interface is slightly more conserved than the rest of the protein surface when using either alignment type, with alignments of diverse homologs showing marginally better discrimination. However, using a novel surface-patch definition, we find that the interface is rarely significantly more conserved than other surface patches when using either alignment type. When an interface is among the most conserved surface patches, it tends to be part of an enzyme active site. The most conserved surface patch overlaps with 39% (+/- 28%) and 36% (+/- 28%) of the actual interface for diverse and close homologs, respectively. Contrary to results obtained from smaller data sets, this work indicates that residue conservation is rarely sufficient for complete and accurate prediction of protein interfaces. Finally, we find that obligate interfaces differ from transient interfaces in that the former have significantly fewer alignment gaps at the interface than the rest of the protein surface, as well as having buried interface residues that are more conserved than partially buried interface residues.  相似文献   

2.
Lin HN  Notredame C  Chang JM  Sung TY  Hsu WL 《PloS one》2011,6(12):e27872
Most sequence alignment tools can successfully align protein sequences with higher levels of sequence identity. The accuracy of corresponding structure alignment, however, decreases rapidly when considering distantly related sequences (<20% identity). In this range of identity, alignments optimized so as to maximize sequence similarity are often inaccurate from a structural point of view. Over the last two decades, most multiple protein aligners have been optimized for their capacity to reproduce structure-based alignments while using sequence information. Methods currently available differ essentially in the similarity measurement between aligned residues using substitution matrices, Fourier transform, sophisticated profile-profile functions, or consistency-based approaches, more recently.In this paper, we present a flexible similarity measure for residue pairs to improve the quality of protein sequence alignment. Our approach, called SymAlign, relies on the identification of conserved words found across a sizeable fraction of the considered dataset, and supported by evolutionary analysis. These words are then used to define a position specific substitution matrix that better reflects the biological significance of local similarity. The experiment results show that the SymAlign scoring scheme can be incorporated within T-Coffee to improve sequence alignment accuracy. We also demonstrate that SymAlign is less sensitive to the presence of structurally non-similar proteins. In the analysis of the relationship between sequence identity and structure similarity, SymAlign can better differentiate structurally similar proteins from non- similar proteins. We show that protein sequence alignments can be significantly improved using a similarity estimation based on weighted n-grams. In our analysis of the alignments thus produced, sequence conservation becomes a better indicator of structural similarity. SymAlign also provides alignment visualization that can display sub-optimal alignments on dot-matrices. The visualization makes it easy to identify well-supported alternative alignments that may not have been identified by dynamic programming. SymAlign is available at http://bio-cluster.iis.sinica.edu.tw/SymAlign/.  相似文献   

3.
R B Russell  G J Barton 《Proteins》1992,14(2):309-323
An algorithm is presented for the accurate and rapid generation of multiple protein sequence alignments from tertiary structure comparisons. A preliminary multiple sequence alignment is performed using sequence information, which then determines an initial superposition of the structures. A structure comparison algorithm is applied to all pairs of proteins in the superimposed set and a similarity tree calculated. Multiple sequence alignments are then generated by following the tree from the branches to the root. At each branchpoint of the tree, a structure-based sequence alignment and coordinate transformations are output, with the multiple alignment of all structures output at the root. The algorithm encoded in STAMP (STructural Alignment of Multiple Proteins) is shown to give alignments in good agreement with published structural accounts within the dehydrogenase fold domains, globins, and serine proteinases. In order to reduce the need for visual verification, two similarity indices are introduced to determine the quality of each generated structural alignment. Sc quantifies the global structural similarity between pairs or groups of proteins, whereas Pij' provides a normalized measure of the confidence in the alignment of each residue. STAMP alignments have the quality of each alignment characterized by Sc and Pij' values and thus provide a reproducible resource for studies of residue conservation within structural motifs.  相似文献   

4.
Shatsky M  Nussinov R  Wolfson HJ 《Proteins》2006,62(1):209-217
Routinely used multiple-sequence alignment methods use only sequence information. Consequently, they may produce inaccurate alignments. Multiple-structure alignment methods, on the other hand, optimize structural alignment by ignoring sequence information. Here, we present an optimization method that unifies sequence and structure information. The alignment score is based on standard amino acid substitution probabilities combined with newly computed three-dimensional structure alignment probabilities. The advantage of our alignment scheme is in its ability to produce more accurate multiple alignments. We demonstrate the usefulness of the method in three applications: 1) computing more accurate multiple-sequence alignments, 2) analyzing protein conformational changes, and 3) computation of amino acid structure-sequence conservation with application to protein-protein docking prediction. The method is available at http://bioinfo3d.cs.tau.ac.il/staccato/.  相似文献   

5.
Cell-surface-anchored immunoglobulin superfamily (IgSF) proteins are widespread throughout the human proteome, forming crucial components of diverse biological processes including immunity, cell-cell adhesion, and carcinogenesis. IgSF proteins generally function through protein-protein interactions carried out between extracellular, membrane-bound proteins on adjacent cells, known as trans-binding interfaces. These protein-protein interactions constitute a class of pharmaceutical targets important in the treatment of autoimmune diseases, chronic infections, and cancer. A molecular-level understanding of IgSF protein-protein interactions would greatly benefit further drug development. A critical step toward this goal is the reliable identification of IgSF trans-binding interfaces. We propose a novel combination of structure and sequence information to identify trans-binding interfaces in IgSF proteins. We developed a structure-based binding interface prediction approach that can identify broad regions of the protein surface that encompass the binding interfaces and suggests that IgSF proteins possess binding supersites. These interfaces could theoretically be pinpointed using sequence-based conservation analysis, with performance approaching the theoretical upper limit of binding interface prediction accuracy, but achieving this in practice is limited by the current ability to identify an appropriate multiple sequence alignment for conservation analysis. However, an important contribution of combining the two orthogonal methods is that agreement between these approaches can estimate the reliability of the predictions. This approach was benchmarked on the set of 22 IgSF proteins with experimentally solved structures in complex with their ligands. Additionally, we provide structure-based predictions and reliability scores for the 62 IgSF proteins with known structure but yet uncharacterized binding interfaces.  相似文献   

6.
SUMMARY: Sequence-structure alignments are a common means for protein structure prediction in the fields of fold recognition and homology modeling, and there is a broad variety of programs that provide such alignments based on sequence similarity, secondary structure or contact potentials. Nevertheless, finding the best sequence-structure alignment in a pool of alignments remains a difficult problem. QUASAR (quality of sequence-structure alignments ranking) provides a unifying framework for scoring sequence-structure alignments that aids finding well-performing combinations of well-known and custom-made scoring schemes. Those scoring functions can be benchmarked against widely accepted quality scores like MaxSub, TMScore, Touch and APDB, thus enabling users to test their own alignment scores against 'standard-of-truth' structure-based scores. Furthermore, individual score combinations can be optimized with respect to benchmark sets based on known structural relationships using QUASAR's in-built optimization routines.  相似文献   

7.
Several recent publications illustrated advantages of using sequence profiles in recognizing distant homologies between proteins. At the same time, the practical usefulness of distant homology recognition depends not only on the sensitivity of the algorithm, but also on the quality of the alignment between a prediction target and the template from the database of known proteins. Here, we study this question for several supersensitive protein algorithms that were previously compared in their recognition sensitivity (Rychlewski et al., 2000). A database of protein pairs with similar structures, but low sequence similarity is used to rate the alignments obtained with several different methods, which included sequence-sequence, sequence-profile, and profile-profile alignment methods. We show that incorporation of evolutionary information encoded in sequence profiles into alignment calculation methods significantly increases the alignment accuracy, bringing them closer to the alignments obtained from structure comparison. In general, alignment quality is correlated with recognition and alignment score significance. For every alignment method, alignments with statistically significant scores correlate with both correct structural templates and good quality alignments. At the same time, average alignment lengths differ in various methods, making the comparison between them difficult. For instance, the alignments obtained by FFAS, the profile-profile alignment algorithm developed in our group are always longer that the alignments obtained with the PSI-BLAST algorithms. To address this problem, we develop methods to truncate or extend alignments to cover a specified percentage of protein lengths. In most cases, the elongation of the alignment by profile-profile methods is reasonable, adding fragments of similar structure. The examples of erroneous alignment are examined and it is shown that they can be identified based on the model quality.  相似文献   

8.
A method for simultaneous alignment of multiple protein structures   总被引:1,自引:0,他引:1  
Shatsky M  Nussinov R  Wolfson HJ 《Proteins》2004,56(1):143-156
Here, we present MultiProt, a fully automated highly efficient technique to detect multiple structural alignments of protein structures. MultiProt finds the common geometrical cores between input molecules. To date, most methods for multiple alignment start from the pairwise alignment solutions. This may lead to a small overall alignment. In contrast, our method derives multiple alignments from simultaneous superpositions of input molecules. Further, our method does not require that all input molecules participate in the alignment. Actually, it efficiently detects high scoring partial multiple alignments for all possible number of molecules in the input. To demonstrate the power of MultiProt, we provide a number of case studies. First, we demonstrate known multiple alignments of protein structures to illustrate the performance of MultiProt. Next, we present various biological applications. These include: (1) a partial alignment of hinge-bent domains; (2) identification of functional groups of G-proteins; (3) analysis of binding sites; and (4) protein-protein interface alignment. Some applications preserve the sequence order of the residues in the alignment, whereas others are order-independent. It is their residue sequence order-independence that allows application of MultiProt to derive multiple alignments of binding sites and of protein-protein interfaces, making MultiProt an extremely useful structural tool.  相似文献   

9.
Wang B  Chen P  Huang DS  Li JJ  Lok TM  Lyu MR 《FEBS letters》2006,580(2):380-384
This paper proposes a novel method that can predict protein interaction sites in heterocomplexes using residue spatial sequence profile and evolution rate approaches. The former represents the information of multiple sequence alignments while the latter corresponds to a residue's evolutionary conservation score based on a phylogenetic tree. Three predictors using a support vector machines algorithm are constructed to predict whether a surface residue is a part of a protein-protein interface. The efficiency and the effectiveness of our proposed approach is verified by its better prediction performance compared with other models. The study is based on a non-redundant data set of heterodimers consisting of 69 protein chains.  相似文献   

10.
Zhang SW  Zhang YL  Pan Q  Cheng YM  Chou KC 《Amino acids》2008,35(2):495-501
Evolutionary conservation derived from a multiple sequence alignment is a powerful indicator of the functional significance of a residue, and it can help to predict active sites, ligand-binding sites, and protein interaction interfaces. The results of the existing algorithms in identifying the residue's conservation strongly depend on the sequence alignment, making the results highly variable. Here, by introducing the amino acid similarity matrix, we propose a novel gap-treating approach by combining the evolutionary information and von Neumann entropies to compute the residue conservation scores. It is indicated through a series of tested results that the new approach is quite encouraging and promising and may become a useful tool in complementing the existing methods.  相似文献   

11.
Position-specific substitution matrices, known as profiles,derived from multiple sequence alignments are currently usedto search sequence databases for distantly related members ofprotein families. The performance of the database searches isenhanced by using (i) a sequence weighting scheme which assignshigher weights to more distantly related sequences based onbranch lengths derived from phylogenetic trees, (ii) exclusionof positions with mainly padding characters at sites of insertionsor deletions and (iii) the BLOSUM62 residue comparison matrix.A natural consequence of these modifications is an improvementin the alignment of new sequences to the profiles. However,the accuracy of the alignments can be further increased by employinga similarity residue comparison matrix. These developments areimplemented in a program called PROFILEWEIGHT which runs onUnix and Vax computers. The only input required by the programis the multiple sequence alignment. The output from PROFILEWEIGHTis a profile designed to be used by existing searching and alignmentprograms. Test results from database searches with four differentfamilies of proteins show the improved sensitivity of the weightedprofiles.  相似文献   

12.
Multiple sequence alignment is a fundamental tool in a number of different domains in modern molecular biology, including functional and evolutionary studies of a protein family. Multiple alignments also play an essential role in the new integrated systems for genome annotation and analysis. Thus, the development of new multiple alignment scores and statistics is essential, in the spirit of the work dedicated to the evaluation of pairwise sequence alignments for database searching techniques. We present here norMD, a new objective scoring function for multiple sequence alignments. NorMD combines the advantages of the column-scoring techniques with the sensitivity of methods incorporating residue similarity scores. In addition, norMD incorporates ab initio sequence information, such as the number, length and similarity of the sequences to be aligned. The sensitivity and reliability of the norMD objective function is demonstrated using structural alignments in the SCOP and BAliBASE databases. The norMD scores are then applied to the multiple alignments of the complete sequences (MACS) detected by BlastP with E-value<10, for a set of 734 hypothetical proteins encoded by the Vibrio cholerae genome. Unrelated or badly aligned sequences were automatically removed from the MACS, leaving a high-quality multiple alignment which could be reliably exploited in a subsequent functional and/or structural annotation process. After removal of unreliable sequences, 176 (24 %) of the alignments contained at least one sequence with a functional annotation. 103 of these new matches were supported by significant hits to the Interpro domain and motif database.  相似文献   

13.
The biological role, biochemical function, and structure of uncharacterized protein sequences is often inferred from their similarity to known proteins. A constant goal is to increase the reliability, sensitivity, and accuracy of alignment techniques to enable the detection of increasingly distant relationships. Development, tuning, and testing of these methods benefit from appropriate benchmarks for the assessment of alignment accuracy.Here, we describe a benchmark protocol to estimate sequence-to-sequence and sequence-to-structure alignment accuracy. The protocol consists of structurally related pairs of proteins and procedures to evaluate alignment accuracy over the whole set. The set of protein pairs covers all the currently known fold types. The benchmark is challenging in the sense that it consists of proteins lacking clear sequence similarity.Correct target alignments are derived from the three-dimensional structures of these pairs by rigid body superposition. An evaluation engine computes the accuracy of alignments obtained from a particular algorithm in terms of alignment shifts with respect to the structure derived alignments. Using this benchmark we estimate that the best results can be obtained from a combination of amino acid residue substitution matrices and knowledge-based potentials.  相似文献   

14.
Amino acid residues, which play important roles in protein function, are often conserved. Here, we analyze thermodynamic and structural data of protein-DNA interactions to explore a relationship between free energy, sequence conservation and structural cooperativity. We observe that the most stabilizing residues or putative hotspots are those which occur as clusters of conserved residues. The higher packing density of the clusters and available experimental thermodynamic data of mutations suggest cooperativity between conserved residues in the clusters. Conserved singlets contribute to the stability of protein-DNA complexes to a lesser extent. We also analyze structural features of conserved residues and their clusters and examine their role in identifying DNA-binding sites. We show that about half of the observed conserved residue clusters are in the interface with the DNA, which could be identified from their amino acid composition; whereas the remaining clusters are at the protein-protein or protein-ligand interface, or embedded in the structural scaffolds. In protein-protein interfaces, conserved residues are highly correlated with experimental residue hotspots, contributing dominantly and often cooperatively to the stability of protein-protein complexes. Overall, the conservation patterns of the stabilizing residues in DNA-binding proteins also highlight the significance of clustering as compared to single residue conservation.  相似文献   

15.

Background  

While the pairwise alignments produced by sequence similarity searches are a powerful tool for identifying homologous proteins - proteins that share a common ancestor and a similar structure; pairwise sequence alignments often fail to represent accurately the structural alignments inferred from three-dimensional coordinates. Since sequence alignment algorithms produce optimal alignments, the best structural alignments must reflect suboptimal sequence alignment scores. Thus, we have examined a range of suboptimal sequence alignments and a range of scoring parameters to understand better which sequence alignments are likely to be more structurally accurate.  相似文献   

16.
The programs described herein function as part of a suite ofprograms designed for pairwise alignment, multiple alignment,generation of randomized sequences, production of alignmentscores and a sorting routine for analysis of the alignmentsproduced. The sequence alignment programs penalize gaps (absencesof residues) within regions of protein secondary structure andhave the added option of ‘fingerprinting’ structurallyor functionally important protein residues. The multiple alignmentprogram is based upon the sequence alignment method of Needlemanand Wunsch and the multiple alignment extension of Barton andSternberg. Our application includes the feature of optionallyweighting active site, monomer-monomer, ligand contact or otherimportant template residues to bias the alignment toward matchingthese residues. A sum-score for the alignments is introduced,which is independent of gap penalties. This score more adequatelyreflects the character of the alignments for a given scoringmatrix than the gap-penalty-dependent total score describedpreviously in the literature. In addition, individual aminoacid similarity scores at each residue position in the alignmentsare printed with the alignment output to enable immediate quantitativeassessment of homology at key sections of the aligned chains.  相似文献   

17.
18.
Contact-based sequence alignment   总被引:2,自引:1,他引:1  
This paper introduces the novel method of contact-based protein sequence alignment, where structural information in the form of contact mutation probabilities is incorporated into an alignment routine using contact-mutation matrices (CAO: Contact Accepted mutatiOn). The contact-based alignment routine optimizes the score of matched contacts, which involves four (two per contact) instead of two residues per match in pairwise alignments. The first contact refers to a real side-chain contact in a template sequence with known structure, and the second contact is the equivalent putative contact of a homologous query sequence with unknown structure. An algorithm has been devised to perform a pairwise sequence alignment based on contact information. The contact scores were combined with PAM-type (Point Accepted Mutation) substitution scores after parameterization of gap penalties and score weights by means of a genetic algorithm. We show that owing to the structural information contained in the CAO matrices, significantly improved alignments of distantly related sequences can be obtained. This has allowed us to annotate eight putative Drosophila IGF sequences. Contact-based sequence alignment should therefore prove useful in comparative modelling and fold recognition.  相似文献   

19.
Although multiple sequence alignments (MSAs) are essential for a wide range of applications from structure modeling to prediction of functional sites, construction of accurate MSAs for distantly related proteins remains a largely unsolved problem. The rapidly increasing database of spatial structures is a valuable source to improve alignment quality. We explore the use of 3D structural information to guide sequence alignments constructed by our MSA program PROMALS. The resulting tool, PROMALS3D, automatically identifies homologs with known 3D structures for the input sequences, derives structural constraints through structure-based alignments and combines them with sequence constraints to construct consistency-based multiple sequence alignments. The output is a consensus alignment that brings together sequence and structural information about input proteins and their homologs. PROMALS3D can also align sequences of multiple input structures, with the output representing a multiple structure-based alignment refined in combination with sequence constraints. The advantage of PROMALS3D is that it gives researchers an easy way to produce high-quality alignments consistent with both sequences and structures of proteins. PROMALS3D outperforms a number of existing methods for constructing multiple sequence or structural alignments using both reference-dependent and reference-independent evaluation methods.  相似文献   

20.
Sequence comparison methods based on position-specific score matrices (PSSMs) have proven a useful tool for recognition of the divergent members of a protein family and for annotation of functional sites. Here we investigate one of the factors that affects overall performance of PSSMs in a PSI-BLAST search, the algorithm used to construct the seed alignment upon which the PSSM is based. We compare PSSMs based on alignments constructed by global sequence similarity (ClustalW and ClustalW-pairwise), local sequence similarity (BLAST), and local structure similarity (VAST). To assess performance with respect to identification of conserved functional or structural sites, we examine the accuracy of the three-dimensional molecular models predicted by PSSM-sequence alignments. Using the known structures of those sequences as the standard of truth, we find that model accuracy varies with the algorithm used for seed alignment construction in the pattern local-structure (VAST) > local-sequence (BLAST) > global-sequence (ClustalW). Using structural similarity of query and database proteins as the standard of truth, we find that PSSM recognition sensitivity depends primarily on the diversity of the sequences included in the alignment, with an optimum around 30-50% average pairwise identity. We discuss these observations, and suggest a strategy for constructing seed alignments that optimize PSSM-sequence alignment accuracy and recognition sensitivity.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号