首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In this study, we investigate the extent to which techniques for homology modeling that were developed for water-soluble proteins are appropriate for membrane proteins as well. To this end we present an assessment of current strategies for homology modeling of membrane proteins and introduce a benchmark data set of homologous membrane protein structures, called HOMEP. First, we use HOMEP to reveal the relationship between sequence identity and structural similarity in membrane proteins. This analysis indicates that homology modeling is at least as applicable to membrane proteins as it is to water-soluble proteins and that acceptable models (with C alpha-RMSD values to the native of 2 A or less in the transmembrane regions) may be obtained for template sequence identities of 30% or higher if an accurate alignment of the sequences is used. Second, we show that secondary-structure prediction algorithms that were developed for water-soluble proteins perform approximately as well for membrane proteins. Third, we provide a comparison of a set of commonly used sequence alignment algorithms as applied to membrane proteins. We find that high-accuracy alignments of membrane protein sequences can be obtained using state-of-the-art profile-to-profile methods that were developed for water-soluble proteins. Improvements are observed when weights derived from the secondary structure of the query and the template are used in the scoring of the alignment, a result which relies on the accuracy of the secondary-structure prediction of the query sequence. The most accurate alignments were obtained using template profiles constructed with the aid of structural alignments. In contrast, a simple sequence-to-sequence alignment algorithm, using a membrane protein-specific substitution matrix, shows no improvement in alignment accuracy. We suggest that profile-to-profile alignment methods should be adopted to maximize the accuracy of homology models of membrane proteins.  相似文献   

2.
Src homology 2 (SH2) regions are short (approximately 100 amino acids), non-catalytic domains conserved among a wide variety of proteins involved in cytoplasmic signaling induced by growth factors. It is thought that SH2 domains play an important role in the intracellular response to growth factor stimulation by binding to phosphotyrosine containing proteins. In this paper we apply the techniques of multiple sequence alignment, secondary structure prediction and conservation analysis to 67 SH2 domain amino acid sequences. This combined approach predicts seven core secondary structure regions with the pattern beta-alpha-beta-beta-beta-beta-alpha, identifies those residues most likely to be buried in the hydrophobic core of the native SH2 domain, and highlights patterns of conservation indicative of secondary structural elements. Residues likely to be involved in phosphotyrosine binding are shown and orientations of the predicted secondary structures suggested which could enable such residues to cooperate in phosphate binding. We propose a consensus pattern that encapsulates the principal conserved features of the SH2 domains. Comparison of the proposed SH2 domain of akt to this pattern shows only 12/40 matches, suggesting that this domain may not exhibit SH2-like properties.  相似文献   

3.
In eukaryotes, the Src homology domain 3 (SH3) is a very important motif in signal transduction. SH3 domains recognize poly-proline-rich peptides and are involved in protein-protein interactions. Until now, the existence of SH3 domains has not been demonstrated in prokaryotes. However, the structure of the C-terminal domain of DtxR clearly shows that the fold of this domain is very similar to that of the SH3 domain. In addition, there is evidence that the C-terminal domain of DtxR binds to poly-proline-rich regions. Other bacterial proteins have domains that are structurally similar to the SH3 domain but whose functions are unknown or differ from that of the SH3 domain. The observed similarities between the structures of the C-terminal domain of DtxR and the SH3 domain constitute a perfect system to gain insight into their function and information about their evolution. Our results show that the C-terminal domain of DtxR shares a number of conserved key hydrophobic positions not recognizable from sequence comparison that might be responsible for the integrity of the SH3-like fold. Structural alignment of an ensemble of such domains from unrelated proteins shows a common structural core that seems to be conserved despite the lack of sequence similarity. This core constitutes the minimal requirements of protein architecture for the SH3-like fold.  相似文献   

4.
C Sander  R Schneider 《Proteins》1991,9(1):56-68
The database of known protein three-dimensional structures can be significantly increased by the use of sequence homology, based on the following observations. (1) The database of known sequences, currently at more than 12,000 proteins, is two orders of magnitude larger than the database of known structures. (2) The currently most powerful method of predicting protein structures is model building by homology. (3) Structural homology can be inferred from the level of sequence similarity. (4) The threshold of sequence similarity sufficient for structural homology depends strongly on the length of the alignment. Here, we first quantify the relation between sequence similarity, structure similarity, and alignment length by an exhaustive survey of alignments between proteins of known structure and report a homology threshold curve as a function of alignment length. We then produce a database of homology-derived secondary structure of proteins (HSSP) by aligning to each protein of known structure all sequences deemed homologous on the basis of the threshold curve. For each known protein structure, the derived database contains the aligned sequences, secondary structure, sequence variability, and sequence profile. Tertiary structures of the aligned sequences are implied, but not modeled explicitly. The database effectively increases the number of known protein structures by a factor of five to more than 1800. The results may be useful in assessing the structural significance of matches in sequence database searches, in deriving preferences and patterns for structure prediction, in elucidating the structural role of conserved residues, and in modeling three-dimensional detail by homology.  相似文献   

5.
MELDB: a database for microbial esterases and lipases   总被引:1,自引:0,他引:1  
Kang HY  Kim JF  Kim MH  Park SH  Oh TK  Hur CG 《FEBS letters》2006,580(11):2736-2740
MELDB is a comprehensive protein database of microbial esterases and lipases which are hydrolytic enzymes important in the modern industry. Proteins in MELDB are clustered into groups according to their sequence similarities based on a local pairwise alignment algorithm and a graph clustering algorithm (TribeMCL). This differs from traditional approaches that use global pairwise alignment and joining methods. Our procedure was able to reduce the noise caused by dubious alignment in the distantly related or unrelated regions in the sequences. In the database, 883 esterase and lipase sequences derived from microbial sources are deposited and conserved parts of each protein are identified. HMM profiles of each cluster were generated to classify unknown sequences. Contents of the database can be keyword-searched and query sequences can be aligned to sequence profiles and sequences themselves.  相似文献   

6.
A general protein sequence alignment methodology for detecting a priori unknown common structural and functional regions is described. The method proposed in this paper is based on two basic requirements for a meaningful alignment. First, each sequence or segment of a sequence is characterized by a multivariate physicochemical profile. Second, the alignment is performed by considering all the sequences simultaneously, and the algorithm detects those regions that form a set of similar profiles. In order to test the structural meaning of the alignment obtained from the sequences, quantitative comparisons are performed with structurally conserved regions (SCR) determined from the X-ray structures of three serine proteases. Results suggest that the limits of the SCR may be predicted from the similarities between the physicochemical profiles of the sequences. The procedures are not completely automated. The final step requires a visual screening of alternative pathways in order to determine an optimal alignment.  相似文献   

7.
Bernsel A  Viklund H  Elofsson A 《Proteins》2008,71(3):1387-1399
Compared with globular proteins, transmembrane proteins are surrounded by a more intricate environment and, consequently, amino acid composition varies between the different compartments. Existing algorithms for homology detection are generally developed with globular proteins in mind and may not be optimal to detect distant homology between transmembrane proteins. Here, we introduce a new profile-profile based alignment method for remote homology detection of transmembrane proteins in a hidden Markov model framework that takes advantage of the sequence constraints placed by the hydrophobic interior of the membrane. We expect that, for distant membrane protein homologs, even if the sequences have diverged too far to be recognized, the hydrophobicity pattern and the transmembrane topology are better conserved. By using this information in parallel with sequence information, we show that both sensitivity and specificity can be substantially improved for remote homology detection in two independent test sets. In addition, we show that alignment quality can be improved for the most distant homologs in a public dataset of membrane protein structures. Applying the method to the Pfam domain database, we are able to suggest new putative evolutionary relationships for a few relatively uncharacterized protein domain families, of which several are confirmed by other methods. The method is called Searcher for Homology Relationships of Integral Membrane Proteins (SHRIMP) and is available for download at http://www.sbc.su.se/shrimp/.  相似文献   

8.
We have analyzed sequence covariation in an alignment of 266 non-redundant SH3 domain sequences using chi-squared statistical methods. Artifactual covariations arising from close evolutionary relationships among certain sequence subgroups were eliminated using empirically derived sequence diversity thresholds. This covariation detection method was able to predict residue-residue contacts (side-chain centres of mass within 8 A) in the structure of the SH3 domain with an accuracy of 85 %, which is greater than that achieved in many previous covariation studies. In examining the positions involved most frequently in covariations, we discovered a dramatic over-representation of a subset of five hydrophobic core positions. This covariation information was used to design second and third site substitutions that could compensate for highly destabilizing hydrophobic core substitutions in the Fyn SH3 domain, thus providing experimental data to validate the covariation analysis. The testing of our covariation detection method on 15 other alignments showed that the accuracy of contact prediction is highly variable depending on which sequence alignment is used, and useful levels of prediction accuracy were obtained with only approximately one-third of alignments. The results presented here provide insight into the difficulties inherent in covariation analysis, and suggest that it may have limited usefulness in tertiary structure prediction. On the other hand, our ability to use covariation analysis to design stabilizing combinations of hydrophobic core substitutions attests to its potential utility for gaining deeper insight into the stability determinants and functional mechanisms of proteins with known three-dimensional structures.  相似文献   

9.
The SH3 domain, comprised of approximately 60 residues, is found within a wide variety of proteins, and is a mediator of protein-protein interactions. Due to the large number of SH3 domain sequences and structures in the databases, this domain provides one of the best available systems for the examination of sequence and structural conservation within a protein family. In this study, a large and diverse alignment of SH3 domain sequences was constructed, and the pattern of conservation within this alignment was compared to conserved structural features, as deduced from analysis of eighteen different SH3 domain structures. Seventeen SH3 domain structures solved in the presence of bound peptide were also examined to identify positions that are consistently most important in mediating the peptide-binding function of this domain. Although residues at the two most conserved positions in the alignment are directly involved in peptide binding, residues at most other conserved positions play structural roles, such as stabilizing turns or comprising the hydrophobic core. Surprisingly, several highly conserved side-chain to main-chain hydrogen bonds were observed in the functionally crucial RT-Src loop between residues with little direct involvement in peptide binding. These hydrogen bonds may be important for maintaining this region in the precise conformation necessary for specific peptide recognition. In addition, a previously unrecognized yet highly conserved beta-bulge was identified in the second beta-strand of the domain, which appears to provide a necessary kink in this strand, allowing it to hydrogen bond to both sheets comprising the fold.  相似文献   

10.
Wu S  Zhang Y 《Proteins》2008,72(2):547-556
We develop a new threading algorithm MUSTER by extending the previous sequence profile-profile alignment method, PPA. It combines various sequence and structure information into single-body terms which can be conveniently used in dynamic programming search: (1) sequence profiles; (2) secondary structures; (3) structure fragment profiles; (4) solvent accessibility; (5) dihedral torsion angles; (6) hydrophobic scoring matrix. The balance of the weighting parameters is optimized by a grading search based on the average TM-score of 111 training proteins which shows a better performance than using the conventional optimization methods based on the PROSUP database. The algorithm is tested on 500 nonhomologous proteins independent of the training sets. After removing the homologous templates with a sequence identity to the target >30%, in 224 cases, the first template alignment has the correct topology with a TM-score >0.5. Even with a more stringent cutoff by removing the templates with a sequence identity >20% or detectable by PSI-BLAST with an E-value <0.05, MUSTER is able to identify correct folds in 137 cases with the first model of TM-score >0.5. Dependent on the homology cutoffs, the average TM-score of the first threading alignments by MUSTER is 5.1-6.3% higher than that by PPA. This improvement is statistically significant by the Wilcoxon signed rank test with a P-value < 1.0 x 10(-13), which demonstrates the effect of additional structural information on the protein fold recognition. The MUSTER server is freely available to the academic community at http://zhang.bioinformatics.ku.edu/MUSTER.  相似文献   

11.
The information required to generate a protein structure is contained in its amino acid sequence, but how three-dimensional information is mapped onto a linear sequence is still incompletely understood. Multiple structure alignments of similar protein structures have been used to investigate conserved sequence features but contradictory results have been obtained, due, in large part, to the absence of subjective criteria to be used in the construction of sequence profiles and in the quantitative comparison of alignment results. Here, we report a new procedure for multiple structure alignment and use it to construct structure-based sequence profiles for similar proteins. The definition of "similar" is based on the structural alignment procedure and on the protein structural distance (PSD) described in paper I of this series, which offers an objective measure for protein structure relationships. Our approach is tested in two well-studied groups of proteins; serine proteases and Ig-like proteins. It is demonstrated that the quality of a sequence profile generated by a multiple structure alignment is quite sensitive to the PSD used as a threshold for the inclusion of proteins in the alignment. Specifically, if the proteins included in the aligned set are too distant in structure from one another, there will be a dilution of information and patterns that are relevant to a subset of the proteins are likely to be lost.In order to understand better how the same three-dimensional information can be encoded in seemingly unrelated sequences, structure-based sequence profiles are constructed for subsets of proteins belonging to nine superfolds. We identify patterns of relatively conserved residues in each subset of proteins. It is demonstrated that the most conserved residues are generally located in the regions where tertiary interactions occur and that are relatively conserved in structure. Nevertheless, the conservation patterns are relatively weak in all cases studied, indicating that structure-determining factors that do not require a particular sequential arrangement of amino acids, such as secondary structure propensities and hydrophobic interactions, are important in encoding protein fold information. In general, we find that similar structures can fold without having a set of highly conserved residue clusters or a well-conserved sequence profile; indeed, in some cases there is no apparent conservation pattern common to structures with the same fold. Thus, when a group of proteins exhibits a common and well-defined sequence pattern, it is more likely that these sequences have a close evolutionary relationship rather than the similarities having arisen from the structural requirements of a given fold.  相似文献   

12.
Sequence motifs specific for cytosine methyltransferases   总被引:2,自引:0,他引:2  
J Pósfai  A S Bhagwat  R J Roberts 《Gene》1988,74(1):261-265
Using a new alignment method, the sequences of 13 m5C methyltransferases (MTases) have been examined. Five extremely well-conserved blocks of sequence have been detected and have been used as fixed points for the alignment of the 13 sequences. Following this initial alignment, five further blocks of similarity have been identified to give a total of ten recognizable blocks of sequence homology that are all arranged in a common order. The structures of these MTases consist of a variable-length N-terminal arm followed by eight well-conserved blocks each separated by small variable-length regions. A large variable-length segment of 90 to 270 amino acids (aa) then follows. After this are two blocks, and a variable-length C-terminal segment completes the sequence. Within the final alignment, 20 aa in the protein sequences, and 86 nucleotides in the nucleotide sequences are invariant. The strongest conservation is found in proximity to a suspected functional site that contains the dipeptide proline-cysteine. Consensus patterns can be defined for the five best conserved blocks and, when used as search motifs, are able to clearly distinguish between the m5C MTases and all other identified proteins in the PIR database. This suggests they may be of use in identifying putative MTases among protein sequences of unknown function.  相似文献   

13.
MOTIVATION: Position specific scoring matrices (PSSMs) corresponding to aligned sequences of homologous proteins are commonly used in homology detection. A PSSM is generated on the basis of one of the homologues as a reference sequence, which is the query in the case of PSI-BLAST searches. The reference sequence is chosen arbitrarily while generating PSSMs for reverse BLAST searches. In this work we demonstrate that the use of multiple PSSMs corresponding to a given alignment and variable reference sequences is more effective than using traditional single PSSMs and hidden Markov models. RESULTS: Searches for proteins with known 3-D structures have been made against three databases of protein family profiles corresponding to known structures: (1) One PSSM per family; (2) multiple PSSMs corresponding to an alignment and variable reference sequences for every family; and (3) hidden Markov models. A comparison of the performances of these three approaches suggests that the use of multiple PSSMs is most effective. CONTACT: ns@mbu.iisc.ernet.in.  相似文献   

14.
We describe a method to identify protein domain boundaries from sequence information alone based on the assumption that hydrophobic residues cluster together in space. SnapDRAGON is a suite of programs developed to predict domain boundaries based on the consistency observed in a set of alternative ab initio three-dimensional (3D) models generated for a given protein multiple sequence alignment. This is achieved by running a distance geometry-based folding technique in conjunction with a 3D-domain assignment algorithm. The overall accuracy of our method in predicting the number of domains for a non-redundant data set of 414 multiple alignments, representing 185 single and 231 multiple-domain proteins, is 72.4 %. Using domain linker regions observed in the tertiary structures associated with each query alignment as the standard of truth, inter-domain boundary positions are delineated with an accuracy of 63.9 % for proteins comprising continuous domains only, and 35.4 % for proteins with discontinuous domains. Overall, domain boundaries are delineated with an accuracy of 51.8 %. The prediction accuracy values are independent of the pair-wise sequence similarities within each of the alignments. These results demonstrate the capability of our method to delineate domains in protein sequences associated with a wide variety of structural domain organisation.  相似文献   

15.
Silva PJ 《Proteins》2008,70(4):1588-1594
Hydrophobic cluster analysis (HCA) has long been used as a tool to detect distant homologies between protein sequences, and to classify them into different folds. However, it relies on expert human intervention, and is sensitive to subjective interpretations of pattern similarities. In this study, we describe a novel algorithm to assess the similarity of hydrophobic amino acid distributions between two sequences. Our algorithm correctly identifies as misattributions several HCA-based proposals of structural similarity between unrelated proteins present in the literature. We have also used this method to identify the proper fold of a large variety of sequences, and to automatically select the most appropriate structure for homology modeling of several proteins with low sequence identity to any other member of the protein data bank. Automatic modeling of the target proteins based on these templates yielded structures with TM-scores (vs. experimental structures) above 0.60, even without further refinement. Besides enabling a reliable identification of the correct fold of an unknown sequence and the choice of suitable templates, our algorithm also shows that whereas most structural classes of proteins are very homogeneous in hydrophobic cluster composition, a tenth of the described families are compatible with a large variety of hydrophobic patterns. We have built a browsable database of every major representative hydrophobic cluster pattern present in each structural class of proteins, freely available at http://www2.ufp.pt/ pedros/HCA_db/index.htm.  相似文献   

16.
T-Coffee (Tree-based consistency objective function for alignment evaluation) is a versatile multiple sequence alignment (MSA) method suitable for aligning most types of biological sequences. The main strength of T-Coffee is its ability to combine third party aligners and to integrate structural (or homology) information when building MSAs. The series of protocols presented here show how the package can be used to multiply align proteins, RNA and DNA sequences. The protein section shows how users can select the most suitable T-Coffee mode for their data set. Detailed protocols include T-Coffee, the default mode, M-Coffee, a meta version able to combine several third party aligners into one, PSI (position-specific iterated)-Coffee, the homology extended mode suitable for remote homologs and Expresso, the structure-based multiple aligner. We then also show how the T-RMSD (tree based on root mean square deviation) option can be used to produce a functionally informative structure-based clustering. RNA alignment procedures are described for using R-Coffee, a mode able to use predicted RNA secondary structures when aligning RNA sequences. DNA alignments are illustrated with Pro-Coffee, a multiple aligner specific of promoter regions. We also present some of the many reformatting utilities bundled with T-Coffee. The package is an open-source freeware available from http://www.tcoffee.org/.  相似文献   

17.
A method for multiple sequence alignment with gaps   总被引:13,自引:0,他引:13  
A method that performs multiple sequence alignment by cyclical use of the standard pairwise Needleman-Wunsch algorithm is presented. The required central processor unit time is of the same order of magnitude as the standard Needleman-Wunsch pairwise implementation. Comparison with the one known case where the optimal multiple sequence alignment has been rigorously determined shows that in practice the proposed method finds the mathematically optimal solution. The more interesting question of the biological usefulness of such multiple sequence alignment over pairwise approaches is assessed using protein families whose X-ray structures are known. The two such cases studied, the subdomains of the ricin B-chain and the S-domains of virus coat proteins, have low pairwise similarity and thus fail to align correctly under standard pairwise sequence comparison. In both cases the multiple sequence alignment produced by the proposed technique, apart from minor deviations at loop regions, correctly predicts the true structural alignment. Thus, given many sequences of low pairwise similarity, the proposed multiple sequence method, can extract any familial similarity and so produce a sequence alignment consistent with the underlying structural homology.  相似文献   

18.
Ubiquitin-like domains are present, apart from ubiquitin-like proteins themselves, in many multidomain proteins involved in different signal transduction processes. The sequence conservation for all ubiquitin superfold family members is rather poor, even between subfamily members, leading to mistakes in sequence alignments using conventional sequence alignment methods. However, a correct alignment is essential, especially for in silico methods that predict binding partners on the basis of sequence and structure. In this study, using 3D-structural information we have generated and manually corrected sequence alignments for proteins of the five ubiquitin superfold subfamilies. On the basis of this alignment, we suggest domains for which structural information will be useful to allow homology modelling. In addition, we have analysed the energetic and electrostatic properties of ubiquitin-like domains in complex with various functional binding proteins using the protein design algorithm FoldX. On the basis of an in silico alanine-scanning mutagenesis, we provide a detailed binding epitope mapping of the hotspots of the ubiquitin domain fold, involved in the interaction with different domains and proteins. Finally, we provide a consensus fingerprint sequence that identifies all sequences described to belong to the ubiquitin superfold family. It is possible that the method that we describe may be applied to other domain families sharing a similar fold but having low levels of sequence homology.  相似文献   

19.
The Protein Identification Resource (PIR) protein sequence data bank was searched for sequence similarity between known proteins and human DNA polymerase beta (Pol beta) or human terminal deoxynucleotidyltransferase (TdT). Pol beta and TdT were found to exhibit amino acid sequence similarity only with each other and not with any other of the 4750 entries in release 12.0 of the PIR data bank. Optimal amino acid sequence alignment of the entire 39-kDa Pol beta polypeptide with the C-terminal two thirds of TdT revealed 24% identical aa residues and 21% conservative aa substitutions. The Monte Carlo score of 12.6 for the entire aligned sequences indicates highly significant aa sequence homology. The hydropathicity profiles of the aligned aa sequences were remarkably similar throughout, suggesting structural similarity of the polypeptides. The most significant regions of homology are aa residues 39-224 and 311-333 of Pol beta vs. aa residues 191-374 and 484-506 of TdT. In addition, weaker homology was seen between a large portion of the 'nonessential' N-terminal end of TdT (aa residues 33-130) and the first region of strong homology between the two proteins (aa residues 31-128 of Pol beta and aa residues 183-280 of TdT), suggestive of genetic duplication within the ancestral gene. On the basis of nucleotide differences between conserved regions of Pol beta and TdT genes (aligned according to optimally aligned aa sequences) it was estimated that Pol beta and TdT diverged on the order of 250 million years ago, corresponding roughly to a time before radiation of mammals and birds.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号