首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Identifying amino acid positions that determine the specific interaction of proteins with small molecule ligands, is required for search of pharmaceutical targets, drug design, and solution of other biotechnology problems. We studied applicability of an original method SPrOS (specificity projection on sequence) developed to recognize functionally significant positions in amino acid sequences. The method allows residues specific to functional subgroups to be determined within the protein family based on their local surroundings in amino acid sequences. The efficiency of the method has been estimated on the protein kinase family. The residues associated with the protein specificity to inhibitors have been predicted. The results have been verified using 3D structures of protein–ligand complexes. Three small molecule inhibitors have been tested. Residues predicted with SPrOS either in contacted the inhibitor or influenced the conformation of the ligand–binding area. Excluding close homologues from the studied set makes it possible to decrease the number of difficult to interpret positions. The expediency of this procedure was determined by the relationship between an inhibitory spectrum and phylogenic partition. Thus, the method efficiency has been confirmed by matching the prediction results with the protein 3D structures.  相似文献   

2.
We identified key residues from the structural alignment of families of protein domains from SCOP which we represented in the form of sparse protein signatures. A signature-generating algorithm (SigGen) was developed and used to automatically identify key residues based on several structural and sequence-based criteria. The capacity of the signatures to detect related sequences from the SWISSPROT database was assessed by receiver operator characteristic (ROC) analysis and jack-knife testing. Test signatures for families from each of the main SCOP classes are described in relation to the quality of the structural alignments, the SigGen parameters used, and their diagnostic performance. We show that automatically generated signatures are potently diagnostic for their family (ROC50 scores typically >0.8), consistently outperform random signatures, and can identify sequence relationships in the "twilight zone" of protein sequence similarity (<40%). Signatures based on 15%-30% of alignment positions occurred most frequently among the best-performing signatures. When alignment quality is poor, sparser signatures perform better, whereas signatures generated from higher-quality alignments of fewer structures require more positions to be diagnostic. Our validation of signatures from the Globin family shows that when sequences from the structural alignment are removed and new signatures generated, the omitted sequences are still detected. The positions highlighted by the signature often correspond (alignment specificity >0.7) to the key positions in the original (non-jack-knifed) alignment. We discuss potential applications of sparse signatures in sequence annotation and homology modeling.  相似文献   

3.
Automatic methods for predicting functionally important residues   总被引:9,自引:0,他引:9  
Sequence analysis is often the first guide for the prediction of residues in a protein family that may have functional significance. A few methods have been proposed which use the division of protein families into subfamilies in the search for those positions that could have some functional significance for the whole family, but at the same time which exhibit the specificity of each subfamily ("Tree-determinant residues"). However, there are still many unsolved questions like the best division of a protein family into subfamilies, or the accurate detection of sequence variation patterns characteristic of different subfamilies. Here we present a systematic study in a significant number of protein families, testing the statistical meaning of the Tree-determinant residues predicted by three different methods that represent the range of available approaches. The first method takes as a starting point a phylogenetic representation of a protein family and, following the principle of Relative Entropy from Information Theory, automatically searches for the optimal division of the family into subfamilies. The second method looks for positions whose mutational behavior is reminiscent of the mutational behavior of the full-length proteins, by directly comparing the corresponding distance matrices. The third method is an automation of the analysis of distribution of sequences and amino acid positions in the corresponding multidimensional spaces using a vector-based principal component analysis. These three methods have been tested on two non-redundant lists of protein families: one composed by proteins that bind a variety of ligand groups, and the other composed by proteins with annotated functionally relevant sites. In most cases, the residues predicted by the three methods show a clear tendency to be close to bound ligands of biological relevance and to those amino acids described as participants in key aspects of protein function. These three automatic methods provide a wide range of possibilities for biologists to analyze their families of interest, in a similar way to the one presented here for the family of proteins related with ras-p21.  相似文献   

4.
Correlated mutation analyses (CMA) on multiple sequence alignments are widely used for the prediction of the function of amino acids. The accuracy of CMA‐based predictions is mainly determined by the number of sequences, by their evolutionary distances, and by the quality of the alignments. These criteria are best met in structure‐based sequence alignments of large super‐families. So far, CMA‐techniques have mainly been employed to study the receptor interactions. The present work shows how a novel CMA tool, called Comulator, can be used to determine networks of functionally related residues in enzymes. These analyses provide leads for protein engineering studies that are directed towards modification of enzyme specificity or activity. As proof of concept, Comulator has been applied to four enzyme super‐families: the isocitrate lyase/phoshoenol‐pyruvate mutase super‐family, the hexokinase super‐family, the RmlC‐like cupin super‐family, and the FAD‐linked oxidases super‐family. In each of those cases networks of functionally related residue positions were discovered that upon mutation influenced enzyme specificity and/or activity as predicted. We conclude that CMA is a powerful tool for redesigning enzyme activity and selectivity. Proteins 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

5.
The maintenance of protein function and structure constrains the evolution of amino acid sequences. This fact can be exploited to interpret correlated mutations observed in a sequence family as an indication of probable physical contact in three dimensions. Here we present a simple and general method to analyze correlations in mutational behavior between different positions in a multiple sequence alignment. We then use these correlations to predict contact maps for each of 11 protein families and compare the result with the contacts determined by crystallography. For the most strongly correlated residue pairs predicted to be in contact, the prediction accuracy ranges from 37 to 68% and the improvement ratio relative to a random prediction from 1.4 to 5.1. Predicted contact maps can be used as input for the calculation of protein tertiary structure, either from sequence information alone or in combination with experimental information. © 1994 John Wiley & Sons, Inc.  相似文献   

6.
The increasing number and diversity of protein sequence families requires new methods to define and predict details regarding function. Here, we present a method for analysis and prediction of functional sub-types from multiple protein sequence alignments. Given an alignment and set of proteins grouped into sub-types according to some definition of function, such as enzymatic specificity, the method identifies positions that are indicative of functional differences by comparison of sub-type specific sequence profiles, and analysis of positional entropy in the alignment. Alignment positions with significantly high positional relative entropy correlate with those known to be involved in defining sub-types for nucleotidyl cyclases, protein kinases, lactate/malate dehydrogenases and trypsin-like serine proteases. We highlight new positions for these proteins that suggest additional experiments to elucidate the basis of specificity. The method is also able to predict sub-type for unclassified sequences. We assess several variations on a prediction method, and compare them to simple sequence comparisons. For assessment, we remove close homologues to the sequence for which a prediction is to be made (by a sequence identity above a threshold). This simulates situations where a protein is known to belong to a protein family, but is not a close relative of another protein of known sub-type. Considering the four families above, and a sequence identity threshold of 30 %, our best method gives an accuracy of 96 % compared to 80 % obtained for sequence similarity and 74 % for BLAST. We describe the derivation of a set of sub-type groupings derived from an automated parsing of alignments from PFAM and the SWISSPROT database, and use this to perform a large-scale assessment. The best method gives an average accuracy of 94 % compared to 68 % for sequence similarity and 79 % for BLAST. We discuss implications for experimental design, genome annotation and the prediction of protein function and protein intra-residue distances.  相似文献   

7.
The sequence and structural analysis of cadherins allow us to find sequence determinants-a few positions in sequences whose residues are characteristic and specific for the structures of a given family. Comparison of the five extracellular domains of classic cadherins showed that they share the same sequence determinants despite only a nonsignificant sequence similarity between the N-terminal domain and other extracellular domains. This allowed us to predict secondary structures and propose three-dimensional structures for these domains that have not been structurally analyzed previously. A new method of assigning a sequence to its proper protein family is suggested: analysis of sequence determinants. The main advantage of this method is that it is not necessary to know all or almost all residues in a sequence as required for other traditional classification tools such as BLAST, FASTA, and HMM. Using the key positions only, that is, residues that serve as the sequence determinants, we found that all members of the classic cadherin family were unequivocally selected from among 80,000 examined proteins. In addition, we proposed a model for the secondary structure of the cytoplasmic domain of cadherins based on the principal relations between sequences and secondary structure multialignments. The patterns of the secondary structure of this domain can serve as the distinguishing characteristics of cadherins.  相似文献   

8.
Structural genomics projects are determining the three-dimensional structure of proteins without full characterization of their function. A critical part of the annotation process involves appropriate knowledge representation and prediction of functionally important residue environments. We have developed a method to extract features from sequence, sequence alignments, three-dimensional structure, and structural environment conservation, and used support vector machines to annotate homologous and nonhomologous residue positions based on a specific training set of residue functions. In order to evaluate this pipeline for automated protein annotation, we applied it to the challenging problem of prediction of catalytic residues in enzymes. We also ranked the features based on their ability to discriminate catalytic from noncatalytic residues. When applying our method to a well-annotated set of protein structures, we found that top-ranked features were a measure of sequence conservation, a measure of structural conservation, a degree of uniqueness of a residue's structural environment, solvent accessibility, and residue hydrophobicity. We also found that features based on structural conservation were complementary to those based on sequence conservation and that they were capable of increasing predictor performance. Using a family nonredundant version of the ASTRAL 40 v1.65 data set, we estimated that the true catalytic residues were correctly predicted in 57.0% of the cases, with a precision of 18.5%. When testing on proteins containing novel folds not used in training, the best features were highly correlated with the training on families, thus validating the approach to nonhomologous catalytic residue prediction in general. We then applied the method to 2781 coordinate files from the structural genomics target pipeline and identified both highly ranked and highly clustered groups of predicted catalytic residues.  相似文献   

9.
EFICAz (Enzyme Function Inference by Combined Approach) is an automatic engine for large-scale enzyme function inference that combines predictions from four different methods developed and optimized to achieve high prediction accuracy: (i) recognition of functionally discriminating residues (FDRs) in enzyme families obtained by a Conservation-controlled HMM Iterative procedure for Enzyme Family classification (CHIEFc), (ii) pairwise sequence comparison using a family specific Sequence Identity Threshold, (iii) recognition of FDRs in Multiple Pfam enzyme families, and (iv) recognition of multiple Prosite patterns of high specificity. For FDR (i.e. conserved positions in an enzyme family that discriminate between true and false members of the family) identification, we have developed an Evolutionary Footprinting method that uses evolutionary information from homofunctional and heterofunctional multiple sequence alignments associated with an enzyme family. The FDRs show a significant correlation with annotated active site residues. In a jackknife test, EFICAz shows high accuracy (92%) and sensitivity (82%) for predicting four EC digits in testing sequences that are <40% identical to any member of the corresponding training set. Applied to Escherichia coli genome, EFICAz assigns more detailed enzymatic function than KEGG, and generates numerous novel predictions.  相似文献   

10.
Dihydrofolate reductase (DHFR) is of significant recent interest as a target for drugs against parasitic and opportunistic infections. Understanding factors which influence DHFR homolog inhibitor specificity is critical for the design of compounds that selectively target DHFRs from pathogenic organisms over the human homolog. This paper presents a novel approach for predicting residues involved in ligand discrimination in a protein family using DHFR as a model system. In this approach, the relationship between inhibitor specificity and amino acid composition for sets of protein homolog pairs is examined. Similar inhibitor specificity profiles correlate with increased sequence homology at specific alignment positions. Residue positions that exhibit the strongest correlations are predicted as specificity determinants. Correlation analysis requires a quantitative measure of similarity in inhibitor specificity (S(lig)) for a pair of homologs. To this end, a method of calculating S(lig) values using K(I) values for the two homologs against a set of inhibitors as input was developed. Correlation analysis of S(lig) values to amino acid sequence similarity scores - obtained via multiple sequence alignments - was performed for individual residue alignment positions and sets of residues on 13 DHFRs. Eighteen alignment positions were identified with a strong correlation of S(lig) to sequence similarity. Of these, three lie in the active site; four are located proximal to the active site, four are clustered together in the adenosine binding domain and five on the βFβG loop. The validity of the method is supported by agreement between experimental findings and current predictions involving active site residues.  相似文献   

11.
12.
13.
We investigate the conservation of amino acid residue sequences in 21 DNA-binding protein families and study the effects that mutations have on DNA-sequence recognition. The observations are best understood by assigning each protein family to one of three classes: (i) non-specific, where binding is independent of DNA sequence; (ii) highly specific, where binding is specific and all members of the family target the same DNA sequence; and (iii) multi-specific, where binding is also specific, but individual family members target different DNA sequences. Overall, protein residues in contact with the DNA are better conserved than the rest of the protein surface, but there is a complex underlying trend of conservation for individual residue positions. Amino acid residues that interact with the DNA backbone are well conserved across all protein families and provide a core of stabilising contacts for homologous protein-DNA complexes. In contrast, amino acid residues that interact with DNA bases have variable levels of conservation depending on the family classification. In non-specific families, base-contacting residues are well conserved and interactions are always found in the minor groove where there is little discrimination between base types. In highly specific families, base-contacting residues are highly conserved and allow member proteins to recognise the same target sequence. In multi-specific families, base-contacting residues undergo frequent mutations and enable different proteins to recognise distinct target sequences. Finally, we report that interactions with bases in the target sequence often follow (though not always) a universal code of amino acid-base recognition and the effects of amino acid mutations can be most easily understood for these interactions.  相似文献   

14.
Protein interactions are fundamental to the functioning of cells, and high throughput experimental and computational strategies are sought to map interactions. Predicting interaction specificity, such as matching members of a ligand family to specific members of a receptor family, is largely an unsolved problem. Here we show that by using evolutionary relationships within such families, it is possible to predict their physical interaction specificities. We introduce the computational method of matrix alignment for finding the optimal alignment between protein family similarity matrices. A second method, 3D embedding, allows visualization of interacting partners via spatial representation of the protein families. These methods essentially align phylogenetic trees of interacting protein families to define specific interaction partners. Prediction accuracy depends strongly on phylogenetic tree complexity, as measured with information theoretic methods. These results, along with simulations of protein evolution, suggest a model for the evolution of interacting protein families in which interaction partners are duplicated in coupled processes. Using these methods, it is possible to successfully find protein interaction specificities, as demonstrated for >18 protein families.  相似文献   

15.
This paper discusses the benefit of mapping paired cysteine mutation patterns as a guide to identifying the positions of protein disulfide bonds. This information can facilitate the computer modeling of protein tertiary structure. First, a simple, paired natural-cysteine-mutation map is presented that identifies the positions of putative disulfide bonds in protein families. The method is based on the observation that if, during the process of evolution, a disulfide-bonded cysteine residue is not conserved, then it is likely that its counterpart will also be mutated. For each target protein, protein databases were searched for the primary amino acid sequences of all known members of distinct protein families. Primary sequence alignment was carried out using PileUp algorithms in the GCG package. To search for correlated mutations, we listed only the positions where cysteine residues were highly conserved and emphasized the mutated residues. In proteins of known three-dimensional structure, a striking pattern of paired cysteine mutations correlated with the positions of known disulfide bridges. For proteins of unknown architecture, the mutation maps showed several positions where disulfide bridging might occur.  相似文献   

16.
The Receptor Activity-Modifying Proteins (RAMP) is a family constituted by a single N-terminal extracellular domain and a transmembrane region ending in a short cytoplasmic region. Due to their specific role in modulating the specificity of ligand binding in many class II G-Protein Coupled Receptors, these proteins are awaiting further characterization and elucidation of their structure. This was the aim of this study. We were able to find 13 new RAMP sequences including new protein sequences and predicted peptides from Expressed Sequence Tags and genomic DNA, all of them annotated in databases such as GeneBank, EMBL, Swissprot and ENSEMBL. The predicted peptides came from an array of different organisms including Teleostei and Elasmobranchii species, of which the latter was the most ancient RAMP sequence found. It was also possible to efficiently predict the 1D structure of the extracellular RAMP domain and its 3D conformation was inferred through a combination of bioinformatic approaches such as threading. The 1D structure of the extracellular RAMP domain was predicted as three alpha-helix domain. The most highly conserved residues in the RAMP family were found to be involved in critical functions. Bioinformatic data mining and multiple sequence alignment analysis were crucial for improving the characterization of RAMP proteins and prediction of their 1D and 3D configurations.  相似文献   

17.
MOTIVATION: The quality of a model structure derived from a comparative modeling procedure is dictated by the accuracy of the predicted sequence-template alignment. As the sequence-template pairs are increasingly remote in sequence relationship, the prediction of the sequence-template alignments becomes increasingly problematic with sequence alignment methods. Structural information of the template, used in connection with the sequence relationship of the sequence-template pair, could significantly improve the accuracy of the sequence-template alignment. In this paper, we describe a sequence-template alignment method that integrates sequence and structural information to enhance the accuracy of sequence-template alignments for distantly related protein pairs. RESULTS: The structure-dependent sequence alignment (SDSA) procedure was optimized for coverage and accuracy on a training set of 412 protein pairs; the structures for each of the training pairs are similar (RMSD< approximately 4A) but the sequence relationship is undetectable (average pair-wise sequence identity = 8%). The optimized SDSA procedure was then applied to extend PSI-BLAST local alignments by calculating the global alignments under the constraint of the residue pairs in the local alignments. This composite alignment procedure was assessed with a testing set of 1421 protein pairs, of which the pair-wise structures are similar (RMSD< approximately 4A) but the sequences are marginally related at best in each pair (average pair-wise sequence identity = 13%). The assessment showed that the composite alignment procedure predicted more aligned residues pairs with an average of 27% increase in correctly aligned residues over the standard PSI-BLAST alignments for the protein pairs in the testing set.  相似文献   

18.
19.
20.
Most recent protein secondary structure prediction methods use sequence alignments to improve the prediction quality. We investigate the relationship between the location of secondary structural elements, gaps, and variable residue positions in multiple sequence alignments. We further investigate how these relationships compare with those found in structurally aligned protein families. We show how such associations may be used to improve the quality of prediction of the secondary structure elements, using the Quadratic-Logistic method with profiles. Furthermore, we analyze the extent to which the number of homologous sequences influences the quality of prediction. The analysis of variable residue positions shows that surprisingly, helical regions exhibit greater variability than do coil regions, which are generally thought to be the most common secondary structure elements in loops. However, the correlation between variability and the presence of helices does not significantly improve prediction quality. Gaps are a distinct signal for coil regions. Increasing the coil propensity for those residues occurring in gap regions enhances the overall prediction quality. Prediction accuracy increases initially with the number of homologues, but changes negligibly as the number of homologues exceeds about 14. The alignment quality affects the prediction more than other factors, hence a careful selection and alignment of even a small number of homologues can lead to significant improvements in prediction accuracy.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号