共查询到20条相似文献,搜索用时 0 毫秒
1.
O. V. Galzitskaya N. V. Dovidchenko M. Yu. Lobanov S. O. Garbuzynskiy 《Molecular Biology》2006,40(1):96-106
A database of 452 two-domain proteins with less than 25% homology was constructed. One half of the database was used to obtain statistics on the appearance of amino acid residues at domain boundaries. Small and hydrophilic residues (proline, glycine, asparagine, glutamic acid, arginine, etc.) occurred more often at domain boundaries than in total proteins. Hydrophobic residues (tryptophan, methionine, phenylalanine, etc.) were rarer at domain boundaries than in total proteins. Probability scales of amino acid appearance in boundary-flanking regions were constructed with these statistics and used to predict the domain boundaries in proteins of the other half of the database. The probability scale obtained by averaging the appearance of amino acids over an 8-residue region (±4 residues from the real domain boundaries) yielded the best results: domain boundaries were predicted within 40 residues of the real boundary in 57% of proteins and within 20 residues of the real boundary in 41% of proteins. The probability scale was used to predict the domain boundaries in proteins with unknown structures (CASP6). 相似文献
2.
Protein sequence conservation is a powerful and widely used indicator for predicting catalytic residues from enzyme sequences.
In order to incorporate amino acid similarity into conservation measures, one attempt is to group amino acids into disjoint
sets. In this paper, based on the overlapping amino acids classification proposed by Taylor, we define the relative entropy
of Venn diagram (RVD) and RVD2. In large-scale testing, we demonstrate that RVD and RVD2 perform better than many existing
conservation measures in identifying catalytic residues, especially than the commonly used relative entropy (RE) and Jensen–Shannon
divergence (JSD). To further improve RVD and RVD2, two new conservation measures are obtained by combining them with the classical
JSD. Experimental results suggest that these combination measures have excellent performances in identifying catalytic residues. 相似文献
3.
4.
Intrinsically disordered regions of proteins are known to have many functional roles in cell signaling and regulatory pathways. The altered expression of these proteins due to mutations is associated with various diseases. Currently, most of the available methods focus on predicting the disordered proteins or the disordered regions in a protein. On the other hand, methods developed for predicting protein disorder on mutation showed a poor performance with a maximum accuracy of 70%. Hence, in this work, we have developed a novel method to classify the disorder-related amino acid substitutions using amino acid properties, substitution matrices, and the effect of neighboring residues that showed an accuracy of 90.0% with a sensitivity and specificity of 94.9 and 80.6%, respectively, in 10-fold cross-validation. The method was evaluated with a test set of 20% data using 10 iterations, which showed an average accuracy of 88.9%. Furthermore, we systematically analyzed the features responsible for the better performance of our method and observed that neighboring residues play an important role in defining the disorder of a given residue in a protein sequence. We have developed a prediction server to identify disorder-related mutations, and it is available at http://www.iitm.ac.in/bioinfo/DIM_Pred/. 相似文献
5.
Prediction of unfolded segments in a protein sequence based on amino acid composition 总被引:1,自引:0,他引:1
MOTIVATION: Partially and wholly unstructured proteins have now been identified in all kingdoms of life--more commonly in eukaryotic organisms. This intrinsic disorder is related to certain critical functions. Apart from their fundamental interest, unstructured regions in proteins may prevent crystallization. Therefore, the prediction of disordered regions is an important aspect for the understanding of protein function, but may also help to devise genetic constructs. RESULTS: In this paper we present a computational tool for the detection of unstructured regions in proteins based on two properties of unfolded fragments: (1) disordered regions have a biased composition and (2) they usually contain either small or no hydrophobic clusters. In order to quantify these two facts we first calculate the amino acid distributions in structured and unstructured regions. Using this distribution, we calculate for a given sequence fragment the probability to be part of either a structured or an unstructured region. For each amino acid, the distance to the nearest hydrophobic cluster is also computed. Using these three values along a protein sequence allows us to predict unstructured regions, with very simple rules. This method requires only the primary sequence, and no multiple alignment, which makes it an adequate method for orphan proteins. AVAILABILITY: http://genomics.eu.org/ 相似文献
6.
Prediction of prolyl residues in cis-conformation in protein structures on the basis of the amino acid sequence 总被引:2,自引:0,他引:2
In proteins most peptide bonds are in trans-conformation: the torsion angle omega = 180 degrees. Only few show cis-conformation in known protein structures (omega = 0 degrees). Most of them are prolyl residues. About 6% of about 4000 prolyl residues are in cis-conformation. Between trans- and cis-prolyl residues significant differences are observed in the surrounding sequences. E.g. there are large amounts of aromatic residues N-terminally in case of cis-prolyl residues, but in the case of trans-prolyl residues more aromatic amino acids occur C-terminally. But in all cases there are only complex patterns which are indicative of cis- and trans-conformation, respectively. Considering the neighbours (+/- 6 residues) of prolyl residues and their physicochemical properties we find 6 different patterns which allow one to assign correctly about 75% of known cis-structured prolyl residues, whereby no false positive one is predicted. 相似文献
7.
8.
Jia P Qian Z Zeng Z Cai Y Li Y 《Biochemical and biophysical research communications》2007,357(2):366-370
Assigning subcellular localization (SL) to proteins is one of the major tasks of functional proteomics. Despite the impressive technical advances of the past decades, it is still time-consuming and laborious to experimentally determine SL on a high throughput scale. Thus, computational predictions are the preferred method for large-scale assignment of protein SL, and if appropriate, followed up by experimental studies. In this report, using a machine learning approach, the Nearest Neighbor Algorithm (NNA), we developed a prediction system for protein SL in which we incorporated a protein functional domain profile. The overall accuracy achieved by this system is 93.96%. Furthermore, comparisons with other methods have been conducted to demonstrate the validity and efficiency of our prediction system. We also provide an implementation of our Subcellular Location Prediction System (SLPS), which is available at http://pcal.biosino.org. 相似文献
9.
Prediction of protein subcellular multi-localization based on the general form of Chou's pseudo amino acid composition 总被引:1,自引:0,他引:1
Many proteins bear multi-locational characteristics, and this phenomenon is closely related to biological function. However, most of the existing methods can only deal with single-location proteins. Therefore, an automatic and reliable ensemble classifier for protein subcellular multi-localization is needed. We propose a new ensemble classifier combining the KNN (K-nearest neighbour) and SVM (support vector machine) algorithms to predict the subcellular localization of eukaryotic, Gram-negative bacterial and viral proteins based on the general form of Chou's pseudo amino acid composition, i.e., GO (gene ontology) annotations, dipeptide composition and AmPseAAC (Amphiphilic pseudo amino acid composition). This ensemble classifier was developed by fusing many basic individual classifiers through a voting system. The overall prediction accuracies obtained by the KNN-SVM ensemble classifier are 95.22, 93.47 and 80.72% for the eukaryotic, Gram-negative bacterial and viral proteins, respectively. Our prediction accuracies are significantly higher than those by previous methods and reveal that our strategy better predicts subcellular locations of multi-location proteins. 相似文献
10.
An empirical relation between the amino acid composition and three-dimensional folding pattern of several classes of proteins has been determined. Computer simulated neural networks have been used to assign proteins to one of the following classes based on their amino acid composition and size: (1) 4α-helical bundles, (2) parallel (α/β)8 barrels, (3) nucleotide binding fold, (4) immunoglobulin fold, or (5) none of these. Networks trained on the known crystal structures as well as sequences of closely related proteins are shown to correctly predict folding classes of proteins not represented in the training set with an average accuracy of 87%. Other folding motifs can easily be added to the prediction scheme once larger databases become available. Analysis of the neural network weights reveals that amino acids favoring prediction of a folding class are usually over represented in that class and amino acids with unfavorable weights are underrepresented in composition. The neural networks utilize combinations of these multiple small variations in amino acid composition in order to make a prediction. The favorably weighted amino acids in a given class also form the most intramolecular interactions with other residues in proteins of that class. A detailed examination of the contacts of these amino acids reveals some general patterns that may help stabilize each folding class. © 1993 Wiley-Liss, Inc. 相似文献
11.
The study of rat proteins is an indispensable task in experimental medicine and drug development. The function of a rat protein is closely related to its subcellular location. Based on the above concept, we construct the benchmark rat proteins dataset and develop a combined approach for predicting the subcellular localization of rat proteins. From protein primary sequence, the multiple sequential features are obtained by using of discrete Fourier analysis, position conservation scoring function and increment of diversity, and these sequential features are selected as input parameters of the support vector machine. By the jackknife test, the overall success rate of prediction is 95.6% on the rat proteins dataset. Our method are performed on the apoptosis proteins dataset and the Gram-negative bacterial proteins dataset with the jackknife test, the overall success rates are 89.9% and 96.4%, respectively. The above results indicate that our proposed method is quite promising and may play a complementary role to the existing predictors in this area. 相似文献
12.
Detailed analyses of protein structures provide an opportunity to understand conformation and function in terms of amino acid sequence and composition. In this work, we have systematically analyzed the characteristic features of the amino acid residues found in alpha-helical coiled-coils and, in so doing, have developed indices for their properties, conformational parameters, surrounding hydrophobicity and flexibility. As expected, there is preference for hydrophobic (Ala, Leu), positive (Lys, Arg) and negatively (Glu) charged residues in coiled-coil domains. However, the surrounding hydrophobicity of residues in coiled-coil domains is significantly less than that for residues in other regions of coiled-coil proteins. The analysis of temperature factors in coiled-coil proteins shows that the residues in these domains are more stable than those in other regions. Further, we have delineated the medium- and long-range contacts in coiled-coil domains and compared the results with those obtained for other (non-coiled-coil) parts of the same proteins and non-coiled-coil helical segments of globular proteins. The residues in coiled-coil domains are largely influenced by medium-range contacts, whereas long-range interactions play a dominant role in other regions of these same proteins as well as in non-coiled-coil helices. We have also revealed the preference of amino acid residues to form cation-pi interactions and we found that Arg is more likely to form such interactions than Lys. The parameters developed in this work can be used to understand the folding and stability of coiled-coil proteins in general. 相似文献
13.
The amino acid composition and sequence in primary structure of 180 proteins have been studied. It is shown that the distribution of amino acid residues is near to a random one, i.e. it is determined by the amino acid composition. The ratio between statistical and unique character of protein primary structures has been discussed. The amino acid sequence is suggested to be unique in fibrous proteins. In contrast the amino acid sequence in globular proteins is a statistical one. The statistical character of amino acids distribution in globular proteins explains the possibility of sensible text generation under the frame shift mutations, deletions and insertions. 相似文献
14.
We have performed a statistical analysis of unstructured amino acid residues in protein structures available in the databank of protein structures. Data on the occurrence of disordered regions at the ends and in the middle part of protein chains have been obtained: in the regions near the ends (at distance less than 30 residues from the N- or C-terminus), there are 66% of unstructured residues (38% are near the N-terminus and 28% are near the C-terminus), although these terminal regions include only 23% of the amino acid residues. The frequencies of occurrence of unstructured residues have been calculated for each of 20 types in different positions in the protein chain. It has been shown that relative frequencies of occurrence of unstructured residues of 20 types at the termini of protein chains differ from the ones in the middle part of the protein chain; amino acid residues of the same type have different probabilities to be unstructured in the terminal regions and in the middle part of the protein chain. The obtained frequencies of occurrence of unstructured residues in the middle part of the protein chain have been used as a scale for predicting disordered regions from amino acid sequence using the method (FoldUnfold) previously developed by us. This scale of frequencies of occurrence of unstructured residues correlates with the contact scale (previously developed by us and used for the same purpose) at a level of 95%. Testing the new scale on a database of 427 unstructured proteins and 559 completely structured proteins has shown that this scale can be successfully used for the prediction of disordered regions in protein chains. 相似文献
15.
Jen Tsi Yang 《Journal of Protein Chemistry》1996,15(2):185-191
The conformational parametersP
k
for each amino acid species (j=1–20) of sequential peptides in proteins are presented as the product ofP
i,k
, wherei is the number of the sequential residues in thekth conformational state (k=-helix,-sheet,-turn, or unordered structure). Since the average parameter for ann-residue segment is related to the average probability of finding the segment in the kth state, it becomes a geometric mean of (P
k
)av=(P
i,k
)
1/n
with amino acid residuei increasing from 1 ton. We then used ln(Pk)av to convert a multiplicative process to a summation, i.e., ln(P
k
)
av
=(1/n)P
i,k
(i=1 ton) for ease of operation. However, this is unlike the popular Chou-Fasman algorithm, which has the flaw of using the arithmetic mean for relative probabilities. The Chou-Fasman algorithm happens to be close to our calculations in many cases mainly because the difference between theirP
k
and our InP
k
is nearly constant for about one-half of the 20 amino acids. When stronger conformation formers and breakers exist, the difference become larger and the prediction at the N- and C-terminal-helix or-sheet could differ. If the average conformational parameters of the overlapping segments of any two states are too close for a unique solution, our calculations could lead to a different prediction. 相似文献
16.
Background
The identification of protein domains plays an important role in protein structure comparison. Domain query size and composition are critical to structure similarity search algorithms such as the Vector Alignment Search Tool (VAST), the method employed for computing related protein structures in NCBI Entrez system. Currently, domains identified on the basis of structural compactness are used for VAST computations. In this study, we have investigated how alternative definitions of domains derived from conserved sequence alignments in the Conserved Domain Database (CDD) would affect the domain comparisons and structure similarity search performance of VAST. 相似文献17.
Thermophilic organisms produce proteins of exceptional stability. To understand protein thermostability at the molecular level we studied a pair of cold shock proteins, one of mesophilic and one of thermophilic origin, by systematic mutagenesis. Although the two proteins differ in sequence at 12 positions, two surface-exposed residues are responsible for the increase in stability of the thermophilic protein (by 15.8 kJ mol-1 at 70 degrees C). 11.5 kJ mol-1 originate from a predominantly electrostatic contribution of Arg 3 and 5.2 kJ mol-1 from hydrophobic interactions of Leu 66 at the carboxy terminus. The mesophilic protein could be converted to a highly thermostable form by changing the Glu residues at positions 3 and 66 to Arg and Leu, respectively. The variation of surface residues may thus provide a simple and powerful approach for increasing the thermostability of a protein. 相似文献
18.
Functional dissection of cdc37: characterization of domain structure and amino acid residues critical for protein kinase binding 总被引:4,自引:0,他引:4
Hsp90 and its co-chaperone Cdc37 facilitate the folding and activation of numerous protein kinases. In this report, we examine the structure-function relationships that regulate the interaction of Cdc37 with Hsp90 and with an Hsp90-dependent kinase, the heme-regulated eIF2alpha kinase (HRI). Limited proteolysis of native and recombinant Cdc37, in conjunction with MALDI-TOF mass spectrometry analysis of peptide fragments and peptide microsequencing, indicates that Cdc37 is comprised of three discrete domains. The N-terminal domain (residues 1-126) interacts with client HRI molecules. Cdc37's middle domain (residues 128-282) interacts with Hsp90, but does not bind to HRI. The C-terminal domain of Cdc37 (residues 283-378) does not bind Hsp90 or kinase, and no functions were ascribable to this domain. Functional assays did, however, suggest that residues S127-G163 of Cdc37 serve as an interdomain switch that modulates the ability of Cdc37 to sense Hsp90's conformation and thereby mediate Hsp90's regulation of Cdc37's kinase-binding activity. Additionally, scanning alanine mutagenesis identified four amino acid residues at the N-terminus of Cdc37 that are critical for high-affinity binding of Cdc37 to client HRI molecules. One mutation, Cdc37/W7A, also implicated this region as an interpreter of Hsp90's conformation. Results illuminate the specific Cdc37 motifs underlying the allosteric interactions that regulate binding of Hsp90-Cdc37 to immature kinase molecules. 相似文献
19.
The multidimensional statistical technique of discriminant analysis is used to allocate amino acid sequences to one of four secondary structural classes: high α content, high β content, mixed α and β, low content of ordered structure. Discrimination is based on four attributes: estimates of percentages of α and β structures, and regular variations in the hydrophobic values of residues along the sequence, occurring with periods of 2 and 3.6 residues. The reliability of the method, estimated by classifying 138 sequences from the Brookhaven Protein Data Bank, is 80%, with no misallocations between α-rich and β-rich classes. The reliability can be increased to 84% by making no allocation for proteins classified with odds close to 1. Classification using previously developed secondary structural prediction methods is considerably less reliable, the best result being 64% obtained using predictions based on the Delphi method. 相似文献
20.
The ability to alter protein structure by site-directed mutagenesis has revolutionized biochemical research. Controlled mutations at the DNA level, before protein translation, are now routine. These techniques allow specific, high fidelity interconversion largely between 20 natural, proteinogenic amino acids. Nonetheless, there is a need to incorporate other amino acids, both natural and unnatural, that are not accessible using standard site-directed mutagenesis and expression systems. Post-translational chemistry offers access to these side chains. Nearly half a century ago, the idea of a 'chemical mutation' was proposed and the interconversion between amino acid side chains was demonstrated on select proteins. In these isolated examples, a powerful proof-of-concept was demonstrated. Here, we revive the idea of chemical mutagenesis and discuss the prospect of its general application in protein science. In particular, we consider amino acids that are chemical precursors to a functional set of other side chains. Among these, dehydroalanine has much potential. There are multiple methods available for dehydroalanine incorporation into proteins and this residue is an acceptor for a variety of nucleophiles. When used in conjunction with standard genetic techniques, chemical mutagenesis may allow access to natural, modified, and unnatural amino residues on translated, folded proteins. 相似文献