首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Based on the analysis of the proton-proton distance dependences from the conformational characteristics of the L-amino acid residues, the correlation diagram of the NOE cross peak intensity waited values with the regions of the sterically allowed space (phi, psi) was proposed. The method for determining the dihedral angles phi, psi values using the information about NOE cross peak intensities was elaborated. By the model spectral NMR parameters of the bovine pancreatic trypsin inhibitor, it is shown that the accuracy of the angles phi, psi determination exceed the corresponding accuracy provided by other methods of the structural interpretation of the two-dimensional NMR spectroscopy data. The analysis of the waited spectral NMR parameters for the different types of protein regular secondary structures and beta-turns was performed.  相似文献   

2.
3.
Lee S  Lee BC  Kim D 《Proteins》2006,62(4):1107-1114
Knowing protein structure and inferring its function from the structure are one of the main issues of computational structural biology, and often the first step is studying protein secondary structure. There have been many attempts to predict protein secondary structure contents. Previous attempts assumed that the content of protein secondary structure can be predicted successfully using the information on the amino acid composition of a protein. Recent methods achieved remarkable prediction accuracy by using the expanded composition information. The overall average error of the most successful method is 3.4%. Here, we demonstrate that even if we only use the simple amino acid composition information alone, it is possible to improve the prediction accuracy significantly if the evolutionary information is included. The idea is motivated by the observation that evolutionarily related proteins share the similar structure. After calculating the homolog-averaged amino acid composition of a protein, which can be easily obtained from the multiple sequence alignment by running PSI-BLAST, those 20 numbers are learned by a multiple linear regression, an artificial neural network and a support vector regression. The overall average error of method by a support vector regression is 3.3%. It is remarkable that we obtain the comparable accuracy without utilizing the expanded composition information such as pair-coupled amino acid composition. This work again demonstrates that the amino acid composition is a fundamental characteristic of a protein. It is anticipated that our novel idea can be applied to many areas of protein bioinformatics where the amino acid composition information is utilized, such as subcellular localization prediction, enzyme subclass prediction, domain boundary prediction, signal sequence prediction, and prediction of unfolded segment in a protein sequence, to name a few.  相似文献   

4.
本研究系统分析了酸性、碱性和中性酶在二级结构氨基酸组成上的差异。结果发现在形成特定二级结构过程中,酸性酶和碱性酶有着不同的氨基酸使用偏向;同时,在酸性和碱性酶中,中性氨基酸和侧链微小的氨基酸含量明显较高,这可能是它们适应极端pH的普遍机制。基于此,提出了一种提取蛋白质序列特征值的新方法,其10倍交叉验证的精度可达80.3%。与其他常见特征值提取方法相比,其精度提高了9.4%到18.7%不等;而随机森林算法比其他机器学习算法识别精度也高出2.7%到21.8%不等。  相似文献   

5.
We present a new method, secondary structure prediction by deviation parameter (SSPDP) for predicting the secondary structure of proteins from amino acid sequence. Deviation parameters (DP) for amino acid singlets, doublets and triplets were computed with respect to secondary structural elements of proteins based on the dictionary of secondary structure prediction (DSSP)-generated secondary structure for 408 selected nonhomologous proteins. To the amino acid triplets which are not found in the selected dataset, a DP value of zero is assigned with respect to the secondary structural elements of proteins. The total number of parameters generated is 15,432, in the possible parameters of 25,260. Deviation parameter is complete with respect to amino acid singlets, doublets, and partially complete with respect to amino acid triplets. These generated parameters were used to predict secondary structural elements from amino acid sequence. The secondary structure predicted by our method (SSPDP) was compared with that of single sequence (NNPREDICT) and multiple sequence (PHD) methods. The average value of the percentage of prediction accuracy for αhelix by SSPDP, NNPREDICT and PHD methods was found to be 57%, 44% and 69% respectively for the proteins in the selected dataset. For Β-strand the prediction accuracy is found to be 69%, 21% and 53% respectively by SSPDP, NNPREDICT and PHD methods. This clearly indicates that the secondary structure prediction by our method is as good as PHD method but much better than NNPREDICT method.  相似文献   

6.
We have used the occluded surface algorithm to estimate the packing of both buried and exposed amino acid residues in protein structures. This method works equally well for buried residues and solvent-exposed residues in contrast to the commonly used Voronoi method that works directly only on buried residues. The atomic packing of individual globular proteins may vary significantly from the average packing of a large data set of globular proteins. Here, we demonstrate that these variations in protein packing are due to a complex combination of protein size, secondary structure composition and amino acid composition. Differences in protein packing are conserved in protein families of similar structure despite significant sequence differences. This conclusion indicates that quality assessments of packing in protein structures should include a consideration of various parameters including the packing of known homologous proteins. Also, modeling of protein structures based on homologous templates should take into account the packing of the template protein structure.  相似文献   

7.
8.
We present a new method, secondary structure prediction by deviation parameter (SSPDP) for predicting the secondary structure of proteins from amino acid sequence. Deviation parameters (DP) for amino acid singlets, doublets and triplets were computed with respect to secondary structural elements of proteins based on the dictionary of secondary structure prediction (DSSP)-generated secondary structure for 408 selected non-homologous proteins. To the amino acid triplets which are not found in the selected dataset, a DP value of zero is assigned with respect to the secondary structural elements of proteins. The total number of parameters generated is 15,432, in the possible parameters of 25,260. Deviation parameter is complete with respect to amino acid singlets, doublets, and partially complete with respect to amino acid triplets. These generated parameters were used to predict secondary structural elements from amino acid sequence. The secondary structure predicted by our method (SSPDP) was compared with that of single sequence (NNPREDICT) and multiple sequence (PHD) methods. The average value of the percentage of prediction accuracy for a helix by SSPDP, NNPREDICT and PHD methods was found to be 57%, 44% and 69% respectively for the proteins in the selected dataset. For b-strand the prediction accuracy is found to be 69%, 21% and 53% respectively by SSPDP, NNPREDICT and PHD methods. This clearly indicates that the secondary structure prediction by our method is as good as PHD method but much better than NNPREDICT method.  相似文献   

9.
10.
A Iu Kuz'minov 《Biofizika》1987,32(2):206-209
The present paper deals with determination of the relationship between the order of the arrangement of amino acids in comparatively short-range oligopeptides (tetrapeptides) and their conformational potentialities. It is shown that the spatial and conformational possibilities of the tetrapeptides composed of the same amino acid residues exhibit high sensibility to their mutual arrangement, i. e. to the amino acid sequence. A detailed conformation analysis vividly demonstrated that the difference in conformational possibilities is manly determined by different conditions of realization of residual interactions. It is shown convincingly that energetic differences of the fragments are due to different interaction contributions for each of the considered fragments.  相似文献   

11.
Correspondence analysis of 28 proteomes selected to span the entire realm of prokaryotes revealed universal biases in the proteins' amino acid distribution. Integral Inner Membrane Proteins always form an individual cluster, which can then be used to predict protein localisation in unknown proteomes, independently of the organism's biotope or kingdom. Orphan proteins are consistently rich in aromatic residues. Another bias is also ubiquitous: the amino acid composition is driven by the G + C content of the first codon position. An unexpected bias is driven, in many proteomes, by the AAN box of the genetic code, suggesting some functional biochemical relationship between asparagine and lysine. Less-significant biases are driven by the rare amino acids, cysteine and tryptophan. Some allow identification of species-specific functions or localisation such as surface or exported proteins. Errors in genome annotations are also revealed by correspondence analysis, making it useful for quality control and correction.  相似文献   

12.
13.
Understating the adaptation mechanism of enzymes to pH extremes and discriminating them is a challenging task and would help to design stable enzymes. In this work, we have systematically analyzed the secondary structure amino acid compositions of 105 acidic and 111 alkaline enzymes, respectively. We found that the propensity of the individual residues to participate in different secondary structures might be a general stability mechanism for their adaptation to pH extremes. Based on it, we present a secondary structure amino acid composition method for extracting useful features from sequence, and a novel ensemble classifier named random forest was used. The overall prediction accuracy evaluated by the 10-fold cross-validation reached 90.7%. Comparing our method with other feature extraction methods, the improvement of the overall prediction accuracy ranged from 5.5% to 21.2%. The random forests algorithm also outperformed other machine learning techniques with an improvement ranging from 3.2% to 19.9%.  相似文献   

14.
The amino acid compositions of proteins from halophilic archaea were compared with those from non-halophilic mesophiles and thermophiles, in terms of the protein surface and interior, on a genome-wide scale. As we previously reported for proteins from thermophiles, a biased amino acid composition also exists in halophiles, in which an abundance of acidic residues was found on the protein surface as compared to the interior. This general feature did not seem to depend on the individual protein structures, but was applicable to all proteins encoded within the entire genome. Unique protein surface compositions are common in both halophiles and thermophiles. Statistical tests have shown that significant surface compositional differences exist among halophiles, non-halophiles, and thermophiles, while the interior composition within each of the three types of organisms does not significantly differ. Although thermophilic proteins have an almost equal abundance of both acidic and basic residues, a large excess of acidic residues in halophilic proteins seems to be compensated by fewer basic residues. Aspartic acid, lysine, asparagine, alanine, and threonine significantly contributed to the compositional differences of halophiles from meso- and thermophiles. Among them, however, only aspartic acid deviated largely from the expected amount estimated from the dinucleotide composition of the genomic DNA sequence of the halophile, which has an extremely high G+C content (68%). Thus, the other residues with large deviations (Lys, Ala, etc.) from their non-halophilic frequencies could have arisen merely as "dragging effects" caused by the compositional shift of the DNA, which would have changed to increase principally the fraction of aspartic acid alone.  相似文献   

15.
Abstract: Intact neurofilaments were isolated from bovine spinal cord white matter, washed by sedimentation in 0.1 m -NaCl, and extracted with 8 m -urea. Solubilized neurofilament triplet proteins of molecular weights approximately 68,000 (P68), 150,000 (P150), and 200,000 (P200) were purified by preparative electrophoresis, using an LKB 7900 Uniphor apparatus. The method provides for an enhanced yield of purified protein and has markedly reduced admixture of electrophoresed protein with acrylamide and associated protein contaminants. Amino acid compositions of the purified neurofilament triplet proteins are reported and compared.  相似文献   

16.
The growth temperature adaptation of six model proteins has been studied in 42 microorganisms belonging to eubacterial and archaeal kingdoms, covering optimum growth temperatures from 7 to 103 degrees C. The selected proteins include three elongation factors involved in translation, the enzymes glyceraldehyde-3-phosphate dehydrogenase and superoxide dismutase, the cell division protein FtsZ. The common strategy of protein adaptation from cold to hot environments implies the occurrence of small changes in the amino acid composition, without altering the overall structure of the macromolecule. These continuous adjustments were investigated through parameters related to the amino acid composition of each protein. The average value per residue of mass, volume and accessible surface area allowed an evaluation of the usage of bulky residues, whereas the average hydrophobicity reflected that of hydrophobic residues. The specific proportion of bulky and hydrophobic residues in each protein almost linearly increased with the temperature of the host microorganism. This finding agrees with the structural and functional properties exhibited by proteins in differently adapted sources, thus explaining the great compactness or the high flexibility exhibited by (hyper)thermophilic or psychrophilic proteins, respectively. Indeed, heat-adapted proteins incline toward the usage of heavier-size and more hydrophobic residues with respect to mesophiles, whereas the cold-adapted macromolecules show the opposite behavior with a certain preference for smaller-size and less hydrophobic residues. An investigation on the different increase of bulky residues along with the growth temperature observed in the six model proteins suggests the relevance of the possible different role and/or structure organization played by protein domains. The significance of the linear correlations between growth temperature and parameters related to the amino acid composition improved when the analysis was collectively carried out on all model proteins.  相似文献   

17.
Summary The relative abundances among the amino acids, which are functionally similar to one another, were explained by random partition of a unit interval.  相似文献   

18.
DNA-binding proteins play an important role in most cellular processes, such as gene regulation, recombination, repair, replication, and DNA modification. In this article, an optimal Chou's pseudo amino acid composition (PseAAC) based on physicochemical characters of amino acid is proposed to represent proteins for identifying DNAbinding proteins. Six physicochemical characters of amino acids are utilized to generate the sequence features via the web server PseAAC. The optimal values of two important parameters (correlation factor δ and weighting factor w) about PseAAC are determined to get the appropriate representation of proteins, which ultimately result in better prediction performance. Experimental results on the benchmark datasets using random forest show that our method is really promising to predict DNA-binding proteins and may at least be a useful supplement tool to existing methods.  相似文献   

19.
The conformational parametersP k for each amino acid species (j=1–20) of sequential peptides in proteins are presented as the product ofP i,k , wherei is the number of the sequential residues in thekth conformational state (k=-helix,-sheet,-turn, or unordered structure). Since the average parameter for ann-residue segment is related to the average probability of finding the segment in the kth state, it becomes a geometric mean of (P k )av=(P i,k ) 1/n with amino acid residuei increasing from 1 ton. We then used ln(Pk)av to convert a multiplicative process to a summation, i.e., ln(P k ) av =(1/n)P i,k (i=1 ton) for ease of operation. However, this is unlike the popular Chou-Fasman algorithm, which has the flaw of using the arithmetic mean for relative probabilities. The Chou-Fasman algorithm happens to be close to our calculations in many cases mainly because the difference between theirP k and our InP k is nearly constant for about one-half of the 20 amino acids. When stronger conformation formers and breakers exist, the difference become larger and the prediction at the N- and C-terminal-helix or-sheet could differ. If the average conformational parameters of the overlapping segments of any two states are too close for a unique solution, our calculations could lead to a different prediction.  相似文献   

20.
Archaea, bacteria and eukaryotes represent the main kingdoms of life. Is there any trend for amino acid compositions of proteins found in full genomes of species of different kingdoms? What is the percentage of totally unstructured proteins in various proteomes? We obtained amino acid frequencies for different taxa using 195 known proteomes and all annotated sequences from the Swiss-Prot data base. Investigation of the two data bases (proteomes and Swiss-Prot) shows that the amino acid compositions of proteins differ substantially for different kingdoms of life, and this difference is larger between different proteomes than between different kingdoms of life. Our data demonstrate that there is a surprisingly small selection for the amino acid composition of proteins for higher organisms (eukaryotes) and their viruses in comparison with the "random" frequency following from a uniform usage of codons of the universal genetic code. On the contrary, lower organisms (bacteria and especially archaea) demonstrate an enhanced selection of amino acids. Moreover, according to our estimates, 12%, 3% and 2% of the proteins in eukaryotic, bacterial and archaean proteomes are totally disordered, and long (> 41 residues) disordered segments are found to occur in 16% of arhaean, 20% of eubacterial and 43% of eukaryotic proteins for 19 archaean, 159 bacterial and 17 eukaryotic proteomes, respectively. A correlation between amino acid compositions of proteins of various taxa, show that the highest correlation is observed between eukaryotes and their viruses (the correlation coefficient is 0.98), and bacteria and their viruses (the correlation coefficient is 0.96), while correlation between eukaryotes and archaea is 0.85 only.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号