首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Energy calculations have been used to study the hydration sites around the polar groups of serine, threonine and tyrosine side chains. These hydration sites depend not only on the hybridization of the polar group but also on the local secondary structure, the chi 1 side chain torsion angle and the position of the hydroxyl hydrogen atom. For tyrosine side chains, two solvent sites are found approximately in the plane of the ring. Even for serine and threonine side chains only two minimum energy sites are found in general of which one is in an expected position within hydrogen bonding of the hydroxyl hydrogen atom (unless this is blocked from interaction with solvent molecules by, for example, Oi-4 or Oi-3. The position of the second of these sites depends not only on the position of the hydroxyl oxygen but also on neighbouring main chain atoms to which it can also hydrogen bond. There is good agreement with the solvent distributions obtained from crystallographic data.  相似文献   

2.
Lee S  Lee BC  Kim D 《Proteins》2006,62(4):1107-1114
Knowing protein structure and inferring its function from the structure are one of the main issues of computational structural biology, and often the first step is studying protein secondary structure. There have been many attempts to predict protein secondary structure contents. Previous attempts assumed that the content of protein secondary structure can be predicted successfully using the information on the amino acid composition of a protein. Recent methods achieved remarkable prediction accuracy by using the expanded composition information. The overall average error of the most successful method is 3.4%. Here, we demonstrate that even if we only use the simple amino acid composition information alone, it is possible to improve the prediction accuracy significantly if the evolutionary information is included. The idea is motivated by the observation that evolutionarily related proteins share the similar structure. After calculating the homolog-averaged amino acid composition of a protein, which can be easily obtained from the multiple sequence alignment by running PSI-BLAST, those 20 numbers are learned by a multiple linear regression, an artificial neural network and a support vector regression. The overall average error of method by a support vector regression is 3.3%. It is remarkable that we obtain the comparable accuracy without utilizing the expanded composition information such as pair-coupled amino acid composition. This work again demonstrates that the amino acid composition is a fundamental characteristic of a protein. It is anticipated that our novel idea can be applied to many areas of protein bioinformatics where the amino acid composition information is utilized, such as subcellular localization prediction, enzyme subclass prediction, domain boundary prediction, signal sequence prediction, and prediction of unfolded segment in a protein sequence, to name a few.  相似文献   

3.
本研究系统分析了酸性、碱性和中性酶在二级结构氨基酸组成上的差异。结果发现在形成特定二级结构过程中,酸性酶和碱性酶有着不同的氨基酸使用偏向;同时,在酸性和碱性酶中,中性氨基酸和侧链微小的氨基酸含量明显较高,这可能是它们适应极端pH的普遍机制。基于此,提出了一种提取蛋白质序列特征值的新方法,其10倍交叉验证的精度可达80.3%。与其他常见特征值提取方法相比,其精度提高了9.4%到18.7%不等;而随机森林算法比其他机器学习算法识别精度也高出2.7%到21.8%不等。  相似文献   

4.
A complex, cascaded neural network designed to predict the secondary structure of globular proteins has been developed. Information about the local buried-unburied pattern and the average tendency of the particular types of amino acids to be buried inside the globule were used. Nonspecific information about long distance contact maps was also employed. These modifications result in a noticeable improvement (3-9%) of prediction accuracy. The best result for the average success ratio for the testing set of nonhomologous proteins was 68.3% (with corresponding Matthews' coefficients, C alpha,beta,coil equal to 0.60, 0.47, 0.43, respectively).  相似文献   

5.
The conformational parametersP k for each amino acid species (j=1–20) of sequential peptides in proteins are presented as the product ofP i,k , wherei is the number of the sequential residues in thekth conformational state (k=-helix,-sheet,-turn, or unordered structure). Since the average parameter for ann-residue segment is related to the average probability of finding the segment in the kth state, it becomes a geometric mean of (P k )av=(P i,k ) 1/n with amino acid residuei increasing from 1 ton. We then used ln(Pk)av to convert a multiplicative process to a summation, i.e., ln(P k ) av =(1/n)P i,k (i=1 ton) for ease of operation. However, this is unlike the popular Chou-Fasman algorithm, which has the flaw of using the arithmetic mean for relative probabilities. The Chou-Fasman algorithm happens to be close to our calculations in many cases mainly because the difference between theirP k and our InP k is nearly constant for about one-half of the 20 amino acids. When stronger conformation formers and breakers exist, the difference become larger and the prediction at the N- and C-terminal-helix or-sheet could differ. If the average conformational parameters of the overlapping segments of any two states are too close for a unique solution, our calculations could lead to a different prediction.  相似文献   

6.
Predicting protein quaternary structure by pseudo amino acid composition   总被引:1,自引:0,他引:1  
Chou KC  Cai YD 《Proteins》2003,53(2):282-289
In the protein universe, many proteins are composed of two or more polypeptide chains, generally referred to as subunits, that associate through noncovalent interactions and, occasionally, disulfide bonds. With the number of protein sequences entering into data banks rapidly increasing, we are confronted with a challenge: how to develop an automated method to identify the quaternary attribute for a new polypeptide chain (i.e., whether it is formed just as a monomer, or as a dimer, trimer, or any other oligomer). This is important, because the functions of proteins are closely related to their quaternary attribute. For example, some critical ligands only bind to dimers but not to monomers; some marvelous allosteric transitions only occur in tetramers but not other oligomers; and some ion channels are formed by tetramers, whereas others are formed by pentamers. To explore this problem, we adopted the pseudo amino acid composition originally proposed for improving the prediction of protein subcellular location (Chou, Proteins, 2001; 43:246-255). The advantage of using the pseudo amino acid composition to represent a protein is that it has paved a way that can take into account a considerable amount of sequence-order effects to significantly improve prediction quality. Results obtained by resubstitution, jack-knife, and independent data set tests, have indicated that the current approach might be quite promising in dealing with such an extremely complicated and difficult problem.  相似文献   

7.
8.
Shestopalov BV 《Tsitologiia》2003,45(7):707-713
In the previous paper (Shestopalov, 2003) we presented the amino acid code of protein secondary structure as a partial solution of the fundamental problem of the protein three-dimensional structure calculation from the amino acid sequence. Here a statistical model of the code is described. The model is based on the structural data from 2258 protein chains (417,112 amino acid residues used). 60 and 61% of the secondary structure, calculated using the model, coincide, respectively, with the observed secondary structure in the training subset and test subset (104 protein chains and 21,166 residues used). This is equal to the threshold value for all the secondary structure calculations, based on the models, where, similarly as here, only the nearest and middle-range interactions are considered. Therefore the constructed model can be applied for the protein structure prediction from the amino acid sequence, especially when additional information is used along with expert analysis, as in the most successful prediction methods. The model can be used for analysis of the secondary structure changes during protein folding by comparison of the calculated and observed secondary structures. The information about the conformationally invariant segments can serve for the simulation of the supersecondary structure formation. One can try to obtain and examine the protein subset, in which the calculated and observed secondary structures are very similar.  相似文献   

9.
基于最近邻居算法,从蛋白质一级序列出发,利用蛋白质序列氨基酸组成、二肤组成以及混合组成方法对蛋白质单聚体、二聚体、三聚体、四聚体、五聚体、六聚体和八聚体进行分类研究。结果表明:采用二肽组成编码方法的预洲效果最好,Jackknife检验和独立测试集检验的总体预测精度分别达到90.83%和95.48%,比相同数据集上基于伪氨基酸组成和组分耦合预测的方法提高了12和15个百分点;特别是对于五聚体蛋白,预测精度分别提高了90和50个百分点;说明二肽组成对于蛋白质四级结构分类研究是一种非常有效的特征提取方法。  相似文献   

10.

Background  

A reliable prediction of the Xaa-Pro peptide bond conformation would be a useful tool for many protein structure calculation methods. We have analyzed the Protein Data Bank and show that the combined use of sequential and structural information has a predictive value for the assessment of the cis versus trans peptide bond conformation of Xaa-Pro within proteins. For the analysis of the data sets different statistical methods such as the calculation of the Chou-Fasman parameters and occurrence matrices were used. Furthermore we analyzed the relationship between the relative solvent accessibility and the relative occurrence of prolines in the cis and in the trans conformation.  相似文献   

11.
Growth rate dependence of global amino acid composition   总被引:1,自引:0,他引:1  
The global amino acid composition of bacteria growing in different media has been studied. The data reveal significant changes in the amino acid composition in the growth rate range between 0.5 and 2.1 doublings per hour at 37 degrees C. The changes are consistent with a progressive simplification of the protein population and mRNA pools as the growth rates increase.  相似文献   

12.
To advance our understanding of protein tertiary structure, the development of the knob‐socket model is completed in an analysis of the packing in irregular coil and turn secondary structure packing as well as between mixed secondary structure. The knob‐socket model simplifies packing based on repeated patterns of two motifs: a three‐residue socket for packing within secondary (2°) structure and a four‐residue knob‐socket for tertiary (3°) packing. For coil and turn secondary structure, knob‐sockets allow identification of a correlation between amino acid composition and tertiary arrangements in space. Coil contributes almost as much as α‐helices to tertiary packing. In irregular sockets, Gly, Pro, Asp, and Ser are favored, while in irregular knobs, the preference order is Arg, Asp, Pro, Asn, Thr, Leu, and Gly. Cys, His,Met, and Trp are not favored in either. In mixed packing, the knob amino acid preferences are a function of the socket that they are packing into, whereas the amino acid composition of the sockets does not depend on the secondary structure of the knob. A unique motif of a coil knob with an XYZ β‐sheet socket may potentially function to inhibit β‐sheet extension. In addition, analysis of the preferred crossing angles for strands within a β‐sheet and mixed α‐helice/β‐sheet identifies canonical packing patterns useful in protein design. Lastly, the knob‐socket model abstracts the complexity of protein tertiary structure into an intuitive packing surface topology map. Proteins 2015; 83:2147–2161. © 2015 Wiley Periodicals, Inc.  相似文献   

13.
The elucidation of the domain content of a given protein sequence in the absence of determined structure or significant sequence homology to known domains is an important problem in structural biology. Here we address how successfully the delineation of continuous domains can be accomplished in the absence of sequence homology using simple baseline methods, an existing prediction algorithm (Domain Guess by Size), and a newly developed method (DomSSEA). The study was undertaken with a view to measuring the usefulness of these prediction methods in terms of their application to fully automatic domain assignment. Thus, the sensitivity of each domain assignment method was measured by calculating the number of correctly assigned top scoring predictions. We have implemented a new continuous domain identification method using the alignment of predicted secondary structures of target sequences against observed secondary structures of chains with known domain boundaries as assigned by Class Architecture Topology Homology (CATH). Taking top predictions only, the success rate of the method in correctly assigning domain number to the representative chain set is 73.3%. The top prediction for domain number and location of domain boundaries was correct for 24% of the multidomain set (+/-20 residues). These results have been put into context in relation to the results obtained from the other prediction methods assessed.  相似文献   

14.
Oxidative stress alters cell viability, from microorganism irradiation sensitivity to human aging and neurodegeneration. Deleterious effects of protein carbonylation by reactive oxygen species (ROS) make understanding molecular properties determining ROS susceptibility essential. The radiation‐resistant bacterium Deinococcus radiodurans accumulates less carbonylation than sensitive organisms, making it a key model for deciphering properties governing oxidative stress resistance. We integrated shotgun redox proteomics, structural systems biology, and machine learning to resolve properties determining protein damage by γ‐irradiation in Escherichia coli and D. radiodurans at multiple scales. Local accessibility, charge, and lysine enrichment accurately predict ROS susceptibility. Lysine, methionine, and cysteine usage also contribute to ROS resistance of the D. radiodurans proteome. Our model predicts proteome maintenance machinery, and proteins protecting against ROS are more resistant in D. radiodurans. Our findings substantiate that protein‐intrinsic protection impacts oxidative stress resistance, identifying causal molecular properties.  相似文献   

15.
Shi JY  Zhang SW  Pan Q  Zhou GP 《Amino acids》2008,35(2):321-327
In the Post Genome Age, there is an urgent need to develop the reliable and effective computational methods to predict the subcellular localization for the explosion of newly found proteins. Here, a novel method of pseudo amino acid (PseAA) composition, the so-called “amino acid composition distribution” (AACD), is introduced. First, a protein sequence is divided equally into multiple segments. Then, amino acid composition of each segment is calculated in series. After that, each protein sequence can be represented by a feature vector. Finally, the feature vectors of all sequences thus obtained are further input into the multi-class support vector machines to predict the subcellular localization. The results show that AACD is quite effective in representing protein sequences for the purpose of predicting protein subcellular localization.  相似文献   

16.
D E Goldsack 《Biopolymers》1969,7(3):299-313
A statistical analysis of the relation between the amino acid composition of proteins and the amount of helical secondary structure as characterized by the Moffitt b0 parameter has shown a high degree of correlation of the b0 parameter with those amino acids whose homopolymers can form helical structures. Using the data for 107 proteins, a linear relation was found between b0 and the sum of the residue percentages of alanine, arginine, aspartic acid, cysteine, glutamic acid, leucine, andlysine. A statistical analysis of the Moffitt a0 parameter, on the other hand, showed no statistically significant grouping of amino acids in relation to the amount of secondary β structure in a protein. A plot of b0 versus a0–a, where a represents the a0 parameter for a fully denatured protein, for 55 proteins showed distinct nonlinearity. This nonlinearity was postulated to be due to presence of β structure, and a nomagram was constructed which allowed a semiquantitative estimate of the amount of helical and β-type secondary structures from the b0 versus a0–a plot.  相似文献   

17.
H.H. Yeoh  M.Y. Chew 《Phytochemistry》1976,15(11):1597-1599
On the basis of leaf dry wt, the protein content of six varieties of cassava varied from 29.3 to 38.6% and the estimated leaf protein production ranged from 242 to 953 kg per ha. On the basis of fr. wt of leaf, the total amino acids ranged from 8.42 to 9.4% while the essential amino acids averaged 4.21% and the sulphur-containing amino acids only 0.25%. The amino acid composition profiles for the six varieties was similar.  相似文献   

18.
《Phytochemistry》1986,25(3):641-644
Three wild species of lentil, Lens orientalis, L. ervoides and L. nigricans were investigated for protein subunits of the albumin protein fraction (APF), globulin protein fraction (GPF) and for protein and free amino acid composition. The APF and GPF formed 12.7–16.8 % and 34.7–49.0 %, respectively, of the meal nitrogen. SDS-PAGE showed APF to contain 15 to 20 major and a similar number of minor protein subunits ranging in Mr at least from 14 400 to 94 000. The GPF was also heterogenous and contained some subunits having Mr similar to APF subunits but none < 15 000. The three wild lentil species were distinguishable by their protein subunit composition. The protein amino acid composition of the wild species was identical and similar to that of the cultivated lentil. The wild species, like the cultivated species (L. culinaris), contained major amounts of free arginine, glutamic and aspartic acids, serine and a number of unidentified amino acids. L. orientalis, L. nigricans and the cultivated lentil contained two acidic and two basic unidentified amino acids. However, L. ervoides was distinctly different in that it contained only the two acidic plus one neutral unidentified amino acid, but none of the two basic unidentified amino acids.  相似文献   

19.
20.
Sarcoplasmic protein diffusion was studied under different conditions, using microinjection in combination with microspectrophotometry. Six globular proteins with molecular masses between 12 and 3700 kDa, with diameters from 3 to 30 nm, were used for the experiments. Proteins were injected into single, intact skeletal muscle fibers taken from either soleus or extensor digitorum longus (edl) muscle of adult rats. No correlation was found between sarcomere spacing and the sarcoplasmic diffusion coefficient (D) for all proteins studied. D of the smaller proteins cytochrome c (diameter 3.1 nm), myoglobin (diameter 3.5 nm), and hemoglobin (diameter 5.5 nm) amounted to only approximately 1/10 of their value in water and was not increased by auxotonic fiber contractions. D for cytochrome c and myoglobin was significantly higher in fibers from edl (mainly type II fibers) compared to fibers from soleus (mainly type I fibers). Measurements of D for myoglobin at 37 degrees C in addition to 22 degrees C led to a Q(10) of 1.46 for this temperature range. For the larger proteins catalase (diameter 10.5 nm) and ferritin (diameter 12.2 nm), a decrease in D to approximately 1/20 and approximately 1/50 of that in water was observed, whereas no diffusive flux at all of earthworm hemoglobin (diameter 30 nm) along the fiber axis could be detected. We conclude that 1) sarcoplasmic protein diffusion is strongly impaired by the presence of the myofilamental lattice, which also gives rise to differences in diffusivity between different fiber types; 2) contractions do not cause significant convection in sarcoplasm and do not lead to increased diffusional transport; and 3) in addition to the steric hindrance that slows down the diffusion of smaller proteins, diffusion of large proteins is further hindered when their dimensions approach the interfilament distances. This molecular sieve property progressively reduces intracellular diffusion of proteins when the molecular diameter increases to more than approximately 10 nm.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号