首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Dihedral angles of amino acids are of considerable importance in protein tertiary structure prediction as they define the backbone of a protein and hence almost define the protein's entire conformation. Most ab initio protein structure prediction methods predict the secondary structure of a protein before predicting the tertiary structure because three-dimensional fold consists of repeating units of secondary structures. Hence, both dihedral angles and secondary structures are important in tertiary structure prediction of proteins. Here we describe a database called DASSD (Dihedral Angle and Secondary Structure Database of Short Amino acid Fragments) that contains dihedral angle values and secondary structure details of short amino acid fragments of lengths 1, 3 and 5. Information stored in this database was extracted from a set of 5,227 non-redundant high resolution (less than 2-angstroms) protein structures. In total, DASSD stores details for about 733,000 fragments. This database finds application in the development of ab initio protein structure prediction methods using fragment libraries and fragment assembly techniques. It is also useful in protein secondary structure prediction.

Availability  相似文献   


2.
SOX蛋白具有一个与DNA特异结合的高保守HMG-box结构域。为研究东北虎SOX蛋白三级结构的分子机理,利用MATLAB的Bioinformatics工具从GenBank中下载东北虎SOX蛋白序列信息,以三级结构已知的SOX2为模板,联合SwissPdbViewer与MATLAB,采用同源建模方法对SOX蛋白HMG-box进行建模、预测;利用MATLAB的Visualization Tool分析预测结果的三维结构。结果显示PtSox蛋白的HMG-box由3个α-螺旋和2个loop区构成;热稳定性分析表明PtSox蛋白loop区的热力学结构不稳定;表面静电分布显示出PtSox蛋白C-端的中间有一个可能与其它小分子或蛋白质的相互作用位点的N/C腔,上述空间结构可能与其活性与功能的调控有关。  相似文献   

3.
That the physicochemical properties of amino acids constrain the structure, function and evolution of proteins is not in doubt. However, principles derived from information theory may also set bounds on the structure (and thus also the evolution) of proteins. Here we analyze the global properties of the full set of proteins in release 13-11 of the SwissProt database, showing by experimental test of predictions from information theory that their collective structure exhibits properties that are consistent with their being guided by a conservation principle. This principle (Conservation of Information) defines the global properties of systems composed of discrete components each of which is in turn assembled from discrete smaller pieces. In the system of proteins, each protein is a component, and each protein is assembled from amino acids. Central to this principle is the inter-relationship of the unique amino acid count and total length of a protein and its implications for both average protein length and occurrence of proteins with specific unique amino acid counts. The unique amino acid count is simply the number of distinct amino acids (including those that are post-translationally modified) that occur in a protein, and is independent of the number of times that the particular amino acid occurs in the sequence. Conservation of Information does not operate at the local level (it is independent of the physicochemical properties of the amino acids) where the influences of natural selection are manifest in the variety of protein structure and function that is well understood. Rather, this analysis implies that Conservation of Information would define the global bounds within which the whole system of proteins is constrained; thus it appears to be acting to constrain evolution at a level different from natural selection, a conclusion that appears counter-intuitive but is supported by the studies described herein.  相似文献   

4.
5.
In this study, n-peptide compositions are utilized for protein vectorization over a discriminative remote homology detection framework based on support vector machines (SVMs). The size of amino acid alphabet is gradually reduced for increasing values of n to make the method to conform with the memory resources in conventional workstations. A hash structure is implemented for accelerated search of n-peptides. The method is tested to see its ability to classify proteins into families on a subset of SCOP family database and compared against many of the existing homology detection methods including the most popular generative methods; SAM-98 and PSI-BLAST and the recent SVM methods; SVM-Fisher, SVM-BLAST and SVM-Pairwise. The results have demonstrated that the new method significantly outperforms SVM-Fisher, SVM-BLAST, SAM-98 and PSI-BLAST, while achieving a comparable accuracy with SVM-Pairwise. In terms of efficiency, it performs much better than SVM-Pairwise. It is shown that the information of n-peptide compositions with reduced amino acid alphabets provides an accurate and efficient means of protein vectorization for SVM-based sequence classification.  相似文献   

6.
A three-dimensional Voronoi tessellation of folded proteins is used to analyze geometrical and topological properties of a set of proteins. To each amino acid is associated a central point surrounded by a Voronoi cell. Voronoi cells describe the packing of the amino acids. Special attention is given to reproduction of the protein surface. Once the Voronoi cells are built, a lot of tools from geometrical analysis can be applied to investigate the protein structure; volume of cells, number of faces per cell, and number of sides per face are the usual signatures of the protein structure. A distinct difference between faces related to primary, secondary, and tertiary structures has been observed. Faces threaded by the main-chain have on average more than six edges, whereas those related to helical packing of the amino acid chain have less than five edges. The faces on the protein surface have on average five edges within 1% error. The average number of faces on the protein surface for a given type of amino acid brings a new point of view in the characterization of the exposition to the solvent and the classification of amino acid as hydrophilic or hydrophobic. It may be a convenient tool for model validation.  相似文献   

7.
Although not the sole feature responsible, the packing of amino acid side chains in the interior of proteins is known to contribute to protein conformational specificity. While a number of amphipathic peptide sequences with optimized hydrophobic domains has been designed to fold into a desired aggregation state, the contribution of the amino acids located on the hydrophilic side of such peptides to the final packing has not been investigated thoroughly. A set of self-aggregating 18-mer peptides designed previously to adopt a high level of alpha-helical conformation in benign buffer is used here to evaluate the effect of the nature of the amino acids located on the hydrophilic face on the packing of a four alpha-helical bundle. These peptides differ from one another by only one to four amino acid mutations on the hydrophilic face of the helix and share the same hydrophobic core. The secondary and tertiary structures in the presence or absence of denaturants were determined by circular dichroism in the far- and near-UV regions, fluorescence and nuclear magnetic resonance spectroscopy. Significant differences in folding ability, as well as chemical and thermal stabilities, were found between the peptides studied. In particular, surface salt bridges may form which would increase both the stability and extent of the tertiary structure of the peptides. The structural behavior of the peptides may be related to their ability to catalyze the decarboxylation of oxaloacetate, with peptides that have a well-defined tertiary structure acting as true catalysts.  相似文献   

8.
Ohtsuki T  Manabe T  Sisido M 《FEBS letters》2005,579(30):6769-6774
The ability to introduce non-natural amino acids into proteins opens up new vistas for the study of protein structure and function. This approach requires suppressor tRNAs that deliver the non-natural amino acid to a ribosome associated with an mRNA containing an expanded codon. The suppressor tRNAs must be absolutely protected from aminoacylation by any of the aminoacyl-tRNA synthetases in the protein synthesizing system, or a natural amino acid will be incorporated instead of the non-natural amino acid. Here, we found that some tRNAs with non-standard structures could work as efficient four-base suppressors fulfilling the above orthogonal conditions. Using these tRNAs, we successfully demonstrated incorporation of three different non-natural amino acids into a single protein.  相似文献   

9.
Combining protein evolution and secondary structure   总被引:19,自引:9,他引:10  
An evolutionary model that combines protein secondary structure and amino acid replacement is introduced. It allows likelihood analysis of aligned protein sequences and does not require the underlying secondary (or tertiary) structures of these sequences to be known. One component of the model describes the organization of secondary structure along a protein sequence and another specifies the evolutionary process for each category of secondary structure. A database of proteins with known secondary structures is used to estimate model parameters representing these two components. Phylogeny, the third component of the model, can be estimated from the data set of interest. As an example, we employ our model to analyze a set of sucrose synthase sequences. For the evolution of sucrose synthase, a parametric bootstrap approach indicates that our model is statistically preferable to one that ignores secondary structure.   相似文献   

10.
Mishra P  Pandey PN 《Bioinformation》2011,6(10):372-374
The number of amino acid sequences is increasing very rapidly in the protein databases like Swiss-Prot, Uniprot, PIR and others, but the structure of only some amino acid sequences are found in the Protein Data Bank. Thus, an important problem in genomics is automatically clustering homologous protein sequences when only sequence information is available. Here, we use graph theoretic techniques for clustering amino acid sequences. A similarity graph is defined and clusters in that graph correspond to connected subgraphs. Cluster analysis seeks grouping of amino acid sequences into subsets based on distance or similarity score between pairs of sequences. Our goal is to find disjoint subsets, called clusters, such that two criteria are satisfied: homogeneity: sequences in the same cluster are highly similar to each other; and separation: sequences in different clusters have low similarity to each other. We tested our method on several subsets of SCOP (Structural Classification of proteins) database, a gold standard for protein structure classification. The results show that for a given set of proteins the number of clusters we obtained is close to the superfamilies in that set; there are fewer singeltons; and the method correctly groups most remote homologs.  相似文献   

11.
A total of 220 cell envelope-associated proteins were successfully extracted and separated from Trichoderma reesei mycelia actively synthesizing and secreting proteins and from mycelia in which the secretion of proteins are low. Altogether 56 spots were examined by nanoelectrospray tandem mass spectrometry and amino acid sequence was obtained for 32 spots. From these, 20 spots were identified by Advanced BLAST searches against all databases available to BLAST. The most abundant protein in both types of mycelia was HEX1, the major protein in Woronin body, a structure unique to filamentous fungi. Other proteins identified were vacuolar protease A, enolase, glyceraldehyde-3-phosphate dehydrogenase, transaldolase, protein disulfide isomerase, mitochondrial outer membrane porin, diphosphate kinase and translation elongation factor beta. Partial short amino acid sequence obtained from some proteins did not allow them to be assigned to a specific protein in the database by BLAST search. In some cases, the tandem mass spectrometry spectra were too complicated to be able to assign an amino acid sequence with certainty. The number of spots (12) giving a clear signal but finding no match in the databases suggests that a majority of proteins associated with a filamentous fungal cell wall, are novel. Some technical problems related to protein isolation are also discussed.  相似文献   

12.
We demonstrate the applicability of our previously developed Bayesian probabilistic approach for predicting residue solvent accessibility to the problem of predicting secondary structure. Using only single-sequence data, this method achieves a three-state accuracy of 67% over a database of 473 non-homologous proteins. This approach is more amenable to inspection and less likely to overlearn specifics of a dataset than "black box" methods such as neural networks. It is also conceptually simpler and less computationally costly. We also introduce a novel method for representing and incorporating multiple-sequence alignment information within the prediction algorithm, achieving 72% accuracy over a dataset of 304 non-homologous proteins. This is accomplished by creating a statistical model of the evolutionarily derived correlations between patterns of amino acid substitution and local protein structure. This model consists of parameter vectors, termed "substitution schemata," which probabilistically encode the structure-based heterogeneity in the distributions of amino acid substitutions found in alignments of homologous proteins. The model is optimized for structure prediction by maximizing the mutual information between the set of schemata and the database of secondary structures. Unlike "expert heuristic" methods, this approach has been demonstrated to work well over large datasets. Unlike the opaque neural network algorithms, this approach is physicochemically intelligible. Moreover, the model optimization procedure, the formalism for predicting one-dimensional structural features and our previously developed method for tertiary structure recognition all share a common Bayesian probabilistic basis. This consistency starkly contrasts with the hybrid and ad hoc nature of methods that have dominated this field in recent years.  相似文献   

13.
The algorithm PLATON is able to assign sets of chemical shifts derived from a single residue to amino acid types with its secondary structure (amino acid species). A subsequent ranking procedure using optionally two different penalty functions yields predictions for possible amino acid species for the given set of chemical shifts. This was demonstrated in the case of the -spectrin SH3 domain and applied to 9 further protein data sets taken from the BioMagRes database. A database consisting of reference chemical shift patterns (reference CSPs) was generated from assigned chemical shifts of proteins with known 3D-structure. This reference CSP database is used in our approach for extracting distributions of amino acid types with their most likely secondary structure elements (namely -helix, -sheet, and coil) for single amino acids by comparison with query CSPs. Results obtained for the 10 investigated proteins indicates that the percentage of correct amino acid species in the first three positions in the ranking list, ranges from 71.4% to 93.2% for the more favorable penalty function. Where only the top result of the ranking list for these 10 proteins is considered, 36.5% to 83.1% of the amino acid species are correctly predicted. The main advantage of our approach, over other methods that rely on average chemical shift values is the ability to increase database content by incorporating newly derived CSPs, and therefore to improve PLATON's performance over time.  相似文献   

14.
蛋白质三维结构叠加面临的主要问题是,参与叠加的目标蛋白质的氨基酸残基存在某些缺失,但是多结构叠加方法却大多数需要完整的氨基酸序列,而目前通用的方法是直接删去缺失的氨基酸序列,导致叠加结果不准确。由于同源蛋白质间结构的相似性,因此,一个蛋白质结构中缺失的某个区域,可能存在于另一个同源蛋白质结构中。基于此,本文提出一种新的、简单、有效的缺失数据下的蛋白质结构叠加方法(ITEMDM)。该方法采用缺失数据的迭代思想计算蛋白质的结构叠加,采用优化的最小二乘算法结合矩阵SVD分解方法,求旋转矩阵和平移向量。用该方法成功叠加了细胞色素C家族的蛋白质和标准Fischer’s 数据库的蛋白质(67对蛋白质),并且与其他方法进行了比较。数值实验表明,本算法有如下优点:①与THESEUS算法相比较,运行时间快,迭代次数少;②与PSSM算法相比较,结果准确,运算时间少。结果表明,该方法可以更好地叠加缺失数据的蛋白质三维结构。  相似文献   

15.
Approaching a complete classification of protein secondary structure   总被引:2,自引:0,他引:2  
A complete classification of types of the protein secondary structure is developed on the basis of computer analysis of the crystallographic structural data deposited in the protein Data Bank. The majority of amino acid residues fall into five conformation types. A conclusion is drawn that the number of sequence variants of torsion angles phi, psi in globular proteins is limited and is essentially less than the number of possible amino acid sequences for this chain length. Along with alpha-helix and beta-structure, the distribution analysis assigning every maximum of distribution of amino acid conformations on Ramachandran map to a certain type of the secondary structure exposed a third type of the secondary structure that was previously neglected. This type of the structure is extended left-handed helical conformation, designated as mobile (M-) conformation. A full set of M-conformation fragments that seems to play a major role in protein globule dynamics has been obtained, a small radius of correlation for the polypeptide chain in M-conformation is demonstrated. It explains a prevalence of short segments of mobile conformation revealed in globular proteins. For secondary structure types, the frequency of occurrence of amino acid residues has been computed.  相似文献   

16.
The computational protein design protocol Rosetta has been applied successfully to a wide variety of protein engineering problems. Here the aim was to test its ability to design de novo a protein adopting the TIM-barrel fold, whose formation requires about twice as many residues as in the largest proteins successfully designed de novo to date. The designed protein, Octarellin VI, contains 216 residues. Its amino acid composition is similar to that of natural TIM-barrel proteins. When produced and purified, it showed a far-UV circular dichroism spectrum characteristic of folded proteins, with α-helical and β-sheet secondary structure. Its stable tertiary structure was confirmed by both tryptophan fluorescence and circular dichroism in the near UV. It proved heat stable up to 70°C. Dynamic light scattering experiments revealed a unique population of particles averaging 4 nm in diameter, in good agreement with our model. Although these data suggest the successful creation of an artificial α/β protein of more than 200 amino acids, Octarellin VI shows an apparent noncooperative chemical unfolding and low solubility.  相似文献   

17.
Due to advances in molecular biology the DNA sequences of structural genes coding for proteins are often known before a protein is characterized or even isolated. The function of a protein whose amino acid sequence has been deduced from a DNA sequence may not even be known. This has created greater interest in the development of methods to predict the tertiary structures of proteins. The a priori prediction of a protein's structure from its amino acid sequence is not yet possible. However, since proteins with similar amino acid sequences are observed to have similar three-dimensional structures, it is possible to use an analogy with a protein of known structure to draw some conclusions about the structure and properties of an uncharacterized protein. The process of predicting the tertiary structure of a protein relies very much upon computer modeling and analysis of the structure. The prediction of the structure of the bacteriophage 434 cro repressor is used as an example illustrating current procedures.  相似文献   

18.
The ability to predict protein function from structure is becoming increasingly important as the number of structures resolved is growing more rapidly than our capacity to study function. Current methods for predicting protein function are mostly reliant on identifying a similar protein of known function. For proteins that are highly dissimilar or are only similar to proteins also lacking functional annotations, these methods fail. Here, we show that protein function can be predicted as enzymatic or not without resorting to alignments. We describe 1178 high-resolution proteins in a structurally non-redundant subset of the Protein Data Bank using simple features such as secondary-structure content, amino acid propensities, surface properties and ligands. The subset is split into two functional groupings, enzymes and non-enzymes. We use the support vector machine-learning algorithm to develop models that are capable of assigning the protein class. Validation of the method shows that the function can be predicted to an accuracy of 77% using 52 features to describe each protein. An adaptive search of possible subsets of features produces a simplified model based on 36 features that predicts at an accuracy of 80%. We compare the method to sequence-based methods that also avoid calculating alignments and predict a recently released set of unrelated proteins. The most useful features for distinguishing enzymes from non-enzymes are secondary-structure content, amino acid frequencies, number of disulphide bonds and size of the largest cleft. This method is applicable to any structure as it does not require the identification of sequence or structural similarity to a protein of known function.  相似文献   

19.
A number of investigators have addressed the issue of why certain protein structures are especially common by considering structure designability, defined as the number of sequences that would successfully fold into any particular native structure. One such approach, based on foldability, suggested that structures could be classified according to their maximum possible foldability and that this optimal foldability would be highly correlated with structure designability. Other approaches have focused on computing the designability of lattice proteins written with reduced two-letter amino acid alphabets. These different approaches suggested contrasting characteristics of the most designable structures. This report compares the designability of lattice proteins over a wide range of amino acid alphabets and foldability requirements. While all alphabets have a wide distribution of protein designabilities, the form of the distribution depends on how protein "viability" is defined. Furthermore, under increasing foldability requirements, the change in designabilities for all alphabets are in good agreement with the previous conclusions of the foldability approach. Most importantly, it was noticed that those structures that were highly designable for the two-letter amino acid alphabets are not especially designable with higher-letter alphabets.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号