首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
    
Protein chemical shifts encode detailed structural information that is difficult and computationally costly to describe at a fundamental level. Statistical and machine learning approaches have been used to infer correlations between chemical shifts and secondary structure from experimental chemical shifts. These methods range from simple statistics such as the chemical shift index to complex methods using neural networks. Notwithstanding their higher accuracy, more complex approaches tend to obscure the relationship between secondary structure and chemical shift and often involve many parameters that need to be trained. We present hidden Markov models (HMMs) with Gaussian emission probabilities to model the dependence between protein chemical shifts and secondary structure. The continuous emission probabilities are modeled as conditional probabilities for a given amino acid and secondary structure type. Using these distributions as outputs of first‐ and second‐order HMMs, we achieve a prediction accuracy of 82.3%, which is competitive with existing methods for predicting secondary structure from protein chemical shifts. Incorporation of sequence‐based secondary structure prediction into our HMM improves the prediction accuracy to 84.0%. Our findings suggest that an HMM with correlated Gaussian distributions conditioned on the secondary structure provides an adequate generative model of chemical shifts. Proteins 2013; © 2012 Wiley Periodicals, Inc.  相似文献   

2.
    
The detection of Outer Membrane Proteins (OMP) in whole genomes is an actual question, their sequence characteristics have thus been intensively studied. This class of protein displays a common beta-barrel architecture, formed by adjacent antiparallel strands. However, due to the lack of available structures, few structural studies have been made on this class of proteins. Here we propose a novel OMP local structure investigation, based on a structural alphabet approach, i.e., the decomposition of 3D structures using a library of four-residue protein fragments. The optimal decomposition of structures using hidden Markov model results in a specific structural alphabet of 20 fragments, six of them dedicated to the decomposition of beta-strands. This optimal alphabet, called SA20-OMP, is analyzed in details, in terms of local structures and transitions between fragments. It highlights a particular and strong organization of beta-strands as series of regular canonical structural fragments. The comparison with alphabets learned on globular structures indicates that the internal organization of OMP structures is more constrained than in globular structures. The analysis of OMP structures using SA20-OMP reveals some recurrent structural patterns. The preferred location of fragments in the distinct regions of the membrane is investigated. The study of pairwise specificity of fragments reveals that some contacts between structural fragments in beta-sheets are clearly favored whereas others are avoided. This contact specificity is stronger in OMP than in globular structures. Moreover, SA20-OMP also captured sequential information. This can be integrated in a scoring function for structural model ranking with very promising results.  相似文献   

3.
  总被引:1,自引:0,他引:1  
We designed a simple position-specific hidden Markov model to predict protein structure. Our new framework naturally repeats itself to converge to a final target, conglomerating fragment assembly, clustering, target selection, refinement, and consensus, all in one process. Our initial implementation of this theory converges to within 6 A of the native structures for 100% of decoys on all six standard benchmark proteins used in ROSETTA (discussed by Simons and colleagues in a recent paper), which achieved only 14%-94% for the same data. The qualities of the best decoys and the final decoys our theory converges to are also notably better.  相似文献   

4.
    
We consider hidden Markov models as a versatile class of models for weakly dependent random phenomena. The topic of the present paper is likelihood-ratio testing for hidden Markov models, and we show that, under appropriate conditions, the standard asymptotic theory of likelihood-ratio tests is valid. Such tests are crucial in the specification of multivariate Gaussian hidden Markov models, which we use to illustrate the applicability of our general results. Finally, the methodology is illustrated by means of a real data set.  相似文献   

5.
Prediction of the tertiary structure of a 34 residue N-terminal fragment of parathyroid-hormone-related protein was carried out by the island model. This peptide is known as a major causative agent of humoral hypercalcemia of malignancy, but structural information studied by X-ray diffraction has not been reported. We adopted the secondary structure determined by NMR and packed on the basis of island model of protein folding developed by us. Predicted structure is discussed in connection with the interaction of active sites.  相似文献   

6.
    
A novel method for predicting the secondary structures of proteins from amino acid sequence has been presented. The protein secondary structure seqlets that are analogous to the words in natural language have been extracted. These seqlets will capture the relationship between amino acid sequence and the secondary structures of proteins and further form the protein secondary structure dictionary. To be elaborate, the dictionary is organism-specific. Protein secondary structure prediction is formulated as an integrated word segmentation and part of speech tagging problem. The word-lattice is used to represent the results of the word segmentation and the maximum entropy model is used to calculate the probability of a seqlet tagged as a certain secondary structure type. The method is markovian in the seqlets, permitting efficient exact calculation of the posterior probability distribution over all possible word segmentations and their tags by viterbi algorithm. The optimal segmentations and their tags are computed as the results of protein secondary structure prediction. The method is applied to predict the secondary structures of proteins of four organisms respectively and compared with the PHD method. The results show that the performance of this method is higher than that of PHD by about 3.9% Q3 accuracy and 4.6% SOV accuracy. Combining with the local similarity protein sequences that are obtained by BLAST can give better prediction. The method is also tested on the 50 CASP5 target proteins with Q3 accuracy 78.9% and SOV accuracy 77.1%. A web server for protein secondary structure prediction has been constructed which is available at http://www.insun.hit.edu.cn:81/demos/biology/index.html.  相似文献   

7.
    
Structural bioinformatics of membrane proteins is still in its infancy, and the picture of their fold space is only beginning to emerge. Because only a handful of three-dimensional structures are available, sequence comparison and structure prediction remain the main tools for investigating sequence-structure relationships in membrane protein families. Here we present a comprehensive analysis of the structural families corresponding to α-helical membrane proteins with at least three transmembrane helices. The new version of our CAMPS database (CAMPS 2.0) covers nearly 1300 eukaryotic, prokaryotic, and viral genomes. Using an advanced classification procedure, which is based on high-order hidden Markov models and considers both sequence similarity as well as the number of transmembrane helices and loop lengths, we identified 1353 structurally homogeneous clusters roughly corresponding to membrane protein folds. Only 53 clusters are associated with experimentally determined three-dimensional structures, and for these clusters CAMPS is in reasonable agreement with structure-based classification approaches such as SCOP and CATH. We therefore estimate that ~1300 structures would need to be determined to provide a sufficient structural coverage of polytopic membrane proteins. CAMPS 2.0 is available at http://webclu.bio.wzw.tum.de/CAMPS2.0/.  相似文献   

8.
  总被引:1,自引:1,他引:1  
  相似文献   

9.
    
Accurate free energy estimation is essential for RNA structure prediction. The widely used Turner''s energy model works well for nested structures. For pseudoknotted RNAs, however, there is no effective rule for estimation of loop entropy and free energy. In this work we present a new free energy estimation method, termed the pseudoknot predictor in three-dimensional space (pk3D), which goes beyond Turner''s model. Our approach treats nested and pseudoknotted structures alike in one unifying physical framework, regardless of how complex the RNA structures are. We first test the ability of pk3D in selecting native structures from a large number of decoys for a set of 43 pseudoknotted RNA molecules, with lengths ranging from 23 to 113. We find that pk3D performs slightly better than the Dirks and Pierce extension of Turner''s rule. We then test pk3D for blind secondary structure prediction, and find that pk3D gives the best sensitivity and comparable positive predictive value (related to specificity) in predicting pseudoknotted RNA secondary structures, when compared with other methods. A unique strength of pk3D is that it also generates spatial arrangement of structural elements of the RNA molecule. Comparison of three-dimensional structures predicted by pk3D with the native structure measured by nuclear magnetic resonance or X-ray experiments shows that the predicted spatial arrangement of stems and loops is often similar to that found in the native structure. These close-to-native structures can be used as starting points for further refinement to derive accurate three-dimensional structures of RNA molecules, including those with pseudoknots.  相似文献   

10.
Prediction of the three-dimensional structure of human growth hormone   总被引:2,自引:0,他引:2  
F E Cohen  I D Kuntz 《Proteins》1987,2(2):162-166
In recent years, the protein-folding problem has attracted the attention of molecular biologists. Efforts have focused on developing heuristic and energy-based algorithms to predict the three-dimensional structure of a protein from its amino acid sequence. We have applied a series of heuristic algorithms to the sequence of human growth hormone. A family of five structures which are generically right-handed fourfold alpha-helical bundles are found from an investigation of approximately 10(8) structures. A plausible receptor binding site is suggested. Independent crystallographic analysis confirms some aspects of these predictions. These methods only deal with the "core" structure, and conformations of many residues are not defined. Further work is required to identify a unique set of coordinates and to clarify the topological alternative available to alpha-helical proteins.  相似文献   

11.
Databases of multiple sequence alignments are a valuable aid to protein sequence classification and analysis. One of the main challenges when constructing such a database is to simultaneously satisfy the conflicting demands of completeness on the one hand and quality of alignment and domain definitions on the other. The latter properties are best dealt with by manual approaches, whereas completeness in practice is only amenable to automatic methods. Herein we present a database based on hidden Markov model profiles (HMMs), which combines high quality and completeness. Our database, Pfam, consists of parts A and B. Pfam-A is curated and contains well-characterized protein domain families with high quality alignments, which are maintained by using manually checked seed alignments and HMMs to find and align all members. Pfam-B contains sequence families that were generated automatically by applying the Domainer algorithm to cluster and align the remaining protein sequences after removal of Pfam-A domains. By using Pfam, a large number of previously unannotated proteins from the Caenorhabditis elegans genome project were classified. We have also identified many novel family memberships in known proteins, including new kazal, Fibronectin type III, and response regulator receiver domains. Pfam-A families have permanent accession numbers and form a library of HMMs available for searching and automatic annotation of new protein sequences. Proteins: 28:405–420, 1997. © 1997 Wiley-Liss, Inc.  相似文献   

12.
Rebuilding flavodoxin from C alpha coordinates: a test study   总被引:4,自引:0,他引:4  
L S Reid  J M Thornton 《Proteins》1989,5(2):170-182
The tertiary structure of flavodoxin has been model built from only the X-ray crystallographic alpha-carbon coordinates. Main-chain atoms were generated from a dictionary of backbone structures. Side-chain conformations were initially set according to observed statistical distributions, clashes were resolved with reference to other knowledge-based parameters, and finally, energy minimization was applied. The RMSD of the model was 1.7 A across all atoms to the native structure. Regular secondary structural elements were modeled more accurately than other regions. About 40% of the chi 1 torsional angles were modeled correctly. Packing of side chains in the core was energetically stable but diverged significantly from the native structure in some regions. The modeling of protein structures is increasing in popularity but relatively few checks have been applied to determine the accuracy of the approach. In this work a variety of parameters have been examined. It was found that close contacts, and hydrogen-bonding patterns could identify poorly packed residues. These tests, however, did not indicate which residues had a conformation different from the native structure or how to move such residues to bring them into agreement. To assist in the modeling of interacting side chains a database of known interactions has been prepared.  相似文献   

13.
    
Methylated non-CpGs (mCpHs) in mammalian cells yield weak enrichment signals and colocalize with methylated CpGs (mCpGs), thus have been considered byproducts of hyperactive methyltransferases. However, mCpHs are cell type-specific and associated with epigenetic regulation, although their dependency on mCpGs remains to be elucidated. In this study, we demonstrated that mCpHs colocalize with mCpGs in pluripotent stem cells, but not in brain cells. In addition, profiling genome-wide methylation patterns using a hidden Markov model revealed abundant genomic regions in which CpGs and CpHs are differentially methylated in brain. These regions were frequently located in putative enhancers, and mCpHs within the enhancers increased in correlation with brain age. The enhancers with hypermethylated CpHs were associated with genes functionally enriched in immune responses, and some of the genes were related to neuroinflammation and degeneration. This study provides insight into the roles of non-CpG methylation as an epigenetic code in the mammalian brain genome.  相似文献   

14.
    
Selection for new favorable variants can lead to selective sweeps. However, such sweeps might be rare in the evolution of different species for which polygenic adaptation or selection on standing variation might be more common. Still, strong selective sweeps have been described in domestic species such as chicken lines or dog breeds. The goal of our study was to use a panel of individuals from 12 different cattle breeds genotyped at high density (800K SNPs) to perform a whole‐genome scan for selective sweeps defined as unexpectedly long stretches of reduced heterozygosity. To that end, we developed a hidden Markov model in which one of the hidden states corresponds to regions of reduced heterozygosity. Some unexpectedly long regions were identified. Among those, six contained genes known to affect traits with simple genetic architecture such as coat color or horn development. However, there was little evidence for sweeps associated with genes underlying production traits.  相似文献   

15.
A probabilistic graphical model is proposed in order to detect the coevolution between different sites in biological sequences. The model extends the continuous-time Markov process of sequence substitution for single nucleic or amino acids and imposes general constraints regarding simultaneous changes on the substitution rate matrix. Given a multiple sequence alignment for each molecule of interest and a phylogenetic tree, the model can predict potential interactions within or between nucleic acids and proteins. Initial validation of the model is carried out using tRNA and 16S rRNA sequence data. The model accurately identifies the secondary interactions of tRNA as well as several known tertiary interactions. In addition, results on 16S rRNA data indicate this general and simple coevolutionary model outperforms several other parametric and nonparametric methods in predicting secondary interactions. Furthermore, the majority of the putative predictions exhibit either direct contact or proximity of the nucleotide pairs in the 3-dimensional structure of the Thermus thermophilus ribosomal small subunit. The results on RNA data suggest a general model of coevolution might be applied to other types of interactions between protein, DNA, and RNA molecules.  相似文献   

16.
    
  相似文献   

17.
18.
    
  相似文献   

19.
Protein structures are stabilized by both local and long-range interactions. In this work, we analyzed the importance of long-range interactions in (α/β)8 barrel proteins in terms of residue distances. We found that the residues occurring in the range of 21–30 residues apart contribute more toward long-range contacts. Indeed, about 50% of successive strands in these proteins are found to occur at a sequential distance of 21–30 residues. The aromatic amino acid residues Phe, Trp, and Tyr prefer the 4–10 range and all other residues prefer the 21–30 range. Hydrophobic-hydrophobic resideu pairs are the most preferred ones for long-range interactions and they may play a key role in the folding and stabilization of (α/β)8 barrel proteins.  相似文献   

20.
Protein structures are stabilized by both local and long-range interactions. In this work, we analyzed the importance of long-range interactions in (α/β)8 barrel proteins in terms of residue distances. We found that the residues occurring in the range of 21–30 residues apart contribute more toward long-range contacts. Indeed, about 50% of successive strands in these proteins are found to occur at a sequential distance of 21–30 residues. The aromatic amino acid residues Phe, Trp, and Tyr prefer the 4–10 range and all other residues prefer the 21–30 range. Hydrophobic-hydrophobic resideu pairs are the most preferred ones for long-range interactions and they may play a key role in the folding and stabilization of (α/β)8 barrel proteins.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号