首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The S subunits of type I DNA restriction/modification enzymes are responsible for recognising the DNA target sequence for the enzyme. They contain two domains of approximately 150 amino acids, each of which is responsible for recognising one half of the bipartite asymmetric target. In the absence of any known tertiary structure for type I enzymes or recognisable DNA recognition motifs in the highly variable amino acid sequences of the S subunits, it has previously not been possible to predict which amino acids are responsible for sequence recognition. Using a combination of sequence alignment and secondary structure prediction methods to analyse the sequences of S subunits, we predict that all of the 51 known target recognition domains (TRDs) have the same tertiary structure. Furthermore, this structure is similar to the structure of the TRD of the C5-cytosine methyltransferase, Hha I, which recognises its DNA target via interactions with two short polypeptide loops and a beta strand. Our results predict the location of these sequence recognition structures within the TRDs of all type I S subunits.  相似文献   

2.
Tom Defay  Fred E. Cohen 《Proteins》1995,23(3):431-445
The results of a protein structure prediction contest are reviewed. Twelve different groups entered predictions on 14 proteins of known sequence whose structures had been determined but not yet disseminated to the scientific community. Thus, these represent true tests of the current state of structure prediction methodologies. From this work, it is clear that accurate tertiary structure prediction is not yet possible. However, protein fold and motif prediction are possible when the motif is recognizably similar to another known structure. Internal symmetry and the information inherent in an aligned family of homologous sequences facilitate predictive efforts. Novel folds remain a major challenge for prediction efforts. © 1995 Wiley-Liss, Inc.  相似文献   

3.
The secondary and tertiary structures of interferon were predicted from four homologous amino acid sequences. Three methods of secondary structure prediction gave differing results that were interpreted to suggest that there might be four α-helices that are important in the tertiary fold. The validity of this interpretation was assessed by the application of the methods to predict the secondary structures of two proteins known to consist of four α-helices. A possible tertiary model for interferon is then proposed in which the four α-helices pack into a right-handed bundle similar to that observed in several known protein structures. This model was shown to be stereochemically feasible by an α-helix docking algorithm. One of the resultant structures is shown to be compatible with the known disulphide linkages in interferon. Certain residues that are conserved between the different sequences lie near each other in our model and these residues might form a functional site. In the absence of a crystal structure for interferon, a predicted tertiary model will help further structural and functional studies.  相似文献   

4.
Measurements of protein sequence-structure correlations   总被引:1,自引:0,他引:1  
Crooks GE  Wolfe J  Brenner SE 《Proteins》2004,57(4):804-810
Correlations between protein structures and amino acid sequences are widely used for protein structure prediction. For example, secondary structure predictors generally use correlations between a secondary structure sequence and corresponding primary structure sequence, whereas threading algorithms and similar tertiary structure predictors typically incorporate interresidue contact potentials. To investigate the relative importance of these sequence-structure interactions, we measured the mutual information among the primary structure, secondary structure and side-chain surface exposure, both for adjacent residues along the amino acid sequence and for tertiary structure contacts between residues distantly separated along the backbone. We found that local interactions along the amino acid chain are far more important than non-local contacts and that correlations between proximate amino acids are essentially uninformative. This suggests that knowledge-based contact potentials may be less important for structure predication than is generally believed.  相似文献   

5.
SOX蛋白具有一个与DNA特异结合的高保守HMG-box结构域。为研究东北虎SOX蛋白三级结构的分子机理,利用MATLAB的Bioinformatics工具从GenBank中下载东北虎SOX蛋白序列信息,以三级结构已知的SOX2为模板,联合SwissPdbViewer与MATLAB,采用同源建模方法对SOX蛋白HMG-box进行建模、预测;利用MATLAB的Visualization Tool分析预测结果的三维结构。结果显示PtSox蛋白的HMG-box由3个α-螺旋和2个loop区构成;热稳定性分析表明PtSox蛋白loop区的热力学结构不稳定;表面静电分布显示出PtSox蛋白C-端的中间有一个可能与其它小分子或蛋白质的相互作用位点的N/C腔,上述空间结构可能与其活性与功能的调控有关。  相似文献   

6.
Eleven basic proline-rich proteins were purified from the parotid saliva of a single individual. The complete amino acid sequences of six of these were determined by conventional protein sequence methodology, bringing to nine the number of known primary structures of nonglycosylated basic proline-rich proteins from the same individual. The partial sequence of one additional protein is also reported. All of the basic proline-rich proteins studied contain segments with identical or very similar sequences, but with two possible exceptions, none of the proteins is derived from another secreted proline-rich protein. The amino acid sequences of nine nonglycosylated basic proline-rich proteins were compared with primary structures deduced from published nucleotide sequences of DNA coding for human parotid proline-rich proteins. The sequences align well, in general, but differences also exist pointing to the complexity of the genetics of these proteins. Seven secretory basic proline-rich proteins appear to be formed from three larger precursors by selective posttranslational proteolyses of arginyl bonds. One of the basic proline-rich proteins appears to derive from human acidic proline-rich proteins. The remaining two proteins studied do not conform to any DNA structure as yet reported. Two of the basic proline-rich proteins studied are phosphoproteins and exhibit abilities to inhibit hydroxyapatite formation in vitro.  相似文献   

7.
Thompson J  Baker D 《Proteins》2011,79(8):2380-2388
Prediction of protein structures from sequences is a fundamental problem in computational biology. Algorithms that attempt to predict a structure from sequence primarily use two sources of information. The first source is physical in nature: proteins fold into their lowest energy state. Given an energy function that describes the interactions governing folding, a method for constructing models of protein structures, and the amino acid sequence of a protein of interest, the structure prediction problem becomes a search for the lowest energy structure. Evolution provides an orthogonal source of information: proteins of similar sequences have similar structure, and therefore proteins of known structure can guide modeling. The relatively successful Rosetta approach takes advantage of the first, but not the second source of information during model optimization. Following the classic work by Andrej Sali and colleagues, we develop a probabilistic approach to derive spatial restraints from proteins of known structure using advances in alignment technology and the growth in the number of structures in the Protein Data Bank. These restraints define a region of conformational space that is high-probability, given the template information, and we incorporate them into Rosetta's comparative modeling protocol. The combined approach performs considerably better on a benchmark based on previous CASP experiments. Incorporating evolutionary information into Rosetta is analogous to incorporating sparse experimental data: in both cases, the additional information eliminates large regions of conformational space and increases the probability that energy-based refinement will hone in on the deep energy minimum at the native state.  相似文献   

8.
Structure-based prediction of DNA target sites by regulatory proteins   总被引:15,自引:0,他引:15  
Kono H  Sarai A 《Proteins》1999,35(1):114-131
Regulatory proteins play a critical role in controlling complex spatial and temporal patterns of gene expression in higher organism, by recognizing multiple DNA sequences and regulating multiple target genes. Increasing amounts of structural data on the protein-DNA complex provides clues for the mechanism of target recognition by regulatory proteins. The analyses of the propensities of base-amino acid interactions observed in those structural data show that there is no one-to-one correspondence in the interaction, but clear preferences exist. On the other hand, the analysis of spatial distribution of amino acids around bases shows that even those amino acids with strong base preference such as Arg with G are distributed in a wide space around bases. Thus, amino acids with many different geometries can form a similar type of interaction with bases. The redundancy and structural flexibility in the interaction suggest that there are no simple rules in the sequence recognition, and its prediction is not straightforward. However, the spatial distributions of amino acids around bases indicate a possibility that the structural data can be used to derive empirical interaction potentials between amino acids and bases. Such information extracted from structural databases has been successfully used to predict amino acid sequences that fold into particular protein structures. We surmised that the structures of protein-DNA complexes could be used to predict DNA target sites for regulatory proteins, because determining DNA sequences that bind to a particular protein structure should be similar to finding amino acid sequences that fold into a particular structure. Here we demonstrate that the structural data can be used to predict DNA target sequences for regulatory proteins. Pairwise potentials that determine the interaction between bases and amino acids were empirically derived from the structural data. These potentials were then used to examine the compatibility between DNA sequences and the protein-DNA complex structure in a combinatorial "threading" procedure. We applied this strategy to the structures of protein-DNA complexes to predict DNA binding sites recognized by regulatory proteins. To test the applicability of this method in target-site prediction, we examined the effects of cognate and noncognate binding, cooperative binding, and DNA deformation on the binding specificity, and predicted binding sites in real promoters and compared with experimental data. These results show that target binding sites for several regulatory proteins are successfully predicted, and our data suggest that this method can serve as a powerful tool for predicting multiple target sites and target genes for regulatory proteins.  相似文献   

9.
In principle, structural information of protein sequences with no detectable homology to a protein of known structure could be obtained by predicting the arrangement of their secondary structural elements. Although some ab initio methods for protein structure prediction have been reported, the long-range interactions required to accurately predict tertiary structures of β-sheet containing proteins are still difficult to simulate. To remedy this problem and facilitate de novo prediction of β-sheet containing protein structures, we developed a support vector machine (SVM) approach that classified parallel and antiparallel orientation of β-strands by using the information of interstrand amino acid pairing preferences. Based on a second-order statistics on the relative frequencies of each possible interstrand amino acid pair, we defined an average amino acid pairing encoding matrix (APEM) for encoding β-strands as input in the prediction model. As a result, a prediction accuracy of 86.89% and a Matthew's correlation coefficient value of 0.71 have been achieved through 7-fold cross-validation on a non-redundant protein dataset from PISCES. Although several issues still remain to be studied, the method presented here to some extent could indicate the important contribution of the amino acid pairs to the β-strand orientation, and provide a possible way to further be combined with other algorithms making a full ‘identification’ of β-strands.  相似文献   

10.
A novel method for predicting the secondary structures of proteins from amino acid sequence has been presented. The protein secondary structure seqlets that are analogous to the words in natural language have been extracted. These seqlets will capture the relationship between amino acid sequence and the secondary structures of proteins and further form the protein secondary structure dictionary. To be elaborate, the dictionary is organism-specific. Protein secondary structure prediction is formulated as an integrated word segmentation and part of speech tagging problem. The word-lattice is used to represent the results of the word segmentation and the maximum entropy model is used to calculate the probability of a seqlet tagged as a certain secondary structure type. The method is markovian in the seqlets, permitting efficient exact calculation of the posterior probability distribution over all possible word segmentations and their tags by viterbi algorithm. The optimal segmentations and their tags are computed as the results of protein secondary structure prediction. The method is applied to predict the secondary structures of proteins of four organisms respectively and compared with the PHD method. The results show that the performance of this method is higher than that of PHD by about 3.9% Q3 accuracy and 4.6% SOV accuracy. Combining with the local similarity protein sequences that are obtained by BLAST can give better prediction. The method is also tested on the 50 CASP5 target proteins with Q3 accuracy 78.9% and SOV accuracy 77.1%. A web server for protein secondary structure prediction has been constructed which is available at http://www.insun.hit.edu.cn:81/demos/biology/index.html.  相似文献   

11.
Prediction of protein tertiary structures from amino acid sequence and understanding the mechanisms of how proteins fold, collectively known as “the protein folding problem,” has been a grand challenge in molecular biology for over half a century. Theories have been developed that provide us with an unprecedented understanding of protein folding mechanisms. However, computational simulation of protein folding is still difficult, and prediction of protein tertiary structure from amino acid sequence is an unsolved problem. Progress toward a satisfying solution has been slow due to challenges in sampling the vast conformational space and deriving sufficiently accurate energy functions. Nevertheless, several techniques and algorithms have been adopted to overcome these challenges, and the last two decades have seen exciting advances in enhanced sampling algorithms, computational power and tertiary structure prediction methodologies. This review aims at summarizing these computational techniques, specifically conformational sampling algorithms and energy approximations that have been frequently used to study protein-folding mechanisms or to de novo predict protein tertiary structures. We hope that this review can serve as an overview on how the protein-folding problem can be studied computationally and, in cases where experimental approaches are prohibitive, help the researcher choose the most relevant computational approach for the problem at hand. We conclude with a summary of current challenges faced and an outlook on potential future directions.  相似文献   

12.
One of the classical DNA-binding proteins, bacteriophage lambda Cro, forms a homodimer with a unique fold of alpha-helices and beta-sheets. We have computationally designed an artificial sequence of 60 amino acid residues to stabilize the backbone tertiary structure of the lambda Cro dimer by simulated annealing using knowledge-based structure-sequence compatibility functions. The designed amino acid sequence has 25% identity with that of natural lambda Cro and preserves Phe58, which is important for formation of the stably folded structure of lambda Cro. The designed dimer protein and its monomeric variant, which was redesigned by the insertion of a beta-hairpin sequence at the C-terminal region to prevent dimerization, were synthesized and biochemically characterized to be well folded. The designed protein was monomeric under a wide range of protein concentrations and its solution structure was determined by NMR spectroscopy. The solved structure is similar to that of a monomeric variant of natural lambda Cro with a root-mean-square deviation of the polypeptide backbones at 2.1A and has a well-packed protein core. Thus, our knowledge-based functions provide approximate but essential relationships between amino acid sequences and protein structures, and are useful for finding novel sequences that are foldable into a given target structure.  相似文献   

13.
1 Introduction The prediction of protein structure and function from amino acid sequences is one of the most impor-tant problems in molecular biology. This problem is becoming more pressing as the number of known pro-tein sequences is explored as a result of genome and other sequencing projects, and the protein sequence- structure gap is widening rapidly[1]. Therefore, com-putational tools to predict protein structures are needed to narrow the widening gap. Although the prediction of three dim…  相似文献   

14.
Gordon M. Crippen 《Proteins》1996,26(2):167-171
To calculate the tertiary structure of a protein from its amino acid sequence, the thermodynamic approach requires a potential function of sequence and conformation that has its global minimum at the native conformation for many different proteins. Here we study the behavior of such functions for the simplest model system that still has some of the features of the protein folding problem, namely two-dimensional square lattice chain configurations involving two residue types. First we show that even the given contact potential, which by definition is used to identify the folding sequences and their unique native conformations, cannot always correctly select which sequences will fold to a given structure. Second, we demonstrate that the given contact potential is not always able to favor the native alignment of a native sequence on its own native conformation over other gapped alignments of different folding sequences onto that same conformation. Because of these shortcomings, even in this simple model system in which all conformations and all native sequences are known and determined directly by the given potential, we must reexamine our expectations for empirical potentials used for inverse folding and gapped alignment on more realistic representations of proteins. © 1996 Wiley-Liss, Inc.  相似文献   

15.
Yo Matsuo  Ken Nishikawa 《Proteins》1995,23(3):370-375
A protein fold recognition method was tested by the blind prediction of the structures of a set of proteins. The method evaluates the compatibility of an amino acid sequence with a three-dimensional structure using the four evaluation functions: side-chain packing, solvation, hydrogen-bonding, and local conformation functions. The structures of 14 proteins containing 19 sequences were predicted. The predictions were compared with the experimental structures. The experimental results showed that 9 of the 19 target sequences have known folds or portions of known folds. Among them, the folds of Klebsiella aerogenes urease β subunit (KAUB) and pyruvate phosphate dikinase domain 4 (PPDK4) were successfully recognized; our method predicted that KAUB and PPDK4 would adopt the folds of macromomycin (Ig-fold) and phosphoribosylanthra-nilate isomerase:indoleglycerol-phosphate synthase (TIM barrel), respectively, and the experimental structure revealed that they actually adopt the predicted folds. The predictions for the other targets were not successful, but they often gave secondary structural patterns similar to those of the experimental structures. © 1995 Wiley-Liss, Inc.  相似文献   

16.
Designing a protein sequence that will fold into a predefined structure is of both practical and fundamental interest. Many successful, computational designs in the last decade resulted from improved understanding of hydrophobic and polar interactions between side chains of amino acid residues in stabilizing protein tertiary structures. However, the coupling between main‐chain backbone structure and local sequence has yet to be fully addressed. Here, we attempt to account for such coupling by using a sequence profile derived from the sequences of five residue fragments in a fragment library that are structurally matched to the five‐residue segments contained in a target structure. We further introduced a term to reduce low complexity regions of designed sequences. These two terms together with optimized reference states for amino‐acid residues were implemented in the RosettaDesign program. The new method, called RosettaDesign‐SR, makes a 12% increase (from 34 to 46%) in fraction of proteins whose designed sequences are more than 35% identical to wild‐type sequences. Meanwhile, it reduces 8% (from 22% to 14%) to the number of designed sequences that are not homologous to any known protein sequences according to psi‐blast. More importantly, the sequences designed by RosettaDesign‐SR have 2–3% more polar residues at the surface and core regions of proteins and these surface and core polar residues have about 4% higher sequence identity to wild‐type sequences than by RosettaDesign. Thus, the proteins designed by RosettaDesign‐SR should be less likely to aggregate and more likely to have unique structures due to more specific polar interactions. Proteins 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

17.
All acyl carrier protein primary and tertiary structures were gathered into the ThYme database. They are classified into 16 families by amino acid sequence similarity, with members of the different families having sequences with statistically highly significant differences. These classifications are supported by tertiary structure superposition analysis. Tertiary structures from a number of families are very similar, suggesting that these families may come from a single distant ancestor. Normal vibrational mode analysis was conducted on experimentally determined freestanding structures, showing greater fluctuations at chain termini and loops than in most helices. Their modes overlap more so within families than between different families. The tertiary structures of three acyl carrier protein families that lacked any known structures were predicted as well.  相似文献   

18.
The primary structure of ribosomal protein L12 from Methanococcus vannielii has been determined by direct amino acid sequence analysis with automated liquid phase Edman degradation of the entire protein and manual 4-N,N'-dimethylaminoazobenzene-4'-isothiocyanate/phenylisothiocyanate sequencing of fragments obtained by enzymatic digestion and by partial acid hydrolysis. The knowledge of the amino acid sequences of these various fragments allowed the synthesis of two oligonucleotide probes complementary to the 5'- and the 3'-end of the gene, and they were used for hybridization with digested M. vannielii chromosomal DNA. Both oligonucleotide probes gave similar and clear hybridization signals. The plasmid pMvaX1 containing the entire gene of protein L12 was obtained. The nucleotide sequence complemented the partial amino acid sequence, and it is in full agreement with the protein sequence and the amino acid analysis. Comparison of secondary structural elements and hydrophobicity plots of the M. vannielii protein L12 with the known L12 sequences derived from other archaebacterial and eukaryotic sources show strong homologies among these sequences. They contain an exceptional highly conserved hydrophilic sequence area in the C-terminal part of the proteins. In comparison with eubacterial L12 proteins, the conservation is reduced to single amino acid residues. However, the eubacterial L12 proteins have hydrophilic regions similar to those of L12 from M. vannielii. These regions are predicted to be located at the surface of the proteins, as has been proven to be the case in crystallized Escherichia coli L12 protein. It is possible that the strongly conserved hydrophilic sequence regions form part of the factor-binding domain.  相似文献   

19.
Klepeis JL  Wei Y  Hecht MH  Floudas CA 《Proteins》2005,58(3):560-570
Ab initio structure prediction and de novo protein design are two problems at the forefront of research in the fields of structural biology and chemistry. The goal of ab initio structure prediction of proteins is to correctly characterize the 3D structure of a protein using only the amino acid sequence as input. De novo protein design involves the production of novel protein sequences that adopt a desired fold. In this work, the results of a double-blind study are presented in which a new ab initio method was successfully used to predict the 3D structure of a protein designed through an experimental approach using binary patterned combinatorial libraries of de novo sequences. The predicted structure, which was produced before the experimental structure was known and without consideration of the design goals, and the final NMR analysis both characterize this protein as a 4-helix bundle. The similarity of these structures is evidenced by both small RMSD values between the coordinates of the two structures and a detailed analysis of the helical packing.  相似文献   

20.
Caenorhabditis elegans can serve as a model system to study telomere functions due to its similarity to higher organisms in telomere structures. We report here the identification of the nematode homeodomain protein CEH-37 as a telomere-binding protein using a yeast one-hybrid screen. The predicted three-dimensional model of the homeodomain of CEH-37, which has a typical helix-loop-helix structure, was similar to that of the Myb domain of known telomere-binding proteins, which is also a helix-loop-helix protein, despite little amino acid sequence similarity. We demonstrated the specific binding of CEH-37 to the nematode telomere sequences in vitro by competition assays. We determined that CEH-37 binding required at least 1.5 repeats of TTAGGC and that the core sequence for binding was GGCTTA. We found that CEH-37 had an ability to bend telomere sequence-containing DNA, which is the case for other known telomere-binding proteins such as TRF1 and RAP1, indicating that CEH-37 may be involved in establishing or maintaining a secondary structure of the telomeres in vivo. We also demonstrated that CEH-37 was primarily co-localized to the chromosome ends in vivo, indicating that CEH-37 may play roles in telomere functions. Consistent with this, a ceh-37 mutation resulting in a truncated protein caused a weak high incidence of male phenotype, which may have been caused by chromosome instability. The identification of CEH-37 as a telomere-binding protein may represent an evolutionary conservation of telomere-binding proteins in terms of tertiary protein structure rather than primary amino acid sequence.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号