首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Discovering structural correlations in alpha-helices.   总被引:5,自引:2,他引:3       下载免费PDF全文
We have developed a new representation for structural and functional motifs in protein sequences based on correlations between pairs of amino acids and applied it to alpha-helical and beta-sheet sequences. Existing probabilistic methods for representing and analyzing protein sequences have traditionally assumed conditional independence of evidence. In other words, amino acids are assumed to have no effect on each other. However, analyses of protein structures have repeatedly demonstrated the importance of interactions between amino acids in conferring both structure and function. Using Bayesian networks, we are able to model the relationships between amino acids at distinct positions in a protein sequence in addition to the amino acid distributions at each position. We have also developed an automated program for discovering sequence correlations using standard statistical tests and validation techniques. In this paper, we test this program on sequences from secondary structure motifs, namely alpha-helices and beta-sheets. In each case, the correlations our program discovers correspond well with known physical and chemical interactions between amino acids in structures. Furthermore, we show that, using different chemical alphabets for the amino acids, we discover structural relationships based on the same chemical principle used in constructing the alphabet. This new representation of 3-dimensional features in protein motifs, such as those arising from structural or functional constraints on the sequence, can be used to improve sequence analysis tools including pattern analysis and database search.  相似文献   

3.
Screening of functional proteins from a random‐sequence library has been used to evolve novel proteins in the field of evolutionary protein engineering. However, random‐sequence proteins consisting of the 20 natural amino acids tend to aggregate, and the occurrence rate of functional proteins in a random‐sequence library is low. From the viewpoint of the origin of life, it has been proposed that primordial proteins consisted of a limited set of amino acids that could have been abundantly formed early during chemical evolution. We have previously found that members of a random‐sequence protein library constructed with five primitive amino acids show high solubility (Doi et al., Protein Eng Des Sel 2005;18:279–284). Although such a library is expected to be appropriate for finding functional proteins, the functionality may be limited, because they have no positively charged amino acid. Here, we constructed three libraries of 120‐amino acid, random‐sequence proteins using alphabets of 5, 12, and 20 amino acids by preselection using mRNA display (to eliminate sequences containing stop codons and frameshifts) and characterized and compared the structural properties of random‐sequence proteins arbitrarily chosen from these libraries. We found that random‐sequence proteins constructed with the 12‐member alphabet (including five primitive amino acids and positively charged amino acids) have higher solubility than those constructed with the 20‐member alphabet, though other biophysical properties are very similar in the two libraries. Thus, a library of moderate complexity constructed from 12 amino acids may be a more appropriate resource for functional screening than one constructed from 20 amino acids.  相似文献   

4.
The prediction of functional sites in newly solved protein structures is a challenge for computational structural biology. Most methods for approaching this problem use evolutionary conservation as the primary indicator of the location of functional sites. However, sequence conservation reflects not only evolutionary selection at functional sites to maintain protein function, but also selection throughout the protein to maintain the stability of the folded state. To disentangle sequence conservation due to protein functional constraints from sequence conservation due to protein structural constraints, we use all atom computational protein design methodology to predict sequence profiles expected under solely structural constraints, and to compute the free energy difference between the naturally occurring amino acid and the lowest free energy amino acid at each position. We show that functional sites are more likely than non-functional sites to have computed sequence profiles which differ significantly from the naturally occurring sequence profiles and to have residues with sub-optimal free energies, and that incorporation of these two measures improves sequence based prediction of protein functional sites. The combined sequence and structure based functional site prediction method has been implemented in a publicly available web server.  相似文献   

5.
6.
Protein folding involves the formation of secondary structural elements from the primary sequence and their association with tertiary assemblies. The relation of this primary sequence to a specific folded protein structure remains a central question in structural biology. An increasing body of evidence suggests that variations in homologous sequence ranging from point mutations to substantial insertions or deletions can yield stable proteins with markedly different folds. Here we report the structural characterization of domain IV (D4) and ΔD4 (polypeptides with 222 and 160 amino acids, respectively) that differ by virtue of an N-terminal deletion of 62 amino acids (28% of the overall D4 sequence). The high-resolution crystal structures of the monomeric D4 and the dimeric ΔD4 reveal substantially different folds despite an overall conservation of secondary structure. These structures show that the formation of tertiary structures, even in extended polypeptide sequences, can be highly context dependent, and they serve as a model for structural plasticity in protein isoforms.  相似文献   

7.
A detailed knowledge of a protein's functional site is an absolute prerequisite for understanding its mode of action at the molecular level. However, the rapid pace at which sequence and structural information is being accumulated for proteins greatly exceeds our ability to determine their biochemical roles experimentally. As a result, computational methods are required which allow for the efficient processing of the evolutionary information contained in this wealth of data, in particular that related to the nature and location of functionally important sites and residues. The method presented here, referred to as conserved functional group (CFG) analysis, relies on a simplified representation of the chemical groups found in amino acid side-chains to identify functional sites from a single protein structure and a number of its sequence homologues. We show that CFG analysis can fully or partially predict the location of functional sites in approximately 96% of the 470 cases tested and that, unlike other methods available, it is able to tolerate wide variations in sequence identity. In addition, we discuss its potential in a structural genomics context, where automation, scalability and efficiency are critical, and an increasing number of protein structures are determined with no prior knowledge of function. This is exemplified by our analysis of the hypothetical protein Ydde_Ecoli, whose structure was recently solved by members of the North East Structural Genomics consortium. Although the proposed active site for this protein needs to be validated experimentally, this example illustrates the scope of CFG analysis as a general tool for the identification of residues likely to play an important role in a protein's biochemical function. Thus, our method offers a convenient solution to rapidly and automatically process the vast amounts of data that are beginning to emerge from structural genomics projects.  相似文献   

8.
Structure-based prediction of DNA target sites by regulatory proteins   总被引:15,自引:0,他引:15  
Kono H  Sarai A 《Proteins》1999,35(1):114-131
Regulatory proteins play a critical role in controlling complex spatial and temporal patterns of gene expression in higher organism, by recognizing multiple DNA sequences and regulating multiple target genes. Increasing amounts of structural data on the protein-DNA complex provides clues for the mechanism of target recognition by regulatory proteins. The analyses of the propensities of base-amino acid interactions observed in those structural data show that there is no one-to-one correspondence in the interaction, but clear preferences exist. On the other hand, the analysis of spatial distribution of amino acids around bases shows that even those amino acids with strong base preference such as Arg with G are distributed in a wide space around bases. Thus, amino acids with many different geometries can form a similar type of interaction with bases. The redundancy and structural flexibility in the interaction suggest that there are no simple rules in the sequence recognition, and its prediction is not straightforward. However, the spatial distributions of amino acids around bases indicate a possibility that the structural data can be used to derive empirical interaction potentials between amino acids and bases. Such information extracted from structural databases has been successfully used to predict amino acid sequences that fold into particular protein structures. We surmised that the structures of protein-DNA complexes could be used to predict DNA target sites for regulatory proteins, because determining DNA sequences that bind to a particular protein structure should be similar to finding amino acid sequences that fold into a particular structure. Here we demonstrate that the structural data can be used to predict DNA target sequences for regulatory proteins. Pairwise potentials that determine the interaction between bases and amino acids were empirically derived from the structural data. These potentials were then used to examine the compatibility between DNA sequences and the protein-DNA complex structure in a combinatorial "threading" procedure. We applied this strategy to the structures of protein-DNA complexes to predict DNA binding sites recognized by regulatory proteins. To test the applicability of this method in target-site prediction, we examined the effects of cognate and noncognate binding, cooperative binding, and DNA deformation on the binding specificity, and predicted binding sites in real promoters and compared with experimental data. These results show that target binding sites for several regulatory proteins are successfully predicted, and our data suggest that this method can serve as a powerful tool for predicting multiple target sites and target genes for regulatory proteins.  相似文献   

9.
10.
Dokholyan NV 《Proteins》2004,54(4):622-628
Selecting a protein sequence that corresponds to a specific three-dimensional protein structure is known as the protein design problem. One principal bottleneck in solving this problem is our lack of knowledge of precise atomic interactions. Using a simple model of amino acid interactions, we determine three crucial factors that are important for solving the protein design problem. Among these factors is the protein alphabet-a set of sequence elements that encodes protein structure. Our model predicts that alphabet size is independent of protein length, suggesting the possibility of designing a protein of arbitrary length with the natural protein alphabet. We also find that protein alphabet size is governed by protein structural properties and the energetic properties of the protein alphabet units. We discover that the usage of average types of amino acid in proteins is less than expected if amino acids were chosen randomly with naturally occurring frequencies. We propose three possible scenarios that account for amino acid underusage in proteins. These scenarios suggest the possibility that amino acids themselves might not constitute the alphabet of natural proteins.  相似文献   

11.
As "the most abundant protein in the world,' ribulose-1,5-bisphosphate carboxylase (RuBisCO) attracts the attention of genetic engineers and plant phylogeneticists. The active site, which is responsible for almost all carbon fixation on earth, is in the large subunit (LSU). Over 30% of the 476 amino acids in the LSU are involved in intermolecular associations. Using available sequence data, we find that 105 (22%) of the residues are absolutely conserved across 499 seed plants, with an additional 110 demonstrating only one change. Our analyses show that conserved domains are not fully explained by current structural data. This has several implications for systematic studies. First, the number of potentially variable sites is likely to be slightly over 1000, rather than 1428. Second, rates of change can vary greatly across the molecule; functional constraints on amino acids and codon biases greatly increase the potential for homoplasy. Third, some changes are correlated, and thus might be down-weighted accordingly. Fourth, some of the variation in RuBisCO may be adaptive and present insights into the nature of evolutionary change in response to the environment.  相似文献   

12.
La D  Kihara D 《Proteins》2012,80(1):126-141
Protein-protein binding events mediate many critical biological functions in the cell. Typically, functionally important sites in proteins can be well identified by considering sequence conservation. However, protein-protein interaction sites exhibit higher sequence variation than other functional regions, such as catalytic sites of enzymes. Consequently, the mutational behavior leading to weak sequence conservation poses significant challenges to the protein-protein interaction site prediction. Here, we present a phylogenetic framework to capture critical sequence variations that favor the selection of residues essential for protein-protein binding. Through the comprehensive analysis of diverse protein families, we show that protein binding interfaces exhibit distinct amino acid substitution as compared with other surface residues. On the basis of this analysis, we have developed a novel method, BindML, which utilizes the substitution models to predict protein-protein binding sites of protein with unknown interacting partners. BindML estimates the likelihood that a phylogenetic tree of a local surface region in a query protein structure follows the substitution patterns of protein binding interface and nonbinding surfaces. BindML is shown to perform well compared to alternative methods for protein binding interface prediction. The methodology developed in this study is very versatile in the sense that it can be generally applied for predicting other types of functional sites, such as DNA, RNA, and membrane binding sites in proteins.  相似文献   

13.
We use flexible backbone protein design to explore the sequence and structure neighborhoods of naturally occurring proteins. The method samples sequence and structure space in the vicinity of a known sequence and structure by alternately optimizing the sequence for a fixed protein backbone using rotamer based sequence search, and optimizing the backbone for a fixed amino acid sequence using atomic-resolution structure prediction. We find that such a flexible backbone design method better recapitulates protein family sequence variation than sequence optimization on fixed backbones or randomly perturbed backbone ensembles for ten diverse protein structures. For the SH3 domain, the backbone structure variation in the family is also better recapitulated than in randomly perturbed backbones. The potential application of this method as a model of protein family evolution is highlighted by a concerted transition to the amino acid sequence in the structural core of one SH3 domain starting from the backbone coordinates of an homologous structure.  相似文献   

14.
牙鲆碱性磷酸酶cDNA序列分析与蛋白质高级结构预测   总被引:1,自引:0,他引:1  
为研究碱性磷酸酶(EC 3.1.3.1; alkaline phosphatase,ALP)在牙鲆(Paralichthys Olivaceus)发育和变态中的作用,采用RACE的方法克隆了牙鲆ALP基因cDNA全长,通过生物信息学分析了核苷酸序列并进行蛋白结构预测. 结果表明,牙鲆ALP cDNA全长为1 811bp,能编码476个氨基酸的蛋白质,分子量为52 293.1,等电点为7.67. 编码区核苷酸GC含量在ALP同源基因中差异比较大,脊椎动物明显高于非脊椎动物和细菌. 分子系统分析显示,牙鲆ALP和青黑斑河豚(Tetraodon nigroviridis)、斑马鱼(Danio rerio)的组织非特异性ALP有较高的同源性,分子进化树和物种进化树是一致的. 在蛋白序列中的一些重要的功能位点,包括金属离子结合位点、N糖基化位点和丝氨酸磷酸化位点等表现了较高的保守性. 牙鲆ALP和人胎盘ALP(PALP)在蛋白序列上有43%的相似性,其3D结构非常接近.通过氨基酸空间位置比较发现,牙鲆ALP中141和203位半胱氨酸对应于人PALP的121和183位半胱氨酸,推测能形成一个二硫键. 在两者酶活性中心,3个金属离子结合的氨基酸残基非常保守,Zn离子周围的9个氨基酸中有2个不同;Mg离子周围的7个氨基酸也只有2个不同,包括一对类似的丝氨酸155和苏氨酸175.  相似文献   

15.
Mooney SD  Liang MH  DeConde R  Altman RB 《Proteins》2005,61(4):741-747
A primary challenge for structural genomics is the automated functional characterization of protein structures. We have developed a sequence-independent method called S-BLEST (Structure-Based Local Environment Search Tool) for the annotation of previously uncharacterized protein structures. S-BLEST encodes the local environment of an amino acid as a vector of structural property values. It has been applied to all amino acids in a nonredundant database of protein structures to generate a searchable structural resource. Given a query amino acid from an experimentally determined or modeled structure, S-BLEST quickly identifies similar amino acid environments using a K-nearest neighbor search. In addition, the method gives an estimation of the statistical significance of each result. We validated S-BLEST on X-ray crystal structures from the ASTRAL 40 nonredundant dataset. We then applied it to 86 crystallographically determined proteins in the protein data bank (PDB) with unknown function and with no significant sequence neighbors in the PDB. S-BLEST was able to associate 20 proteins with at least one local structural neighbor and identify the amino acid environments that are most similar between those neighbors.  相似文献   

16.
The specific functional structure of natural proteins is determined by the way in which amino acids are sequentially connected in the polypeptide. The tight sequence/structure relationship governing protein folding does not seem to apply to amyloid fibril formation because many proteins without any sequence relationship have been shown to assemble into very similar β-sheet-enriched structures. Here, we have characterized the aggregation kinetics, seeding ability, morphology, conformation, stability, and toxicity of amyloid fibrils formed by a 20-residue domain of the islet amyloid polypeptide (IAPP), as well as of a backward and scrambled version of this peptide. The three IAPP peptides readily aggregate into ordered, β-sheet-enriched, amyloid-like fibrils. However, the mechanism of formation and the structural and functional properties of aggregates formed from these three peptides are different in such a way that they do not cross-seed each other despite sharing a common amino acid composition. The results confirm that, as for globular proteins, highly specific polypeptide sequential traits govern the assembly pathway, final fine structure, and cytotoxic properties of amyloid conformations.  相似文献   

17.
18.
Earlier studies of a group of monoclonal antibody-resistant (mar) mutants of herpes simplex virus type 1 glycoprotein C (gC) operationally defined two distinct antigenic sites on this molecule, each consisting of numerous overlapping epitopes. In this report, we further define epitopes of gC by sequence analysis of the mar mutant gC genes. In 18 mar mutants studied, the mar phenotype was associated with a single nucleotide substitution and a single predicted amino acid change. The mutations were localized to two regions within the coding sequence of the external domain of gC and correlated with the two previously defined antigenic sites. The predicted amino acid substitutions of site I mutants resided between residues Gln-307 and Pro-373, whereas those of site II mutants occurred between amino acids Arg-129 and Glu-247. Of the 12 site II mutations, 9 induced amino acid substitutions within an arginine-rich segment of 8 amino acids extending from residues 143 to 151. The clustering of the majority of substituted residues suggests that they contribute to the structure of the affected sites. Moreover, the patterns of substitutions which affected recognition by antibodies with similar epitope specificities provided evidence that epitope structures are physically linked and overlap within antigenic sites. Of the nine epitopes defined on the basis of mutations, three were located within site I and six were located within site II. Substituted residues affecting the site I epitopes did not overlap substituted residues of site II, supporting our earlier conclusion that sites I and II reside in spatially distinct antigenic domains. A computer analysis of the distribution of charged residues and the predicted secondary structural features of wild-type gC revealed that the two antigenic sites reside within the most hydrophilic regions of the molecule and that the antigenic residues are likely to be organized as beta sheets which loop out from the surface of the molecule. Together, these data and our previous studies support the conclusion that the mar mutations identified by sequence analysis very likely occur within or near the epitope structures themselves. Thus, two highly antigenic regions of gC have now been physically and genetically mapped to well-defined domains of the protein molecule.  相似文献   

19.
Solis AD  Rackovsky S 《Proteins》2000,38(2):149-164
In an effort to quantify loss of information in the processing of protein bioinformatic data, we examine how representations of amino acid sequence and backbone conformation affect the quantity of accessible structural information from local sequence. We propose a method to extract the maximum amount of peptide backbone structural information available in local sequence fragments, given a finite structural data set. Using methods of information theory, we develop an unbiased measure of local structural information that gauges changes in structural distributions when different representations of secondary structure and local sequence are used. We find that the manner in which backbone structure is represented affects the amount and quality of structural information that may be extracted from local sequence. Representations based on virtual bonds capture more structural information from local sequence than a three-state assignment scheme (helix/strand/loop). Furthermore, we find that amino acids show significant kinship with respect to the backbone structural information they carry, so that a collapse of the amino acid alphabet can be accomplished without severely affecting the amount of extractable information. This strategy is critical in optimizing the utility of a limited database of experimentally solved protein structures. Finally, we discuss the similarities within and differences between groups of amino acids in their roles in the local folding code and recognize specific amino acids critical in the formation of local structure.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号