首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
In this work, we discovered a fundamental connection between selection for protein stability and emergence of preferred structures of proteins. Using a standard exact three-dimensional lattice model we evolve sequences starting from random ones and determine the exact native structure after each mutation. Acceptance of mutations is biased to select for stable proteins. We found that certain structures, "wonderfolds", are independently discovered numerous times as native states of stable proteins in many unrelated runs of selection. The strong dependence of lattice fold usage on the structural determinant of designability quantitatively reproduces uneven fold usage in natural proteins. Diversity of sequences that fold into wonderfold structures gives rise to superfamilies, i.e. sets of dissimilar sequences that fold into the same or very similar structures. The present work establishes a model of pre-biotic structure selection, which identifies dominant structural patterns emerging upon optimization of proteins for survival in a hot environment. Convergently discovered pre-biotic initial superfamilies with wonderfold structures could have served as a seed for subsequent biological evolution involving gene duplications and divergence.  相似文献   

2.
What are the selective pressures on protein sequences during evolution? Amino acid residues may be highly conserved for functional or structural (stability) reasons. Theoretical studies have proposed that residues involved in the folding nucleus may also be highly conserved. To test this we are using an experimental "fold approach" to the study of protein folding. This compares the folding and stability of a number of proteins that share the same fold, but have no common amino acid sequence or biological activity. The fold selected for this study is the immunoglobulin-like beta-sandwich fold, which is a fold that has no specifically conserved function. Four model proteins are used from two distinct superfamilies that share the immunoglobulin-like fold, the fibronectin type III and immunoglobulin superfamilies. Here, the fold approach and protein engineering are used to question the role of a highly conserved tyrosine in the "tyrosine corner" motif that is found ubiquitously and exclusively in Greek key proteins. In the four model beta-sandwich proteins characterised here, the tyrosine is the only residue that is absolutely conserved at equivalent sites. By mutating this position to phenylalanine, we show that the tyrosine hydroxyl is not required to nucleate folding in the immunoglobulin superfamily, whereas it is involved to some extent in early structure formation in the fibronectin type III superfamily. The tyrosine corner is important for stability, mutation to phenylalanine costs between 1.5 and 3 kcal mol(-1). We propose that the high level of conservation of the tyrosine is related to the structural restraints of the loop connecting the beta-sheets, representing an evolutionary "cul-de-sac".  相似文献   

3.
Evolution of protein sequences and structures.   总被引:9,自引:0,他引:9  
The relationship between sequence similarity and structural similarity has been examined in 36 protein families with five or more diverse members whose structures are known. The structural similarity within a family (as determined with the DALI structure comparison program) is linearly related to sequence similarity (as determined by a Smith-Waterman search of the protein sequences in the structure database). The correlation between structural similarity and sequence similarity is very high; 18 of the 36 families had linear correlation coefficients r>/=0.878, and only nine had correlation coefficients r相似文献   

4.
The amino acid sequence identity and potential structural similarity between the subunits of bacterial luciferase and the recently determined structure of the luxF molecule are examined. The unique beta/alpha barrel fold found in luxF appears to be conserved in part in the luciferase subunits. From secondary structural predictions of both luciferase subunits, and from structural comparisons between the protein product of the luxF gene, NFP, and glycolate oxidase, we propose that it is feasible for both luciferase subunits to adopt a (beta alpha)8 barrel fold with at least 2 excursions from the (beta alpha)8 topology. Amino acids conserved between NFP and the luciferase subunits cluster together in 3 distinct "pockets" of NFP, which are located at hydrophobic interfaces between the beta-strands and alpha-helices. Several tight turns joining the C-termini of beta-strands and the N-termini of alpha-helices are found as key components of these conserved regions. Helix start and end points are easily demarcated in the luciferase subunit protein sequences; the N-cap residues are the most strongly conserved structural features. A partial model of the luciferase beta subunit from Photobacterium leiognathi has been built based on our crystallographically determined structure of luxF at 1.6 A resolution.  相似文献   

5.
Histone Sequence Database: new histone fold family members.   总被引:2,自引:0,他引:2       下载免费PDF全文
Searches of the major public protein databases with core and linker chicken and human histone sequences have resulted in the compilation of an annotated set of histone protein sequences. In addition, new database searches with two distinct motif search algorithms have identified several members of the histone fold family, including human DRAP1 and yeast CSE4. Database resources include information on conflicts between similar sequence entries in different source databases, multiple sequence alignments, links to the Entrez integrated information retrieval system, structures for histone and histone fold proteins, and the ability to visualize structural data through Cn3D. The database currently contains >1000 protein sequences, which are searchable by protein type, accession number, organism name, or any other free text appearing in the definition line of the entry. All sequences and alignments in this database are available through the World Wide Web at http://www.nhgri.nih. gov/DIR/GTB/HISTONES or http://www.ncbi.nlm.nih. gov/Baxevani/HISTONES  相似文献   

6.
Homology detection and protein structure prediction are central themes in bioinformatics. Establishment of relationship between protein sequences or prediction of their structure by sequence comparison methods finds limitations when there is low sequence similarity. Recent works demonstrate that the use of profiles improves homology detection and protein structure prediction. Profiles can be inferred from protein multiple alignments using different approaches. The "Conservatism-of-Conservatism" is an effective profile analysis method to identify structural features between proteins having the same fold but no detectable sequence similarity. The information obtained from protein multiple alignments varies according to the amino acid classification employed to calculate the profile. In this work, we calculated entropy profiles from PSI-BLAST-derived multiple alignments and used different amino acid classifications summarizing almost 500 different attributes. These entropy profiles were converted into pseudocodes which were compared using the FASTA program with an ad-hoc matrix. We tested the performance of our method to identify relationships between proteins with similar fold using a nonredundant subset of sequences having less than 40% of identity. We then compared our results using Coverage Versus Error per query curves, to those obtained by methods like PSI-BLAST, COMPASS and HHSEARCH. Our method, named HIP (Homology Identification with Profiles) presented higher accuracy detecting relationships between proteins with the same fold. The use of different amino acid classifications reflecting a large number of amino acid attributes, improved the recognition of distantly related folds. We propose the use of pseudocodes representing profile information as a fast and powerful tool for homology detection, fold assignment and analysis of evolutionary information enclosed in protein profiles.  相似文献   

7.
8.
Here, we provide an analysis of molecular evolution of five of the most populated protein folds: immunoglobulin fold, oligonucleotide-binding fold, Rossman fold, alpha/beta plait, and TIM barrels. In order to distinguish between "historic", functional and structural reasons for amino acid conservations, we consider proteins that acquire the same fold and have no evident sequence homology. For each fold we identify positions that are conserved within each individual family and coincide when non-homologous proteins are structurally superimposed. As a baseline for statistical assessment we use the conservatism expected based on the solvent accessibility. The analysis is based on a new concept of "conservatism-of-conservatism". This approach allows us to identify the structural features that are stabilized in all proteins having a given fold, despite the fact that actual interactions that provide such stabilization may vary from protein to protein. Comparison with experimental data on thermodynamics, folding kinetics and function of the proteins reveals that such universally conserved clusters correspond to either: (i) super-sites (common location of active site in proteins having common tertiary structures but not function) or (ii) folding nuclei whose stability is an important determinant of folding rate, or both (in the case of Rossman fold). The analysis also helps to clarify the relation between folding and function that is apparent for some folds.  相似文献   

9.
Protein fold and structure in the truncated (2/2) globin family   总被引:1,自引:0,他引:1  
Nardini M  Pesce A  Milani M  Bolognesi M 《Gene》2007,398(1-2):2-11
Analysis of amino acids sequences and protein folds has recently unraveled the structural bases and details of several proteins from the recently discovered "truncated hemoglobin" family. The analysis here presented, in agreement with previous surveys, shows that truncated hemoglobins can be classified in three main groups, based on their structural properties. Crystallographic analyses have shown that all three groups adopt a 2-on-2 alpha-helical sandwich fold, resulting from apparent editing of the classical 3-on-3 alpha-helical sandwich of vertebrate and invertebrate conventional globins. Specific structural features distinguish each of the three groups. Among these, a protein matrix tunnel system is typical of group I, a Trp residue at the G8 topological site is conserved in groups II and III, and TyrB10 is almost invariant through the three groups. A strongly intertwined network of hydrogen bonds stabilizes the heme bound ligand, despite variability of the heme distal residues observed in the different proteins considered. Details of ligand recognition in the three groups are discussed at the light of residue conservation and of differing ligand diffusion pathways to the heme. Based on structural analyses of the family-specific fold, we endorse a recent proposal of leaving the "truncated hemoglobins" term, that does not represent properly the observed 2-on-2 alpha-helical sandwich fold, and adopting the simple "2/2Hb" term to concisely address this protein family.  相似文献   

10.
11.
Accurately assigning folds for divergent protein sequences is a major obstacle to structural studies. Herein, we outline an effective method for fold recognition using sets of PSSMs, each of which is constructed for different protein folds. Our analyses demonstrate that FSL (Fold-specific Position Specific Scoring Matrix Libraries) can predict/relate structures given only their amino acid sequences of highly divergent proteins. This ability to detect distant relationships is dependent on low-identity sequence alignments obtained from FSL. Results from our experiments demonstrate that FSL perform well in recognizing folds from the "twilight-zone" SABmark dataset. Further, this method is capable of accurate fold prediction in newly determined structures. We suggest that by building complete PSSM libraries for all unique folds within the Protein Database (PDB), FSL can be used to rapidly and reliably annotate a large subset of protein folds at proteomic level. The related programs and fold-specific PSSMs for our FSL are publicly available at: http://ccp.psu.edu/download/FSLv1.0/.  相似文献   

12.
Proteins that share even low sequence homologies are known to adopt similar folds. The beta-propeller structural motif is one such example. Identifying sequences that adopt a beta-propeller fold is useful to annotate protein structure and function. Often, tandem sequence repeats provide the necessary signal for identifying beta-propellers in proteins. In our recent analysis to identify cell surface proteins in archaeal and bacterial genomes, we identified some proteins that contain novel tandem repeats "LVIVD", "RIVW" and "LGxL". In this work, based on protein fold predictions and three-dimensional comparative modeling methods, we predicted that these repeat types fold as beta-propeller. Further, the evolutionary trace analysis of all proteins constituting amino acid sequence repeats in beta-propellers suggest that the novel repeats have diverged from a common ancestor.  相似文献   

13.
Many dissimilar protein sequences fold into similar structures. A central and persistent challenge facing protein structural analysis is the discrimination between homology and convergence for structurally similar domains that lack significant sequence similarity. Classic examples are the OB-fold and SH3 domains, both small, modular beta-barrel protein superfolds. The similarities among these domains have variously been attributed to common descent or to convergent evolution. Using a sequence profile-based phylogenetic technique, we analyzed all structurally characterized OB-fold, SH3, and PDZ domains with less than 40% mutual sequence identity. An all-against-all, profile-versus-profile analysis of these domains revealed many previously undetectable significant interrelationships. The matrices of scores were used to infer phylogenies based on our derivation of the relationships between sequence similarity E-values and evolutionary distances. The resulting clades of domains correlate remarkably well with biological function, as opposed to structural similarity, indicating that the functionally distinct sub-families within these superfolds are homologous. This method extends phylogenetics into the challenging "twilight zone" of sequence similarity, providing the first objective resolution of deep evolutionary relationships among distant protein families.  相似文献   

14.
Kinases are a ubiquitous group of enzymes that catalyze the phosphoryl transfer reaction from a phosphate donor (usually ATP) to a receptor substrate. Although all kinases catalyze essentially the same phosphoryl transfer reaction, they display remarkable diversity in their substrate specificity, structure, and the pathways in which they participate. In order to learn the relationship between structural fold and functional specificities in kinases, we have done a comprehensive survey of all available kinase sequences (>17,000) and classified them into 30 distinct families based on sequence similarities. Of these families, 19, covering nearly 98% of all sequences, fall into seven general structural folds for which three-dimensional structures are known. These fold groups include some of the most widespread protein folds, such as Rossmann fold, ferredoxin fold, ribonuclease H fold, and TIM beta/alpha-barrel. On the basis of this classification system, we examined the shared substrate binding and catalytic mechanisms as well as variations of these mechanisms in the same fold groups. Cases of convergent evolution of identical kinase activities occurring in different folds are discussed.  相似文献   

15.
Silva PJ 《Proteins》2008,70(4):1588-1594
Hydrophobic cluster analysis (HCA) has long been used as a tool to detect distant homologies between protein sequences, and to classify them into different folds. However, it relies on expert human intervention, and is sensitive to subjective interpretations of pattern similarities. In this study, we describe a novel algorithm to assess the similarity of hydrophobic amino acid distributions between two sequences. Our algorithm correctly identifies as misattributions several HCA-based proposals of structural similarity between unrelated proteins present in the literature. We have also used this method to identify the proper fold of a large variety of sequences, and to automatically select the most appropriate structure for homology modeling of several proteins with low sequence identity to any other member of the protein data bank. Automatic modeling of the target proteins based on these templates yielded structures with TM-scores (vs. experimental structures) above 0.60, even without further refinement. Besides enabling a reliable identification of the correct fold of an unknown sequence and the choice of suitable templates, our algorithm also shows that whereas most structural classes of proteins are very homogeneous in hydrophobic cluster composition, a tenth of the described families are compatible with a large variety of hydrophobic patterns. We have built a browsable database of every major representative hydrophobic cluster pattern present in each structural class of proteins, freely available at http://www2.ufp.pt/ pedros/HCA_db/index.htm.  相似文献   

16.
Modeling the inherent flexibility of the protein backbone as part of computational protein design is necessary to capture the behavior of real proteins and is a prerequisite for the accurate exploration of protein sequence space. We present the results of a broad exploration of sequence space, with backbone flexibility, through a novel approach: large-scale protein design to structural ensembles. A distributed computing architecture has allowed us to generate hundreds of thousands of diverse sequences for a set of 253 naturally occurring proteins, allowing exciting insights into the nature of protein sequence space. Designing to a structural ensemble produces a much greater diversity of sequences than previous studies have reported, and homology searches using profiles derived from the designed sequences against the Protein Data Bank show that the relevance and quality of the sequences is not diminished. The designed sequences have greater overall diversity than corresponding natural sequence alignments, and no direct correlations are seen between the diversity of natural sequence alignments and the diversity of the corresponding designed sequences. For structures in the same fold, the sequence entropies of the designed sequences cluster together tightly. This tight clustering of sequence entropies within a fold and the separation of sequence entropy distributions for different folds suggest that the diversity of designed sequences is primarily determined by a structure's overall fold, and that the designability principle postulated from studies of simple models holds in real proteins. This has important implications for experimental protein design and engineering, as well as providing insight into protein evolution.  相似文献   

17.
The local environment of an amino acid in a folded protein determines the acceptability of mutations at that position. In order to characterize and quantify these structural constraints, we have made a comparative analysis of families of homologous proteins. Residues in each structure are classified according to amino acid type, secondary structure, accessibility of the side chain, and existence of hydrogen bonds from the side chains. Analysis of the pattern of observed substitutions as a function of local environment shows that there are distinct patterns, especially for buried polar residues. The substitution data tables are available on diskette with Protein Science. Given the fold of a protein, one is able to predict sequences compatible with the fold (profiles or templates) and potentially to discriminate between a correctly folded and misfolded protein. Conversely, analysis of residue variation across a family of aligned sequences in terms of substitution profiles can allow prediction of secondary structure or tertiary environment.  相似文献   

18.
To facilitate swift structural characterizations, structural genomic/proteomic projects need to divide large multi-domain proteins into structural domains and to determine their structures separately. Thus, the assignment of structural domains based solely on sequence information, especially on the physico-chemical properties of the amino acid sequences, could be very helpful for such projects. In this study, we examined the characteristics of domain linker sequences, which are loop sequences connecting two structural domains. To this end, we prepared a set of 101 non-redundant multi-domain protein sequences with known structures, and performed an analysis of the linker sequences. The analysis revealed that the frequencies of five (Pro, Gly, Asp, Asn, Lys) amino acid residues differed significantly between the linker and non-linker loop sequences. Moreover, we observed a similar deviation for the residue pair frequencies between the two types of loop sequences. Finally, we describe an automated method, based on the above analysis, to detect loops that have high probabilities of being domain linkers in a protein sequence.  相似文献   

19.
New equations are derived to estimate the number of amino acid substitutions per site between two homologous proteins from the root mean square (RMS) deviation between two spatial structures and from the fraction of identical residues between two sequences. The equations are based on evolutionary models, analyzing predominantly structural changes and not sequence changes. Evolution of spatial structure is treated as a diffusion in an elastic force field. Diffusion accounts for structural changes caused by amino acid substitutions, and elastic force reflects selection, which preserves protein fold. Obtained equations are supported by analysis of protein spatial structures. Received: 21 September 1995 / Accepted: 19 May 1997  相似文献   

20.
Structure-based prediction of DNA target sites by regulatory proteins   总被引:15,自引:0,他引:15  
Kono H  Sarai A 《Proteins》1999,35(1):114-131
Regulatory proteins play a critical role in controlling complex spatial and temporal patterns of gene expression in higher organism, by recognizing multiple DNA sequences and regulating multiple target genes. Increasing amounts of structural data on the protein-DNA complex provides clues for the mechanism of target recognition by regulatory proteins. The analyses of the propensities of base-amino acid interactions observed in those structural data show that there is no one-to-one correspondence in the interaction, but clear preferences exist. On the other hand, the analysis of spatial distribution of amino acids around bases shows that even those amino acids with strong base preference such as Arg with G are distributed in a wide space around bases. Thus, amino acids with many different geometries can form a similar type of interaction with bases. The redundancy and structural flexibility in the interaction suggest that there are no simple rules in the sequence recognition, and its prediction is not straightforward. However, the spatial distributions of amino acids around bases indicate a possibility that the structural data can be used to derive empirical interaction potentials between amino acids and bases. Such information extracted from structural databases has been successfully used to predict amino acid sequences that fold into particular protein structures. We surmised that the structures of protein-DNA complexes could be used to predict DNA target sites for regulatory proteins, because determining DNA sequences that bind to a particular protein structure should be similar to finding amino acid sequences that fold into a particular structure. Here we demonstrate that the structural data can be used to predict DNA target sequences for regulatory proteins. Pairwise potentials that determine the interaction between bases and amino acids were empirically derived from the structural data. These potentials were then used to examine the compatibility between DNA sequences and the protein-DNA complex structure in a combinatorial "threading" procedure. We applied this strategy to the structures of protein-DNA complexes to predict DNA binding sites recognized by regulatory proteins. To test the applicability of this method in target-site prediction, we examined the effects of cognate and noncognate binding, cooperative binding, and DNA deformation on the binding specificity, and predicted binding sites in real promoters and compared with experimental data. These results show that target binding sites for several regulatory proteins are successfully predicted, and our data suggest that this method can serve as a powerful tool for predicting multiple target sites and target genes for regulatory proteins.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号