共查询到20条相似文献,搜索用时 15 毫秒
1.
Engineered,highly reactive substrates of microbial transglutaminase enable protein labeling within various secondary structure elements 下载免费PDF全文
Natalie M. Rachel Daniela Quaglia Éric Lévesque André B. Charette Joelle N. Pelletier 《Protein science : a publication of the Protein Society》2017,26(11):2268-2279
Microbial transglutaminase (MTG) is a practical tool to enzymatically form isopeptide bonds between peptide or protein substrates. This natural approach to crosslinking the side‐chains of reactive glutamine and lysine residues is solidly rooted in food and textile processing. More recently, MTG's tolerance for various primary amines in lieu of lysine have revealed its potential for site‐specific protein labeling with aminated compounds, including fluorophores. Importantly, MTG can label glutamines at accessible positions in the body of a target protein, setting it apart from most labeling enzymes that react exclusively at protein termini. To expand its applicability as a labeling tool, we engineered the B1 domain of Protein G (GB1) to probe the selectivity and enhance the reactivity of MTG toward its glutamine substrate. We built a GB1 library where each variant contained a single glutamine at positions covering all secondary structure elements. The most reactive and selective variants displayed a >100‐fold increase in incorporation of a recently developed aminated benzo[a]imidazo[2,1,5‐cd]indolizine‐type fluorophore, relative to native GB1. None of the variants were destabilized. Our results demonstrate that MTG can react readily with glutamines in α‐helical, β‐sheet, and unstructured loop elements and does not favor one type of secondary structure. Introducing point mutations within MTG's active site further increased reactivity toward the most reactive substrate variant, I6Q‐GB1, enhancing MTG's capacity to fluorescently label an engineered, highly reactive glutamine substrate. This work demonstrates that MTG‐reactive glutamines can be readily introduced into a protein domain for fluorescent labeling. 相似文献
2.
Constraint-based assembly of tertiary protein structures from secondary structure elements 下载免费PDF全文
A challenge in computational protein folding is to assemble secondary structure elements-helices and strands-into well-packed tertiary structures. Particularly difficult is the formation of beta-sheets from strands, because they involve large conformational searches at the same time as precise packing and hydrogen bonding. Here we describe a method, called Geocore-2, that (1) grows chains one monomer or secondary structure at a time, then (2) disconnects the loops and performs a fast rigid-body docking step to achieve canonical packings, then (3) in the case of intrasheet strand packing, adjusts the side-chain rotamers; and finally (4) reattaches loops. Computational efficiency is enhanced by using a branch-and-bound search in which pruning rules aim to achieve a hydrophobic core and satisfactory hydrogen bonding patterns. We show that the pruning rules reduce computational time by 10(3)- to 10(5)-fold, and that this strategy is computationally practical at least for molecules up to about 100 amino acids long. 相似文献
3.
Furnham N Sillitoe I Holliday GL Cuff AL Laskowski RA Orengo CA Thornton JM 《PLoS computational biology》2012,8(3):e1002403
In order to understand the evolution of enzyme reactions and to gain an overview of biological catalysis we have combined sequence and structural data to generate phylogenetic trees in an analysis of 276 structurally defined enzyme superfamilies, and used these to study how enzyme functions have evolved. We describe in detail the analysis of two superfamilies to illustrate different paradigms of enzyme evolution. Gathering together data from all the superfamilies supports and develops the observation that they have all evolved to act on a diverse set of substrates, whilst the evolution of new chemistry is much less common. Despite that, by bringing together so much data, we can provide a comprehensive overview of the most common and rare types of changes in function. Our analysis demonstrates on a larger scale than previously studied, that modifications in overall chemistry still occur, with all possible changes at the primary level of the Enzyme Commission (E.C.) classification observed to a greater or lesser extent. The phylogenetic trees map out the evolutionary route taken within a superfamily, as well as all the possible changes within a superfamily. This has been used to generate a matrix of observed exchanges from one enzyme function to another, revealing the scale and nature of enzyme evolution and that some types of exchanges between and within E.C. classes are more prevalent than others. Surprisingly a large proportion (71%) of all known enzyme functions are performed by this relatively small set of 276 superfamilies. This reinforces the hypothesis that relatively few ancient enzymatic domain superfamilies were progenitors for most of the chemistry required for life. 相似文献
4.
GeMMA (Genome Modelling and Model Annotation) is a new approach to automatic functional subfamily classification within families and superfamilies of protein sequences. A major advantage of GeMMA is its ability to subclassify very large and diverse superfamilies with tens of thousands of members, without the need for an initial multiple sequence alignment. Its performance is shown to be comparable to the established high-performance method SCI-PHY. GeMMA follows an agglomerative clustering protocol that uses existing software for sensitive and accurate multiple sequence alignment and profile–profile comparison. The produced subfamilies are shown to be equivalent in quality whether whole protein sequences are used or just the sequences of component predicted structural domains. A faster, heuristic version of GeMMA that also uses distributed computing is shown to maintain the performance levels of the original implementation. The use of GeMMA to increase the functional annotation coverage of functionally diverse Pfam families is demonstrated. It is further shown how GeMMA clusters can help to predict the impact of experimentally determining a protein domain structure on comparative protein modelling coverage, in the context of structural genomics. 相似文献
5.
6.
MOTIVATION: The prediction of protein domains is a crucial task for functional classification, homology-based structure prediction and structural genomics. In this paper, we present the SSEP-Domain protein domain prediction approach, which is based on the application of secondary structure element alignment (SSEA) and profile-profile alignment (PPA) in combination with InterPro pattern searches. SSEA allows rapid screening for potential domain regions while PPA provides us with the necessary specificity for selecting significant hits. The combination with InterPro patterns allows finding domain regions without solved structural templates if sequence family definitions exist. RESULTS: A preliminary version of SSEP-Domain was ranked among the top-performing domain prediction servers in the CASP 6 and CAFASP 4 experiments. Evaluation of the final version shows further improvement over these results together with a significant speed-up. AVAILABILITY: The server is available at http://www.bio.ifi.lmu.de/SSEP/ 相似文献
7.
Background
In the process of protein evolution, sequence variations within protein families can cause changes in protein structures and functions. However, structures tend to be more conserved than sequences and functions. This leads to an intriguing question: what is the evolutionary mechanism by which sequence variations produce structural changes? To investigate this question, we focused on the most common types of sequence variations: amino acid substitutions and insertions/deletions (indels). Here their combined effects on protein structure evolution within protein families are studied.Results
Sequence-structure correlation analysis on 75 homologous structure families (from SCOP) that contain 20 or more non-redundant structures shows that in most of these families there is, statistically, a bilinear correlation between the amount of substitutions and indels versus the degree of structure variations. Bilinear regression of percent sequence non-identity (PNI) and standardized number of gaps (SNG) versus RMSD was performed. The coefficients from the regression analysis could be used to estimate the structure changes caused by each unit of substitution (structural substitution sensitivity, SSS) and by each unit of indel (structural indel sensitivity, SIDS). An analysis on 52 families with high bilinear fitting multiple correlation coefficients and statistically significant regression coefficients showed that SSS is mainly constrained by disulfide bonds, which almost have no effects on SIDS.Conclusions
Structural changes in homologous protein families could be rationally explained by a bilinear model combining amino acid substitutions and indels. These results may further improve our understanding of the evolutionary mechanisms of protein structures. 相似文献8.
《Journal of molecular graphics》1994,12(2):146-152
A method of graduating (i.e., least-squares fitting) a smooth polynomial curve through long elements of protein secondary structure is described. It uses the Chebyshev polynomials of a discrete (integer) variable with several restraints to prevent artifactual curvatures. A new recursion formula is given which allows the evaluation of the polynomials on rational-number points as well as on the integer points. High-order splines suitable for interpolation between integer points are also discussed. The new method finds applications in graphics and in structural analysis. 相似文献
9.
In this work, we discovered a fundamental connection between selection for protein stability and emergence of preferred structures of proteins. Using a standard exact three-dimensional lattice model we evolve sequences starting from random ones and determine the exact native structure after each mutation. Acceptance of mutations is biased to select for stable proteins. We found that certain structures, "wonderfolds", are independently discovered numerous times as native states of stable proteins in many unrelated runs of selection. The strong dependence of lattice fold usage on the structural determinant of designability quantitatively reproduces uneven fold usage in natural proteins. Diversity of sequences that fold into wonderfold structures gives rise to superfamilies, i.e. sets of dissimilar sequences that fold into the same or very similar structures. The present work establishes a model of pre-biotic structure selection, which identifies dominant structural patterns emerging upon optimization of proteins for survival in a hot environment. Convergently discovered pre-biotic initial superfamilies with wonderfold structures could have served as a seed for subsequent biological evolution involving gene duplications and divergence. 相似文献
10.
Eighteen subclasses of S-adenosyl-l-methionine (AdoMet) radical proteins have been aligned in the first bioinformatics study of the AdoMet radical superfamily to utilize crystallographic information. The recently resolved X-ray structure of biotin synthase (BioB) was used to guide the multiple sequence alignment, and the recently resolved X-ray structure of coproporphyrinogen III oxidase (HemN) was used as the control. Despite the low 9% sequence identity between BioB and HemN, the multiple sequence alignment correctly predicted all but one of the core helices in HemN, and correctly predicted the residues in the enzyme active site. This alignment further suggests that the AdoMet radical proteins may have evolved from half-barrel structures (alphabeta)4 to three-quarter-barrel structures (alphabeta)6 to full-barrel structures (alphabeta)8. It predicts that anaerobic ribonucleotide reductase (RNR) activase, an ancient enzyme that, it has been suggested, serves as a link between the RNA and DNA worlds, will have a half-barrel structure, whereas the three-quarter barrel, exemplified by HemN, will be the most common architecture for AdoMet radical enzymes, and fewer members of the superfamily will join BioB in using a complete (alphabeta)8 TIM-barrel fold to perform radical chemistry. These differences in barrel architecture also explain how AdoMet radical enzymes can act on substrates that range in size from 10 atoms to 608 residue proteins. 相似文献
11.
Electrostatic interactions play a key role in enzyme catalytic function. At long range, electrostatics steer the incoming ligand/substrate to the active site, and at short distances, electrostatics provide the specific local interactions for catalysis. In cases in which electrostatics determine enzyme function, orthologs should share the electrostatic properties to maintain function. Often, electrostatic potential maps are employed to depict how conserved surface electrostatics preserve function. We expand on previous efforts to explain conservation of function, using novel electrostatic sequence and structure analyses of four enzyme families and one enzyme superfamily. We show that the spatial charge distribution is conserved within each family and superfamily. Conversely, phylogenetic analysis of key electrostatic residues provide the evolutionary origins of functionality. 相似文献
12.
We describe a web server, which provides easy access to the SLoop database of loop conformations connecting elements of protein secondary structure. The loops are classified according to their length, the type of bounding secondary structures and the conformation of the mainchain. The current release of the database consists of over 8000 loops of up to 20 residues in length. A loop prediction method, which selects conformers on the basis of the sequence and the positions of the elements of secondary structure, is also implemented. These web pages are freely accessible over the internet at http://www-cryst.bioc.cam.ac.uk/ approximately sloop. 相似文献
13.
MOTIVATION: Protein structure comparison (PSC) has been used widely in studies of structural and functional genomics. However, PSC is computationally expensive and as a result almost all of the PSC methods currently in use look only for the optimal alignment and ignore many alternative alignments that are statistically significant and that may provide insight into protein evolution or folding. RESULTS: We have developed a new PSC method with efficiency to detect potentially viable alternative alignments in all-against-all database comparisons. The efficiency of the new PSC method derives from the ability to directly home in on a limited number of viable and ranked alignment solutions based on intuitively derived SSE (secondary structure element)-matching probabilities. 相似文献
14.
M O Dayhoff 《Federation proceedings》1976,35(10):2132-2138
The organization of proteins into superfamilies based primarily on their sequences is introduced: examples are given of the methods used to cluster the related sequences and to elucidate the evolutionary history of the corresponding genes within each superfamily. Within the framework of this organization, the amount of sequence information currently and potentially available in all living forms can be discussed. The 116 superfamilies already sampled reflect possibly 10% of the total number. There are related proteins from many species in all of these superfamilies, suggesting that the origin of a new superfamily is rare indeed. The proteins so far sequenced are so rigorously conserved by the evolutionary process that we would expect to recognize as related descendants of any protein found in the ancestral vertebrate. The evolutionary history of the thyrotropin-gonadotropin beta chain superfamily is discussed in detail as an example. Some proteins are so constrained in structure that related forms can be recognized in prokaryotes and eukaryotes. Evolution in these superfamilies can be traced back close to the origin of life itself. From the evolutionary tree of the c-type cytochromes the identity of the prokaryote types involved in the symbiotic origin of mitochondria and chloroplasts begins to emerge. 相似文献
15.
Evolution of protein superfamilies and bacterial genome size 总被引:1,自引:0,他引:1
We present the structural annotation of 56 different bacterial species based on the assignment of genes to 816 evolutionary superfamilies in the CATH domain structure database. These assignments have enabled us to analyse the recurrence of specific superfamilies within and across the genomes. We have selected the superfamilies that have a very broad representation and therefore appear to be universally distributed in a significant number of bacterial lineages. Occurrence profiles of these universally distributed superfamilies are compared with genome size in order to estimate the correlation between superfamily duplication and the increase in proteome size. This distinguishes between those size-dependent superfamilies where frequency of occurrence is highly correlated with increase in genome size, and size-independent superfamilies where no correlation is observed. Consideration of the size correlation and the ratio between the mean and the standard deviations for all the superfamily profiles allows more detailed subdivisions and classification of superfamilies. For example, within the size-independent superfamilies, we distinguished a group that are distributed evenly amongst all the genomes. Within the size-dependent superfamilies we differentiated two groups: linearly distributed and non-linearly distributed. Functional annotation using the COG database was performed for all superfamilies in each of these groups, and this revealed significant differences amongst the three sets of superfamilies. Evenly distributed, size-independent domains are shown to be involved primarily in protein translation and biosynthesis. For the size-dependent superfamilies, linearly distributed superfamilies are involved mainly in metabolism, and non-linearly distributed superfamily domains are involved principally in gene regulation. 相似文献
16.
Combining protein evolution and secondary structure 总被引:10,自引:9,他引:10
An evolutionary model that combines protein secondary structure and amino
acid replacement is introduced. It allows likelihood analysis of aligned
protein sequences and does not require the underlying secondary (or
tertiary) structures of these sequences to be known. One component of the
model describes the organization of secondary structure along a protein
sequence and another specifies the evolutionary process for each category
of secondary structure. A database of proteins with known secondary
structures is used to estimate model parameters representing these two
components. Phylogeny, the third component of the model, can be estimated
from the data set of interest. As an example, we employ our model to
analyze a set of sucrose synthase sequences. For the evolution of sucrose
synthase, a parametric bootstrap approach indicates that our model is
statistically preferable to one that ignores secondary structure.
相似文献
17.
Zhenxing Feng Xiuzhen Hu Zhuo Jiang Hangyu Song Muhammad Aqeel Ashraf 《Saudi Journal of Biological Sciences》2016,23(2):189-197
The recognition of protein folds is an important step in the prediction of protein structure and function. Recently, an increasing number of researchers have sought to improve the methods for protein fold recognition. Following the construction of a dataset consisting of 27 protein fold classes by Ding and Dubchak in 2001, prediction algorithms, parameters and the construction of new datasets have improved for the prediction of protein folds. In this study, we reorganized a dataset consisting of 76-fold classes constructed by Liu et al. and used the values of the increment of diversity, average chemical shifts of secondary structure elements and secondary structure motifs as feature parameters in the recognition of multi-class protein folds. With the combined feature vector as the input parameter for the Random Forests algorithm and ensemble classification strategy, we propose a novel method to identify the 76 protein fold classes. The overall accuracy of the test dataset using an independent test was 66.69%; when the training and test sets were combined, with 5-fold cross-validation, the overall accuracy was 73.43%. This method was further used to predict the test dataset and the corresponding structural classification of the first 27-protein fold class dataset, resulting in overall accuracies of 79.66% and 93.40%, respectively. Moreover, when the training set and test sets were combined, the accuracy using 5-fold cross-validation was 81.21%. Additionally, this approach resulted in improved prediction results using the 27-protein fold class dataset constructed by Ding and Dubchak. 相似文献
18.
Analysis of nucleotide substitutions and amino acid conservation in theDrosophila Adh genomic region
The homologous genomic region that contains two paralogous genes,Adh andAdh-dup, was compared in severalDrosophila species. Sequences were analyzed as follows: a) At the nucleotide level, Ka and Ks values were determined for each pair of species. Ka-Adh and Ka-Adh-dup are not significantly different. However, Ks-Adh values are significantly lower than Ks-Adh-dup, which are more variable. In agreement with other reports, lower Ks values forAdh correlate with a high level of gene expression and relatively high percentage of G+C content in the third codon position, while the opposite applies toAdh-dup. b) At the protein level, amino acid comparisons reveal conserved regions shared by ADH and ADH-DUP, which have been assigned to known functional domains. Key residues for dehydrogenasic function are also found in ADH-DUP, thus pointing to a dehydrogenase activity for ADH-DUP, albeit very different from that of ADH. 相似文献
19.
We have developed an automatic algorithm STRIDE for protein secondary structure assignment from atomic coordinates based on the combined use of hydrogen bond energy and statistically derived backbone torsional angle information. Parameters of the pattern recognition procedure were optimized using designations provided by the crystallographers as a standard-of-truth. Comparison to the currently most widely used technique DSSP by Kabsch and Sander (Biopolymers 22:2577-2637, 1983) shows that STRIDE and DSSP assign secondary structural states in 58 and 31% of 226 protein chains in our data sample, respectively, in greater agreement with the specific residue-by-residue definitions provided by the discoverers of the structures while in 11% of the chains, the assignments are the same. STRIDE delineates every 11th helix and every 32nd strand more in accord with published assignments. © 1995 Wiley-Liss, Inc. 相似文献
20.
Mathia Colwell Melissa Drown Kelly Showel Chelsea Drown Amanda Palowski 《Epigenetics》2018,13(1):49-60
Ultraconserved noncoding elements (UCNEs) constitute less than 1 Mb of vertebrate genomes and are impervious to accumulating mutations. About 4000 UCNEs exist in vertebrate genomes, each at least 200 nucleotides in length, sharing greater than 95% sequence identity between human and chicken. Despite extreme sequence conservation over 400 million years of vertebrate evolution, we show both ordered interspecies and within-species interindividual variation in DNA methylation in these regions. Here, we surveyed UCNEs with high CpG density in 56 species finding half to be intermediately methylated and the remaining near 0% or 100%. Intermediately methylated UCNEs displayed a greater range of methylation between mouse tissues. In a human population, most UCNEs showed greater variation than the LINE1 transposon, a frequently used epigenetic biomarker. Global methylation was found to be inversely correlated to hydroxymethylation across 60 vertebrates. Within UCNEs, DNA methylation is flexible, conserved between related species, and relaxed from the underlying sequence selection pressure, while remaining heritable through speciation. 相似文献