首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
The construction of fitness landscape has broad implication in understanding molecular evolution, cellular epigenetic state, and protein structures. We studied the problem of constructing fitness landscape of inverse protein folding or protein design, with the aim to generate amino acid sequences that would fold into an a priori determined structural fold which would enable engineering novel or enhanced biochemistry. For this task, an effective fitness function should allow identification of correct sequences that would fold into the desired structure. In this study, we showed that nonlinear fitness function for protein design can be constructed using a rectangular kernel with a basis set of proteins and decoys chosen a priori. The full landscape for a large number of protein folds can be captured using only 480 native proteins and 3,200 non-protein decoys via a finite Newton method. A blind test of a simplified version of fitness function for sequence design was carried out to discriminate simultaneously 428 native sequences not homologous to any training proteins from 11 million challenging protein-like decoys. This simplified function correctly classified 408 native sequences (20 misclassifications, 95% correct rate), which outperforms several other statistical linear scoring function and optimized linear function. Our results further suggested that for the task of global sequence design of 428 selected proteins, the search space of protein shape and sequence can be effectively parametrized with just about 3,680 carefully chosen basis set of proteins and decoys, and we showed in addition that the overall landscape is not overly sensitive to the specific choice of this set. Our results can be generalized to construct other types of fitness landscape.  相似文献   

2.
Diverse peptide sequences recognizing the lambda boxB RNA hairpin were previously isolated from a library encoding the 22-residue lambda N peptide with random amino acids at positions 13-22 using mRNA display. We have statistically analyzed amino acid distributions in 65 unique sequences from rounds 11 and 12 of this selection and evaluated the resulting structural and functional predictions by alanine-scanning mutagenesis and circular dichroism spectrometry. This artificial sequence family has a consensus structure that continues the bent alpha helix of lambda N up to position 17 when bound to lambda boxB. A charge pair (E(14)R(15)) and hydrophobic patch (A(21)L(22) or V(21)L(22)) have important functional roles in this context. Notably, amino acid covariance reveals six specific pairs of random region positions with >95% significant linkage and strong overall helical (i+1, i+3, and i+4) couplings. The covariance analysis suggests that (1) the sequence context of every residue in each insert has been optimized, (2) selected sequences are local optima on a rugged fitness landscape, and (3) it is possible to detect more subtle structural features with artificial protein sequence families than natural homologs. Our results provide a framework for investigating the structures of in vitro selected proteins by functional minimization, reselection, and covariance analysis.  相似文献   

3.
Protein function classification via support vector machine approach   总被引:2,自引:0,他引:2  
Support vector machine (SVM) is introduced as a method for the classification of proteins into functionally distinguished classes. Studies are conducted on a number of protein classes including RNA-binding proteins; protein homodimers, proteins responsible for drug absorption, proteins involved in drug distribution and excretion, and drug metabolizing enzymes. Testing accuracy for the classification of these protein classes is found to be in the range of 84-96%. This suggests the usefulness of SVM in the classification of protein functional classes and its potential application in protein function prediction.  相似文献   

4.
In plant genomes, the function of a substantial percentage of the putative protein-coding open reading frames (ORFs) is unknown. These ORFs have no significant sequence similarity to known proteins, which complicates the task of functional study of these proteins. Efforts are being made to explore methods that are complementary to, or may be used in combination with, sequence alignment and clustering methods. A web-based protein functional class prediction software, SVMProt, has shown some capability for predicting functional class of distantly related proteins. Here the usefulness of SVMProt for functional study of novel plant proteins is evaluated. To test SVMProt, 49 plant proteins (without a sequence homolog in the Swiss-Prot protein database, not in the SVMProt training set, and with functional indications provided in the literature) were selected from a comprehensive search of MEDLINE abstracts and Swiss-Prot databases in 1999-2004. These represent unique proteins the function of which, at present, cannot be confidently predicted by sequence alignment and clustering methods. The predicted functional class of 31 proteins was consistent, and that of four other proteins was weakly consistent, with published functions. Overall, the functional class of 71.4% of these proteins was consistent, or weakly consistent, with functional indications described in the literature. SVMProt shows a certain level of ability to provide useful hints about the functions of novel plant proteins with no similarity to known proteins.  相似文献   

5.
T Palzkill  D Botstein 《Proteins》1992,14(1):29-44
A new analytical mutagenesis technique is described that involves randomizing the DNA sequence of a short stretch of a gene (3-6 codons) and determining the percentage of all possible random sequences that produce a functional protein. A low percentage of functional random sequences in a complete library of random substitutions indicates that the region mutagenized is important for the structure and/or function of the protein. Repeating the mutagenesis over many regions throughout a protein gives a global perspective of which amino acid sequences in a protein are critical. We applied this method to 66 codons of the gene encoding TEM-1 beta-lactamase in 19 separate experiments. We found that TEM-1 beta-lactamase is extremely tolerant of amino acid substitutions: on average, 44% of all mutants with random substitutions function and 20% of the substitutions are expressed, secreted, and fold well enough to function at levels similar to those for the wild-type enzyme. We also found a few exceptional regions where only a few random sequences function. Examination of the X-ray structures of homologous beta-lactamases indicates that the regions most sensitive to substitution are in the vicinity of the active site pocket or buried in the hydrophobic core of the protein. DNA sequence analysis of functional random sequences has been used to obtain more detailed information about the amino acid sequence requirements for several regions and this information has been compared to sequence conservation among several related beta-lactamases.  相似文献   

6.
In multi‐domain proteins, the domains typically run end‐to‐end, that is, one domain follows the C‐terminus of another domain. However, approximately 10% of multi‐domain proteins are formed by insertion of one domain sequence into that of another domain. Detecting such insertions within protein sequences is a fundamental challenge in structural biology. The haloacid dehalogenase superfamily (HADSF) serves as a challenging model system wherein a variable cap domain (~5–200 residues in length) accessorizes the ubiquitous Rossmann‐fold core domain, with variations in insertion site and topology corresponding to different classes of cap types. Herein, we describe a comprehensive computational strategy, CapPredictor, for determining large, variable domain insertions in protein sequences. Using a novel sequence‐alignment algorithm in conjunction with a structure‐guided sequence profile from 154 core‐domain‐only structures, more than 40,000 HADSF member sequences were assigned cap types. The resulting data set afforded insight into HADSF evolution. Notably, a similar distribution of cap‐type classes across different phyla was observed, indicating that all cap types existed in the last universal common ancestor. In addition, comparative analyses of the predicted cap‐type and functional assignments showed that different cap types carry out similar chemistries. Thus, while cap domains play a role in substrate recognition and chemical reactivity, cap‐type does not strictly define functional class. Through this example, we have shown that CapPredictor is an effective new tool for the study of form and function in protein families where domain insertion occurs. Proteins 2014; 82:1896–1906. © 2014 Wiley Periodicals, Inc.  相似文献   

7.
Practical limits of function prediction   总被引:15,自引:0,他引:15  
Devos D  Valencia A 《Proteins》2000,41(1):98-107
  相似文献   

8.
Although protein prenylation is widely studied, there are few good methods for isolating prenylated proteins from their nonprenylated relatives. We report that crosslinked agarose (e.g., Sepharose) chromatography medium that has been chemically functionalized with β-cyclodextrin (β-CD) is extremely effective in affinity chromatography of prenylated proteins. In this study, a variety of proteins with C-terminal prenylation target (“CAAX box”) sequences were enzymatically prenylated in vitro with natural and nonnatural prenyl diphosphate substrates. The prenylated protein products could then be isolated from starting materials by gravity chromatography or fast protein liquid chromatography (FPLC) on a β-CD-Sepharose column. One particular prenylation reaction, farnesylation of an mCherry-CAAX fusion construct, was studied in detail. In this case, purified farnesylated product was unambiguously identified by electrospray mass spectrometry. In addition, when mCherry-CAAX was prenylated with a nonnatural, functional isoprenoid substrate, the functional group was maintained by chromatography on β-CD-Sepharose, such that the resulting protein could be selectively bound at its C terminus to complementary functionality on a solid substrate. Finally, β-CD-Sepharose FPLC was used to isolate prenylated mCherry-CAAX from crude HeLa cell lysate as a model for purifying prenylated proteins from cell extracts. We propose that this method could be generally useful to the community of researchers studying protein prenylation.  相似文献   

9.
Kinch LN  Grishin NV 《Proteins》2002,48(1):75-84
Nitrogen regulatory (PII) proteins are signal transduction molecules involved in controlling nitrogen metabolism in prokaryots. PII proteins integrate the signals of intracellular nitrogen and carbon status into the control of enzymes involved in nitrogen assimilation. Using elaborate sequence similarity detection schemes, we show that five clusters of orthologs (COGs) and several small divergent protein groups belong to the PII superfamily and predict their structure to be a (betaalphabeta)(2) ferredoxin-like fold. Proteins from the newly emerged PII superfamily are present in all major phylogenetic lineages. The PII homologs are quite diverse, with below random (as low as 1%) pairwise sequence identities between some members of distant groups. Despite this sequence diversity, evidence suggests that the different subfamilies retain the PII trimeric structure important for ligand-binding site formation and maintain a conservation of conservations at residue positions important for PII function. Because most of the orthologous groups within the PII superfamily are composed entirely of hypothetical proteins, our remote homology-based structure prediction provides the only information about them. Analogous to structural genomics efforts, such prediction gives clues to the biological roles of these proteins and allows us to hypothesize about locations of functional sites on model structures or rationalize about available experimental information. For instance, conserved residues in one of the families map in close proximity to each other on PII structure, allowing for a possible metal-binding site in the proteins coded by the locus known to affect sensitivity to divalent metal ions. Presented analysis pushes the limits of sequence similarity searches and exemplifies one of the extreme cases of reliable sequence-based structure prediction. In conjunction with structural genomics efforts to shed light on protein function, our strategies make it possible to detect homology between highly diverse sequences and are aimed at understanding the most remote evolutionary connections in the protein world.  相似文献   

10.
11.
Naturally occurring proteins comprise a special subset of all plausible sequences and structures selected through evolution. Simulating protein evolution with simplified and all-atom models has shed light on the evolutionary dynamics of protein populations, the nature of evolved sequences and structures, and the extent to which today's proteins are shaped by selection pressures on folding, structure and function. Extensive mapping of the native structure, stability and folding rate in sequence space using lattice proteins has revealed organizational principles of the sequence/structure map important for evolutionary dynamics. Evolutionary simulations with lattice proteins have highlighted the importance of fitness landscapes, evolutionary mechanisms, population dynamics and sequence space entropy in shaping the generic properties of proteins. Finally, evolutionary-like simulations with all-atom models, in particular computational protein design, have helped identify the dominant selection pressures on naturally occurring protein sequences and structures.  相似文献   

12.
Oliveira L  Paiva PB  Paiva AC  Vriend G 《Proteins》2003,52(4):544-552
We introduce sequence entropy-variability plots as a method of analyzing families of protein sequences, and demonstrate this for three well-known sequence families: globins, ras-like proteins, and serine-proteases. The location of an aligned residue position in the entropy-variability plot correlates with structural characteristics, and with known facts about the roles of individual amino acids in the function of these proteins. The large numbers of known sequences in these families allowed us to introduce new filtering methods for variability patterns. The results are discussed in terms of a simple evolutionary model for functional proteins.  相似文献   

13.
A new computer program (CORE) is described that predicts core hydrophobic sequences of predetermined target protein structures. A novel scoring function is employed, which for the first time incorporates parameters directly correlated to free energies of unfolding (deltaGu), melting temperatures (Tm), and cooperativity. Metropolis-driven simulated annealing and low-temperature Monte Carlo sampling are used to optimize this score, generating sequences predicted to yield uniquely folded, stable proteins with cooperative unfolding transitions. The hydrophobic core residues of four natural proteins were predicted using CORE with the backbone structure and solvent exposed residues as input. In the two smaller proteins tested (Gbeta1, 11 core amino acids; 434 cro, 10 core amino acids), the native sequence was regenerated as well as the sequence of known thermally stable variants that exhibit cooperative denaturation transitions. Previously designed sequences of variants with lower thermal stability and weaker cooperativity were not predicted. In the two larger proteins tested (myoglobin, 32 core amino acids; methionine aminopeptidase, 63 core amino acids), sequences with corresponding side-chain conformations remarkably similar to that of native were predicted.  相似文献   

14.
Mammalian odorant binding proteins   总被引:13,自引:0,他引:13  
Odorant binding proteins (OBPs) pertain to one of the most abundant classes of proteins found in the olfactory apparatus. OBPs are a sub-class of lipocalins, defined by their property of reversibly binding volatile chemicals, that we call 'odorants'. Numerous sequences of OBPs are now available, derived from protein sequencing from nasal mucus material, or from DNA sequences. The structural knowledge of OBPs has been improved too in recent years, with the availability of two X-ray structures. The physiological role of OBPs remains, however, essentially hypothetical, and most probably, not linked to a function of odor transport. The present knowledge on OBP biochemistry, sequence and structure will be examined here in relation to the different functional hypotheses proposed for OBPs.  相似文献   

15.
Protein sequences can be represented as binary patterns of polar (○) and nonpolar (?) amino acids. These binary sequence patterns are categorized into two classes: Class A patterns match the structural repeat of an idealized amphiphilic α-helix (3.6 residues per turn), and class B patterns match the structural repeat of an idealized amphiphilic β-strand (2 residues per turn). The difference between these two classes of sequence patterns has led to a strategy for de novo protein design based on binary patterning of polar and nonpolar amino acids. Here we ask whether similar binary patterning is incorporated in the sequences and structures of natural proteins. Analysis of the Protein Data Bank demonstrates the following. (1) Class A sequence patterns occur considerably more frequently in the sequences of natural proteins than would be expected at random, but class B patterns occur less often than expected. (2) Each pattern is found predominantly in the secondary structure expected from the binary strategy for protein design. Thus, class A patterns are found more frequently in α-helices than in β-strands, and class B patterns are found more frequently in β-strands than in α-helices. (3) Among the α-helices of natural proteins, the most commonly used binary patterns are indeed the class A patterns. (4) Among all β-strands in the database, the most commonly used binary patterns are not the expected class B patterns. (5) However, for solvent-exposed β-strands, the correlation is striking: All β-strands in the database that contain the class B patterns are exposed to solvent. (6) The bias of class A patterns for α-structure over β-structure and the bias of class B patterns for β-structure over α-structure are significant, not merely when compared to other binary patterns of polar (○) and nonpolar (?) amino acids, but also when compared to the full range of sequences in the database. The implications for the design of novel proteins are discussed.  相似文献   

16.
It is commonly believed that similarities between the sequences of two proteins infer similarities between their structures. Sequence alignments reliably recognize pairs of protein of similar structures provided that the percentage sequence identity between their two sequences is sufficiently high. This distinction, however, is statistically less reliable when the percentage sequence identity is lower than 30% and little is known then about the detailed relationship between the two measures of similarity. Here, we investigate the inverse correlation between structural similarity and sequence similarity on 12 protein structure families. We define the structure similarity between two proteins as the cRMS distance between their structures. The sequence similarity for a pair of proteins is measured as the mean distance between the sequences in the subsets of sequence space compatible with their structures. We obtain an approximation of the sequence space compatible with a protein by designing a collection of protein sequences both stable and specific to the structure of that protein. Using these measures of sequence and structure similarities, we find that structural changes within a protein family are linearly related to changes in sequence similarity.  相似文献   

17.
18.
Understanding the coupling specificity between G protein-coupled receptors (GPCRs) and specific classes of G proteins is important for further elucidation of receptor functions within a cell. Increasing information on GPCR sequences and the G protein family would facilitate prediction of the coupling properties of GPCRs. In this study, we describe a novel approach for predicting the coupling specificity between GPCRs and G proteins. This method uses not only GPCR sequences but also the functional knowledge generated by natural language processing, and can achieve 92.2% prediction accuracy by using the C4.5 algorithm. Furthermore, rules related to GPCR-G protein coupling are generated. The combination of sequence analysis and text mining improves the prediction accuracy for GPCR-G protein coupling specificity, and also provides clues for understanding GPCR signaling.  相似文献   

19.
Motivation. Protein design aims to identify sequences compatible with a given protein fold but incompatible to any alternative folds. To select the correct sequences and to guide the search process, a design scoring function is critically important. Such a scoring function should be able to characterize the global fitness landscape of many proteins simultaneously. RESULTS: To find optimal design scoring functions, we introduce two geometric views and propose a formulation using a mixture of non-linear Gaussian kernel functions. We aim to solve a simplified protein sequence design problem. Our goal is to distinguish each native sequence for a major portion of representative protein structures from a large number of alternative decoy sequences, each a fragment from proteins of different folds. Our scoring function discriminates perfectly a set of 440 native proteins from 14 million sequence decoys. We show that no linear scoring function can succeed in this task. In a blind test of unrelated proteins, our scoring function misclassfies only 13 native proteins out of 194. This compares favorably with about three-four times more misclassifications when optimal linear functions reported in the literature are used. We also discuss how to develop protein folding scoring function.  相似文献   

20.
A robust tool is proposed for the rapid at-line verification of the identity and integrity of (recombinant) proteins, namely the hyphenation of multidimensional chromatography and mass spectrometry (MS). A recombinant human antibody produced in Chinese hamster ovary cells is taken as pertinent example. The recombinant human antibody is first captured from the production environment by affinity chromatography (rProtein A, isolation/concentration of the target molecule) and automatically transferred to an enzyme reactor (immobilized trypsin column) for digestion, thereby yielding different peptides corresponding to the protein sequence. The peptides are then separated on a reversed-phase column before being analyzed and identified by MS. This step does not require a fine resolution since the mass spectrometer can identify a variety of substances at the same time. The results are then analyzed in silico with suitable bio-informatic tools. When the gene sequence of the protein product is known, proteolytic cleavages can be predicted and the exact mass and hence the amino acid sequence of each peptide can thereby be deduced. Fitting experimental data and reference peptide sequences then provides important information about the integrity of the protein and more particularly about its sequence. In our case, the integrity of 45% of the light and 75% of the heavy chain sequences of the antibody could be verified within minutes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号