首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
To study local structures in proteins, we previously developed an autoassociative artificial neural network (autoANN) and clustering tool to discover intrinsic features of macromolecular structures. The hidden unit activations computed by the trained autoANN are a convenient low-dimensional encoding of the local protein backbone structure. Clustering these activation vectors results in a unique classification of protein local structural features called Structural Building Blocks (SBBs). Here we describe application of this method to a larger database of proteins, verification of the applicability of this method to structure classification, and subsequent analysis of amino acid frequencies and several commonly occurring patterns of SBBs. The SBB classification method has several interesting properties: 1) it identifies the regular secondary structures, α helix and β strand; 2) it consistently identifies other local structure features (e.g., helix caps and strand caps); 3) strong amino acid preferences are revealed at some positions in some SBBs; and 4) distinct patterns of SBBs occur in the “random coil” regions of proteins. Analysis of these patterns identifies interesting structural motifs in the protein backbone structure, indicating that SBBs can be used as “building blocks” in the analysis of protein structure. This type of pattern analysis should increase our understanding of the relationship between protein sequence and local structure, especially in the prediction of protein structures. © 1997 Wiley-Liss, Inc.  相似文献   

2.
MOTIVATION: Characterization of a protein family by its distinct sequence domains is crucial for functional annotation and correct classification of newly discovered proteins. Conventional Multiple Sequence Alignment (MSA) based methods find difficulties when faced with heterogeneous groups of proteins. However, even many families of proteins that do share a common domain contain instances of several other domains, without any common underlying linear ordering. Ignoring this modularity may lead to poor or even false classification results. An automated method that can analyze a group of proteins into the sequence domains it contains is therefore highly desirable. RESULTS: We apply a novel method to the problem of protein domain detection. The method takes as input an unaligned group of protein sequences. It segments them and clusters the segments into groups sharing the same underlying statistics. A Variable Memory Markov (VMM) model is built using a Prediction Suffix Tree (PST) data structure for each group of segments. Refinement is achieved by letting the PSTs compete over the segments, and a deterministic annealing framework infers the number of underlying PST models while avoiding many inferior solutions. We show that regions of similar statistics correlate well with protein sequence domains, by matching a unique signature to each domain. This is done in a fully automated manner, and does not require or attempt an MSA. Several representative cases are analyzed. We identify a protein fusion event, refine an HMM superfamily classification into the underlying families the HMM cannot separate, and detect all 12 instances of a short domain in a group of 396 sequences. CONTACT: jill@cs.huji.ac.il; tishby@cs.huji.ac.il.  相似文献   

3.
Sawada Y  Honda S 《Biophysical journal》2006,91(4):1213-1223
The local structures of protein segments were classified and their distribution was analyzed to explore the structural diversity of proteins. Representative proteins were divided into short segments using a sliding L-residue window. Each set of local structures consisting of consecutive 1-31 amino acids was classified using a single-pass clustering method. The results demonstrate that the local structures of proteins are very unevenly distributed in the protein universe. The distribution of local structures of relatively long segments shows a power-law behavior that is formulated well by Zipf's law, implying that a protein structure possesses recursive and fractal characteristics. The degree of effective conformational freedom per residue as well as the structure entropy per residue decreases gradually with an increasing value of L and then converges to constant values. This suggests that the number of protein conformations resides within the range between 1.2L and 1.5L and that 10- to 20-residue segments are already proteinlike in terms of their structural diversity.  相似文献   

4.
The construction of a realistic theoretical model of proteins is determinant for improving the computational simulations of their structural and functional aspects. Modeling proteins as a network of non-covalent connections between the atoms of amino acid residues has shown valuable insights into these macromolecules. The energy-related properties of protein structures are known to be very important in molecular dynamics. However, these same properties have been neglected when the protein structures are modeled as networks of atoms and amino acid residues. A new approach for the construction of protein models based on a network of atoms is presented. This method, based on interatomic interaction, takes into account the energy and geometric aspects of the protein structures that were not employed before, such as atomic occlusion inside the protein, the use of solvation, protein modeling and analysis, and the use of energy potentials to estimate the energies of interatomic non-covalent contacts. As a result, we achieved a more realistic network model of proteins. This model has the virtue of being more robust in face of different unknown variables that usually are arbitrarily estimated. We were able to determine the most connected residues of all the proteins studied, so that we are now in a better condition to study their structural role.  相似文献   

5.
G protein-coupled receptors (GPCRs) are ubiquitous and essential in modulating virtually all physiological processes. These receptors share a similar structural design consisting of the seven-transmembrane alpha-helical segments. The active conformations of the receptors are stabilized by an agonist and couple to structurally highly conserved heterotrimeric G proteins. One of the most important unanswered questions is how GPCRs couple to their cognate G proteins. Phototransduction represents an excellent model system for understanding G protein signaling, owing to the high expression of rhodopsin in rod photoreceptors and the multidisciplinary experimental approaches used to study this GPCR. Here, we describe how a G protein (transducin) docks on to an oligomeric GPCR (rhodopsin), revealing structural details of this critical interface in the signal transduction process. This conceptual model takes into account recent structural information on the receptor and G protein, as well as oligomeric states of GPCRs.  相似文献   

6.
Discovery of local packing motifs in protein structures   总被引:1,自引:0,他引:1  
We present a language for describing structural patterns of residues in protein structures and a method for the discovery of such patterns that recur in a set of protein structures. The patterns impose restrictions on the spatial position of each residue, their order along the amino acid chain, and which amino acids are allowed in each position. Unlike other methods for comparing sets of protein structures, our method is not based on the use of pairwise structure comparisons which is often time consuming and can produce inconsistent results. Instead, the method simultaneously takes into account information from all structures in the search for conserved structure patterns which are potential structure motifs. The method is based on describing the spatial neighborhoods of each residue in each structure as a string and applying a sequence pattern discovery method to find patterns common to subsets of these strings. Finally it is checked whether the similarities between the neighborhood strings correspond to spatially similar substructures. We apply the method to analyze sets of very disparate proteins from the four different protein families: serine proteases, cuprodoxins, cysteine proteinases, and ferredoxins. The motifs found by the method correspond well to the site and motif information given in the annotation of these proteins in PDB, Swiss-Prot, and PROSITE. Furthermore, the motifs are confirmed by using the motif data to constrain the structural alignment of the proteins obtained with the program SAP. This gave the best superposition/alignment of the proteins given the motif assignment.  相似文献   

7.
A specific treatment of recurrent structural motifs that represent the local bias information has been proven to be an important ingredient in de novo protein structure predication. Significant majority of methods for local structure are based on building blocks, which still suffer from its inherent discrete nature. Instead of using building blocks, this work presents a new protocol framework for local structural motifs prediction based on the direct locating along protein sequence and probabilistic sampling in a continuous (φ, ψ) space. The protein sequence was first scanned by an algorithm of sliding window with variable length of 7 to 19 residues, to match local segments to one of 82 motifs patterns in the fragment library. Identified segments were then labeled and modeled as the correlations of backbone torsion angles with mixture of bivariate cosine distributions in continuous (φ, ψ) space. 3D conformations of corresponding segments were finally sampled by using a backtrack algorithm to the hidden Markov model with single output of (φ, ψ). For local motifs in 50 proteins of testing set, about 62% of eight-residue segments located with high confidence value were predicted within 1.5 ? of their native structures by the method. Majority of local structural motifs were identified and sampled, which indicates the proposed protocol may at least serve as the foundation to obtain better protein tertiary structure prediction.  相似文献   

8.
Inherent difficulties in growing protein crystals are major concerns within structural biology and particularly in structural proteomics. Here, we describe a novel approach of engineering target proteins by surface mutagenesis to increase the odds of crystallizing the molecules. To this end, we have exploited our recent triad-hypothesis using proteins with crystallographically defined beta-structures as the principal models. Crystal packing analyses of 182 protein structures belonging to 21 different superfamilies implied that the propensities to crystallize could be engineered into target proteins by replacing short segments, 5-6 residues, of their beta-strands with 'cassettes' of suitable packing motifs. These packing motifs will generate specific crystal packing interactions that promote crystallization. Key features of the primary and tertiary structures of such packing motifs have been identified for immunoglobulins. Further, packing motifs have been engineered successfully into six model antibodies without disturbing their capabilities to be produced, their immunoreactivity and their overall structure. Preliminary crystallization analyses have also been performed. Taken together, the procedures outline a rational protocol for crystallizing proteins by surface mutagenesis. The importance of these findings is discussed in relation to the crystallization of proteins in general.  相似文献   

9.
Here our goal is to carry out nanotube design using naturally occurring protein building blocks. Inspection of the protein structural database reveals the richness of the conformations of proteins, their parts, and their chemistry. Given target functional protein nanotube geometry, our strategy involves scanning a library of candidate building blocks, combinatorially assembling them into the shape and testing its stability. Since self-assembly takes place on time scales not affordable for computations, here we propose a strategy for the very first step in protein nanotube design: we map the candidate building blocks onto a planar sheet and wrap the sheet around a cylinder with the target dimensions. We provide examples of three nanotubes, two peptide and one protein, in atomistic model detail for which there are experimental data. The nanotube models can be used to verify a nanostructure observed by low-resolution experiments, and to study the mechanism of tube formation.  相似文献   

10.
A hybrid system (hidden neural network) based on a hidden Markov model (HMM) and neural networks (NN) was trained to predict the bonding states of cysteines in proteins starting from the residue chains. Training was performed using 4136 cysteine-containing segments extracted from 969 non-homologous proteins of well-resolved 3D structure and without chain-breaks. After a 20-fold cross-validation procedure, the efficiency of the prediction scores as high as 80% using neural networks based on evolutionary information. When the whole protein is taken into account by means of an HMM, a hybrid system is generated, whose emission probabilities are computed using the NN output (hidden neural networks). In this case, the predictor accuracy increases up to 88%. Further, when tested on a protein basis, the hybrid system can correctly predict 84% of the chains in the data set, with a gain of at least 27% over the NN predictor.  相似文献   

11.
Prediction of membrane segments in sequences of membrane proteins is well known and important problem. Accuracy of the solution of this problem by methods that don't use homology search in additional data bank can be improved. There is a lack of testing data in this area because of small amount of real structures of membrane proteins. In this work, we create a testing set of structural alignments of membrane proteins, in which positioning of the membrane segments reflects agreement of known 3D-structures of proteins in the alignment. We propose a method for predicting position of membrane segments in multiple alignment based on forward-backward algorithm from HMM theory. This method not only allows to predict positions of membrane segments but also forms probability membrane profile, which can be used in multiple alignment methods that take into account secondary structure information about sequences. Method is implemented in computer program available on the World-Wide Web site http://bioinf.fbb.msu.ru/fwdbck/. Proposed method provides results better than MEMSAT method, which is nearly only tool for prediction of membrane segments in multiple alignments without additional homology search.  相似文献   

12.
We introduce a method for identifying elements of a protein structure that can be shuffled to make chimeric proteins from two or more homologous parents. Formulating recombination as a graph‐partitioning problem allows us to identify noncontiguous segments of the sequence that should be inherited together in the progeny proteins. We demonstrate this noncontiguous recombination approach by constructing a chimera of β‐glucosidases from two different kingdoms of life. Although the protein's alpha–beta barrel fold has no obvious subdomains for recombination, noncontiguous SCHEMA recombination generated a functional chimera that takes approximately half its structure from each parent. The X‐ray crystal structure shows that the structural blocks that make up the chimera maintain the backbone conformations found in their respective parental structures. Although the chimera has lower β‐glucosidase activity than the parent enzymes, the activity was easily recovered by directed evolution. This simple method, which does not rely on detailed atomic models, can be used to design chimeras that take structural, and functional, elements from distantly‐related proteins.  相似文献   

13.
Prediction of transmembrane (TM) segments of amino acid sequences of membrane proteins is a well-known and very important problem. The accuracy of its solution can be improved for approaches that do not use a homology search in an additional data bank. There is a lack of tested data in this area of research, because information on the structure of membrane proteins is scarce. In this work we created a test sample of structural alignments for membrane proteins. The TM segments of these proteins were mapped according to aligned 3D structures resolved for these proteins. A method for predicting TM segments in an alignment was developed on the basis of the forward-backward algorithm from the HMM theory. This method allows a user not only to predict TM segments, but also to create a probabilistic membrane profile, which can be employed in multiple alignment procedures taking the secondary structure of proteins into account. The method was implemented in a computer program available at http://bioinf.fbb.msu.ru/fwdbck/. It provides better results than the MEMSAT method, which is nearly the only tool predicting TM segments in multiple alignments, without a homology search.  相似文献   

14.
Cooperative unfolding penalties are calculated by statistically evaluating an ensemble of denatured states derived from native structures. The ensemble of denatured states is determined by dividing the native protein into short contiguous segments and defining all possible combinations of native, i.e., interacting, and non-native, i.e., non-interacting, segments. We use a novel knowledge-based scoring function, derived from a set of non-homologous proteins in the Protein Data Bank, to describe the interactions among residues. This procedure is used for the structural identification of cooperative folding cores for four globular proteins: bovine pancreatic trypsin inhibitor, horse heart cytochrome c, French bean plastocyanin, and staphylococcal nuclease. The theoretical folding units are shown to correspond to regions that exhibit enhanced stability against denaturation as determined from experimental hydrogen exchange protection factors. Using a sequence similarity score for related sequences, we show that, in addition to residues necessary for enzymatic function, those amino acids comprising structurally important folding cores are also preferentially conserved during evolution. This implies that the identified folding cores may be part of an array of fundamental structural folding units.  相似文献   

15.
Structural genomic projects envision almost routine protein structure determinations, which are currently imaginable only for small proteins with molecular weights below 25,000 Da. For larger proteins, structural insight can be obtained by breaking them into small segments of amino acid sequences that can fold into native structures, even when isolated from the rest of the protein. Such segments are autonomously folding units (AFU) and have sizes suitable for fast structural analyses. Here, we propose to expand an intuitive procedure often employed for identifying biologically important domains to an automatic method for detecting putative folded protein fragments. The procedure is based on the recognition that large proteins can be regarded as a combination of independent domains conserved among diverse organisms. We thus have developed a program that reorganizes the output of BLAST searches and detects regions with a large number of similar sequences. To automate the detection process, it is reduced to a simple geometrical problem of recognizing rectangular shaped elevations in a graph that plots the number of similar sequences at each residue of a query sequence. We used our program to quantitatively corroborate the premise that segments with conserved sequences correspond to domains that fold into native structures. We applied our program to a test data set composed of 99 amino acid sequences containing 150 segments with structures listed in the Protein Data Bank, and thus known to fold into native structures. Overall, the fragments identified by our program have an almost 50% probability of forming a native structure, and comparable results are observed with sequences containing domain linkers classified in SCOP. Furthermore, we verified that our program identifies AFU in libraries from various organisms, and we found a significant number of AFU candidates for structural analysis, covering an estimated 5 to 20% of the genomic databases. Altogether, these results argue that methods based on sequence similarity can be useful for dissecting large proteins into small autonomously folding domains, and such methods may provide an efficient support to structural genomics projects.  相似文献   

16.
RNA motifs can be defined broadly as recurrent structural elements containing multiple intramolecular RNA-RNA interactions, as observed in atomic-resolution RNA structures. They constitute the modular building blocks of RNA architecture, which is organized hierarchically. Recent work has focused on analyzing RNA backbone conformations to identify, define and search for new instances of recurrent motifs in X-ray structures. One current view asserts that recurrent RNA strand segments with characteristic backbone configurations qualify as independent motifs. Other considerations indicate that, to characterize modular motifs, one must take into account the larger structural context of such strand segments. This follows the biologically relevant motivation, which is to identify RNA structural characteristics that are subject to sequence constraints and that thus relate RNA architectures to sequences.  相似文献   

17.
Protein families and RNA recognition   总被引:1,自引:0,他引:1  
Chen Y  Varani G 《The FEBS journal》2005,272(9):2088-2097
This minireview series examines the structural principles underlying the biological function of RNA-binding proteins. The structural work of the last decade has elucidated the structures of essentially all the major RNA-binding protein families; it has also demonstrated how RNA recognition takes place. The ribosome structures have further integrated this knowledge into principles for the assembly of complex ribonucleoproteins. Structural and biochemical work has revealed unexpectedly that several RNA-binding proteins bind to other proteins in addition to RNA or instead of RNA. This tremendous increase in the structural knowledge has expanded not only our understanding of the RNA recognition principle, but has also provided new insight into the biological function of these proteins and has helped to design better experiments to understand their biological roles.  相似文献   

18.
Currently there is increasing interest in nanostructures and their design. Nanostructure design involves the ability to predictably manipulate the properties of the self-assembly of autonomous units. Autonomous units have preferred conformational states. The units can be synthetic material science-based or derived from functional biological macromolecules. Autonomous biological building blocks with available structures provide an extremely rich and useful resource for design. For proteins, the structural databases contain large libraries of protein molecules and their building blocks with a range of shapes, surfaces, and chemical properties. The introduction of engineered synthetic residues or short peptides into these can expand the available chemical space and enhance the desired properties. Here we focus on the principles of nanostructure design with protein building blocks.  相似文献   

19.
Experimental conditions or the presence of interacting components can lead to variations in the structural models of macromolecules. However, the role of these factors in conformational selection is often omitted by in silico methods to extract dynamic information from protein structural models. Structures of small peptides, considered building blocks for larger macromolecular structural models, can substantially differ in the context of a larger protein. This limitation is more evident in the case of modeling large multi-subunit macromolecular complexes using structures of the individual protein components. Here we report an analysis of variations in structural models of proteins with high sequence similarity. These models were analyzed for sequence features of the protein, the role of scaffolding segments including interacting proteins or affinity tags and the chemical components in the experimental conditions. Conformational features in these structural models could be rationalized by conformational selection events, perhaps induced by experimental conditions. This analysis was performed on a non-redundant dataset of protein structures from different SCOP classes. The sequence-conformation correlations that we note here suggest additional features that could be incorporated by in silico methods to extract dynamic information from protein structural models.  相似文献   

20.
A model for topological coding of proteins is proposed. The model is based on the capacity of hydrogen bonds (property of connectivity) to fix conformations of protein molecules. The protein chain is modeled by an n -arc graph with the following elements: vertices (alpha -carbon atoms), structural edges (peptide bonds) and connectivity edges (virtual edges connecting non-adjacent atoms). It was shown that 64 conformations of the 4-arc graph can be described in the binary system by matrices of six variables which form a supermatrix containing four blocks. On the basis of correspondences between the pairs of variables in matrices and four letters of the genetic code matrices and supermatrix are converted, respectively, into the triplets and the table of the genetic code. An algorithm admitting computer programming is proposed for coding the n -arc graph and protein chain. Connectivity operators (polar amino acids) are assigned to blocks of triplets coding for cyclic conformations (G, A-in the second position), while anti-connectivity operators (non-polar amino acids) correspond to blocks of triplets coding for open conformations (C, U-in the second position). Amino acids coded by triplets differing by the first base have different structures. The third base for C, U and G, A is degenerated. Properties of the real genetic code are in full agreement with the model. The model provides an insight into the topological nature of the genetic code and can be used for development of algorithms for the prediction of the protein structure.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号