首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
An increasing number of structural studies of large macromolecular complexes, both in X-ray crystallography and cryo-electron microscopy, have resulted in intermediate-resolution (5-10 A) density maps. Despite being limited in resolution, significant structural and functional information may be extractable from these maps. To aid in the analysis and annotation of these complexes, we have developed SSEhunter, a tool for the quantitative detection of alpha helices and beta sheets. Based on density skeletonization, local geometry calculations, and a template-based search, SSEhunter has been tested and validated on a variety of simulated and authentic subnanometer-resolution density maps. The result is a robust, user-friendly approach that allows users to quickly visualize, assess, and annotate intermediate-resolution density maps. Beyond secondary structure element identification, the skeletonization algorithm in SSEhunter provides secondary structure topology, which is potentially useful in leading to structural models of individual molecular components directly from the density.  相似文献   

2.
Prediction of topological representations of proteins that are geometrically invariants can contribute towards the solution of fundamental open problems in structural genomics like folding. In this paper we focus on coarse grained protein contact maps, a representation that describes the spatial neighborhood relation between secondary structure elements such as helices, beta sheets, and random coils. Our methodology is based on searching the graph space. The search algorithm is guided by an adaptive evaluation function computed by a specialized noncausal recursive connectionist architecture. The neural network is trained using candidate graphs generated during examples of successful searches. Our results demonstrate the viability of the approach for predicting coarse contact maps.  相似文献   

3.
The distance geometry approach for computing the tertiary structure of globular proteins emphasized in this series of papers (Goelet al., J. theor. Biol. 99, 705–757, 1982) is developed further. This development includes incorporation of some secondary structure information—the location of alpha helices in the primary sequence—in the algorithm to compute the tertiary structure of alpha helical globular proteins. An algorithm is developed which estimates the interresidue distances between chain-proximate helices. These distances, in conjunction with the global statistical average distances obtainable from a database of real proteins and determined by the primary sequence of the protein under study, are used to determine the tertiary structure. Five proteins, parvalbumin, hemerythrin, human hemoglobin, lamprey hemoglobin, and sperm whale myoglobin, are investigated. The root mean square (RMS) errors between the calculated structures and those determined by X-ray diffraction range from 4.78 to 7.56 Å. These RMSs are 0.21–2.76 Å lower than those estimated without the secondary structure information. Contact maps and three-dimensional backbone representations also show considerable improvements with the introduction of secondary structure information.  相似文献   

4.
Electron density maps of membrane proteins or large macromolecular complexes are frequently only determined at medium resolution between 4?? and 10??, either by cryo-electron microscopy or X-ray crystallography. In these density maps, the general arrangement of secondary structure elements (SSEs) is revealed, whereas their directionality and connectivity remain elusive. We demonstrate that the topology of proteins with up to 250 amino acids can be determined from such density maps when combined with a computational protein folding protocol. Furthermore, we accurately reconstruct atomic detail in loop regions and amino acid side chains not visible in the experimental data. The EM-Fold algorithm assembles the SSEs de novo before atomic detail is added using Rosetta. In a benchmark of 27 proteins, the protocol consistently and reproducibly achieves models with root mean square deviation values <3??.  相似文献   

5.
In this paper, we develop a segmental semi-Markov model (SSMM) for protein secondary structure prediction which incorporates multiple sequence alignment profiles with the purpose of improving the predictive performance. The segmental model is a generalization of the hidden Markov model where a hidden state generates segments of various length and secondary structure type. A novel parameterized model is proposed for the likelihood function that explicitly represents multiple sequence alignment profiles to capture the segmental conformation. Numerical results on benchmark data sets show that incorporating the profiles results in substantial improvements and the generalization performance is promising. By incorporating the information from long range interactions in /spl beta/-sheets, this model is also capable of carrying out inference on contact maps. This is an important advantage of probabilistic generative models over the traditional discriminative approach to protein secondary structure prediction. The Web server of our algorithm and supplementary materials are available at http://public.kgi.edu/-wild/bsm.html.  相似文献   

6.
Electron cryo-microscopy is a fast advancing biophysical technique to derive three-dimensional structures of large protein complexes. Using this technique, many density maps have been generated at intermediate resolution such as 6-10 ? resolution. Although it is challenging to derive the backbone of the protein directly from such density maps, secondary structure elements such as helices and β-sheets can be computationally detected. Our work in this paper provides an approach to enumerate the top-ranked possible topologies instead of enumerating the entire population of the topologies. This approach is particularly practical for large proteins. We developed a directed weighted graph, the topology graph, to represent the secondary structure assignment problem. We prove that the problem of finding the valid topology with the minimum cost is NP hard. We developed an O(N(2)2(N)) dynamic programming algorithm to identify the topology with the minimum cost. The test of 15 proteins suggests that our dynamic programming approach is feasible to work with proteins of much larger size than we could before. The largest protein in the test contains 18 helical sticks detected from the density map out of 33 helices in the protein.  相似文献   

7.
We present computational solutions to two problemsof macromolecular structure interpretation from reconstructedthree-dimensional electron microscopy (3D-EM) maps of largebio-molecular complexes at intermediate resolution (5A-15A). Thetwo problems addressed are: (a) 3D structural alignment (matching)between identified and segmented 3D maps of structure units(e.g. trimeric configuration of proteins), and (b) the secondarystructure identification of a segmented protein 3D map (i.e.locations of a-helices, b -sheets). For problem (a), we presentan efficient algorithm to correlate spatially (and structurally)two 3D maps of structure units. Besides providing a similarityscore between structure units, the algorithm yields an effectivetechnique for resolution refinement of repeated structure units,by 3D alignment and averaging. For problem (b), we present anefficient algorithm to compute eigenvalues and link eigenvectorsof a Gaussian convoluted structure tensor derived from theprotein 3D Map, thereby identifying and locating secondarystructural motifs of proteins. The efficiency and performanceof our approach is demonstrated on several experimentallyreconstructed 3D maps of virus capsid shells from single-particlecryo-EM, as well as computationally simulated protein structuredensity 3D maps generated from protein model entries in theProtein Data Bank.  相似文献   

8.
We develop and test machine learning methods for the prediction of coarse 3D protein structures, where a protein is represented by a set of rigid rods associated with its secondary structure elements (alpha-helices and beta-strands). First, we employ cascades of recursive neural networks derived from graphical models to predict the relative placements of segments. These are represented as discretized distance and angle maps, and the discretization levels are statistically inferred from a large and curated dataset. Coarse 3D folds of proteins are then assembled starting from topological information predicted in the first stage. Reconstruction is carried out by minimizing a cost function taking the form of a purely geometrical potential. We show that the proposed architecture outperforms simpler alternatives and can accurately predict binary and multiclass coarse maps. The reconstruction procedure proves to be fast and often leads to topologically correct coarse structures that could be exploited as a starting point for various protein modeling strategies. The fully integrated rod-shaped protein builder (predictor of contact maps + reconstruction algorithm) can be accessed at http://distill.ucd.ie/.  相似文献   

9.
One of the main barriers to accurate computational protein structure prediction is searching the vast space of protein conformations. Distance restraints or inter‐residue contacts have been used to reduce this search space, easing the discovery of the correct folded state. It has been suggested that about 1 contact for every 12 residues may be sufficient to predict structure at fold level accuracy. Here, we use coarse‐grained structure‐based models in conjunction with molecular dynamics simulations to examine this empirical prediction. We generate sparse contact maps for 15 proteins of varying sequence lengths and topologies and find that given perfect secondary‐structural information, a small fraction of the native contact map (5%‐10%) suffices to fold proteins to their correct native states. We also find that different sparse maps are not equivalent and we make several observations about the type of maps that are successful at such structure prediction. Long range contacts are found to encode more information than shorter range ones, especially for α and αβ‐proteins. However, this distinction reduces for β‐proteins. Choosing contacts that are a consensus from successful maps gives predictive sparse maps as does choosing contacts that are well spread out over the protein structure. Additionally, the folding of proteins can also be used to choose predictive sparse maps. Overall, we conclude that structure‐based models can be used to understand the efficacy of structure‐prediction restraints and could, in future, be tuned to include specific force‐field interactions, secondary structure errors and noise in the sparse maps.  相似文献   

10.
Single-stranded RNA from the bacteriophage MS2 was cleaved into two unequal fragments using the Escherichia coli endonuclease RNase IV. The fragments were purified by sucrose gradient centrifugation and secondary structure maps of the purified fragments were prepared after spreading the RNAs in 0·5 mmMgCl2. Comparison of these maps with those of native RNA permitted the identification of the 5′ and 3′ ends of the maps of native single-stranded RNA. In addition, the location of the cleavage site with respect to the secondary and tertiary structure of the RNA suggests that the conformation of the RNA around this site may be important in determining the specificity of cleavage by the enzyme.The approximate location of individual viral genes within the secondary structure map has been obtained by comparing the map of native RNA with known sequence data. A new model is proposed to explain the role of secondary structure, as seen in the electron microscope, in the regulation of the synthesis of coat protein and the viral subunit of the MS2 replicase.  相似文献   

11.
Si D  Ji S  Nasr KA  He J 《Biopolymers》2012,97(9):698-708
The accuracy of the secondary structure element (SSE) identification from volumetric protein density maps is critical for de-novo backbone structure derivation in electron cryo-microscopy (cryoEM). It is still challenging to detect the SSE automatically and accurately from the density maps at medium resolutions (~5-10 ?). We present a machine learning approach, SSELearner, to automatically identify helices and β-sheets by using the knowledge from existing volumetric maps in the Electron Microscopy Data Bank. We tested our approach using 10 simulated density maps. The averaged specificity and sensitivity for the helix detection are 94.9% and 95.8%, respectively, and those for the β-sheet detection are 86.7% and 96.4%, respectively. We have developed a secondary structure annotator, SSID, to predict the helices and β-strands from the backbone Cα trace. With the help of SSID, we tested our SSELearner using 13 experimentally derived cryo-EM density maps. The machine learning approach shows the specificity and sensitivity of 91.8% and 74.5%, respectively, for the helix detection and 85.2% and 86.5% respectively for the β-sheet detection in cryoEM maps of Electron Microscopy Data Bank. The reduced detection accuracy reveals the challenges in SSE detection when the cryoEM maps are used instead of the simulated maps. Our results suggest that it is effective to use one cryoEM map for learning to detect the SSE in another cryoEM map of similar quality.  相似文献   

12.
MOTIVATION: Protein structure comparison is a fundamental problem in structural biology and bioinformatics. Two-dimensional maps of distances between residues in the structure contain sufficient information to restore the 3D representation, while maps of contacts reveal characteristic patterns of interactions between secondary and super-secondary structures and are very attractive for visual analysis. The overlap of 2D maps of two structures can be easily calculated, providing a sensitive measure of protein structure similarity. PROTMAP2D is a software tool for calculation of contact and distance maps based on user-defined criteria, quantitative comparison of pairs or series of contact maps (e.g. alternative models of the same protein, model versus native structure, different trajectories from molecular dynamics simulations, etc.) and visualization of the results. AVAILABILITY: PROTMAP2D for Windows / Linux / MacOSX is freely available for academic users from http://genesilico.pl/protmap2d.htm  相似文献   

13.

Background

The analysis of correlation in alignments generates a matrix of predicted contacts between positions in the structure and while these can arise for many reasons, the simplest explanation is that the pair of residues are in contact in a three-dimensional structure and are affecting each others selection pressure. To analyse these data, A dynamic programming algorithm was developed for parsing secondary structure interactions in predicted contact maps.

Results

The non-local nature of the constraints required an iterated approach (using a “frozen approximation”) but with good starting definitions, a single pass was usually sufficient. The method was shown to be effective when applied to the transmembrane class of protein and error tolerant even when the signal becomes degraded. In the globular class of protein, where the extent of interactions are more limited and more complex, the algorithm still behaved well, classifying most of the important interactions correctly in both a small and a large test case. For the larger protein, this involved examples of the algorithm apportioning parts of a single large secondary structure element between two different interactions.

Conclusions

It is expected that the method will be useful as a pre-processor to coarse-grained modelling methods to extend the range of protein tertiary structure prediction to larger proteins or to data that is currently too ’noisy’ to be used by current residue-based methods.
  相似文献   

14.
Prediction of contact maps with neural networks and correlated mutations.   总被引:1,自引:0,他引:1  
Contact maps of proteins are predicted with neural network-based methods, using as input codings of increasing complexity including evolutionary information, sequence conservation, correlated mutations and predicted secondary structures. Neural networks are trained on a data set comprising the contact maps of 173 non-homologous proteins as computed from their well resolved three-dimensional structures. Proteins are selected from the Protein Data Bank database provided that they align with at least 15 similar sequences in the corresponding families. The predictors are trained to learn the association rules between the covalent structure of each protein and its contact map with a standard back propagation algorithm and tested on the same protein set with a cross-validation procedure. Our results indicate that the method can assign protein contacts with an average accuracy of 0.21 and with an improvement over a random predictor of a factor >6, which is higher than that previously obtained with methods only based either on neural networks or on correlated mutations. Furthermore, filtering the network outputs with a procedure based on the residue coordination numbers, the accuracy of predictions increases up to 0.25 for all the proteins, with an 8-fold deviation from a random predictor. These scores are the highest reported so far for predicting protein contact maps.  相似文献   

15.
An algorithm for modeling the evolution of the regulatory signals involving the interaction with RNA secondary structure is proposed. The algorithm implies that the species phylogenetic tree is known and is based on the assumption that the considered signals have a conserved secondary structure. The input data are the extant primary structure of a signal for all leaves of the phylogenetic tree; the algorithm computes the signal primary and secondary structures at all the nodes. Concurrently, the algorithm constructs a multiple alignment of the extant (in leaves) sites of a regulatory signal taking into account its secondary structure. The results of successful testing of the algorithm for three main types of attenuation regulation in bacteria—classic attenuation (threonine and leucine biosyntheses in Gammaproteobacteria), T-box (in Actinobacteria), and RFN-mediated (in Eubacteria) regulations—are described.  相似文献   

16.
Accurate prediction of pseudoknotted nucleic acid secondary structure is an important computational challenge. Prediction algorithms based on dynamic programming aim to find a structure with minimum free energy according to some thermodynamic ("sum of loop energies") model that is implicit in the recurrences of the algorithm. However, a clear definition of what exactly are the loops in pseudoknotted structures, and their associated energies, has been lacking. In this work, we present a complete classification of loops in pseudoknotted nucleic secondary structures, and describe the Rivas and Eddy and other energy models as sum-of-loops energy models. We give a linear time algorithm for parsing a pseudoknotted secondary structure into its component loops. We give two applications of our parsing algorithm. The first is a linear time algorithm to calculate the free energy of a pseudoknotted secondary structure. This is useful for heuristic prediction algorithms, which are widely used since (pseudoknotted) RNA secondary structure prediction is NP-hard. The second application is a linear time algorithm to test the generality of the dynamic programming algorithm of Akutsu for secondary structure prediction.Together with previous work, we use this algorithm to compare the generality of state-of-the-art algorithms on real biological structures.  相似文献   

17.
18.
EM-Fold was used to build models for nine proteins in the maps of GroEL (7.7 ? resolution) and ribosome (6.4 ? resolution) in the ab initio modeling category of the 2010 cryo-electron microscopy modeling challenge. EM-Fold assembles predicted secondary structure elements (SSEs) into regions of the density map that were identified to correspond to either α-helices or β-strands. The assembly uses a Monte Carlo algorithm where loop closure, density-SSE length agreement, and strength of connecting density between SSEs are evaluated. Top-scoring models are refined by translating, rotating, and bending SSEs to yield better agreement with the density map. EM-Fold produces models that contain backbone atoms within SSEs only. The RMSD values of the models with respect to native range from 2.4 to 3.5 ? for six of the nine proteins. EM-Fold failed to predict the correct topology in three cases. Subsequently, Rosetta was used to build loops and side chains for the very best scoring models after EM-Fold refinement. The refinement within Rosetta's force field is driven by a density agreement score that calculates a cross-correlation between a density map simulated from the model and the experimental density map. All-atom RMSDs as low as 3.4 ? are achieved in favorable cases. Values above 10.0 ? are observed for two proteins with low overall content of secondary structure and hence particularly complex loop modeling problems. RMSDs over residues in secondary structure elements range from 2.5 to 4.8 ?.  相似文献   

19.
Cryo-electron microscopy (cryo-EM) enables the imaging of macromolecular complexes in near-native environments at resolutions that often permit the visualization of secondary structure elements. For example, alpha helices frequently show consistent patterns in volumetric maps, exhibiting rod-like structures of high density. Here, we introduce VolTrac (Volume Tracer) – a novel technique for the annotation of alpha-helical density in cryo-EM data sets. VolTrac combines a genetic algorithm and a bidirectional expansion with a tabu search strategy to trace helical regions. Our method takes advantage of the stochastic search by using a genetic algorithm to identify optimal placements for a short cylindrical template, avoiding exploration of already characterized tabu regions. These placements are then utilized as starting positions for the adaptive bidirectional expansion that characterizes the curvature and length of the helical region. The method reliably predicted helices with seven or more residues in experimental and simulated maps at intermediate (4–10 Å) resolution. The observed success rates, ranging from 70.6% to 100%, depended on the map resolution and validation parameters. For successful predictions, the helical axes were located within 2 Å from known helical axes of atomic structures.  相似文献   

20.
1IntroductionThe three-dimensional(3D)structure of a proteinis perhaps the most important of all its features,since itdetermines completely how the protein functions andinteracts with other molecules.Most biological mech-anisms at the protein level are based on shape-complementarity,so that proteins present particularconcavities and convexities that allow them to bind toeach other and formcomplexstructures,and tendon.Forthis reason,for instance,the drug design problem con-sists primarily in th…  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号