首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
With the advance of experimental procedures obtaining chemical crosslinking information is becoming a fast and routine practice. Information on crosslinks can greatly enhance the accuracy of protein structure modeling. Here, we review the current state of the art in modeling protein structures with the assistance of experimentally determined chemical crosslinks within the framework of the 13th meeting of Critical Assessment of Structure Prediction approaches. This largest-to-date blind assessment reveals benefits of using data assistance in difficult to model protein structure prediction cases. However, in a broader context, it also suggests that with the unprecedented advance in accuracy to predict contacts in recent years, experimental crosslinks will be useful only if their specificity and accuracy further improved and they are better integrated into computational workflows.  相似文献   

2.
Capturing conformational changes in proteins or protein-protein complexes is a challenge for both experimentalists and computational biologists. Solution nuclear magnetic resonance (NMR) is unique in that it permits structural studies of proteins under greatly varying conditions, and thus allows us to monitor induced structural changes. Paramagnetic effects are increasingly used to study protein structures as they give ready access to rich structural information of orientation and long-range distance restraints from the NMR signals of backbone amides, and reliable methods have become available to tag proteins with paramagnetic metal ions site-specifically and at multiple sites. In this study, we show how sparse pseudocontact shift (PCS) data can be used to computationally model conformational states in a protein system, by first identifying core structural elements that are not affected by the environmental change, and then computationally completing the remaining structure based on experimental restraints from PCS. The approach is demonstrated on a 27 kDa two-domain NS2B-NS3 protease system of the dengue virus serotype 2, for which distinct closed and open conformational states have been observed in crystal structures. By changing the input PCS data, the observed conformational states in the dengue virus protease are reproduced without modifying the computational procedure. This data driven Rosetta protocol enables identification of conformational states of a protein system, which are otherwise difficult to obtain either experimentally or computationally.  相似文献   

3.
The current available data on protein sequences largely exceeds the experimental capabilities to annotate their function. So annotation in silico, i.e. using computational methods becomes increasingly important. This annotation is inevitably a prediction, but it can be an important starting point for further experimental studies. Here we present a method for prediction of protein functional sites, SDPsite, based on the identification of protein specificity determinants. Taking as an input a protein sequence alignment and a phylogenetic tree, the algorithm predicts conserved positions and specificity determinants, maps them onto the protein's 3D structure, and searches for clusters of the predicted positions. Comparison of the obtained predictions with experimental data and data on performance of several other methods for prediction of functional sites reveals that SDPsite agrees well with the experiment and outperforms most of the previously available methods. SDPsite is publicly available under http://bioinf.fbb.msu.ru/SDPsite.  相似文献   

4.
Inferring regulatory networks from experimental data via probabilistic graphical models is a popular framework to gain insights into biological systems. However, the inherent noise in experimental data coupled with a limited sample size reduces the performance of network reverse engineering. Prior knowledge from existing sources of biological information can address this low signal to noise problem by biasing the network inference towards biologically plausible network structures. Although integrating various sources of information is desirable, their heterogeneous nature makes this task challenging. We propose two computational methods to incorporate various information sources into a probabilistic consensus structure prior to be used in graphical model inference. Our first model, called Latent Factor Model (LFM), assumes a high degree of correlation among external information sources and reconstructs a hidden variable as a common source in a Bayesian manner. The second model, a Noisy-OR, picks up the strongest support for an interaction among information sources in a probabilistic fashion. Our extensive computational studies on KEGG signaling pathways as well as on gene expression data from breast cancer and yeast heat shock response reveal that both approaches can significantly enhance the reconstruction accuracy of Bayesian Networks compared to other competing methods as well as to the situation without any prior. Our framework allows for using diverse information sources, like pathway databases, GO terms and protein domain data, etc. and is flexible enough to integrate new sources, if available.  相似文献   

5.
The production of waste creates both direct and indirect environmental impacts. A range of strategies are available to reduce the generation of waste by industry and households, and to select waste treatment approaches that minimize environmental harm. However, evaluating these strategies requires reliable and detailed data on waste production and treatment. Unfortunately, published Australian waste data are typically highly aggregated, published by a variety of entities in different formats, and do not form a complete time‐series. We demonstrate a technique for constructing a multi‐regional waste supply‐use (MRWSU) framework for Australia using information from numerous waste data sources. This is the first MRWSU framework to be constructed (to the authors' knowledge) and the first sub‐national waste input‐output framework to be constructed for Australia. We construct the framework using the Industrial Ecology Virtual Laboratory (IELab), a cloud‐hosted computational platform for building Australian multi‐regional input‐output tables. The structure of the framework complies with the System of Environmental‐Economic Accounting (SEEA). We demonstrate the use of the MRWSU framework by calculating waste footprints that enumerate the full supply chain waste production for Australian consumers.  相似文献   

6.
7.
Manfred J. Sippl 《Proteins》1993,17(4):355-362
A major problem in the determination of the three-dimensional structure of proteins concerns the quality of the structural models obtained from the interpretation of experimental data. New developments in X-ray crystallography and nuclear magnetic resonance spectroscopy have acceleratedd the process of structure determination and the biological community is confronted with a steadily increasing number of experimentally determined protein folds. However, in the recent past several experimentally determined protein structures have been proven to contain major errors, indicating that in some cases the interpretation of experimental data is difficult and may yield incorrect models. Such problems can be avoided when computational methods are employed which complement experimental structure determinations. A prerequisite of such computational tools is that they are independent of the parameters obtained from a particular experiment. In addition such techniques are able to support and accelerate experimental structure determinations. Here we present techniques based on knowledge based mean fields which can be used to judge the quality of protein folds. The methods can be used to identify misfolded structures as well as faulty parts of structural models. The techniques are even applicable in cases where only the Cα trace of a protein conformation is available. The capabilities of the technique are demonstrated using correct and incorrect protein folds. © 1993 Wiley-Liss, Inc.  相似文献   

8.
Selected reaction monitoring (SRM) is a targeted mass spectrometry technique that provides sensitive and accurate protein detection and quantification in complex biological mixtures. Statistical and computational tools are essential for the design and analysis of SRM experiments, particularly in studies with large sample throughput. Currently, most such tools focus on the selection of optimized transitions and on processing signals from SRM assays. Little attention is devoted to protein significance analysis, which combines the quantitative measurements for a protein across isotopic labels, peptides, charge states, transitions, samples, and conditions, and detects proteins that change in abundance between conditions while controlling the false discovery rate. We propose a statistical modeling framework for protein significance analysis. It is based on linear mixed-effects models and is applicable to most experimental designs for both isotope label-based and label-free SRM workflows. We illustrate the utility of the framework in two studies: one with a group comparison experimental design and the other with a time course experimental design. We further verify the accuracy of the framework in two controlled data sets, one from the NCI-CPTAC reproducibility investigation and the other from an in-house spike-in study. The proposed framework is sensitive and specific, produces accurate results in broad experimental circumstances, and helps to optimally design future SRM experiments. The statistical framework is implemented in an open-source R-based software package SRMstats, and can be used by researchers with a limited statistics background as a stand-alone tool or in integration with the existing computational pipelines.  相似文献   

9.
A computational method for NMR-constrained protein threading.   总被引:2,自引:0,他引:2  
Protein threading provides an effective method for fold recognition and backbone structure prediction. But its application is currently limited due to its level of prediction accuracy and scope of applicability. One way to significantly improve its usefulness is through the incorporation of underconstrained (or partial) NMR data. It is well known that the NMR method for protein structure determination applies only to small proteins and that its effectiveness decreases rapidly as the protein mass increases beyond about 30 kD. We present, in this paper, a computational framework for applying underconstrained NMR data (that alone are insufficient for structure determination) as constraints in protein threading and also in all-atom model construction. In this study, we consider both secondary structure assignments from chemical shifts and NOE distance restraints. Our results have shown that both secondary structure assignments and a small number of long-range NOEs can significantly improve the threading quality in both fold recognition and threading-alignment accuracy, and can possibly extend threading's scope of applicability from homologs to analogs. An accurate backbone structure generated by NMR-constrained threading can then provide a great amount of structural information, equivalent to that provided by many NMR data; and hence can help reduce the number of NMR data typically required for an accurate structure determination. This new technique can potentially accelerate current NMR structure determination processes and possibly expand NMR's capability to larger proteins.  相似文献   

10.
Understanding the forces that stabilize membrane proteins in their native states is one of the contemporary challenges of biophysics. To date, estimates of side chain partitioning free energies from water to the lipid environment show disparate values between experimental and computational measures. Resolving the disparities is particularly important for understanding the energetic contributions of polar and charged side chains to membrane protein function because of the roles these residue types play in many cellular functions. In general, computational free energy estimates of charged side chain partitioning into bilayers are much larger than experimental measurements. However, the lack of a protein-based experimental system that uses bilayers against which to vet these computational predictions has traditionally been a significant drawback. Moon & Fleming recently published a novel hydrophobicity scale that was derived experimentally by using a host-guest strategy to measure the side chain energetic perturbation due to mutation in the context of a native membrane protein inserted into a phospholipid bilayer. These values are still approximately an order of magnitude smaller than computational estimates derived from molecular dynamics calculations from several independent groups. Here we address this discrepancy by showing that the free energy differences between experiment and computation become much smaller if the appropriate comparisons are drawn, which suggests that the two fields may in fact be converging. In addition, we present an initial computational characterization of the Moon & Fleming experimental system used for the hydrophobicity scale: OmpLA in DLPC bilayers. The hydrophobicity scale used OmpLA position 210 as the guest site, and our preliminary results demonstrate that this position is buried in the center of the DLPC membrane, validating its usage in the experimental studies. We further showed that the introduction of charged Arg at position 210 is well tolerated in OmpLA and that the DLPC bilayers accommodate this perturbation by creating a water dimple that allows the Arg side chain to remain hydrated. Lipid head groups visit the dimple and can hydrogen bond with Arg, but these interactions are transient. Overall, our study demonstrates the unique advantages of this molecular system because it can be interrogated by both computational and experimental practitioners, and it sets the stage for free energy calculations in a system for which there is unambiguous experimental data. This article is part of a Special Issue entitled: Membrane protein structure and function.  相似文献   

11.
As one of the earliest problems in computational biology, RNA secondary structure prediction (sometimes referred to as "RNA folding") problem has attracted attention again, thanks to the recent discoveries of many novel non-coding RNA molecules. The two common approaches to this problem are de novo prediction of RNA secondary structure based on energy minimization and the consensus folding approach (computing the common secondary structure for a set of unaligned RNA sequences). Consensus folding algorithms work well when the correct seed alignment is part of the input to the problem. However, seed alignment itself is a challenging problem for diverged RNA families. In this paper, we propose a novel framework to predict the common secondary structure for unaligned RNA sequences. By matching putative stacks in RNA sequences, we make use of both primary sequence information and thermodynamic stability for prediction at the same time. We show that our method can predict the correct common RNA secondary structures even when we are given only a limited number of unaligned RNA sequences, and it outperforms current algorithms in sensitivity and accuracy.  相似文献   

12.
In this paper(1) we present a novel framework for protein secondary structure prediction. In this prediction framework, firstly we propose a novel parameterized semi-probability profile, which combines single sequence with evolutionary information effectively. Secondly, different semi-probability profiles are respectively applied as network input to predict protein secondary structure. Then a comparison among these different predictions is discussed in this article. Finally, na?ve Bayes approaches are used to combine these predictions in order to obtain a better prediction performance than individual prediction. The experimental results show that our proposed framework can indeed improve the prediction accuracy.  相似文献   

13.
In nature, proteins partake in numerous protein– protein interactions that mediate their functions. Moreover, proteins have been shown to be physically stable in multiple structures, induced by cellular conditions, small ligands, or covalent modifications. Understanding how protein sequences achieve this structural promiscuity at the atomic level is a fundamental step in the drug design pipeline and a critical question in protein physics. One way to investigate this subject is to computationally predict protein sequences that are compatible with multiple states, i.e., multiple target structures or binding to distinct partners. The goal of engineering such proteins has been termed multispecific protein design. We develop a novel computational framework to efficiently and accurately perform multispecific protein design. This framework utilizes recent advances in probabilistic graphical modeling to predict sequences with low energies in multiple target states. Furthermore, it is also geared to specifically yield positional amino acid probability profiles compatible with these target states. Such profiles can be used as input to randomly bias high‐throughput experimental sequence screening techniques, such as phage display, thus providing an alternative avenue for elucidating the multispecificity of natural proteins and the synthesis of novel proteins with specific functionalities. We prove the utility of such multispecific design techniques in better recovering amino acid sequence diversities similar to those resulting from millions of years of evolution. We then compare the approaches of prediction of low energy ensembles and of amino acid profiles and demonstrate their complementarity in providing more robust predictions for protein design. Proteins 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

14.
The biclustering method can be a very useful analysis tool when some genes have multiple functions and experimental conditions are diverse in gene expression measurement. This is because the biclustering approach, in contrast to the conventional clustering techniques, focuses on finding a subset of the genes and a subset of the experimental conditions that together exhibit coherent behavior. However, the biclustering problem is inherently intractable, and it is often computationally costly to find biclusters with high levels of coherence. In this work, we propose a novel biclustering algorithm that exploits the zero-suppressed binary decision diagrams (ZBDDs) data structure to cope with the computational challenges. Our method can find all biclusters that satisfy specific input conditions, and it is scalable to practical gene expression data. We also present experimental results confirming the effectiveness of our approach.  相似文献   

15.
One of the possible ways for a complete and final decision of the problem of the determination of three-dimensional structure of proteins from their amino acid sequence is simulation of protein three-dimensional structure formation. For the performance of this task it is suggested to use the code-based physics method developed by the author. In this article a simulation of the α-helix and β-hairpin formation in water-soluble proteins as a start of the realization of this plan is described. Results of the simulation are compared from experimental data for 14 proteins of no more than 50 amino acids and, therefore, with a small number of α-helices and β-strands (to meet limits of simulation process) and secondary structure predictions by the best current methods of protein secondary structure prediction, PSIpred, PORTER and PROFsec. The secondary structure of proteins, obtained as a result of the simulation of α-helix and β-hairpin formation by the code-based physics method, agrees completely with the experiment, while the secondary structure predicted by the PSIpred, PORTER, and PROFsec methods contains significant differences from the experimental data.  相似文献   

16.
Chen C  Zhou X  Tian Y  Zou X  Cai P 《Analytical biochemistry》2006,357(1):116-121
Because a priori knowledge of a protein structural class can provide useful information about its overall structure, the determination of protein structural class is a quite meaningful topic in protein science. However, with the rapid increase in newly found protein sequences entering into databanks, it is both time-consuming and expensive to do so based solely on experimental techniques. Therefore, it is vitally important to develop a computational method for predicting the protein structural class quickly and accurately. To deal with the challenge, this article presents a dual-layer support vector machine (SVM) fusion network that is featured by using a different pseudo-amino acid composition (PseAA). The PseAA here contains much information that is related to the sequence order of a protein and the distribution of the hydrophobic amino acids along its chain. As a showcase, the rigorous jackknife cross-validation test was performed on the two benchmark data sets constructed by Zhou. A significant enhancement in success rates was observed, indicating that the current approach may serve as a powerful complementary tool to other existing methods in this area.  相似文献   

17.
Understanding and characterising biochemical processes inside single cells requires experimental platforms that allow one to perturb and observe the dynamics of such processes as well as computational methods to build and parameterise models from the collected data. Recent progress with experimental platforms and optogenetics has made it possible to expose each cell in an experiment to an individualised input and automatically record cellular responses over days with fine time resolution. However, methods to infer parameters of stochastic kinetic models from single-cell longitudinal data have generally been developed under the assumption that experimental data is sparse and that responses of cells to at most a few different input perturbations can be observed. Here, we investigate and compare different approaches for calculating parameter likelihoods of single-cell longitudinal data based on approximations of the chemical master equation (CME) with a particular focus on coupling the linear noise approximation (LNA) or moment closure methods to a Kalman filter. We show that, as long as cells are measured sufficiently frequently, coupling the LNA to a Kalman filter allows one to accurately approximate likelihoods and to infer model parameters from data even in cases where the LNA provides poor approximations of the CME. Furthermore, the computational cost of filtering-based iterative likelihood evaluation scales advantageously in the number of measurement times and different input perturbations and is thus ideally suited for data obtained from modern experimental platforms. To demonstrate the practical usefulness of these results, we perform an experiment in which single cells, equipped with an optogenetic gene expression system, are exposed to various different light-input sequences and measured at several hundred time points and use parameter inference based on iterative likelihood evaluation to parameterise a stochastic model of the system.  相似文献   

18.
Inferential structure determination uses Bayesian theory to combine experimental data with prior structural knowledge into a posterior probability distribution over protein conformational space. The posterior distribution encodes everything one can say objectively about the native structure in the light of the available data and additional prior assumptions and can be searched for structural representatives. Here an analogy is drawn between the posterior distribution and the canonical ensemble of statistical physics. A statistical mechanics analysis assesses the complexity of a structure calculation globally in terms of ensemble properties. Analogs of the free energy and density of states are introduced; partition functions evaluate the consistency of prior assumptions with data. Critical behavior is observed with dwindling restraint density, which impairs structure determination with too sparse data. However, prior distributions with improved realism ameliorate the situation by lowering the critical number of observations. An in-depth analysis of various experimentally accessible structural parameters and force field terms will facilitate a statistical approach to protein structure determination with sparse data that avoids bias as much as possible.  相似文献   

19.
Crystallography and NMR system (CNS) is currently a widely used method for fragment-free ab initio protein folding from inter-residue distance or contact maps. Despite its widespread use in protein structure prediction, CNS is a decade-old macromolecular structure determination system that was originally developed for solving macromolecular geometry from experimental restraints as opposed to predictive modeling driven by interaction map data. As such, the adaptation of the CNS experimental structure determination protocol for ab initio protein folding is intrinsically anomalous that may undermine the folding accuracy of computational protein structure prediction. In this paper, we propose a new CNS-free hierarchical structure modeling method called DConStruct for folding both soluble and membrane proteins driven by distance and contact information. Rigorous experimental validation shows that DConStruct attains much better reconstruction accuracy than CNS when tested with the same input contact map at varying contact thresholds. The hierarchical modeling with iterative self-correction employed in DConStruct scales at a much higher degree of folding accuracy than CNS with the increase in contact thresholds, ultimately approaching near-optimal reconstruction accuracy at higher-thresholded contact maps. The folding accuracy of DConStruct can be further improved by exploiting distance-based hybrid interaction maps at tri-level thresholding, as demonstrated by the better performance of our method in folding free modeling targets from the 12th and 13th rounds of the Critical Assessment of techniques for protein Structure Prediction (CASP) experiments compared to popular CNS- and fragment-based approaches and energy-minimization protocols, some of which even using much finer-grained distance maps than ours. Additional large-scale benchmarking shows that DConStruct can significantly improve the folding accuracy of membrane proteins compared to a CNS-based approach. These results collectively demonstrate the feasibility of greatly improving the accuracy of ab initio protein folding by optimally exploiting the information encoded in inter-residue interaction maps beyond what is possible by CNS.  相似文献   

20.
X-ray crystallography is a critical tool in the study of biological systems. It is able to provide information that has been a prerequisite to understanding the fundamentals of life. It is also a method that is central to the development of new therapeutics for human disease. Significant time and effort are required to determine and optimize many macromolecular structures because of the need for manual interpretation of complex numerical data, often using many different software packages, and the repeated use of interactive three-dimensional graphics. The Phenix software package has been developed to provide a comprehensive system for macromolecular crystallographic structure solution with an emphasis on automation. This has required the development of new algorithms that minimize or eliminate subjective input in favor of built-in expert-systems knowledge, the automation of procedures that are traditionally performed by hand, and the development of a computational framework that allows a tight integration between the algorithms. The application of automated methods is particularly appropriate in the field of structural proteomics, where high throughput is desired. Features in Phenix for the automation of experimental phasing with subsequent model building, molecular replacement, structure refinement and validation are described and examples given of running Phenix from both the command line and graphical user interface.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号