首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background

In the backdrop of challenge to obtain a protein structure under the known limitations of both experimental and theoretical techniques, the need of a fast as well as accurate protein structure evaluation method still exists to substantially reduce a huge gap between number of known sequences and structures. Among currently practiced theoretical techniques, homology modelling backed by molecular dynamics based optimization appears to be the most popular one. However it suffers from contradictory indications of different validation parameters generated from a set of protein models which are predicted against a particular target protein. For example, in one model Ramachandran Score may be quite high making it acceptable, whereas, its potential energy may not be very low making it unacceptable and vice versa. Towards resolving this problem, the main objective of this study was fixed as to utilize a simple experimentally derived output, Surface Roughness Index of concerned protein of unknown structure as an intervening agent that could be obtained using ordinary microscopic images of heat denatured aggregates of the same protein.

Result

It was intriguing to observe that direct experimental knowledge of the concerned protein, however simple it may be, might give insight on acceptability of its particular structural model out of a confusion set of models generated from database driven comparative technique for structure prediction. The result obtained from a widely varying structural class of proteins indicated that speed of protein structure evaluation can be further enhanced without compromising with accuracy by recruiting simple experimental output.

Conclusion

In this work, a semi-empirical methodological approach was provided for improving protein structure evaluation. It showed that, once structure models of a protein were obtained through homology technique, the problem of selection of a best model out of a confusion set of Pareto-optimal structures could be resolved by employing a structure agent directly obtainable through experiment with the same protein as experimental ingredient. Overall, in the backdrop of getting a reasonably accurate protein structure of pathogens causing epidemics or biological warfare, such approach could be of use as a plausible solution for fast drug design.
  相似文献   

2.
The score statistics of a recently introduced 'hybrid alignment' algorithm is studied in detail numerically. An extensive survey across the 2216 models of protein domains contained in the Pfam v5.4 database (Bateman et al., Nucleic Acids Res., 28, 263-266, 2000) verifies the theoretical predictions: For the position-specific scoring functions used in the Pfam models, the score statistics of hybrid alignment obey the Gumbel distribution, with the key Gumbel parameter lambda taking on the asymptotic value 1 universally for all models. Thus, the use of hybrid alignment eliminates the time-consuming computer simulations normally needed to assign p-values to alignment scores, freeing the users to experiment with different scoring parameters and functions. The performance of the hybrid algorithm in detecting sequence homology is also studied. For protein sequences from the SCOP database (Murzin et al., J. Mol. Biol., 247, 536-540, 1995) using uniform scoring functions, the performance is found to be comparable to the best of the existing methods. Preliminary results using the PfamA database suggest that the hybrid algorithm achieves similar performance as existing methods for position-specific scoring systems as well. Hybrid alignment is thereby established as a high performance alignment algorithm with well-characterized, universal statistics.  相似文献   

3.
Reliable prediction of model accuracy is an important unsolved problem in protein structure modeling. To address this problem, we studied 24 individual assessment scores, including physics-based energy functions, statistical potentials, and machine learning-based scoring functions. Individual scores were also used to construct approximately 85,000 composite scoring functions using support vector machine (SVM) regression. The scores were tested for their abilities to identify the most native-like models from a set of 6000 comparative models of 20 representative protein structures. Each of the 20 targets was modeled using a template of <30% sequence identity, corresponding to challenging comparative modeling cases. The best SVM score outperformed all individual scores by decreasing the average RMSD difference between the model identified as the best of the set and the model with the lowest RMSD (DeltaRMSD) from 0.63 A to 0.45 A, while having a higher Pearson correlation coefficient to RMSD (r=0.87) than any other tested score. The most accurate score is based on a combination of the DOPE non-hydrogen atom statistical potential; surface, contact, and combined statistical potentials from MODPIPE; and two PSIPRED/DSSP scores. It was implemented in the SVMod program, which can now be applied to select the final model in various modeling problems, including fold assignment, target-template alignment, and loop modeling.  相似文献   

4.
Overdispersion is a common phenomenon in Poisson modeling, and the negative binomial (NB) model is frequently used to account for overdispersion. Testing approaches (Wald test, likelihood ratio test (LRT), and score test) for overdispersion in the Poisson regression versus the NB model are available. Because the generalized Poisson (GP) model is similar to the NB model, we consider the former as an alternate model for overdispersed count data. The score test has an advantage over the LRT and the Wald test in that the score test only requires that the parameter of interest be estimated under the null hypothesis. This paper proposes a score test for overdispersion based on the GP model and compares the power of the test with the LRT and Wald tests. A simulation study indicates the score test based on asymptotic standard Normal distribution is more appropriate in practical application for higher empirical power, however, it underestimates the nominal significance level, especially in small sample situations, and examples illustrate the results of comparing the candidate tests between the Poisson and GP models. A bootstrap test is also proposed to adjust the underestimation of nominal level in the score statistic when the sample size is small. The simulation study indicates the bootstrap test has significance level closer to nominal size and has uniformly greater power than the score test based on asymptotic standard Normal distribution. From a practical perspective, we suggest that, if the score test gives even a weak indication that the Poisson model is inappropriate, say at the 0.10 significance level, we advise the more accurate bootstrap procedure as a better test for comparing whether the GP model is more appropriate than Poisson model. Finally, the Vuong test is illustrated to choose between GP and NB2 models for the same dataset.  相似文献   

5.
The predictive utility of polygenic scores is increasing, and many polygenic scoring methods are available, but it is unclear which method performs best. This study evaluates the predictive utility of polygenic scoring methods within a reference-standardized framework, which uses a common set of variants and reference-based estimates of linkage disequilibrium and allele frequencies to construct scores. Eight polygenic score methods were tested: p-value thresholding and clumping (pT+clump), SBLUP, lassosum, LDpred1, LDpred2, PRScs, DBSLMM and SBayesR, evaluating their performance to predict outcomes in UK Biobank and the Twins Early Development Study (TEDS). Strategies to identify optimal p-value thresholds and shrinkage parameters were compared, including 10-fold cross validation, pseudovalidation and infinitesimal models (with no validation sample), and multi-polygenic score elastic net models. LDpred2, lassosum and PRScs performed strongly using 10-fold cross-validation to identify the most predictive p-value threshold or shrinkage parameter, giving a relative improvement of 16–18% over pT+clump in the correlation between observed and predicted outcome values. Using pseudovalidation, the best methods were PRScs, DBSLMM and SBayesR. PRScs pseudovalidation was only 3% worse than the best polygenic score identified by 10-fold cross validation. Elastic net models containing polygenic scores based on a range of parameters consistently improved prediction over any single polygenic score. Within a reference-standardized framework, the best polygenic prediction was achieved using LDpred2, lassosum and PRScs, modeling multiple polygenic scores derived using multiple parameters. This study will help researchers performing polygenic score studies to select the most powerful and predictive analysis methods.  相似文献   

6.
Critical Assessment of PRedicted Interactions (CAPRI) has proven to be a catalyst for the development of docking algorithms. An essential step in docking is the scoring of predicted binding modes in order to identify stable complexes. In 2005, CAPRI introduced the scoring experiment, where upon completion of a prediction round, a larger set of models predicted by different groups and comprising both correct and incorrect binding modes, is made available to all participants for testing new scoring functions independently from docking calculations. Here we present an expanded benchmark data set for testing scoring functions, which comprises the consolidated ensemble of predicted complexes made available in the CAPRI scoring experiment since its inception. This consolidated scoring benchmark contains predicted complexes for 15 published CAPRI targets. These targets were subjected to 23 CAPRI assessments, due to existence of multiple binding modes for some targets. The benchmark contains more than 19,000 protein complexes. About 10% of the complexes represent docking predictions of acceptable quality or better, the remainder represent incorrect solutions (decoys). The benchmark set contains models predicted by 47 different predictor groups including web servers, which use different docking and scoring procedures, and is arguably as diverse as one may expect, representing the state of the art in protein docking. The data set is publicly available at the following URL: http://cb.iri.univ‐lille1.fr/Users/lensink/Score_set . Proteins 2014; 82:3163–3169. © 2014 Wiley Periodicals, Inc.  相似文献   

7.
蛋白质折叠速率的正确预测对理解蛋白质的折叠机理非常重要。本文从伪氨基酸组成的方法出发,提出利用序列疏水值震荡的方法来提取蛋白质氨基酸的序列顺序信息,建立线性回归模型进行折叠速率预测。该方法不需要蛋白质的任何二级结构、三级结构信息或结构类信息,可直接从序列对蛋白质折叠速率进行预测。对含有62个蛋白质的数据集,经过Jack.knife交互检验验证,相关系数达到0.804,表示折叠速率预测值与实验值有很好的相关性,说明了氨基酸序列信息对蛋白质折叠速率影响重要。同其他方法相比,本文的方法具有计算简单,输入参数少等特点。  相似文献   

8.
Qiu J  Sheffler W  Baker D  Noble WS 《Proteins》2008,71(3):1175-1182
Protein structure prediction is an important problem of both intellectual and practical interest. Most protein structure prediction approaches generate multiple candidate models first, and then use a scoring function to select the best model among these candidates. In this work, we develop a scoring function using support vector regression (SVR). Both consensus-based features and features from individual structures are extracted from a training data set containing native protein structures and predicted structural models submitted to CASP5 and CASP6. The SVR learns a scoring function that is a linear combination of these features. We test this scoring function on two data sets. First, when used to rank server models submitted to CASP7, the SVR score selects predictions that are comparable to the best performing server in CASP7, Zhang-Server, and significantly better than all the other servers. Even if the SVR score is not allowed to select Zhang-Server models, the SVR score still selects predictions that are significantly better than all the other servers. In addition, the SVR is able to select significantly better models and yield significantly better Pearson correlation coefficients than the two best Quality Assessment groups in CASP7, QA556 (LEE), and QA634 (Pcons). Second, this work aims to improve the ability of the Robetta server to select best models, and hence we evaluate the performance of the SVR score on ranking the Robetta server template-based models for the CASP7 targets. The SVR selects significantly better models than the Robetta K*Sync consensus alignment score.  相似文献   

9.
PurposeThis study evaluated the plan quality of CyberKnife MLC-based treatment planning in comparison to the Iris collimator for abdominal and pelvic SBRT. Multiple dosimetric parameters were considered together with a global scoring index validated by clinical scoring.Methods and materialsIris and MLC plans were created for 28 liver, 15 pancreas and 13 prostate cases including a wide range of PTV sizes (24–643 cm3). Plans were compared in terms of coverage, conformity (nCI), dose gradient (R50%), homogeneity (HI), OAR doses, PTV gEUD, MU, treatment time both estimated by TPS (tTPS) and measured. A global plan quality score index was calculated for IRIS and MLC solutions and validated by a clinical score given independently by two observers.ResultsCompared to Iris, MLC achieved equivalent coverage and conformity without compromising OAR sparing and improving R50% (p < 0.001). MLC gEUD was slightly lower than Iris (p < 0.05) for abdominal cases. MLC reduced significantly MU (−15%) and tTPS (−22%). Time reduction was partially lost when measured. The global score index was significantly higher for MLC solutions which were selected in 73% and 64% of cases respectively by the first and second observer.ConclusionIris and MLC comparison was not straightforward when based on multiple dosimetric parameters. The use of a mathematical overall score index integrated with a clinical scoring was essential to confirm MLC plans advantages over Iris solutions.  相似文献   

10.
Agro‐Land Surface Models (agro‐LSM) combine detailed crop models and large‐scale vegetation models (DGVMs) to model the spatial and temporal distribution of energy, water, and carbon fluxes within the soil–vegetation–atmosphere continuum worldwide. In this study, we identify and optimize parameters controlling leaf area index (LAI) in the agro‐LSM ORCHIDEE‐STICS developed for sugarcane. Using the Morris method to identify the key parameters impacting LAI, at eight different sugarcane field trial sites, in Australia and La Reunion island, we determined that the three most important parameters for simulating LAI are (i) the maximum predefined rate of LAI increase during the early crop development phase, a parameter that defines a plant density threshold below which individual plants do not compete for growing their LAI, and a parameter defining a threshold for nitrogen stress on LAI. A multisite calibration of these three parameters is performed using three different scoring functions. The impact of the choice of a particular scoring function on the optimized parameter values is investigated by testing scoring functions defined from the model‐data RMSE, the figure of merit and a Bayesian quadratic model‐data misfit function. The robustness of the calibration is evaluated for each of the three scoring functions with a systematic cross‐validation method to find the most satisfactory one. Our results show that the figure of merit scoring function is the most robust metric for establishing the best parameter values controlling the LAI. The multisite average figure of merit scoring function is improved from 67% of agreement to 79%. The residual error in LAI simulation after the calibration is discussed.  相似文献   

11.
We describe protein-protein recognition within the frame of the random energy model of statistical physics. We simulate, by docking the component proteins, the process of association of two proteins that form a complex. We obtain the energy spectrum of a set of protein-protein complexes of known three-dimensional structure by performing docking in random orientations and scoring the models thus generated. We use a coarse protein representation where each amino acid residue is replaced by its Vorono? cell, and derive a scoring function by applying the evolutionary learning program ROGER to a set of parameters measured on that representation. Taking the scores of the docking models to be interaction energies, we obtain energy spectra for the complexes and fit them to a Gaussian distribution, from which we derive physical parameters such as a glass transition temperature and a specificity transition temperature.  相似文献   

12.
13.
14.
Major advances have been made in the prediction of soluble protein structures, led by the knowledge-based modeling methods that extract useful structural trends from known protein structures and incorporate them into scoring functions. The same cannot be reported for the class of transmembrane proteins, primarily due to the lack of high-resolution structural data for transmembrane proteins, which render many of the knowledge-based method unreliable or invalid. We have developed a method that harnesses the vast structural knowledge available in soluble protein data for use in the modeling of transmembrane proteins. At the core of the method, a set of transmembrane protein decoy sets that allow us to filter and train features recognized from soluble proteins for transmembrane protein modeling into a set of scoring functions. We have demonstrated that structures of soluble proteins can provide significant insight into transmembrane protein structures. A complementary novel two-stage modeling/selection process that mimics the two-stage helical membrane protein folding was developed. Combined with the scoring function, the method was successfully applied to model 5 transmembrane proteins. The root mean square deviations of the predicted models ranged from 5.0 to 8.8?Å to the native structures.  相似文献   

15.
Chemical shifts contain substantial information about protein local conformations. We present a method to assign individual protein backbone dihedral angles into specific regions on the Ramachandran map based on the amino acid sequences and the chemical shifts of backbone atoms of tripeptide segments. The method uses a scoring function derived from the Bayesian probability for the central residue of a query tripeptide segment to have a particular conformation. The Ramachandran map is partitioned into representative regions at two levels of resolution. The lower resolution partitioning is equivalent to the conventional definitions of different secondary structure regions on the map. At the higher resolution level, the α and β regions are further divided into subregions. Predictions are attempted at both levels of resolution. We compared our method with TALOS using the original TALOS database, and obtained comparable results. Although TALOS may produce the best results with currently available databases which are much enlarged, the Bayesian-probability-based approach can provide a quantitative measure for the reliability of predictions.  相似文献   

16.
The use of procedures for the automated scoring of amplified fragment length polymorphisms (AFLP) fragments has recently increased. Corresponding software does not only automatically score the presence or absence of AFLP fragments, but also allows an evaluation of how different settings of scoring parameters influence subsequent population genetic analyses. In this study, we used the automated scoring package rawgeno to evaluate how five scoring parameters influence the number of polymorphic bins and estimates of pairwise genetic differentiation between populations (Fst). Steps were implemented in r to automatically run the scoring process in rawgeno for a set of different parameter combinations. While we found the scoring parameters minimum bin width and minimum number of samples per bin to have only weak influence on pairwise Fst values, maximum bin width and bin reproducibility had much stronger effects. The minimum average bin fluorescence scoring parameter affected Fst values in an only moderate way. At a range of scoring parameters around the default settings of rawgeno , the number of polymorphic bins as well as pairwise Fst values stayed rather constant. This study thus shows the particularities of AFLP scoring, be it either manual or automatical, can have profound effects on subsequent population genetic analysis.  相似文献   

17.
Two commonly employed angular-mobility models for describing amino-acid side-chain χ(1) torsion conformation, the staggered-rotamer jump and the normal probability density, are discussed and performance differences in applications to scalar-coupling data interpretation highlighted. Both models differ in their distinct statistical concepts, representing discrete and continuous angle distributions, respectively. Circular statistics, introduced for describing torsion-angle distributions by using a universal circular order parameter central to all models, suggest another distribution of the continuous class, here referred to as the elliptic model. Characteristic of the elliptic model is that order parameter and circular variance form complementary moduli. Transformations between the parameter sets that describe the probability density functions underlying the different models are provided. Numerical aspects of parameter optimization are considered. The issues are typified by using a set of χ(1) related (3) J coupling constants available for FK506-binding protein. The discrete staggered-rotamer model is found generally to produce lower order parameters, implying elevated rotatory variability in the amino-acid side chains, whereas continuous models tend to give higher order parameters that suggest comparatively less variation in angle conformations. The differences perceived regarding angular mobility are attributed to conceptually different features inherent to the models.  相似文献   

18.
Huang SY  Zou X 《Proteins》2011,79(9):2648-2661
In this study, we have developed a statistical mechanics-based iterative method to extract statistical atomic interaction potentials from known, nonredundant protein structures. Our method circumvents the long-standing reference state problem in deriving traditional knowledge-based scoring functions, by using rapid iterations through a physical, global convergence function. The rapid convergence of this physics-based method, unlike other parameter optimization methods, warrants the feasibility of deriving distance-dependent, all-atom statistical potentials to keep the scoring accuracy. The derived potentials, referred to as ITScore/Pro, have been validated using three diverse benchmarks: the high-resolution decoy set, the AMBER benchmark decoy set, and the CASP8 decoy set. Significant improvement in performance has been achieved. Finally, comparisons between the potentials of our model and potentials of a knowledge-based scoring function with a randomized reference state have revealed the reason for the better performance of our scoring function, which could provide useful insight into the development of other physical scoring functions. The potentials developed in this study are generally applicable for structural selection in protein structure prediction.  相似文献   

19.
Current approaches to RNA structure prediction range from physics-based methods, which rely on thousands of experimentally measured thermodynamic parameters, to machine-learning (ML) techniques. While the methods for parameter estimation are successfully shifting toward ML-based approaches, the model parameterizations so far remained fairly constant. We study the potential contribution of increasing the amount of information utilized by RNA folding prediction models to the improvement of their prediction quality. This is achieved by proposing novel models, which refine previous ones by examining more types of structural elements, and larger sequential contexts for these elements. Our proposed fine-grained models are made practical thanks to the availability of large training sets, advances in machine-learning, and recent accelerations to RNA folding algorithms. We show that the application of more detailed models indeed improves prediction quality, while the corresponding running time of the folding algorithm remains fast. An additional important outcome of this experiment is a new RNA folding prediction model (coupled with a freely available implementation), which results in a significantly higher prediction quality than that of previous models. This final model has about 70,000 free parameters, several orders of magnitude more than previous models. Being trained and tested over the same comprehensive data sets, our model achieves a score of 84% according to the F?-measure over correctly-predicted base-pairs (i.e., 16% error rate), compared to the previously best reported score of 70% (i.e., 30% error rate). That is, the new model yields an error reduction of about 50%. Trained models and source code are available at www.cs.bgu.ac.il/?negevcb/contextfold.  相似文献   

20.
A wastewater biofiltration model is used to assess its capacity to reproduce the treatment behaviour of a plant-sized tertiary nitrifying biofilter unit. It is calibrated on two different types of datasets collected at the Seine-Aval biofiltration plant (Achères, France): grab samples at several heights inside the media bed and a long-term daily plant monitoring over a 1-year period. The model parameters are first calibrated to fit the dynamics observed in the media bed, after which the model is compared to the second dataset. Further parameter changes are then made if necessary and the model is once again compared to both datasets to ensure its ability to predict the treatment behaviour on both size scales. The calibrated model provides correct predictions for most observed nutrient variables for both datasets. An overestimation of the oxygen transfer under a summer, low ammonia load period however leads to a slight underestimation of the nitrifying efficiency of the biofilters. Statistical score computation corroborates the model accuracy as the mean error scores usually remain low. They also point to a certain weakness of the model regarding the suspended solids filtration. Both datasets are overall correctly modelled using a single parameter set. Most of this parameter set is close to or contained in value ranges found in the literature. The parameters related to aeration, however, seem to be slightly higher than what is reported elsewhere.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号