首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The accuracy of comparative models of proteins is addressed here. A set of 12732 single-template models of sequences of known high-resolution structures was built by an automated procedure. Accuracy of several structure-derived properties, such as surface area, residue accessibility, presence of pockets, electrostatic potential and others, was determined as a function of template:target sequence identity by comparing models with their corresponding experimental structures. As expected, the average accuracy of structure-derived properties always increases with higher template:target sequence identity, but the exact shape of this relationship can differ from one property to another. A comparison of structure-derived properties measured from NMR and X-ray structures of the same protein shows that for most properties, the NMR/X-ray difference is of the same order as the error in models based on ~40% template:target sequence identity. The exact sequence identity at which properties reach that accuracy varies between 25 and 50%, depending on the property being analyzed. A general characteristic of simple comparative models is that their surface has increased area as a consequence of being more rugged than that of experimental structures. This suggests that including solvent effects during model building or refinement could significantly improve the accuracy of surface properties in comparative models.  相似文献   

2.

Background  

For successful protein structure prediction by comparative modeling, in addition to identifying a good template protein with known structure, obtaining an accurate sequence alignment between a query protein and a template protein is critical. It has been known that the alignment accuracy can vary significantly depending on our choice of various alignment parameters such as gap opening penalty and gap extension penalty. Because the accuracy of sequence alignment is typically measured by comparing it with its corresponding structure alignment, there is no good way of evaluating alignment accuracy without knowing the structure of a query protein, which is obviously not available at the time of structure prediction. Moreover, there is no universal alignment parameter option that would always yield the optimal alignment.  相似文献   

3.
Added-value is the additional information that a model carries with respect to the template structure used for model building. Thousands of single-template models, corresponding to proteins of known structure, were analyzed. The accuracy of structure-derived properties, such as residue accessibility, surface area, electrostatic potential, and others, was determined as a function of template:target sequence identity by comparing the models with their corresponding experimental structures. Added-value was determined by comparing the accuracy in models with that from templates. Geometry-dependent properties such as neighborhood of buried residues and accessible surface area showed low added-value. Properties that also depend on the protein sequence, such as presence of polar areas and electrostatic potential, showed high added-value. In general added-value increases when template:target sequence identity decreases, but it is also affected by alignment errors. This study justifies the use of models instead of the use of templates to estimate structure-derived properties of a target protein.  相似文献   

4.

Background

Many antibody crystal structures have been solved. Structural modeling programs have been developed that utilize this information to predict 3-D structures of an antibody based upon its sequence. Because of the problem of self-reference, the accuracy and utility of these predictions can only be tested when a new structure has not yet been deposited in the Protein Data Bank.

Methods

We have solved the crystal structure of the Fab fragment of RAC18, a protective anti-ricin mAb, to 1.9 Å resolution. We have also modeled the Fv structure of RAC18 using publicly available Ab modeling tools Prediction of Immunoglobulin Structures (PIGS), RosettaAntibody, and Web Antibody Modeling (WAM). The model structures underwent energy minimization. We compared results to the crystal structure on the basis of root-mean-square deviation (RMSD), template modeling score (TM-score), Z-score, and MolProbity analysis.

Findings

The crystal structure showed a pocket formed mainly by AA residues in each of the heavy chain complementarity determining regions (CDRs). There were differences between the crystal structure and structures predicted by the modeling tools, particularly in the CDRs. There were also differences among the predicted models, although the differences were small and within experimental error. No one modeling program was clearly superior to the others. In some cases, choosing structures based only on sequence homology to the crystallized Ab yielded RMSDs comparable to the models.

Conclusions

Molecular modeling programs accurately predict the structure of most regions of antibody variable domains of RAC18. The hypervariable CDRs proved most difficult to model, particularly H chain CDR3. Because CDR3 is most often involved in contact with antigen, this defect must be considered when using models to identify potential contacts between antibody and antigen. Because this study represents only a single case, the results cannot be generalized. Rather they highlight the utility and limitations of modeling programs.  相似文献   

5.
Peng J  Xu J 《Proteins》2011,79(6):1930-1939
Most threading methods predict the structure of a protein using only a single template. Due to the increasing number of solved structures, a protein without solved structure is very likely to have more than one similar template structures. Therefore, a natural question to ask is if we can improve modeling accuracy using multiple templates. This article describes a new multiple-template threading method to answer this question. At the heart of this multiple-template threading method is a novel probabilistic-consistency algorithm that can accurately align a single protein sequence simultaneously to multiple templates. Experimental results indicate that our multiple-template method can improve pairwise sequence-template alignment accuracy and generate models with better quality than single-template models even if they are built from the best single templates (P-value <10(-6)) while many popular multiple sequence/structure alignment tools fail to do so. The underlying reason is that our probabilistic-consistency algorithm can generate accurate multiple sequence/template alignments. In another word, without an accurate multiple sequence/template alignment, the modeling accuracy cannot be improved by simply using multiple templates to increase alignment coverage. Blindly tested on the CASP9 targets with more than one good template structures, our method outperforms all other CASP9 servers except two (Zhang-Server and QUARK of the same group). Our probabilistic-consistency algorithm can possibly be extended to align multiple protein/RNA sequences and structures.  相似文献   

6.

Background  

Selecting the highest quality 3D model of a protein structure from a number of alternatives remains an important challenge in the field of structural bioinformatics. Many Model Quality Assessment Programs (MQAPs) have been developed which adopt various strategies in order to tackle this problem, ranging from the so called "true" MQAPs capable of producing a single energy score based on a single model, to methods which rely on structural comparisons of multiple models or additional information from meta-servers. However, it is clear that no current method can separate the highest accuracy models from the lowest consistently. In this paper, a number of the top performing MQAP methods are benchmarked in the context of the potential value that they add to protein fold recognition. Two novel methods are also described: ModSSEA, which based on the alignment of predicted secondary structure elements and ModFOLD which combines several true MQAP methods using an artificial neural network.  相似文献   

7.
Zhu M  Li M 《Molecular bioSystems》2012,8(6):1686-1693
G-protein coupled receptors (GPCRs) are recognized to constitute the largest family of membrane proteins. Due to the disproportion in the quantity of crystal structures and their amino acid sequences, homology modeling contributes a reasonable and feasible approach to GPCR theoretical coordinates. With the brand new crystal structures resolved recently, herein we deliberated how to designate them as templates to carry out homology modeling in four aspects: (1) various sequence alignment methods; (2) protein weight matrix; (3) different sets of multiple templates; (4) active and inactive state of templates. The accuracy of models was evaluated by comparing the similarity of stereo conformation and molecular docking results between models and the experimental structure of Meleagris gallopavo β(1)-adrenergic receptor (Mg_Adrb1) that we desired to develop as an example. Our results proposed that: (1) Cobalt and MAFFT, two algorithms of sequence alignment, were suitable for single- and multiple-template modeling, respectively; (2) Blosum30 is applicable to align sequences in the case of low sequence identity; (3) multiple-template modeling is not always better than single-template one; (4) the state of template is an influential factor in simulating the GPCR structures as well.  相似文献   

8.

Aim

To measure the effects of including biotic interactions on climate‐based species distribution models (SDMs) used to predict distribution shifts under climate change. We evaluated the performance of distribution models for an endangered marsupial, the northern bettong (Bettongia tropica), comparing models that used only climate variables with models that also took into account biotic interactions.

Location

North‐east Queensland, Australia.

Methods

We developed separate climate‐based distribution models for the northern bettong, its two main resources and a competitor species. We then constructed models for the northern bettong by including climate suitability estimates for the resources and competitor as additional predictor variables to make climate + resource and climate + resource + competition models. We projected these models onto seven future climate scenarios and compared predictions of northern bettong distribution made by these differently structured models, using a ‘global’ metric, the I similarity statistic, to measure overlap in distribution and a ‘local’ metric to identify where predictions differed significantly.

Results

Inclusion of food resource biotic interactions improved model performance. Over moderate climate changes, up to 3.0 °C of warming, the climate‐only model for the northern bettong gave similar predictions of distribution to the more complex models including interactions, with differences only at the margins of predicted distributions. For climate changes beyond 3.0 °C, model predictions diverged significantly. The interactive model predicted less contraction of distribution than the simpler climate‐only model.

Main conclusions

Distribution models that account for interactions with other species, in particular direct resources, improve model predictions in the present‐day climate. For larger climate changes, shifts in distribution of interacting species cause predictions of interactive models to diverge from climate‐only models. Incorporating interactions with other species in SDMs may be needed for long‐term prediction of changes in distribution of species under climate change, particularly for specialized species strongly dependent on a small number of biotic interactions.  相似文献   

9.

Background  

Template selection and target-template alignment are critical steps for template-based modeling (TBM) methods. To identify the template for the twilight zone of 15~25% sequence similarity between targets and templates is still difficulty for template-based protein structure prediction. This study presents the (PS)2-v2 server, based on our original server with numerous enhancements and modifications, to improve reliability and applicability.  相似文献   

10.

Background  

In the area of protein structure prediction, recently a lot of effort has gone into the development of Model Quality Assessment Programs (MQAPs). MQAPs distinguish high quality protein structure models from inferior models. Here, we propose a new method to use an MQAP to improve the quality of models. With a given target sequence and template structure, we construct a number of different alignments and corresponding models for the sequence. The quality of these models is scored with an MQAP and used to choose the most promising model. An SVM-based selection scheme is suggested for combining MQAP partial potentials, in order to optimize for improved model selection.  相似文献   

11.

Background

Combinatorial complexity is a challenging problem for the modeling of cellular signal transduction since the association of a few proteins can give rise to an enormous amount of feasible protein complexes. The layer-based approach is an approximative, but accurate method for the mathematical modeling of signaling systems with inherent combinatorial complexity. The number of variables in the simulation equations is highly reduced and the resulting dynamic models show a pronounced modularity. Layer-based modeling allows for the modeling of systems not accessible previously.

Results

ALC (Automated Layer Construction) is a computer program that highly simplifies the building of reduced modular models, according to the layer-based approach. The model is defined using a simple but powerful rule-based syntax that supports the concepts of modularity and macrostates. ALC performs consistency checks on the model definition and provides the model output in different formats (C MEX, MATLAB, Mathematica and SBML) as ready-to-run simulation files. ALC also provides additional documentation files that simplify the publication or presentation of the models. The tool can be used offline or via a form on the ALC website.

Conclusion

ALC allows for a simple rule-based generation of layer-based reduced models. The model files are given in different formats as ready-to-run simulation files.  相似文献   

12.
13.

Background  

Accurate and sensitive performance evaluation is crucial for both effective development of better structure prediction methods based on sequence similarity, and for the comparative analysis of existing methods. Up to date, there has been no satisfactory comprehensive evaluation method that (i) is based on a large and statistically unbiased set of proteins with clearly defined relationships; and (ii) covers all performance aspects of sequence-based structure predictors, such as sensitivity and specificity, alignment accuracy and coverage, and structure template quality.  相似文献   

14.

Background

Agent-based models are valuable for examining systems where large numbers of discrete individuals interact with each other, or with some environment. Diabetic Veterans seeking eye care at a Veterans Administration hospital represent one such cohort.

Objective

The objective of this study was to develop an agent-based template to be used as a model for a patient with diabetic retinopathy (DR). This template may be replicated arbitrarily many times in order to generate a large cohort which is representative of a real-world population, upon which in-silico experimentation may be conducted.

Methods

Agent-based template development was performed in java-based computer simulation suite AnyLogic Professional 6.6. The model was informed by medical data abstracted from 535 patient records representing a retrospective cohort of current patients of the VA St. Louis Healthcare System Eye clinic. Logistic regression was performed to determine the predictors associated with advancing stages of DR. Predicted probabilities obtained from logistic regression were used to generate the stage of DR in the simulated cohort.

Results

The simulated cohort of DR patients exhibited no significant deviation from the test population of real-world patients in proportion of stage of DR, duration of diabetes mellitus (DM), or the other abstracted predictors. Simulated patients after 10 years were significantly more likely to exhibit proliferative DR (P<0.001).

Conclusions

Agent-based modeling is an emerging platform, capable of simulating large cohorts of individuals based on manageable data abstraction efforts. The modeling method described may be useful in simulating many different conditions where course of disease is described in categorical stages.  相似文献   

15.

Background  

Protein secondary structure prediction method based on probabilistic models such as hidden Markov model (HMM) appeals to many because it provides meaningful information relevant to sequence-structure relationship. However, at present, the prediction accuracy of pure HMM-type methods is much lower than that of machine learning-based methods such as neural networks (NN) or support vector machines (SVM).  相似文献   

16.

Background

Knottins are small, diverse and stable proteins with important drug design potential. They can be classified in 30 families which cover a wide range of sequences (1621 sequenced), three-dimensional structures (155 solved) and functions (> 10). Inter knottin similarity lies mainly between 15% and 40% sequence identity and 1.5 to 4.5 Å backbone deviations although they all share a tightly knotted disulfide core. This important variability is likely to arise from the highly diverse loops which connect the successive knotted cysteines. The prediction of structural models for all knottin sequences would open new directions for the analysis of interaction sites and to provide a better understanding of the structural and functional organization of proteins sharing this scaffold.

Results

We have designed an automated modeling procedure for predicting the three-dimensionnal structure of knottins. The different steps of the homology modeling pipeline were carefully optimized relatively to a test set of knottins with known structures: template selection and alignment, extraction of structural constraints and model building, model evaluation and refinement. After optimization, the accuracy of predicted models was shown to lie between 1.50 and 1.96 Å from native structures at 50% and 10% maximum sequence identity levels, respectively. These average model deviations represent an improvement varying between 0.74 and 1.17 Å over a basic homology modeling derived from a unique template. A database of 1621 structural models for all known knottin sequences was generated and is freely accessible from our web server at http://knottin.cbs.cnrs.fr. Models can also be interactively constructed from any knottin sequence using the structure prediction module Knoter1D3D available from our protein analysis toolkit PAT at http://pat.cbs.cnrs.fr.

Conclusions

This work explores different directions for a systematic homology modeling of a diverse family of protein sequences. In particular, we have shown that the accuracy of the models constructed at a low level of sequence identity can be improved by 1) a careful optimization of the modeling procedure, 2) the combination of multiple structural templates and 3) the use of conserved structural features as modeling restraints.
  相似文献   

17.

Background  

The study of biological networks has led to the development of increasingly large and detailed models. Computer tools are essential for the simulation of the dynamical behavior of the networks from the model. However, as the size of the models grows, it becomes infeasible to manually verify the predictions against experimental data or identify interesting features in a large number of simulation traces. Formal verification based on temporal logic and model checking provides promising methods to automate and scale the analysis of the models. However, a framework that tightly integrates modeling and simulation tools with model checkers is currently missing, on both the conceptual and the implementational level.  相似文献   

18.

Background

We describe the E-RFE method for gene ranking, which is useful for the identification of markers in the predictive classification of array data. The method supports a practical modeling scheme designed to avoid the construction of classification rules based on the selection of too small gene subsets (an effect known as the selection bias, in which the estimated predictive errors are too optimistic due to testing on samples already considered in the feature selection process).

Results

With E-RFE, we speed up the recursive feature elimination (RFE) with SVM classifiers by eliminating chunks of uninteresting genes using an entropy measure of the SVM weights distribution. An optimal subset of genes is selected according to a two-strata model evaluation procedure: modeling is replicated by an external stratified-partition resampling scheme, and, within each run, an internal K-fold cross-validation is used for E-RFE ranking. Also, the optimal number of genes can be estimated according to the saturation of Zipf's law profiles.

Conclusions

Without a decrease of classification accuracy, E-RFE allows a speed-up factor of 100 with respect to standard RFE, while improving on alternative parametric RFE reduction strategies. Thus, a process for gene selection and error estimation is made practical, ensuring control of the selection bias, and providing additional diagnostic indicators of gene importance.
  相似文献   

19.

Background  

A new algorithm for assessing similarity between primer and template has been developed based on the hypothesis that annealing of primer to template is an information transfer process.  相似文献   

20.

Background

Predicting complete protein-coding genes in human DNA remains a significant challenge. Though a number of promising approaches have been investigated, an ideal suite of tools has yet to emerge that can provide near perfect levels of sensitivity and specificity at the level of whole genes. As an incremental step in this direction, it is hoped that controlled gene finding experiments in the ENCODE regions will provide a more accurate view of the relative benefits of different strategies for modeling and predicting gene structures.

Results

Here we describe our general-purpose eukaryotic gene finding pipeline and its major components, as well as the methodological adaptations that we found necessary in accommodating human DNA in our pipeline, noting that a similar level of effort may be necessary by ourselves and others with similar pipelines whenever a new class of genomes is presented to the community for analysis. We also describe a number of controlled experiments involving the differential inclusion of various types of evidence and feature states into our models and the resulting impact these variations have had on predictive accuracy.

Conclusion

While in the case of the non-comparative gene finders we found that adding model states to represent specific biological features did little to enhance predictive accuracy, for our evidence-based 'combiner' program the incorporation of additional evidence tracks tended to produce significant gains in accuracy for most evidence types, suggesting that improved modeling efforts at the hidden Markov model level are of relatively little value. We relate these findings to our current plans for future research.
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号