首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Pairwise structure alignment commonly uses root mean square deviation (RMSD) to measure the structural similarity, and methods for optimizing RMSD are well established. We extend RMSD to weighted RMSD for multiple structures. By using multiplicative weights, we show that weighted RMSD for all pairs is the same as weighted RMSD to an average of the structures. Thus, using RMSD or weighted RMSD implies that the average is a consensus structure. Although we show that in general, the two tasks of finding the optimal translations and rotations for minimizing weighted RMSD cannot be separated for multiple structures like they can for pairs, an inherent difficulty and a fact ignored by previous work, we develop a near-linear iterative algorithm to converge weighted RMSD to a local minimum. 10,000 experiments of gapped alignment done on each of 23 protein families from HOMSTRAD (where each structure starts with a random translation and rotation) converge rapidly to the same minimum. Finally we propose a heuristic method to iteratively remove the effect of outliers and find well-aligned positions that determine the structural conserved region by modeling B-factors and deviations from the average positions as weights and iteratively assigning higher weights to better aligned atoms.  相似文献   

2.
MOTIVATION: Even the best sequence alignment methods frequently fail to correctly identify the framework regions for which backbones can be copied from the template into the target structure. Since the underprediction and, more significantly, the overprediction of these regions reduces the quality of the final model, it is of prime importance to attain as much as possible of the true structural alignment between target and template. RESULTS: We have developed an algorithm called Consensus that consistently provides a high quality alignment for comparative modeling. The method follows from a benchmark analysis of the 3D models generated by ten alignment techniques for a set of 79 homologous protein structure pairs. For 20-to-40% of the targets, these methods yield models with at least 6 A root mean square deviation (RMSD) from the native structure. We have selected the top five performing methods, and developed a consensus algorithm to generate an improved alignment. By building on the individual strength of each method, a set of criteria was implemented to remove the alignment segments that are likely to correspond to structurally dissimilar regions. The automated algorithm was validated on a different set of 48 protein pairs, resulting in 2.2 A average RMSD for the predicted models, and only four cases in which the RMSD exceeded 3 A. The average length of the alignments was about 75% of that found by standard structural superposition methods. The performance of Consensus was consistent from 2 to 32% target-template sequence identity, and hence it can be used for accurate prediction of framework regions in homology modeling.  相似文献   

3.
An appropriate structural superposition identifies similarities and differences between homologous proteins that are not evident from sequence alignments alone. We have coupled our Gaussian‐weighted RMSD (wRMSD) tool with a sequence aligner and seed extension (SE) algorithm to create a robust technique for overlaying structures and aligning sequences of homologous proteins (HwRMSD). HwRMSD overcomes errors in the initial sequence alignment that would normally propagate into a standard RMSD overlay. SE can generate a corrected sequence alignment from the improved structural superposition obtained by wRMSD. HwRMSD's robust performance and its superiority over standard RMSD are demonstrated over a range of homologous proteins. Its better overlay results in corrected sequence alignments with good agreement to HOMSTRAD. Finally, HwRMSD is compared to established structural alignment methods: FATCAT, secondary‐structure matching, combinatorial extension, and Dalilite. Most methods are comparable at placing residue pairs within 2 Å, but HwRMSD places many more residue pairs within 1 Å, providing a clear advantage. Such high accuracy is essential in drug design, where small distances can have a large impact on computational predictions. This level of accuracy is also needed to correct sequence alignments in an automated fashion, especially for omics‐scale analysis. HwRMSD can align homologs with low‐sequence identity and large conformational differences, cases where both sequence‐based and structural‐based methods may fail. The HwRMSD pipeline overcomes the dependency of structural overlays on initial sequence pairing and removes the need to determine the best sequence‐alignment method, substitution matrix, and gap parameters for each unique pair of homologs. Proteins 2012. © 2012 Wiley Periodicals, Inc.  相似文献   

4.
Lee SY  Skolnick J 《Proteins》2007,68(1):39-47
To improve the accuracy of TASSER models especially in the limit where threading provided template alignments are of poor quality, we have developed the TASSER(iter) algorithm which uses the templates and contact restraints from TASSER generated models for iterative structure refinement. We apply TASSER(iter) to a large benchmark set of 2,773 nonhomologous single domain proteins that are < or = 200 in length and that cover the PDB at the level of 35% pairwise sequence identity. Overall, TASSER(iter) models have a smaller global average RMSD of 5.48 A compared to 5.81 A RMSD of the original TASSER models. Classifying the targets by the level of prediction difficulty (where Easy targets have a good template with a corresponding good threading alignment, Medium targets have a good template but a poor alignment, and Hard targets have an incorrectly identified template), TASSER(iter) (TASSER) models have an average RMSD of 4.15 A (4.35 A) for the Easy set and 9.05 A (9.52 A) for the Hard set. The largest reduction of average RMSD is for the Medium set where the TASSER(iter) models have an average global RMSD of 5.67 A compared to 6.72 A of the TASSER models. Seventy percent of the Medium set TASSER(iter) models have a smaller RMSD than the TASSER models, while 63% of the Easy and 60% of the Hard TASSER models are improved by TASSER(iter). For the foldable cases, where the targets have a RMSD to the native <6.5 A, TASSER(iter) shows obvious improvement over TASSER models: For the Medium set, it improves the success rate from 57.0 to 67.2%, followed by the Hard targets where the success rate improves from 32.0 to 34.8%, with the smallest improvement in the Easy targets from 82.6 to 84.0%. These results suggest that TASSER(iter) can provide more reliable predictions for targets of Medium difficulty, a range that had resisted improvement in the quality of protein structure predictions.  相似文献   

5.
Shape information about macromolecules is increasingly available but is difficult to use in modeling efforts. We demonstrate that shape information alone can often distinguish structural models of biological macromolecules. By using a data structure called a surface envelope (SE) to represent the shape of the molecule, we propose a method that generates a fitness score for the shape of a particular molecular model. This score correlates well with root mean squared deviation (RMSD) of the model to the known test structures and can be used to filter models in decoy sets. The scoring method requires both alignment of the model to the SE in three-dimensional space and assessment of the degree to which atoms in the model fill the SE. Alignment combines a hybrid algorithm using principal components and a previously published iterated closest point algorithm. We test our method against models generated from random atom perturbation from crystal structures, published decoy sets used in structure prediction, and models created from the trajectories of atoms in molecular modeling runs. We also test our alignment algorithm against experimental electron microscopic data from rice dwarf virus. The alignment performance is reliable, and we show a high correlation between model RMSD and score function. This correlation is stronger for molecular models with greater oblong character (as measured by the ratio of largest to smallest principal component).  相似文献   

6.
Apart from playing key roles in drug metabolism and adverse drug–drug interactions, CYPs are potential drug targets to treat a variety of diseases. The intervention of over expression of P450 1A1 (CYP1A1) in tumor cells is identified as a novel strategy for anticancer therapy. We investigated three isoforms of CYP1 family (CYP1A1, CYP1A2, and CYP1B1) for their substrate specificity. The understanding of macromolecular features that govern substrate specificity is required to understand the interplay between the protein function and dynamics. This can help in design of new antitumor molecule specifically metabolized by CYP1A1 to mediate their antitumor activity. In the present study, we carried out the comparative protein structure analysis of the three isoforms. Sequence alignment, root mean square deviation (RMSD) analysis, B-factor analysis was performed to give a better understanding of the macromolecular features involved in substrate specificity and to understand the interplay between protein dynamics and functions which will have important implications on rational design of anticancer drugs. We identified the differences in amino acid residues among the three isoforms of CYP1 family, which may account for differential substrate specificity. Six putative substrate recognition sequences are characterized along with the regions they form in the protein structure. Further the RMSD and B-factor analysis provides the information about the identified residues having the maximum RMSD and B-factor deviations.  相似文献   

7.
Rangwala H  Karypis G 《Proteins》2008,72(3):1005-1018
The effectiveness of comparative modeling approaches for protein structure prediction can be substantially improved by incorporating predicted structural information in the initial sequence-structure alignment. Motivated by the approaches used to align protein structures, this article focuses on developing machine learning approaches for estimating the RMSD value of a pair of protein fragments. These estimated fragment-level RMSD values can be used to construct the alignment, assess the quality of an alignment, and identify high-quality alignment segments. We present algorithms to solve this fragment-level RMSD prediction problem using a supervised learning framework based on support vector regression and classification that incorporates protein profiles, predicted secondary structure, effective information encoding schemes, and novel second-order pairwise exponential kernel functions. Our comprehensive empirical study shows superior results compared with the profile-to-profile scoring schemes. We also show that for protein pairs with low sequence similarity (less than 12% sequence identity) these new local structural features alone or in conjunction with profile-based information lead to alignments that are considerably accurate than those obtained by schemes that use only profile and/or predicted secondary structure information.  相似文献   

8.
MOTIVATION: We introduce the iRMSD, a new type of RMSD, independent from any structure superposition and suitable for evaluating sequence alignments of proteins with known structures. RESULTS: We demonstrate that the iRMSD is equivalent to the standard RMSD although much simpler to compute and we also show that it is suitable for comparing sequence alignments and benchmarking multiple sequence alignment methods. We tested the iRMSD score on 6 established multiple sequence alignment packages and found the results to be consistent with those obtained using an established reference alignment collection like Prefab. AVAILABILITY: The iRMSD is part of the T-Coffee package and is distributed as an open source freeware (http://www.tcoffee.org/).  相似文献   

9.
The root mean square deviation (RMSD) and the least RMSD are two widely used similarity measures in structural bioinformatics. Yet, they stem from global comparisons, possibly obliterating locally conserved motifs. We correct these limitations with the so-called combined RMSD, which mixes independent lRMSD measures, each computed with its own rigid motion. The combined RMSD is relevant in two main scenarios, namely to compare (quaternary) structures based on motifs defined from the sequence (domains and SSE) and to compare structures based on structural motifs yielded by local structural alignment methods. We illustrate the benefits of combined RMSD over the usual RMSD on three problems, namely (a) the assignment of quaternary structures for hemoglobin (scenario #1), (b) the calculation of structural phylogenies (case study: class II fusion proteins; scenario #1), and (c) the analysis of conformational changes based on combined RMSD of rigid structural motifs (case study: one class II fusion protein; scenario #2). Based on these illustrations, we argue that the combined RMSD is a tool of choice to perform positive and negative discrimination of degree of freedom, with applications to the design of move sets and collective coordinates. Executables to compute combined RMSD are available within the Structural Bioinformatics Library ( http://sbl.inria.fr ).  相似文献   

10.
Strength, or maximum joint torque, is a fundamental factor governing human movement, and is regularly assessed for clinical and rehabilitative purposes as well as for research into human performance. This study aimed to identify the most appropriate protocol for fitting a maximum voluntary torque function to experimental joint torque data. Three participants performed maximum isometric and concentric-eccentric knee extension trials on an isovelocity dynamometer and a separate experimental protocol was used to estimate maximum knee extension angular velocity. A nine parameter maximum voluntary torque function, which included angle, angular velocity and neural inhibition effects, was fitted to the experimental torque data and three aspects of this fitting protocol were investigated. Using an independent experimental estimate of maximum knee extension angular velocity gave lower variability in the high concentric velocity region of the maximum torque function compared to using dynamometer measurements alone. A weighted root mean square difference (RMSD) score function, that forced the majority (73-92%) of experimental data beneath the maximum torque function, was found to best account for the one-sided noise in experimental torques resulting from sub-maximal effort by the participants. The suggested protocol (an appropriately weighted RMSD score function and an independent estimate of maximum knee extension angular velocity) gave a weighted RMSD of between 11 and 13 Nm (4-5% of maximum isometric torque). It is recommended that this protocol be used in generating maximum voluntary joint torque functions in all torque-based modelling of dynamic human movement.  相似文献   

11.
Shah SB  Sahinidis NV 《PloS one》2012,7(5):e37493
Protein structure alignment is the problem of determining an assignment between the amino-acid residues of two given proteins in a way that maximizes a measure of similarity between the two superimposed protein structures. By identifying geometric similarities, structure alignment algorithms provide critical insights into protein functional similarities. Existing structure alignment tools adopt a two-stage approach to structure alignment by decoupling and iterating between the assignment evaluation and structure superposition problems. We introduce a novel approach, SAS-Pro, which addresses the assignment evaluation and structure superposition simultaneously by formulating the alignment problem as a single bilevel optimization problem. The new formulation does not require the sequentiality constraints, thus generalizing the scope of the alignment methodology to include non-sequential protein alignments. We employ derivative-free optimization methodologies for searching for the global optimum of the highly nonlinear and non-differentiable RMSD function encountered in the proposed model. Alignments obtained with SAS-Pro have better RMSD values and larger lengths than those obtained from other alignment tools. For non-sequential alignment problems, SAS-Pro leads to alignments with high degree of similarity with known reference alignments. The source code of SAS-Pro is available for download at http://eudoxus.cheme.cmu.edu/saspro/SAS-Pro.html.  相似文献   

12.
Quantification of statistical significance is essential for the interpretation of protein structural similarity. To address this, a random model for protein structure comparison was developed. Novelty of the model is threefold. First, a sample of random structure comparisons is restricted to molecules of the same size and shape as the superposition of interest. Second, careful selection of the sample and accurate modeling of shape allows approximation of the root mean square deviation (RMSD) distribution of random comparisons with a Nakagami probability density function. Third, through convolution, a second probability density function is obtained that describes the coordinate difference vector projections underlying the random distribution of RMSD. This last feature allows sampling random distributions of not only RMSD, but also any similarity score that depends on difference vector projections, such as GDT_TS score, TM score, and LiveBench 3D score. Probabilities estimated from the method correlate well with common measures of structural similarity, such as the Dali Z-score and the GDT_TS score. As a result, the p-value for a given superposition can be calculated using simple formulae depending on RMSD, radius of gyration, and thinnest molecular dimension. In addition to scoring structural similarity, p-values computed by this method can be applied to evaluation of homology modeling techniques, providing a statistically sound alternative to scores used in reference-independent evaluation of alignment quality.  相似文献   

13.
Finding structural similarities in distantly related proteins can reveal functional relationships that can not be identified using sequence comparison. Given two proteins A and B and threshold ε ?, we develop an algorithm, TRiplet-based Iterative ALignment (TRIAL) for computing the transformation of B that maximizes the number of aligned residues such that the root mean square deviation (RMSD) of the alignment is at most ε ?. Our algorithm is designed with the specific goal of effectively handling proteins with low similarity in primary structure, where existing algorithms perform particularly poorly. Experiments show that our method outperforms existing methods. TRIAL alignment brings the secondary structures of distantly related proteins to similar orientations. It also finds larger number of secondary structure matches at lower RMSD values and increased overall alignment lengths. Its classification accuracy is up to 63 percent better than other methods, including CE and DALI. TRIAL successfully aligns 83 percent of the residues from the smaller protein in reasonable time while other methods align only 29 to 65 percent of the residues for the same set of proteins.  相似文献   

14.
The analysis of structural mobility in molecular dynamics plays a key role in data interpretation, particularly in the simulation of biomolecules. The most common mobility measures computed from simulations are the Root Mean Square Deviation (RMSD) and Root Mean Square Fluctuations (RMSF) of the structures. These are computed after the alignment of atomic coordinates in each trajectory step to a reference structure. This rigid-body alignment is not robust, in the sense that if a small portion of the structure is highly mobile, the RMSD and RMSF increase for all atoms, resulting possibly in poor quantification of the structural fluctuations and, often, to overlooking important fluctuations associated to biological function. The motivation of this work is to provide a robust measure of structural mobility that is practical, and easy to interpret. We propose a Low-Order-Value-Optimization (LOVO) strategy for the robust alignment of the least mobile substructures in a simulation. These substructures are automatically identified by the method. The algorithm consists of the iterative superposition of the fraction of structure displaying the smallest displacements. Therefore, the least mobile substructures are identified, providing a clearer picture of the overall structural fluctuations. Examples are given to illustrate the interpretative advantages of this strategy. The software for performing the alignments was named MDLovoFit and it is available as free-software at: http://leandro.iqm.unicamp.br/mdlovofit  相似文献   

15.
Mestres J 《Proteins》2005,58(3):596-609
The recent availability of crystal structures for several diverse cytochromes P450 (CYPs) offers the possibility to perform an up-to-date comparative analysis to identify the degree of structure conservation among this superfamily of enzymes specially relevant for their involvement in drug metabolism and toxicity. A set of 9 CYPs sharing between 10% and 27% sequence identity was selected, including 7 class I (CYP 101, 107, 108, 119, 121, 51, and 55) and two class II (CYP 102, and 2C5) structures. After obtaining a multiprotein structure superimposition, a structure-based sequence alignment was derived. Mapping the level of three-dimensional structural conservation onto the sequence alignment revealed that over 28% of the alignment positions have the Calpha carbons of their residues within a root-mean-square deviation (RMSD) of 2 A. This degree of structure conservation is found to be generally preserved, even when the structure undergoes dramatic conformational changes. Performing the analysis on 4 members of the CYP2 family (CYP 2B4, 2C5, 2C8, and 2C9), the percentage of alignment positions within 2 A RMSD amounted to 73%, increasing to over 85% when only structures in a closed conformation are considered. The present findings suggest that it should be plausible to derive models of overall good quality for the major CYP2 metabolizing forms (CYP 2A6, 2C19, 2D6, and 2E1), whereas high levels of uncertainty are still likely to be expected in models for the remaining 2 major P450 metabolizing forms (CYP 1A2 and 3A4), with the corresponding implications for their potential applicability in drug design activities.  相似文献   

16.
Reliable prediction of model accuracy is an important unsolved problem in protein structure modeling. To address this problem, we studied 24 individual assessment scores, including physics-based energy functions, statistical potentials, and machine learning-based scoring functions. Individual scores were also used to construct approximately 85,000 composite scoring functions using support vector machine (SVM) regression. The scores were tested for their abilities to identify the most native-like models from a set of 6000 comparative models of 20 representative protein structures. Each of the 20 targets was modeled using a template of <30% sequence identity, corresponding to challenging comparative modeling cases. The best SVM score outperformed all individual scores by decreasing the average RMSD difference between the model identified as the best of the set and the model with the lowest RMSD (DeltaRMSD) from 0.63 A to 0.45 A, while having a higher Pearson correlation coefficient to RMSD (r=0.87) than any other tested score. The most accurate score is based on a combination of the DOPE non-hydrogen atom statistical potential; surface, contact, and combined statistical potentials from MODPIPE; and two PSIPRED/DSSP scores. It was implemented in the SVMod program, which can now be applied to select the final model in various modeling problems, including fold assignment, target-template alignment, and loop modeling.  相似文献   

17.
Chen H  Kihara D 《Proteins》2008,71(3):1255-1274
The error in protein tertiary structure prediction is unavoidable, but it is not explicitly shown in most of the current prediction algorithms. Estimated error of a predicted structure is crucial information for experimental biologists to use the prediction model for design and interpretation of experiments. Here, we propose a method to estimate errors in predicted structures based on the stability of the optimal target-template alignment when compared with a set of suboptimal alignments. The stability of the optimal alignment is quantified by an index named the SuboPtimal Alignment Diversity (SPAD). We implemented SPAD in a profile-based threading algorithm and investigated how well SPAD can indicate errors in threading models using a large benchmark dataset of 5232 alignments. SPAD shows a very good correlation not only to alignment shift errors but also structure-level errors, the root mean square deviation (RMSD) of predicted structure models to the native structures (i.e. global errors), and local errors at each residue position. We have further compared SPAD with seven other quality measures, six from sequence alignment-based measures and one atomic statistical potential, discrete optimized protein energy (DOPE), in terms of the correlation coefficient to the global and local structure-level errors. In terms of the correlation to the RMSD of structure models, when a target and a template are in the same SCOP family, the sequence identity showed a best correlation to the RMSD; in the superfamily level, SPAD was the best; and in the fold level, DOPE was best. However, in a head-to-head comparison, SPAD wins over the other measures. Next, SPAD is compared with three other measures of local errors. In this comparison, SPAD was best in all of the family, the superfamily and the fold levels. Using the discovered correlation, we have also predicted the global and local error of our predicted structures of CASP7 targets by the SPAD. Finally, we proposed a sausage representation of predicted tertiary structures which intuitively indicate the predicted structure and the estimated error range of the structure simultaneously.  相似文献   

18.
The building of protein structures from alpha-carbon coordinates   总被引:3,自引:0,他引:3  
P E Correa 《Proteins》1990,7(4):366-377
A procedure for the construction of complete protein structures from only alpha-carbon coordinates is described. This involves building the backbone by sequential addition of Pro, Gly, or Ala residues. This main chain structure is then refined using molecular dynamics. Side chains are constructed by sequential addition of atoms with intermediate molecular dynamics refinement. For alpha lytic protease (a structure that is mostly beta sheet) a backbone root mean square deviation (RMSD) of 0.19 A and an overall RMSD of 1.24 A from the crystallographic coordinates are attained. For troponin C (67% alpha-helix), where the coordinates are available only for the alpha-carbons, a backbone RMSD of 0.41 A and an overall RMSD of 1.68 A are attained (fits kindly provided by Dr. Michael James and Natalie Strynadka). For flavodoxin a backbone RMSD of 0.49 A and an overall RMSD of 1.64 A were attained.  相似文献   

19.
Similarity of protein structures has been analyzed using three-dimensional Delaunay triangulation patterns derived from the backbone representation. It has been found that structurally related proteins have a common spatial invariant part, a set of tetrahedrons, mathematically described as a common spatial subgraph volume of the three-dimensional contact graph derived from Delaunay tessellation (DT). Based on this property of protein structures, we present a novel common volume superimposition (TOPOFIT) method to produce structural alignments. Structural alignments usually evaluated by a number of equivalent (aligned) positions (N(e)) with corresponding root mean square deviation (RMSD). The superimposition of the DT patterns allows one to uniquely identify a maximal common number of equivalent residues in the structural alignment. In other words, TOPOFIT identifies a feature point on the RMSD N(e) curve, a topomax point, until which the topologies of two structures correspond to each other, including backbone and interresidue contacts, whereas the growing number of mismatches between the DT patterns occurs at larger RMSD (N(e)) after the topomax point. It has been found that the topomax point is present in all alignments from different protein structural classes; therefore, the TOPOFIT method identifies common, invariant structural parts between proteins. The alignments produced by the TOPOFIT method have a good correlation with alignments produced by other current methods. This novel method opens new opportunities for the comparative analysis of protein structures and for more detailed studies on understanding the molecular principles of tertiary structure organization and functionality. The TOPOFIT method also helps to detect conformational changes, topological differences in variable parts, which are particularly important for studies of variations in active/ binding sites and protein classification.  相似文献   

20.
Irving JA  Whisstock JC  Lesk AM 《Proteins》2001,42(3):378-382
Structural genomics-the systematic solution of structures of the proteins of an organism-will increasingly often produce molecules of unknown function with no close relative of known function. Prediction of protein function from structure has thereby become a challenging problem of computational molecular biology. The strong conservation of active site conformations in homologous proteins suggests a method for identifying them. This depends on the relationship between size and goodness-of-fit of aligned substructures in homologous proteins. For all pairs of proteins studied, the root-mean-square deviation (RMSD) as a function of the number of residues aligned varies exponentially for large common substructures and linearly for small common substructures. The exponent of the dependence at large common substructures is well correlated with the RMSD of the core as originally calculated by Chothia and Lesk (EMBO J 1986;5:823-826), affording the possibility of reconciling different structural alignment procedures. In the region of small common substructures, reduced aligned subsets define active sites and can be used to suggest the locations of active sites in homologous proteins.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号