首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 596 毫秒
1.

Background

Popular bioinformatics approaches for studying protein functional dynamics include comparisons of crystallographic structures, molecular dynamics simulations and normal mode analysis. However, determining how observed displacements and predicted motions from these traditionally separate analyses relate to each other, as well as to the evolution of sequence, structure and function within large protein families, remains a considerable challenge. This is in part due to the general lack of tools that integrate information of molecular structure, dynamics and evolution.

Results

Here, we describe the integration of new methodologies for evolutionary sequence, structure and simulation analysis into the Bio3D package. This major update includes unique high-throughput normal mode analysis for examining and contrasting the dynamics of related proteins with non-identical sequences and structures, as well as new methods for quantifying dynamical couplings and their residue-wise dissection from correlation network analysis. These new methodologies are integrated with major biomolecular databases as well as established methods for evolutionary sequence and comparative structural analysis. New functionality for directly comparing results derived from normal modes, molecular dynamics and principal component analysis of heterogeneous experimental structure distributions is also included. We demonstrate these integrated capabilities with example applications to dihydrofolate reductase and heterotrimeric G-protein families along with a discussion of the mechanistic insight provided in each case.

Conclusions

The integration of structural dynamics and evolutionary analysis in Bio3D enables researchers to go beyond a prediction of single protein dynamics to investigate dynamical features across large protein families. The Bio3D package is distributed with full source code and extensive documentation as a platform independent R package under a GPL2 license from http://thegrantlab.org/bio3d/.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0399-6) contains supplementary material, which is available to authorized users.  相似文献   

2.
How to compare the structures of an ensemble of protein conformations is a fundamental problem in structural biology. As has been previously observed, the widely used RMSD measure due to Kabsch, in which a rigid‐body superposition minimizing the least‐squares positional deviations is performed, has its drawbacks when comparing and visualizing a set of flexible protein structures. Here, we develop a method, fleximatch, of protein structure comparison that takes flexibility into account. Based on a distance matrix measure of flexibility, a weighted superposition of distance matrices rather than of atomic coordinates is performed. Subsequently, this allows a consistent determination of (a) a superposition of structures for visualization, (b) a partitioning of the protein structure into rigid molecular components (core atoms), and (c) an atomic mobility measure. The method is suitable for highlighting both particularly flexible and rigid parts of a protein from structures derived from NMR, X‐ray diffraction or molecular simulation. Proteins 2015; 83:820–826. © 2015 Wiley Periodicals, Inc.  相似文献   

3.
We present an automated method incorporated into a software package, FOLDER, to fold a protein sequence on a given three-dimensional (3D) template. Starting with the sequence alignment of a family of homologous proteins, tertiary structures are modeled using the known 3D structure of one member of the family as a template. Homologous interatomic distances from the template are used as constraints. For nonhomologous regions in the model protein, the lower and the upper bounds for the interatomic distances are imposed by steric constraints and the globular dimensions of the template, respectively. Distance geometry is used to embed an ensemble of structures consistent with these distance bounds. Structures are selected from this ensemble based on minimal distance error criteria, after a penalty function optimization step. These structures are then refined using energy optimization methods. The method is tested by simulating the alpha-chain of horse hemoglobin using the alpha-chain of human hemoglobin as the template and by comparing the generated models with the crystal structure of the alpha-chain of horse hemoglobin. We also test the packing efficiency of this method by reconstructing the atomic positions of the interior side chains beyond C beta atoms of a protein domain from a known 3D structure. In both test cases, models retain the template constraints and any additionally imposed constraints while the packing of the interior residues is optimized with no short contacts or bond deformations. To demonstrate the use of this method in simulating structures of proteins with nonhomologous disulfides, we construct a model of murine interleukin (IL)-4 using the NMR structure of human IL-4 as the template. The resulting geometry of the nonhomologous disulfide in the model structure for murine IL-4 is consistent with standard disulfide geometry.  相似文献   

4.
《Proteins》2018,86(5):501-514
The structural variations of multidomain proteins with flexible parts mediate many biological processes, and a structure ensemble can be determined by selecting a weighted combination of representative structures from a simulated structure pool, producing the best fit to experimental constraints such as interatomic distance. In this study, a hybrid structure‐based and physics‐based atomistic force field with an efficient sampling strategy is adopted to simulate a model di‐domain protein against experimental paramagnetic relaxation enhancement (PRE) data that correspond to distance constraints. The molecular dynamics simulations produce a wide range of conformations depicted on a protein energy landscape. Subsequently, a conformational ensemble recovered with low‐energy structures and the minimum‐size restraint is identified in good agreement with experimental PRE rates, and the result is also supported by chemical shift perturbations and small‐angle X‐ray scattering data. It is illustrated that the regularizations of energy and ensemble‐size prevent an arbitrary interpretation of protein conformations. Moreover, energy is found to serve as a critical control to refine the structure pool and prevent data overfitting, because the absence of energy regularization exposes ensemble construction to the noise from high‐energy structures and causes a more ambiguous representation of protein conformations. Finally, we perform structure‐ensemble optimizations with a topology‐based structure pool, to enhance the understanding on the ensemble results from different sources of pool candidates.  相似文献   

5.
The multiconformer nature of solution nuclear magnetic resonance (NMR) structures of proteins results from the effects of intramolecular dynamics, spin diffusion and an uneven distribution of structural restraints throughout the molecule. A delineation of the former from the latter two contributions is attempted in this work for an ensemble of 15 NMR structures of the protein Escherichia coli ribonuclease HI (RNase HI). Exploration of the dynamic information content of the NMR ensemble is carried out through correlation with data from two crystal structures and a 1.7‐ns molecular dynamics (MD) trajectory of RNase HI in explicit solvent. Assessment of the consistency of the crystal and mean MD structures with nuclear Overhauser effect (NOE) data showed that the NMR ensemble is overall more compatible with the high‐resolution (1.48 Å) crystal structure than with either the lower‐resolution (2.05 Å) crystal structure or the MD simulation. Furthermore, the NMR ensemble is found to span more conformational space than the MD simulation for both the backbone and the sidechains of RNase HI. Nonetheless, the backbone conformational variability of both the NMR ensemble and the simulation is especially consistent with NMR relaxation measurements of two loop regions that are putative sites of substrate recognition. Plausible side‐chain dynamic information is extracted from the NMR ensemble on the basis of (i) rotamericity and syn‐pentane character of variable torsion angles, (ii) comparison of the magnitude of atomic mean‐square fluctuations (msf) with those deduced from crystallographic thermal factors, and (iii) comparison of torsion angle conformational behavior in the NMR ensemble and the simulation. Several heterogeneous torsion angles, while adopting non‐rotameric/syn‐pentane conformations in the NMR ensemble, exist in a unique conformation in the simulation and display low X‐ray thermal factors. These torsions are identified as sites whose variability is likely to be an artifact of the NMR structure determination procedure. A number of other torsions show a close correspondence between the conformations sampled in the NMR and MD ensembles, as well as significant correlations among crystallographic thermal factors and atomic msf calculated from the NMR ensemble and the simulation. These results indicate that a significant amount of dynamic information is contained in the NMR ensemble. The relevance of the present findings for the biological function of RNase HI, protein recognition studies, and previous investigations of the motional content of protein NMR structures are discussed. Proteins 1999;36:87–110. © 1999 Wiley‐Liss, Inc.  相似文献   

6.
An efficient new method is presented for the characterization of motional correlations derived from a set of protein structures without requiring the separation of overall and internal motion. In this method, termed isotropically distributed ensemble (IDE) analysis, each structure is represented by an ensemble of isotropically distributed replicas corresponding to the situation found in an isotropic protein solution. This leads to a covariance matrix of the cartesian atomic positions with elements proportional to the ensemble average of scalar products of the position vectors with respect to the center of mass. Diagonalization of the covariance matrix yields eigenmodes and amplitudes that describe concerted motions of atoms, including overall rotational and intramolecular dynamics. It is demonstrated that this covariance matrix naturally distinguishes between "rigid" and "mobile" parts without necessitating a priori selection of a reference structure and an atom set for the orientational alignment process. The method was applied to the analysis of a 5-ns molecular dynamics trajectory of native ubiquitin and a 40-ns trajectory of a partially folded state of ubiquitin. The results were compared with essential dynamics analysis. By taking advantage of the spherical symmetry of the IDE covariance matrix, more than a 10-fold speed up is achieved for the computation of eigenmodes and mode amplitudes. IDE analysis is particularly suitable for studying the correlated dynamics of flexible and large molecules.  相似文献   

7.
Liisa Holm  Chris Sander 《Proteins》1994,19(3):165-173
The number of protein structures known in atomic detail has increased from one in 1960 (Kendrew, J. C., Strandberg, B. E., Hart, R. G., Davies, D. R., Phillips, D. C., Shore, V. C. Nature (London) 185:422–427, 1960) to more than 1000 in 1994. The rate at which new structures are being published exceeds one a day as a result of recent advances in protein engineering, crystallography, and spectroscopy. More and more frequently, a newly determined structure is similar in fold to a known one, even when no sequence similarity is detectable. A new generation of computer algorithms has now been developed that allows routine comparison of a protein structure with the database of all known structures. Such structure database searches are already used daily and they are beginning to rival sequence database searches as a tool for discovering biologically interesting relationships. © 1994 Wiley-Liss, Inc.  相似文献   

8.
Statistical potentials that embody torsion angle probability densities in databases of high‐quality X‐ray protein structures supplement the incomplete structural information of experimental nuclear magnetic resonance (NMR) datasets. By biasing the conformational search during the course of structure calculation toward highly populated regions in the database, the resulting protein structures display better validation criteria and accuracy. Here, a new statistical torsion angle potential is developed using adaptive kernel density estimation to extract probability densities from a large database of more than 106 quality‐filtered amino acid residues. Incorporated into the Xplor‐NIH software package, the new implementation clearly outperforms an older potential, widely used in NMR structure elucidation, in that it exhibits simultaneously smoother and sharper energy surfaces, and results in protein structures with improved conformation, nonbonded atomic interactions, and accuracy.  相似文献   

9.
It is commonly believed that similarities between the sequences of two proteins infer similarities between their structures. Sequence alignments reliably recognize pairs of protein of similar structures provided that the percentage sequence identity between their two sequences is sufficiently high. This distinction, however, is statistically less reliable when the percentage sequence identity is lower than 30% and little is known then about the detailed relationship between the two measures of similarity. Here, we investigate the inverse correlation between structural similarity and sequence similarity on 12 protein structure families. We define the structure similarity between two proteins as the cRMS distance between their structures. The sequence similarity for a pair of proteins is measured as the mean distance between the sequences in the subsets of sequence space compatible with their structures. We obtain an approximation of the sequence space compatible with a protein by designing a collection of protein sequences both stable and specific to the structure of that protein. Using these measures of sequence and structure similarities, we find that structural changes within a protein family are linearly related to changes in sequence similarity.  相似文献   

10.
The ability to consistently distinguish real protein structures from computationally generated model decoys is not yet a solved problem. One route to distinguish real protein structures from decoys is to delineate the important physical features that specify a real protein. For example, it has long been appreciated that the hydrophobic cores of proteins contribute significantly to their stability. We used two sources to obtain datasets of decoys to compare with real protein structures: submissions to the biennial Critical Assessment of protein Structure Prediction competition, in which researchers attempt to predict the structure of a protein only knowing its amino acid sequence, and also decoys generated by 3DRobot, which have user‐specified global root‐mean‐squared deviations from experimentally determined structures. Our analysis revealed that both sets of decoys possess cores that do not recapitulate the key features that define real protein cores. In particular, the model structures appear more densely packed (because of energetically unfavorable atomic overlaps), contain too few residues in the core, and have improper distributions of hydrophobic residues throughout the structure. Based on these observations, we developed a feed‐forward neural network, which incorporates key physical features of protein cores, to predict how well a computational model recapitulates the real protein structure without knowledge of the structure of the target sequence. By identifying the important features of protein structure, our method is able to rank decoy structures with similar accuracy to that obtained by state‐of‐the‐art methods that incorporate many additional features. The small number of physical features makes our model interpretable, emphasizing the importance of protein packing and hydrophobicity in protein structure prediction.  相似文献   

11.
12.
The nature of flexibility in the helix‐turn‐helix region of E. coli trp aporepressor has been unexplained for many years. The original ensemble of nuclear magnetic resonance (NMR structures showed apparent disorder, but chemical shift and relaxation measurements indicated a helical region. Nuclear Overhauser effect (NOE) data for a temperature‐sensitive mutant showed more helical character in its helix‐turn‐helix region, but nevertheless also led to an apparently disordered ensemble. However, conventional NMR structure determination methods require all structures in the ensemble to be consistent with every NOE simultaneously. This work uses an alternative approach in which some structures of the ensemble are allowed to violate some NOEs to permit modeling of multiple conformational states that are in dynamic equilibrium. Newly measured NOE data for wild‐type aporepressor are used as time‐averaged distance restraints in molecular dynamics simulations to generate an ensemble of helical conformations that is more consistent with the observed NMR data than the apparent disorder in the previously reported NMR structures. The results indicate the presence of alternating helical conformations that provide a better explanation for the flexibility of the helix‐turn‐helix region of trp aporepressor. Structures representing these conformations have been deposited with PDB ID: 5TM0. Proteins 2017; 85:731–740. © 2016 Wiley Periodicals, Inc.  相似文献   

13.
We report the application of an integrated computational approach for biomolecular structure determination at a low resolution. In particular, a neural network is trained to predict the spatial proximity of C-alpha atoms that are less than a given threshold apart, whereas a Kalman filter algorithm is employed to outline the biomolecular fold, with a constraints set that includes these pairwise atomic distances, and the distances and angles that define the structure as it is known from the protein's sequence. The results for Crambin demonstrate that this integrated approach is useful for molecular structure prediction at a low resolution and may also complement existing experimental distance data for a protein structure determination. © 1996 John Wiley & Sons, Inc.  相似文献   

14.
Chao Fang  Yi Shang  Dong Xu 《Proteins》2018,86(5):592-598
Protein secondary structure prediction can provide important information for protein 3D structure prediction and protein functions. Deep learning offers a new opportunity to significantly improve prediction accuracy. In this article, a new deep neural network architecture, named the Deep inception‐inside‐inception (Deep3I) network, is proposed for protein secondary structure prediction and implemented as a software tool MUFOLD‐SS. The input to MUFOLD‐SS is a carefully designed feature matrix corresponding to the primary amino acid sequence of a protein, which consists of a rich set of information derived from individual amino acid, as well as the context of the protein sequence. Specifically, the feature matrix is a composition of physio‐chemical properties of amino acids, PSI‐BLAST profile, and HHBlits profile. MUFOLD‐SS is composed of a sequence of nested inception modules and maps the input matrix to either eight states or three states of secondary structures. The architecture of MUFOLD‐SS enables effective processing of local and global interactions between amino acids in making accurate prediction. In extensive experiments on multiple datasets, MUFOLD‐SS outperformed the best existing methods and other deep neural networks significantly. MUFold‐SS can be downloaded from http://dslsrv8.cs.missouri.edu/~cf797/MUFoldSS/download.html .  相似文献   

15.
With the rapid increase in the size of the genome sequence database, computational analysis of RNA will become increasingly important in revealing structure-function relationships and potential drug targets. RNA secondary structure prediction for a single sequence is 73 % accurate on average for a large database of known secondary structures. This level of accuracy provides a good starting point for determining a secondary structure either by comparative sequence analysis or by the interpretation of experimental studies. Dynalign is a new computer algorithm that improves the accuracy of structure prediction by combining free energy minimization and comparative sequence analysis to find a low free energy structure common to two sequences without requiring any sequence identity. It uses a dynamic programming construct suggested by Sankoff. Dynalign, however, restricts the maximum distance, M, allowed between aligned nucleotides in the two sequences. This makes the calculation tractable because the complexity is simplified to O(M(3)N(3)), where N is the length of the shorter sequence.The accuracy of Dynalign was tested with sets of 13 tRNAs, seven 5 S rRNAs, and two R2 3' UTR sequences. On average, Dynalign predicted 86.1 % of known base-pairs in the tRNAs, as compared to 59.7 % for free energy minimization alone. For the 5 S rRNAs, the average accuracy improves from 47.8 % to 86.4 %. The secondary structure of the R2 3' UTR from Drosophila takahashii is poorly predicted by standard free energy minimization. With Dynalign, however, the structure predicted in tandem with the sequence from Drosophila melanogaster nearly matches the structure determined by comparative sequence analysis.  相似文献   

16.
We propose a new approach for calculating the three-dimensional (3D) structure of a protein from distance and dihedral angle constraints derived from experimental data. We suggest that such constraints can be obtained from experiments such as tritium planigraphy, chemical or enzymatic cleavage of the polypeptide chain, paramagnetic perturbation of nuclear magnetic resonance (NMR) spectra, measurement of hydrogen-exchange rates, mutational studies, mass spectrometry, and electron paramagnetic resonance. These can be supplemented with constraints from theoretical prediction of secondary structures and of buried/exposed residues. We report here distance geometry calculations to generate the structures of a test protein Staphylococcal nuclease (STN), and the HIV-1 rev protein (REV) of unknown structure. From the available 3D atomic coordinates of STN, we set up simulated data sets consisting of varying number and quality of constraints, and used our group's Self Correcting Distance Geometry (SECODG) program DIAMOD to generate structures. We could generate the correct tertiary fold from qualitative (approximate) as well as precise distance constraints. The root mean square deviations of backbone atoms from the native structure were in the range of 2.0 A to 8.3 A, depending on the number of constraints used. We could also generate the correct fold starting from a subset of atoms that are on the surface and those that are buried. When we used data sets containing a small fraction of incorrect distance constraints, the SECODG technique was able to detect and correct them. In the case of REV, we used a combination of constraints obtained from mutagenic data and structure predictions. DIAMOD generated helix-loop-helix models, which, after four self-correcting cycles, populated one family exclusively. The features of the energy-minimized model are consistent with the available data on REV-RNA interaction. Our method could thus be an attractive alternative for calculating protein 3D structures, especially in cases where the traditional methods of X-ray crystallography and multidimensional NMR spectroscopy have been unsuccessful.  相似文献   

17.
18.
SUMMARY: OTUbase is an R package designed to facilitate the analysis of operational taxonomic unit (OTU) data and sequence classification (taxonomic) data. Currently there are programs that will cluster sequence data into OTUs and/or classify sequence data into known taxonomies. However, there is a need for software that can take the summarized output of these programs and organize it into easily accessed and manipulated formats. OTUbase provides this structure and organization within R, to allow researchers to easily manipulate the data with the rich library of R packages currently available for additional analysis. AVAILABILITY: OTUbase is an R package available through Bioconductor. It can be found at http://www.bioconductor.org/packages/release/bioc/html/OTUbase.html.  相似文献   

19.
Abstract

Structures and functions of proteins play various essential roles in biological processes. The functions of newly discovered proteins can be predicted by comparing their structures with that of known-functional proteins. Many approaches have been proposed for measuring the protein structure similarity, such as the template-modeling (TM)-score method, GRaphlet (GR)-Align method as well as the commonly used root-mean-square deviation (RMSD) measures. However, the alignment comparisons between the similarity of protein structure cost much time on large dataset, and the accuracy still have room to improve. In this study, we introduce a new three-dimensional (3D) Yau–Hausdorff distance between any two 3D objects. The (3D) Yau–Hausdorff distance can be used in particular to measure the similarity/dissimilarity of two proteins of any size and does not need aligning and superimposing two structures. We apply structural similarity to study function similarity and perform phylogenetic analysis on several datasets. The results show that (3D) Yau–Hausdorff distance could serve as a more precise and effective method to discover biological relationships between proteins than other methods on structure comparison.

Communicated by Ramaswamy H. Sarma  相似文献   

20.
M. F. Thorpe  S. Banu Ozkan 《Proteins》2015,83(12):2279-2292
The most successful protein structure prediction methods to date have been template‐based modeling (TBM) or homology modeling, which predicts protein structure based on experimental structures. These high accuracy predictions sometimes retain structural errors due to incorrect templates or a lack of accurate templates in the case of low sequence similarity, making these structures inadequate in drug‐design studies or molecular dynamics simulations. We have developed a new physics based approach to the protein refinement problem by mimicking the mechanism of chaperons that rehabilitate misfolded proteins. The template structure is unfolded by selectively (targeted) pulling on different portions of the protein using the geometric based technique FRODA, and then refolded using hierarchically restrained replica exchange molecular dynamics simulations (hr‐REMD). FRODA unfolding is used to create a diverse set of topologies for surveying near native‐like structures from a template and to provide a set of persistent contacts to be employed during re‐folding. We have tested our approach on 13 previous CASP targets and observed that this method of folding an ensemble of partially unfolded structures, through the hierarchical addition of contact restraints (that is, first local and then nonlocal interactions), leads to a refolding of the structure along with refinement in most cases (12/13). Although this approach yields refined models through advancement in sampling, the task of blind selection of the best refined models still needs to be solved. Overall, the method can be useful for improved sampling for low resolution models where certain of the portions of the structure are incorrectly modeled. Proteins 2015; 83:2279–2292. © 2015 Wiley Periodicals, Inc.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号