首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 390 毫秒
1.
The function of the protein is primarily dictated by its structure. Therefore it is far more logical to find the functional clues of the protein in its overall 3-dimensional fold or its global structure. In this paper, we have developed a novel Support Vector Machines (SVM) based prediction model for functional classification and prediction of proteins using features extracted from its global structure based on fragment libraries. Fragment libraries have been previously used for abintio modelling of proteins and protein structure comparisons. The query protein structure is broken down into a collection of short contiguous backbone fragments and this collection is discretized using a library of fragments. The input feature vector is frequency vector that counts the number of each library fragment in the collection of fragments by all-to-all fragment comparisons. SVM models were trained and optimised for obtaining the best 10-fold Cross validation accuracy for classification. As an example, this method was applied for prediction and classification of Cell Adhesion molecules (CAMs). Thirty-four different fragment libraries with sizes ranging from 4 to 400 and fragment lengths ranging from 4 to 12 were used for obtaining the best prediction model. The best 10-fold CV accuracy of 95.25% was obtained for library of 400 fragments of length 10. An accuracy of 87.5% was obtained on an unseen test dataset consisting of 20 CAMs and 20 NonCAMs. This shows that protein structure can be accurately and uniquely described using 400 representative fragments of length 10.  相似文献   

2.
There have been steady improvements in protein structure prediction during the past 2 decades. However, current methods are still far from consistently predicting structural models accurately with computing power accessible to common users. Toward achieving more accurate and efficient structure prediction, we developed a number of novel methods and integrated them into a software package, MUFOLD. First, a systematic protocol was developed to identify useful templates and fragments from Protein Data Bank for a given target protein. Then, an efficient process was applied for iterative coarse‐grain model generation and evaluation at the Cα or backbone level. In this process, we construct models using interresidue spatial restraints derived from alignments by multidimensional scaling, evaluate and select models through clustering and static scoring functions, and iteratively improve the selected models by integrating spatial restraints and previous models. Finally, the full‐atom models were evaluated using molecular dynamics simulations based on structural changes under simulated heating. We have continuously improved the performance of MUFOLD by using a benchmark of 200 proteins from the Astral database, where no template with >25% sequence identity to any target protein is included. The average root‐mean‐square deviation of the best models from the native structures is 4.28 Å, which shows significant and systematic improvement over our previous methods. The computing time of MUFOLD is much shorter than many other tools, such as Rosetta. MUFOLD demonstrated some success in the 2008 community‐wide experiment for protein structure prediction CASP8. Proteins 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

3.
We have examined how the hydrogen bond geometry in three different proteins is affected when structural restraints based on measurements of residual dipolar couplings are included in the structure calculations. The study shows, that including restraints based solely on (1)H(N)-(15)N residual dipolar couplings has pronounced impact on the backbone rmsd and Ramachandran plot but does not improve the hydrogen bond geometry. In the case of chymotrypsin inhibitor 2 the addition of (13)CO-(13)C(alpha) and (15)N-(13)CO one bond dipolar couplings as restraints in the structure calculations improved the hydrogen bond geometry to a quality comparable to that obtained in the 1.8 A resolution X-ray structure of this protein. A systematic restraint study was performed, in which four types of restraints, residual dipolar couplings, hydrogen bonds, TALOS angles and NOEs, were allowed in two states. This study revealed the importance of using several types of residual dipolar couplings to get good hydrogen bond geometry. The study also showed that using a small set of NOEs derived only from the amide protons, together with a full set of residual dipolar couplings resulted in structures of very high quality. When reducing the NOE set, it is mainly the side-chain to side-chain NOEs that are removed. Despite of this the effect on the side-chain packing is very small when a reduced NOE set is used, which implies that the over all fold of a protein structure is mainly determined by correct folding of the backbone.  相似文献   

4.
Coarse-grained (CG) methods for sampling protein conformational space have the potential to increase computational efficiency by reducing the degrees of freedom. The gain in computational efficiency of CG methods often comes at the expense of non-protein like local conformational features. This could cause problems when transitioning to full atom models in a hierarchical framework. Here, a CG potential energy function was validated by applying it to the problem of loop prediction. A novel method to sample the conformational space of backbone atoms was benchmarked using a standard test set consisting of 351 distinct loops. This method used a sequence-independent CG potential energy function representing the protein using -carbon positions only and sampling conformations with a Monte Carlo simulated annealing based protocol. Backbone atoms were added using a method previously described and then gradient minimised in the Rosetta force field. Despite the CG potential energy function being sequence-independent, the method performed similarly to methods that explicitly use either fragments of known protein backbones with similar sequences or residue-specific /-maps to restrict the search space. The method was also able to predict with sub-Angstrom accuracy two out of seven loops from recently solved crystal structures of proteins with low sequence and structure similarity to previously deposited structures in the PDB. The ability to sample realistic loop conformations directly from a potential energy function enables the incorporation of additional geometric restraints and the use of more advanced sampling methods in a way that is not possible to do easily with fragment replacement methods and also enable multi-scale simulations for protein design and protein structure prediction. These restraints could be derived from experimental data or could be design restraints in the case of computational protein design. C++ source code is available for download from http://www.sbg.bio.ic.ac.uk/phyre2/PD2/.  相似文献   

5.
Gunnar Jeschke 《Proteins》2016,84(4):544-560
Conformational ensembles of intrinsically disordered peptide chains are not fully determined by experimental observations. Uncertainty due to lack of experimental restraints and due to intrinsic disorder can be distinguished if distance distributions restraints are available. Such restraints can be obtained from pulsed dipolar electron paramagnetic resonance (EPR) spectroscopy applied to pairs of spin labels. Here, we introduce a Monte Carlo approach for generating conformational ensembles that are consistent with a set of distance distribution restraints, backbone dihedral angle statistics in known protein structures, and optionally, secondary structure propensities or membrane immersion depths. The approach is tested with simulated restraints for a terminal and an internal loop and for a protein with 69 residues by using sets of sparse restraints for underlying well‐defined conformations and for published ensembles of a premolten globule‐like and a coil‐like intrinsically disordered protein. Proteins 2016; 84:544–560. © 2016 Wiley Periodicals, Inc.  相似文献   

6.
Recently, several experimental techniques have emerged for probing RNA structures based on high-throughput sequencing. However, most secondary structure prediction tools that incorporate probing data are designed and optimized for particular types of experiments. For example, RNAstructure-Fold is optimized for SHAPE data, while SeqFold is optimized for PARS data. Here, we report a new RNA secondary structure prediction method, restrained MaxExpect (RME), which can incorporate multiple types of experimental probing data and is based on a free energy model and an MEA (maximizing expected accuracy) algorithm. We first demonstrated that RME substantially improved secondary structure prediction with perfect restraints (base pair information of known structures). Next, we collected structure-probing data from diverse experiments (e.g. SHAPE, PARS and DMS-seq) and transformed them into a unified set of pairing probabilities with a posterior probabilistic model. By using the probability scores as restraints in RME, we compared its secondary structure prediction performance with two other well-known tools, RNAstructure-Fold (based on a free energy minimization algorithm) and SeqFold (based on a sampling algorithm). For SHAPE data, RME and RNAstructure-Fold performed better than SeqFold, because they markedly altered the energy model with the experimental restraints. For high-throughput data (e.g. PARS and DMS-seq) with lower probing efficiency, the secondary structure prediction performances of the tested tools were comparable, with performance improvements for only a portion of the tested RNAs. However, when the effects of tertiary structure and protein interactions were removed, RME showed the highest prediction accuracy in the DMS-accessible regions by incorporating in vivo DMS-seq data.  相似文献   

7.
Determination of the accurate three-dimensional structure of large proteins by NMR remains challenging due to a loss in the density of experimental restraints resulting from the often prerequisite perdeuteration. Solution small-angle scattering, which carries long-range translational information, presents an opportunity to enhance the structural accuracy of derived models when used in combination with global orientational NMR restraints such as residual dipolar couplings (RDCs) and residual chemical shift anisotropies (RCSAs). We have quantified the improvements in accuracy that can be obtained using this strategy for the 82 kDa enzyme Malate Synthase G (MSG), currently the largest single chain protein solved by solution NMR. Joint refinement against NMR and scattering data leads to an improvement in structural accuracy as evidenced by a decrease from approximately 4.5 to approximately 3.3 A of the backbone rmsd between the derived model and the high-resolution X-ray structure, PDB code 1D8C. This improvement results primarily from medium-angle scattering data, which encode the overall molecular shape, rather than the lowest angle data that principally determine the radius of gyration and the maximum particle dimension. The effect of the higher angle data, which are dominated by internal density fluctuations, while beneficial, is also found to be relatively small. Our results demonstrate that joint NMR/SAXS refinement can yield significantly improved accuracy in solution structure determination and will be especially well suited for the study of systems with limited NMR restraints such as large proteins, oligonucleotides, or their complexes.  相似文献   

8.
Howell SC  Mesleh MF  Opella SJ 《Biochemistry》2005,44(13):5196-5206
The three-dimensional backbone structure of a membrane protein with two transmembrane helices in micelles was determined using solution NMR methods that rely on the measurement of backbone (1)H-(15)N residual dipolar couplings (RDCs) from samples of two different constructs that align differently in stressed polyacrylamide gels. Dipolar wave fitting to the (1)H-(15)N RDCs determines the helical boundaries based on periodicity and was utilized in the generation of supplemental dihedral restraints for the helical segments. The (1)H-(15)N RDCs and supplemental dihedral restraints enable the determination of the structure of the helix-loop-helix core domain of the mercury transport membrane protein MerF with a backbone RMSD of 0.58 A. Moreover, the fold of this polypeptide demonstrates that the two vicinal pairs of cysteine residues, shown to be involved in the transport of Hg(II) across the membrane, are exposed to the cytoplasm. This finding differs from earlier structural and mechanistic models that were based primarily on the somewhat atypical hydropathy plot for MerF and related transport proteins.  相似文献   

9.
The solution structure of a novel 69 residue proteinase inhibitor, Linum usitatissimum trypsin inhibitor (LUTI), was determined using a method based on computer aided assignment of nuclear Overhauser enhancement spectroscopy (NOESY) data. The approach applied uses the program NOAH/DYANA for automatic assignment of NOESY cross-peaks. Calculations were carried out using two unassigned NOESY peak lists and a set of determined dihedral angle restraints. In addition, hydrogen bonds involving amide protons were identified during calculations using geometrical criteria and values of HN temperature coefficients. Stereospecific assignment of beta-methylene protons was carried out using a standard procedure based on nuclear Overhauser enhancement intensities and 3J(alpha)(beta) coupling constants. Further stereospecific assignment of methylene protons and diastereotopic methyl groups were established upon structure-based method available in the program GLOMSA and chemical shift calculations. The applied algorithm allowed us to assign 1968 out of 2164 peaks (91%) derived from NOESY spectra recorded in H2O and 2H2O. The final experimental data input consisted of 1609 interproton distance restraints, 88 restraints for 44 hydrogen bonds, 63 torsion angle restraints and 32 stereospecifically assigned methylene proton pairs and methyl groups. The algorithm allowed the calculation of a high precision protein structure without the laborious manual assignment of NOESY cross-peaks. For the 20 best conformers selected out of 40 refined ones in the program CNS, the calculated average pairwise rmsd values for residues 3 to 69 were 0.38 A (backbone atoms) and 1.02 A (all heavy atoms). The three-dimensional LUTI structure consists of a mixed parallel and antiparallel beta-sheet, a single alpha-helix and shows the fold of the potato 1 family of proteinase inhibitors. Compared to known structures of the family, LUTI contains Arg and Trp residues at positions P6' and P8', respectively, instead of two Arg residues, involved in the proteinase binding loop stabilization. A consequence of the ArgTrp substitution at P8' is a slightly more compact conformation of the loop relative to the protein core.  相似文献   

10.
Alexandrescu AT 《Proteins》2004,56(1):117-129
Introductory biochemistry texts often note that the fold of a protein is completely defined when the dihedral angles phi and psi are known for each amino acid. This assertion was examined with torsion angle dynamics and simulated annealing (TAD/SA) calculations of protein G using only dihedral angle restraints. When all dihedral angles were restrained to within 1 degrees of the values of the X-ray structure, the TAD/SA structures gave a backbone root mean square deviation to the target of 4 A. Factors that contributed to divergence from the correct solution include deviations of peptide bonds from planarity, internal conflicts resulting from the nonuniform energies of different phi, psi combinations, and relaxation to extended conformations in the absence of long-range constraints. Simulations including hydrogen-bond restraints showed that even a few long-range contacts constrain the fold better than a complete set of accurate dihedral restraints. A procedure is described for TAD/SA calculations using hydrogen-bond restraints, idealized dihedral restraints for residues in regular secondary structures, and "hydrophobic distance restraints" derived from the positions of hydrophobic residues in the amino acid sequence. The hydrogen-bond restraints are treated as inviolable, whereas violated hydrophobic restraints are removed following reduction of restraint upper bounds from 2 to 1 times the predicted radius of gyration. The strategy was tested with simulated restraints from X-ray structures of proteins from different fold classes and NMR data for cold shock protein A that included only backbone chemical shifts and hydrogen bonds obtained from a long-range HNCO experiment.  相似文献   

11.
In electron crystallography, membrane protein structure is determined from two-dimensional crystals where the protein is embedded in a membrane. Once large and well-ordered 2D crystals are grown, one of the bottlenecks in electron crystallography is the collection of image data to directly provide experimental phases to high resolution. Here, we describe an approach to bypass this bottleneck, eliminating the need for high-resolution imaging. We use the strengths of electron crystallography in rapidly obtaining accurate experimental phase information from low-resolution images and accurate high-resolution amplitude information from electron diffraction. The low-resolution experimental phases were used for the placement of α helix fragments and extended to high resolution using phases from the fragments. Phases were further improved by density modifications followed by fragment expansion and structure refinement against the high-resolution diffraction data. Using this approach, structures of three membrane proteins were determined rapidly and accurately to atomic resolution without high-resolution image data.  相似文献   

12.
The influence of the stereospecific assignments of beta-methylene protons and the classification of chi 1 torsion angles on the definition of the three-dimensional structures of proteins determined from NMR data is investigated using the sea anemone protein BDS-I (43 residues) as a model system. Two sets of structures are computed. The first set comprises 42 converged structures (denoted STEREO structures) calculated on the basis of the complete list of restraints derived from the NMR data, consisting of 489 interproton and 24 hydrogen bonding distance restraints, supplemented by 23 phi backbone and 21 chi 1 side chain torsion angle restraints. The second set comprises 31 converged structures (denoted NOSTEREO structures) calculated from a reduced data set in which those restraints arising from stereospecific assignments, and the corresponding chi 1 torsion angle restraints, are explicitly omitted. The results show that the inclusion of the stereospecific restraints leads to a significant improvement in the definition of the structure of BDS-I, both with respect to the backbone and the detailed arrangement of the side chains. Average atomic rms differences between the individual structures and the mean structures for the backbone atoms are 0.67 +/- 0.12 A and 0.93 +/- 0.16 A for the STEREO and NOSTEREO structures, respectively; the corresponding values for all atoms are 0.90 +/- 0.17 A and 1.17 +/- 0.17 A, respectively. In addition, while the overall fold remains unchanged, there is a small but significant atomic displacement between the two sets of structures.  相似文献   

13.
An automatic procedure for building a protein polyalanine backbone from C alpha positions and 'spare parts' retrieved from a data base of 66 high-resolution protein structures is described. Protein backbones are constructed from overlapping fragments of variable length, which allows the backbone of regular secondary structure elements to be built in one block. The procedure is shown to yield backbones which compare very favourably with those from highly refined X-ray structures (r.m.s. deviation between generated and crystal structures less than 1A). The method is furthermore quite insensitive to experimental errors in C alpha positions as well as to the size of the data base, and is seen to yield valuable insight into the relationships between sequence and 3-D structure: one example on triose phosphate isomerase, a beta-barrel protein, shows that beta alpha loops can be considered as structurally more uncommon than alpha beta loops. The 'spare parts' approach is also found to be useful for general-purpose modelling of local structural changes produced by insertion or deletion of residues. It should, however, be used with caution. Crude selection criteria based solely on fragment length and geometric fit to the loop base regions yield realistic backbones in about two-thirds of the test cases (r.m.s. deviations from refined crystal structure approximately 1A). In the remaining cases, sequence information, in particular the presence of glycine residues which tend to adopt more unusual backbone conformations, must be considered to obtain comparable results.  相似文献   

14.
The extent of rapid (picosecond) backbone motions within the glucocorticoid receptor DNA-binding domain (GR DBD) has been investigated using proton-detected heteronuclear NMR spectroscopy on uniformly 15N-labeled protein fragments containing the GR DBD. Sequence-specific 15N resonance assignments, based on two- and three-dimensional heteronuclear NMR spectra, are reported for 65 of 69 backbone amides within the segment C440-A509 of the rat GR in a protein fragment containing a total of 82 residues (MW = 9200). Individual backbone 15N spin-lattice relaxation times (T1), rotating-frame spin-lattice relaxation times (T1 rho), and steady-state (1H)-15N nuclear Overhauser effects (NOEs) have been measured at 11.74 T for a majority of the backbone amide nitrogens within the segment C440-N506. T1 relaxation times and NOEs are interpreted in terms of a generalized order parameter (S2) and an effective correlation time (tau e) characterizing internal motions in each backbone amide using an optimized value for the correlation time for isotropic rotational motions of the protein (tau R = 6.3 ns). Average S2 order parameters are found to be similar (approximately 0.86 +/- 0.07) for various functional domains of the DBD. Qualitative inspection as well as quantitative analysis of the relaxation and NOE data suggests that the picosecond flexibility of the DBD backbone is limited and uniform over the entire protein, with the possible exception of residues S448-H451 of the first zinc domain and a few residues for which relaxation and NOE parameters were not obtained. in particular, we find no evidence for extensive rapid backbone motions within the second zinc domain. Our results therefore suggest that the second zinc domain is not disordered in the uncomplexed state of DBD, although the possibility of slowly exchanging (ordered) conformational states cannot be excluded in the present analysis.  相似文献   

15.
Dong Xu  Yang Zhang 《Proteins》2013,81(2):229-239
Fragment assembly using structural motifs excised from other solved proteins has shown to be an efficient method for ab initio protein‐structure prediction. However, how to construct accurate fragments, how to derive optimal restraints from fragments, and what the best fragment length is are the basic issues yet to be systematically examined. In this work, we developed a gapless‐threading method to generate position‐specific structure fragments. Distance profiles and torsion angle pairs are then derived from the fragments by statistical consistency analysis, which achieved comparable accuracy with the machine‐learning‐based methods although the fragments were taken from unrelated proteins. When measured by both accuracies of the derived distance profiles and torsion angle pairs, we come to a consistent conclusion that the optimal fragment length for structural assembly is around 10, and at least 100 fragments at each location are needed to achieve optimal structure assembly. The distant profiles and torsion angle pairs as derived by the fragments have been successfully used in QUARK for ab initio protein structure assembly and are provided by the QUARK online server at http://zhanglab.ccmb. med.umich.edu/QUARK/ . Proteins 2013. © 2012 Wiley Periodicals, Inc.  相似文献   

16.
《Proteins》2018,86(3):273-278
Unusual local arrangements of protein in Ramachandran space are not well represented by standard geometry tools used in either protein structure refinement using simple harmonic geometry restraints or in protein simulations using molecular mechanics force fields. In contrast, quantum chemical computations using small poly‐peptide molecular models can predict accurate geometries for any well‐defined backbone Ramachandran orientation. For conformations along transition regions—ϕ from −60 to 60°—a very good agreement with representative high‐resolution experimental X‐ray (≤1.5 Å) protein structures is obtained for both backbone C−1‐N‐Cα angle and the nonbonded O−1…C distance, while “standard geometry” leads to the “clashing” of O…C atoms and Amber FF99SB predicts distances too large by about 0.15 Å. These results confirm that quantum chemistry computations add valuable support for detailed analysis of local structural arrangements in proteins, providing improved or missing data for less understood high‐energy or unusual regions.  相似文献   

17.
The production and analysis of individual structural domains is a common strategy for studying large or complex proteins, which may be experimentally intractable in their full-length form. However, identifying domain boundaries is challenging if there is little structural information concerning the protein target. One experimental procedure for mapping domains is to screen a library of random protein fragments for solubility, since truncation of a domain will typically expose hydrophobic groups, leading to poor fragment solubility. We have coupled fragment solubility screening with global data analysis to develop an effective method for identifying structural domains within a protein. A gene fragment library is generated using mechanical shearing, or by uracil doping of the gene and a uracil-specific enzymatic digest. A split green fluorescent protein (GFP) assay is used to screen the corresponding protein fragments for solubility when expressed in Escherichia coli. The soluble fragment data are then analyzed using two complementary approaches. Fragmentation “hotspots” indicate possible interdomain regions. Clustering algorithms are used to group related fragments, and concomitantly predict domain location. The effectiveness of this Domain Seeking procedure is demonstrated by application to the well-characterized human protein p85α.  相似文献   

18.
The structure of human protein HSPC034 has been determined by both solution nuclear magnetic resonance (NMR) spectroscopy and X-ray crystallography. Refinement of the NMR structure ensemble, using a Rosetta protocol in the absence of NMR restraints, resulted in significant improvements not only in structure quality, but also in molecular replacement (MR) performance with the raw X-ray diffraction data using MOLREP and Phaser. This method has recently been shown to be generally applicable with improved MR performance demonstrated for eight NMR structures refined using Rosetta (Qian et al., Nature 2007;450:259-264). Additionally, NMR structures of HSPC034 calculated by standard methods that include NMR restraints have improvements in the RMSD to the crystal structure and MR performance in the order DYANA, CYANA, XPLOR-NIH, and CNS with explicit water refinement (CNSw). Further Rosetta refinement of the CNSw structures, perhaps due to more thorough conformational sampling and/or a superior force field, was capable of finding alternative low energy protein conformations that were equally consistent with the NMR data according to the Recall, Precision, and F-measure (RPF) scores. On further examination, the additional MR-performance shortfall for NMR refined structures as compared with the X-ray structure were attributed, in part, to crystal-packing effects, real structural differences, and inferior hydrogen bonding in the NMR structures. A good correlation between a decrease in the number of buried unsatisfied hydrogen-bond donors and improved MR performance demonstrates the importance of hydrogen-bond terms in the force field for improving NMR structures. The superior hydrogen-bond network in Rosetta-refined structures demonstrates that correct identification of hydrogen bonds should be a critical goal of NMR structure refinement. Inclusion of nonbivalent hydrogen bonds identified from Rosetta structures as additional restraints in the structure calculation results in NMR structures with improved MR performance.  相似文献   

19.
Residual dipolar couplings provide significant structural information for proteins in the solution state, which makes them attractive for the rapid determination of protein folds. Unfortunately, dipolar couplings contain inherent structural ambiguities which make them difficult to use in the absence of additional information. In this paper, we describe an approach to the construction of protein backbone folds using experimental dipolar couplings based on a bounded tree search through a structural database. We filter out false positives via an overlap similarity measure that insists that protein fragments assigned to overlapping regions of the sequence must have self-consistent structures. This allows us to determine a backbone fold (including the correct C-C bond orientations) using only residual dipolar coupling data obtained from one ordering medium. We demonstrate the applicability of the method using experimental data for ubiquitin.  相似文献   

20.
We have derived a quartic equation for computing the direction of an internuclear vector from residual dipolar couplings (RDCs) measured in two aligning media, and two simple trigonometric equations for computing the backbone (phi,psi) angles from two backbone vectors in consecutive peptide planes. These equations make it possible to compute, exactly and in constant time, the backbone (phi,psi) angles for a residue from RDCs in two media on any single backbone vector type. Building upon these exact solutions we have designed a novel algorithm for determining a protein backbone substructure consisting of alpha-helices and beta-sheets. Our algorithm employs a systematic search technique to refine the conformation of both alpha-helices and beta-sheets and to determine their orientations using exclusively the angular restraints from RDCs. The algorithm computes the backbone substructure employing very sparse distance restraints between pairs of alpha-helices and beta-sheets refined by the systematic search. The algorithm has been demonstrated on the protein human ubiquitin using only backbone NH RDCs, plus twelve hydrogen bonds and four NOE distance restraints. Further, our results show that both the global orientations and the conformations of alpha-helices and beta-strands can be determined with high accuracy using only two RDCs per residue. The algorithm requires, as its input, backbone resonance assignments, the identification of alpha-helices and beta-sheets as well as sparse NOE distance and hydrogen bond restraints.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号