首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The prediction of the secondary structure of a protein from its amino acid sequence is an important step towards the prediction of its three-dimensional structure. However, the accuracy of ab initio secondary structure prediction from sequence is about 80 % currently, which is still far from satisfactory. In this study, we proposed a novel method that uses binomial distribution to optimize tetrapeptide structural words and increment of diversity with quadratic discriminant to perform prediction for protein three-state secondary structure. A benchmark dataset including 2,640 proteins with sequence identity of less than 25 % was used to train and test the proposed method. The results indicate that overall accuracy of 87.8 % was achieved in secondary structure prediction by using ten-fold cross-validation. Moreover, the accuracy of predicted secondary structures ranges from 84 to 89 % at the level of residue. These results suggest that the feature selection technique can detect the optimized tetrapeptide structural words which affect the accuracy of predicted secondary structures.  相似文献   

2.
Magic-angle-spinning solid-state 13C NMR spectroscopy is useful for structural analysis of non-crystalline proteins. However, the signal assignments and structural analysis are often hampered by the signal overlaps primarily due to minor structural heterogeneities, especially for uniformly-13C,15N labeled samples. To overcome this problem, we present a method for assigning 13C chemical shifts and secondary structures from unresolved two-dimensional 13C–13C MAS NMR spectra by spectral fitting, named reconstruction of spectra using protein local structures (RESPLS). The spectral fitting was conducted using databases of protein fragmented structures related to 13Cα, 13Cβ, and 13C′ chemical shifts and cross-peak intensities. The experimental 13C–13C inter- and intra-residue correlation spectra of uniformly isotope-labeled ubiquitin in the lyophilized state had a few broad peaks. The fitting analysis for these spectra provided sequence-specific Cα, Cβ, and C′ chemical shifts with an accuracy of about 1.5 ppm, which enabled the assignment of the secondary structures with an accuracy of 79 %. The structural heterogeneity of the lyophilized ubiquitin is revealed from the results. Test of RESPLS analysis for simulated spectra of five different types of proteins indicated that the method allowed the secondary structure determination with accuracy of about 80 % for the 50–200 residue proteins. These results demonstrate that the RESPLS approach expands the applicability of the NMR to non-crystalline proteins exhibiting unresolved 13C NMR spectra, such as lyophilized proteins, amyloids, membrane proteins and proteins in living cells.  相似文献   

3.
Chemical shift prediction has an unappreciated power to guide backbone resonance assignment in cases where protein structure is known. Here we describe Resonance Assignment by chemical Shift Prediction (RASP), a method that exploits this power to derive protein backbone resonance assignments from chemical shift predictions. Robust assignments can be obtained from a minimal set of only the most sensitive triple-resonance experiments, even for spectroscopically challenging proteins. Over a test set of 154 proteins RASP assigns 88 % of residues with an accuracy of 99.7 %, using only information available from HNCO and HNCA spectra. Applied to experimental data from a challenging 34 kDa protein, RASP assigns 90 % of manually assigned residues using only 40 % of the experimental data required for the manual assignment. RASP has the potential to significantly accelerate the backbone assignment process for a wide range of proteins for which structural information is available, including those for which conventional assignment strategies are not feasible.  相似文献   

4.
A method for predicting type I and II β-turns using nuclear magnetic resonance (NMR) chemical shifts is proposed. Isolated β-turn chemical-shift data were collected from 1,798 protein chains. One-dimensional statistical analyses on chemical-shift data of three classes β-turn (type I, II, and VIII) showed different distributions at four positions, (i) to (i + 3). Considering the central two residues of type I β-turns, the mean values of Cο, Cα, HN, and NH chemical shifts were generally (i + 1) > (i + 2). The mean values of Cβ and Hα chemical shifts were (i + 1) < (i + 2). The distributions of the central two residues in type II and VIII β-turns were also distinguishable by trends of chemical shift values. Two-dimensional cluster analyses on chemical-shift data show positional distributions more clearly. Based on these propensities of chemical shift classified as a function of position, rules were derived using scoring matrices for four consecutive residues to predict type I and II β-turns. The proposed method achieves an overall prediction accuracy of 83.2 and 84.2 % with the Matthews correlation coefficient values of 0.317 and 0.632 for type I and II β-turns, indicating that its higher accuracy for type II turn prediction. The results show that it is feasible to use NMR chemical shifts to predict the β-turn types in proteins. The proposed method can be incorporated into other chemical-shift based protein secondary structure prediction methods.  相似文献   

5.
Protein chemical shifts have long been used by NMR spectroscopists to assist with secondary structure assignment and to provide useful distance and torsion angle constraint data for structure determination. One of the most widely used methods for secondary structure identification is called the Chemical Shift Index (CSI). The CSI method uses a simple digital chemical shift filter to locate secondary structures along the protein chain using backbone 13C and 1H chemical shifts. While the CSI method is simple to use and easy to implement, it is only about 75–80 % accurate. Here we describe a significantly improved version of the CSI (2.0) that uses machine-learning techniques to combine all six backbone chemical shifts (13Cα, 13Cβ, 13C, 15N, 1HN, 1Hα) with sequence-derived features to perform far more accurate secondary structure identification. Our tests indicate that CSI 2.0 achieved an average identification accuracy (Q3) of 90.56 % for a training set of 181 proteins in a repeated tenfold cross-validation and 89.35 % for a test set of 59 proteins. This represents a significant improvement over other state-of-the-art chemical shift-based methods. In particular, the level of performance of CSI 2.0 is equal to that of standard methods, such as DSSP and STRIDE, used to identify secondary structures via 3D coordinate data. This suggests that CSI 2.0 could be used both in providing accurate NMR constraint data in the early stages of protein structure determination as well as in defining secondary structure locations in the final protein model(s). A CSI 2.0 web server (http://csi.wishartlab.com) is available for submitting the input queries for secondary structure identification.  相似文献   

6.
X-ray diffraction and nuclear magnetic resonance spectroscopy (NMR) are the staple methods for revealing atomic structures of proteins. Since crystals of biomolecular assemblies and membrane proteins often diffract weakly and such large systems encroach upon the molecular tumbling limit of solution NMR, new methods are essential to extend structures of such systems to high resolution. Here we present a method that incorporates solid-state NMR restraints alongside of X-ray reflections to the conventional model building and refinement steps of structure calculations. Using the 3.7 Å crystal structure of the integral membrane protein complex DsbB-DsbA as a test case yielded a significantly improved backbone precision of 0.92 Å in the transmembrane region, a 58% enhancement from using X-ray reflections alone. Furthermore, addition of solid-state NMR restraints greatly improved the overall quality of the structure by promoting 22% of DsbB transmembrane residues into the most favored regions of Ramachandran space in comparison to the crystal structure. This method is widely applicable to any protein system where X-ray data are available, and is particularly useful for the study of weakly diffracting crystals.  相似文献   

7.
Exponential growth in the number of available protein sequences is unmatched by the slower growth in the number of structures. As a result, the development of efficient and fast protein secondary structure prediction methods is essential for the broad comprehension of protein structures. Computational methods that can efficiently determine secondary structure can in turn facilitate protein tertiary structure prediction, since most methods rely initially on secondary structure predictions. Recently, we have developed a fast learning optimized prediction methodology (FLOPRED) for predicting protein secondary structure (Saraswathi et al. in JMM 18:4275, 2012). Data are generated by using knowledge-based potentials combined with structure information from the CATH database. A neural network-based extreme learning machine (ELM) and advanced particle swarm optimization (PSO) are used with this data to obtain better and faster convergence to more accurate secondary structure predicted results. A five-fold cross-validated testing accuracy of 83.8 % and a segment overlap (SOV) score of 78.3 % are obtained in this study. Secondary structure predictions and their accuracy are usually presented for three secondary structure elements: α-helix, β-strand and coil but rarely have the results been analyzed with respect to their constituent amino acids. In this paper, we use the results obtained with FLOPRED to provide detailed behaviors for different amino acid types in the secondary structure prediction. We investigate the influence of the composition, physico-chemical properties and position specific occurrence preferences of amino acids within secondary structure elements. In addition, we identify the correlation between these properties and prediction accuracy. The present detailed results suggest several important ways that secondary structure predictions can be improved in the future that might lead to improved protein design and engineering.  相似文献   

8.
We have investigated some of the basic principles that influence generation of protein structures using a fragment-based, random insertion method. We tested buildup methods and fragment library quality for accuracy in constructing a set of known structures. The parameters most influential in the construction procedure are bond and torsion angles with minor inaccuracies in bond angles alone causing >6 A CalphaRMSD for a 150-residue protein. Idealization to a standard set of values corrects this problem, but changes the torsion angles and does not work for every structure. Alternatively, we found using Cartesian coordinates instead of torsion angles did not reduce performance and can potentially increase speed and accuracy. Under conditions simulating ab initio structure prediction, fragment library quality can be suboptimal and still produce near-native structures. Using various clustering criteria, we created a number of libraries and used them to predict a set of native structures based on nonnative fragments. Local CalphaRMSD fit of fragments, library size, and takeoff/landing angle criteria weakly influence the accuracy of the models. Based on a fragment's minimal perturbation upon insertion into a known structure, a seminative fragment library was created that produced more accurate structures with fragments that were less similar to native fragments than the other sets. These results suggest that fragments need only contain native-like subsections, which when correctly overlapped, can recreate a native-like model. For fragment-based, random insertion methods used in protein structure prediction and design, our findings help to define the parameters this method needs to generate near-native structures.  相似文献   

9.
β-Turn is a secondary protein structure type that plays an important role in protein configuration and function. Here, we introduced an approach of β-turn prediction that used the support vector machine (SVM) algorithm combined with predicted secondary structure information. The secondary structure information was obtained by using E-SSpred, a new secondary protein structure prediction method. A 7-fold cross validation based on the benchmark dataset of 426 non-homologous protein chains was used to evaluate the performance of our method. The prediction results broke the 80% Q total barrier and achieved Q total = 80.9%, MCC = 0.44, and Q predicted higher 0.9% when compared with the best method. The results in our research are coincident with the conclusion that β-turn prediction accuracy can be improved by inclusion of secondary structure information.  相似文献   

10.
Matsuo K  Watanabe H  Gekko K 《Proteins》2008,73(1):104-112
Synchrotron-radiation vacuum-ultraviolet circular dichroism (VUVCD) spectroscopy can significantly improve the predictive accuracy of the contents and segment numbers of protein secondary structures by extending the short-wavelength limit of the spectra. In the present study, we combined VUVCD spectra down to 160 nm with neural-network (NN) method to improve the sequence-based prediction of protein secondary structures. The secondary structures of 30 target proteins (test set) were assigned into alpha-helices, beta-strands, and others by the DSSP program based on their X-ray crystal structures. Combining the alpha-helix and beta-strand contents estimated from the VUVCD spectra of the target proteins improved the overall sequence-based predictive accuracy Q(3) for three secondary-structure components from 59.5 to 60.7%. Incorporating the position-specific scoring matrix in the NN method improved the predictive accuracy from 70.9 to 72.1% when combining the secondary-structure contents, to 72.5% when combining the numbers of segments, and finally to 74.9% when filtering the VUVCD data. Improvement in the sequence-based prediction of secondary structures was also apparent in two other indices of the overall performance: the correlation coefficient (C) and the segment overlap value (SOV). These results suggest that VUVCD data could enhance the predictive accuracy to over 80% when combined with the currently best sequence-prediction algorithms, greatly expanding the applicability of VUVCD spectroscopy to protein structural biology.  相似文献   

11.
The structure-based design of protein–ligand interfaces with respect to different small molecules is of great significance in the discovery of functional proteins. By statistical analysis of a set of protein–ligand complex structures, it was determined that water-mediated hydrogen bonding at the protein–ligand interface plays a crucial role in governing the binding between the protein and the ligand. Based on the novel statistic results, a solvated ligand rotamer approach was developed to explicitly describe the key water molecules at the protein–ligand interface and a water-mediated hydrogen bonding model was applied in the computational protein design context to complement the continuum solvent model. The solvated ligand rotamer approach produces only one additional solvated rotamer for each rotamer in the ligand rotamer library and does not change the number of side-chain rotamers at each protein design site. This has greatly reduced the total combinatorial number in sequence selection for protein design, and the accuracy of the model was confirmed by two tests. For the water placement test, 61 % of the crystal water molecules were predicted correctly in five protein-ligand complex structures. For the sequence recapitulation test, 44.7 % of the amino acid identities were recovered using the solvated ligand rotamer approach and the water-mediated hydrogen bonding model, while only 30.4 % were recovered when the explicitly bound waters were removed. These results indicated that the developed solvated ligand rotamer approach is promising for functional protein design targeting novel protein–ligand interactions.  相似文献   

12.

Background

A key advantage of recombinant antibody technology is the ability to optimize and tailor reagents. Single domain antibodies (sdAbs), the recombinantly produced variable domains derived from camelid and shark heavy chain antibodies, provide advantages of stability and solubility and can be further engineered to enhance their properties. In this study, we generated sdAbs specific for Ebola virus envelope glycoprotein (GP) and increased their stability to expand their utility for use in austere locals. Ebola virus is extremely virulent and causes fatal hemorrhagic fever in ~ 50 percent of the cases. The viral GP binds to host cell receptors to facilitate viral entry and thus plays a critical role in pathogenicity.

Results

An immune phage display library containing more than 107 unique clones was developed from a llama immunized with a combination of killed Ebola virus and recombinantly produced GP. We panned the library to obtain GP binding sdAbs and isolated sdAbs from 5 distinct sequence families. Three GP binders with dissociation constants ranging from ~ 2 to 20 nM, and melting temperatures from ~ 57 to 72 °C were selected for protein engineering in order to increase their stability through a combination of consensus sequence mutagenesis and the addition of a non-canonical disulfide bond. These changes served to increase the melting temperatures of the sdAbs by 15–17 °C. In addition, fusion of a short positively charged tail to the C-terminus which provided ideal sites for the chemical modification of these sdAbs resulted in improved limits of detection of GP and Ebola virus like particles while serving as tracer antibodies.

Conclusions

SdAbs specific for Ebola GP were selected and their stability and functionality were improved utilizing protein engineering. Thermal stability of antibody reagents may be of particular importance when operating in austere locations that lack reliable refrigeration. Future efforts can evaluate the potential of these isolated sdAbs as candidates for diagnostic or therapeutic applications for Ebola.
  相似文献   

13.
14.
Protein structure determination using nuclear magnetic resonance (NMR) spectroscopy can be both time-consuming and labor intensive. Here we demonstrate how chemical shift threading can permit rapid, robust, and accurate protein structure determination using only chemical shift data. Threading is a relatively old bioinformatics technique that uses a combination of sequence information and predicted (or experimentally acquired) low-resolution structural data to generate high-resolution 3D protein structures. The key motivations behind using NMR chemical shifts for protein threading lie in the fact that they are easy to measure, they are available prior to 3D structure determination, and they contain vital structural information. The method we have developed uses not only sequence and chemical shift similarity but also chemical shift-derived secondary structure, shift-derived super-secondary structure, and shift-derived accessible surface area to generate a high quality protein structure regardless of the sequence similarity (or lack thereof) to a known structure already in the PDB. The method (called E-Thrifty) was found to be very fast (often?<?10 min/structure) and to significantly outperform other shift-based or threading-based structure determination methods (in terms of top template model accuracy)—with an average TM-score performance of 0.68 (vs. 0.50–0.62 for other methods). Coupled with recent developments in chemical shift refinement, these results suggest that protein structure determination, using only NMR chemical shifts, is becoming increasingly practical and reliable. E-Thrifty is available as a web server at http://ethrifty.ca.  相似文献   

15.
ABSTRACT

The fully synthetic humanized phage antibody library has the advantages including the minimized immunogenicity, which frequently happened in hybridoma cell-based antibody production. In this paper, using the constructed diverse complementarity determining region gene library and the germline gene as the backbone, we constructed eight single-chain antibody libraries and a combinatorial antibody library with a big capacity of 1.41 × 1010. M13EEA helper phage that was engineered from M13KO7 was applied to prepare phage antibody library. The eukaryotic expression of T-cell immune receptor with Ig and ITIM domain (TIGIT) antigen was used as a target antigen for screening. The screening of antigen-specific single-chain Fc-fused protein was performed through evaluation of binding affinity based on ELISA analysis. The IgG antibody was prepared with the screened single-chain protein. Finally, the CB3 antibody was screened out which exhibits the highest binding affinity with TIGIT with the Kd value of 8.155 × 10?10 M.  相似文献   

16.
Chemical shift frequencies represent a time-average of all the conformational states populated by a protein. Thus, chemical shift prediction programs based on sequence and database analysis yield higher accuracy for rigid rather than flexible protein segments. Here we show that the prediction accuracy can be significantly improved by averaging over an ensemble of structures, predicted solely from amino acid sequence with the Rosetta program. This approach to chemical shift and structure prediction has the potential to be useful for guiding resonance assignments, especially in solid-state NMR structural studies of membrane proteins in proteoliposomes.  相似文献   

17.
Paul Mach  Patrice Koehl 《Proteins》2013,81(9):1556-1570
It is well known that protein fold recognition can be greatly improved if models for the underlying evolution history of the folds are taken into account. The improvement, however, exists only if such evolutionary information is available. To circumvent this limitation for protein families that only have a small number of representatives in current sequence databases, we follow an alternate approach in which the benefits of including evolutionary information can be recreated by using sequences generated by computational protein design algorithms. We explore this strategy on a large database of protein templates with 1747 members from different protein families. An automated method is used to design sequences for these templates. We use the backbones from the experimental structures as fixed templates, thread sequences on these backbones using a self‐consistent mean field approach, and score the fitness of the corresponding models using a semi‐empirical physical potential. Sequences designed for one template are translated into a hidden Markov model‐based profile. We describe the implementation of this method, the optimization of its parameters, and its performance. When the native sequences of the protein templates were tested against the library of these profiles, the class, fold, and family memberships of a large majority (>90%) of these sequences were correctly recognized for an E‐value threshold of 1. In contrast, when homologous sequences were tested against the same library, a much smaller fraction (35%) of sequences were recognized; The structural classification of protein families corresponding to these sequences, however, are correctly recognized (with an accuracy of >88%). Proteins 2013; © 2013 Wiley Periodicals, Inc.  相似文献   

18.
Locating sequences compatible with a protein structural fold is the well‐known inverse protein‐folding problem. While significant progress has been made, the success rate of protein design remains low. As a result, a library of designed sequences or profile of sequences is currently employed for guiding experimental screening or directed evolution. Sequence profiles can be computationally predicted by iterative mutations of a random sequence to produce energy‐optimized sequences, or by combining sequences of structurally similar fragments in a template library. The latter approach is computationally more efficient but yields less accurate profiles than the former because of lacking tertiary structural information. Here we present a method called SPIN that predicts Sequence Profiles by Integrated Neural network based on fragment‐derived sequence profiles and structure‐derived energy profiles. SPIN improves over the fragment‐derived profile by 6.7% (from 23.6 to 30.3%) in sequence identity between predicted and wild‐type sequences. The method also reduces the number of residues in low complex regions by 15.7% and has a significantly better balance of hydrophilic and hydrophobic residues at protein surface. The accuracy of sequence profiles obtained is comparable to those generated from the protein design program RosettaDesign 3.5. This highly efficient method for predicting sequence profiles from structures will be useful as a single‐body scoring term for improving scoring functions used in protein design and fold recognition. It also complements protein design programs in guiding experimental design of the sequence library for screening and directed evolution of designed sequences. The SPIN server is available at http://sparks‐lab.org . Proteins 2014; 82:2565–2573. © 2014 Wiley Periodicals, Inc.  相似文献   

19.
Chen H  Zhou HX 《Proteins》2005,61(1):21-35
The number of structures of protein-protein complexes deposited to the Protein Data Bank is growing rapidly. These structures embed important information for predicting structures of new protein complexes. This motivated us to develop the PPISP method for predicting interface residues in protein-protein complexes. In PPISP, sequence profiles and solvent accessibility of spatially neighboring surface residues were used as input to a neural network. The network was trained on native interface residues collected from the Protein Data Bank. The prediction accuracy at the time was 70% with 47% coverage of native interface residues. Now we have extensively improved PPISP. The training set now consisted of 1156 nonhomologous protein chains. Test on a set of 100 nonhomologous protein chains showed that the prediction accuracy is now increased to 80% with 51% coverage. To solve the problem of over-prediction and under-prediction associated with individual neural network models, we developed a consensus method that combines predictions from multiple models with different levels of accuracy and coverage. Applied on a benchmark set of 68 proteins for protein-protein docking, the consensus approach outperformed the best individual models by 3-8 percentage points in accuracy. To demonstrate the predictive power of cons-PPISP, eight complex-forming proteins with interfaces characterized by NMR were tested. These proteins are nonhomologous to the training set and have a total of 144 interface residues identified by chemical shift perturbation. cons-PPISP predicted 174 interface residues with 69% accuracy and 47% coverage and promises to complement experimental techniques in characterizing protein-protein interfaces. .  相似文献   

20.
We present the ProCS method for the rapid and accurate prediction of protein backbone amide proton chemical shifts - sensitive probes of the geometry of key hydrogen bonds that determine protein structure. ProCS is parameterized against quantum mechanical (QM) calculations and reproduces high level QM results obtained for a small protein with an RMSD of 0.25 ppm (r = 0.94). ProCS is interfaced with the PHAISTOS protein simulation program and is used to infer statistical protein ensembles that reflect experimentally measured amide proton chemical shift values. Such chemical shift-based structural refinements, starting from high-resolution X-ray structures of Protein G, ubiquitin, and SMN Tudor Domain, result in average chemical shifts, hydrogen bond geometries, and trans-hydrogen bond (h3 JNC'') spin-spin coupling constants that are in excellent agreement with experiment. We show that the structural sensitivity of the QM-based amide proton chemical shift predictions is needed to obtain this agreement. The ProCS method thus offers a powerful new tool for refining the structures of hydrogen bonding networks to high accuracy with many potential applications such as protein flexibility in ligand binding.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号