首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Error tolerant backbone resonance assignment is the cornerstone of the NMR structure determination process. Although a variety of assignment approaches have been developed, none works sufficiently well on noisy fully automatically picked peaks to enable the subsequent automatic structure determination steps. We have designed an integer linear programming (ILP) based assignment system (IPASS) that has enabled fully automatic protein structure determination for four test proteins. IPASS employs probabilistic spin system typing based on chemical shifts and secondary structure predictions. Furthermore, IPASS extracts connectivity information from the inter-residue information and the (automatically picked) (15)N-edited NOESY peaks which are then used to fix reliable fragments. When applied to automatically picked peaks for real proteins, IPASS achieves an average precision and recall of 82% and 63%, respectively. In contrast, the next best method, MARS, achieves an average precision and recall of 77% and 36%, respectively. The assignments generated by IPASS are then fed into our protein structure calculation system, FALCON-NMR, to determine the 3D structures without human intervention. The final models have backbone RMSDs of 1.25?, 0.88?, 1.49?, and 0.67? to the reference native structures for proteins TM1112, CASKIN, VRAR, and HACS1, respectively. The web server is publicly available at http://monod.uwaterloo.ca/nmr/ipass.  相似文献   

2.
The significant biological role of RNA has further highlighted the need for improving the accuracy, efficiency and the reach of methods for investigating RNA structure and function. Nuclear magnetic resonance (NMR) spectroscopy is vital to furthering the goals of RNA structural biology because of its distinctive capabilities. However, the dispersion pattern in the NMR spectra of RNA makes automated resonance assignment, a key step in NMR investigation of biomolecules, remarkably challenging. Herein we present RNA Probabilistic Assignment of Imino Resonance Shifts (RNA-PAIRS), a method for the automated assignment of RNA imino resonances with synchronized verification and correction of predicted secondary structure. RNA-PAIRS represents an advance in modeling the assignment paradigm because it seeds the probabilistic network for assignment with experimental NMR data, and predicted RNA secondary structure, simultaneously and from the start. Subsequently, RNA-PAIRS sets in motion a dynamic network that reverberates between predictions and experimental evidence in order to reconcile and rectify resonance assignments and secondary structure information. The procedure is halted when assignments and base-parings are deemed to be most consistent with observed crosspeaks. The current implementation of RNA-PAIRS uses an initial peak list derived from proton-nitrogen heteronuclear multiple quantum correlation (1H–15N 2D HMQC) and proton–proton nuclear Overhauser enhancement spectroscopy (1H–1H 2D NOESY) experiments. We have evaluated the performance of RNA-PAIRS by using it to analyze NMR datasets from 26 previously studied RNAs, including a 111-nucleotide complex. For moderately sized RNA molecules, and over a range of comparatively complex structural motifs, the average assignment accuracy exceeds 90%, while the average base pair prediction accuracy exceeded 93%. RNA-PAIRS yielded accurate assignments and base pairings consistent with imino resonances for a majority of the NMR resonances, even when the initial predictions are only modestly accurate. RNA-PAIRS is available as a public web-server at .  相似文献   

3.
The recent expansion of structural genomics has increased the demands for quick and accurate protein structure determination by NMR spectroscopy. The conventional strategy without an automated protocol can no longer satisfy the needs of high-throughput application to a large number of proteins, with each data set including many NMR spectra, chemical shifts, NOE assignments, and calculated structures. We have developed the new software KUJIRA, a package of integrated modules for the systematic and interactive analysis of NMR data, which is designed to reduce the tediousness of organizing and manipulating a large number of NMR data sets. In combination with CYANA, the program for automated NOE assignment and structure determination, we have established a robust and highly optimized strategy for comprehensive protein structure analysis. An application of KUJIRA in accordance with our new strategy was carried out by a non-expert in NMR structure analysis, demonstrating that the accurate assignment of the chemical shifts and a high-quality structure of a small protein can be completed in a few weeks. The high completeness of the chemical shift assignment and the NOE assignment achieved by the systematic analysis using KUJIRA and CYANA led, in practice, to increased reliability of the determined structure.  相似文献   

4.
MOTIVATION: Accurate multiple sequence alignments are essential in protein structure modeling, functional prediction and efficient planning of experiments. Although the alignment problem has attracted considerable attention, preparation of high-quality alignments for distantly related sequences remains a difficult task. RESULTS: We developed PROMALS, a multiple alignment method that shows promising results for protein homologs with sequence identity below 10%, aligning close to half of the amino acid residues correctly on average. This is about three times more accurate than traditional pairwise sequence alignment methods. PROMALS algorithm derives its strength from several sources: (i) sequence database searches to retrieve additional homologs; (ii) accurate secondary structure prediction; (iii) a hidden Markov model that uses a novel combined scoring of amino acids and secondary structures; (iv) probabilistic consistency-based scoring applied to progressive alignment of profiles. Compared to the best alignment methods that do not use secondary structure prediction and database searches (e.g. MUMMALS, ProbCons and MAFFT), PROMALS is up to 30% more accurate, with improvement being most prominent for highly divergent homologs. Compared to SPEM and HHalign, which also employ database searches and secondary structure prediction, PROMALS shows an accuracy improvement of several percent. AVAILABILITY: The PROMALS web server is available at: http://prodata.swmed.edu/promals/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

5.
ABACUS [Grishaev et al. (2005) Proteins 61:36-43] is a novel protocol for automated protein structure determination via NMR. ABACUS starts from molecular fragments defined by unassigned J-coupled spin-systems and involves a Monte Carlo stochastic search in assignment space, probabilistic sequence selection, and assembly of fragments into structures that are used to guide the stochastic search. Here, we report further development of the two main algorithms that increase the flexibility and robustness of the method. Performance of the BACUS [Grishaev and Llinás (2004) J Biomol NMR 28:1-101] algorithm was significantly improved through use of sequential connectivities available from through-bond correlated 3D-NMR experiments, and a new set of likelihood probabilities derived from a database of 56 ultra high resolution X-ray structures. A Multicanonical Monte Carlo procedure, Fragment Monte Carlo (FMC), was developed for sequence-specific assignment of spin-systems. It relies on an enhanced assignment sampling and provides the uncertainty of assignments in a quantitative manner. The efficiency of the protocol was validated on data from four proteins of between 68-116 residues, yielding 100% accuracy in sequence specific assignment of backbone and side chain resonances.  相似文献   

6.
A significant number of protein sequences in a given proteome have no obvious evolutionarily related protein in the database of solved protein structures, the PDB. Under these conditions, ab initio or template-free modeling methods are the sole means of predicting protein structure. To assess its expected performance on proteomes, the TASSER structure prediction algorithm is benchmarked in the ab initio limit on a representative set of 1129 nonhomologous sequences ranging from 40 to 200 residues that cover the PDB at 30% sequence identity and which adopt alpha, alpha + beta, and beta secondary structures. For sequences in the 40-100 (100-200) residue range, as assessed by their root mean square deviation from native, RMSD, the best of the top five ranked models of TASSER has a global fold that is significantly close to the native structure for 25% (16%) of the sequences, and with a correct identification of the structure of the protein core for 59% (36%). In the absence of a native structure, the structural similarity among the top five ranked models is a moderately reliable predictor of folding accuracy. If we classify the sequences according to their secondary structure content, then 64% (36%) of alpha, 43% (24%) of alpha + beta, and 20% (12%) of beta sequences in the 40-100 (100-200) residue range have a significant TM-score (TM-score > or = 0.4). TASSER performs best on helical proteins because there are less secondary structural elements to arrange in a helical protein than in a beta protein of equal length, since the average length of a helix is longer than that of a strand. In addition, helical proteins have shorter loops and dangling tails. If we exclude these flexible fragments, then TASSER has similar accuracy for sequences containing the same number of secondary structural elements, irrespective of whether they are helices and/or strands. Thus, it is the effective configurational entropy of the protein that dictates the average likelihood of correctly arranging the secondary structure elements.  相似文献   

7.
Previous studies by Wishart et al. [Wishart, D. S., Sykes, B. D., & Richards, F. M. (1991) J. Mol. Biol. (in press)] have demonstrated that 1H NMR chemical shifts are strongly dependent on the character and nature of protein secondary structure. In particular, it has been found that the 1H NMR chemical shift of the alpha-CH proton of all 20 naturally occurring amino acids experiences an upfield shift (with respect to the random coil value) when in a helical configuration and a comparable downfield shift when in a beta-strand extended configuration. On the basis of these observations, a technique is described for rapidly and quantitatively determining the identity, extent, and location of secondary structural elements in proteins based on the simple inspection of the alpha-CH 1H resonance assignments. A number of examples are provided to demonstrate both the simplicity and the accuracy of the technique. This new method is found to be almost as accurate as the more traditional NOE-based methods of determining secondary structure and could prove to be particularly useful in light of the recent development of sequential assignment techniques which are now almost NOE-independent [Ikura, M., Kay, L. E., & Bax, A. (1990) Biochemistry 29, 4659-4667]. We suggest that this new procedure should not necessarily be seen as a substitute to existing rigorous methods for secondary structure determination but, rather, should be viewed as a complement to these approaches.  相似文献   

8.
9.
10.
MOTIVATION: What constitutes a baseline level of success for protein fold recognition methods? As fold recognition benchmarks are often presented without any thought to the results that might be expected from a purely random set of predictions, an analysis of fold recognition baselines is long overdue. Given varying amounts of basic information about a protein-ranging from the length of the sequence to a knowledge of its secondary structure-to what extent can the fold be determined by intelligent guesswork? Can simple methods that make use of secondary structure information assign folds more accurately than purely random methods and could these methods be used to construct viable hierarchical classifications? EXPERIMENTS PERFORMED: A number of rapid automatic methods which score similarities between protein domains were devised and tested. These methods ranged from those that incorporated no secondary structure information, such as measuring absolute differences in sequence lengths, to more complex alignments of secondary structure elements. Each method was assessed for accuracy by comparison with the Class Architecture Topology Homology (CATH) classification. Methods were rated against both a random baseline fold assignment method as a lower control and FSSP as an upper control. Similarity trees were constructed in order to evaluate the accuracy of optimum methods at producing a classification of structure. RESULTS: Using a rigorous comparison of methods with CATH, the random fold assignment method set a lower baseline of 11% true positives allowing for 3% false positives and FSSP set an upper benchmark of 47% true positives at 3% false positives. The optimum secondary structure alignment method used here achieved 27% true positives at 3% false positives. Using a less rigorous Critical Assessment of Structure Prediction (CASP)-like sensitivity measurement the random assignment achieved 6%, FSSP-59% and the optimum secondary structure alignment method-32%. Similarity trees produced by the optimum method illustrate that these methods cannot be used alone to produce a viable protein structural classification system. CONCLUSIONS: Simple methods that use perfect secondary structure information to assign folds cannot produce an accurate protein taxonomy, however they do provide useful baselines for fold recognition. In terms of a typical CASP assessment our results suggest that approximately 6% of targets with folds in the databases could be assigned correctly by randomly guessing, and as many as 32% could be recognised by trivial secondary structure comparison methods, given knowledge of their correct secondary structures.  相似文献   

11.
M M Teeter  M Whitlow 《Proteins》1988,4(4):262-273
Methods that analyze protein circular dichroism (CD) spectra for fractions of secondary structure are evaluated for the plant protein crambin, which has a known high-resolution crystal structure. In addition, a two-step secondary structure prediction scheme is presented and used for the toxins homologous to crambin, shown by others to have secondary structures similar to crambin. The test of CD spectral analysis methods with the protein crambin employed two computer programs and several CD basis sets. Crambin's crystal structure, known to 0.945A resolution (Hendrickson, W.A., Teeter, M.M. Nature 290:107-113, 1981), allows accurate evaluation of results. Analysis with the protein spectra basis sets (Provencher, S.W., Gl?ckner, J. Biochemistry 20:33-37, 1981) as modified (Manavalan, P., Johnson, W.C., Jr. Anal. Biochem. 167:76-85, 1987) agreed most closely with crambin's crystal structure. This method was then applied to the CD spectra of the membrane-active toxins homologous to crambin (alpha 1- and beta-purothionin, phoratoxin A and B, and viscotoxin A3 and B). The new program SEQ (pronounced "seek") was developed to assign the secondary structure along the protein chain in a hierarchical fashion and applied to the plant toxins. The method constrained the secondary structure fractions to those from CD analysis and combined standard statistical methods with amphipathic helix location. Both CD-arrived secondary structure percentages and sequence assignment indicate that the viscotoxins are structurally most similar to crambin. Purothionin's secondary structure was predicted to be fundamentally similar to crambin's with a difference at the start of the first helix. This assignment agreed with Raman and NMR analyses of purothionin and lends validity to the method presented here. Differences from the NMR in the CD secondary structure fraction analysis for phoratoxin suggest interference in the CD from tryptophan residues.  相似文献   

12.
High-throughput NMR structural biology can play an important role in structural genomics. We report an automated procedure for high-throughput NMR resonance assignment for a protein of known structure, or of a homologous structure. These assignments are a prerequisite for probing protein-protein interactions, protein-ligand binding, and dynamics by NMR. Assignments are also the starting point for structure determination and refinement. A new algorithm, called Nuclear Vector Replacement (NVR) is introduced to compute assignments that optimally correlate experimentally measured NH residual dipolar couplings (RDCs) to a given a priori whole-protein 3D structural model. The algorithm requires only uniform( 15)N-labeling of the protein and processes unassigned H(N)-(15)N HSQC spectra, H(N)-(15)N RDCs, and sparse H(N)-H(N) NOE's (d(NN)s), all of which can be acquired in a fraction of the time needed to record the traditional suite of experiments used to perform resonance assignments. NVR runs in minutes and efficiently assigns the (H(N),(15)N) backbone resonances as well as the d(NN)s of the 3D (15)N-NOESY spectrum, in O(n(3)) time. The algorithm is demonstrated on NMR data from a 76-residue protein, human ubiquitin, matched to four structures, including one mutant (homolog), determined either by x-ray crystallography or by different NMR experiments (without RDCs). NVR achieves an assignment accuracy of 92-100%. We further demonstrate the feasibility of our algorithm for different and larger proteins, using NMR data for hen lysozyme (129 residues, 97-100% accuracy) and streptococcal protein G (56 residues, 100% accuracy), matched to a variety of 3D structural models. Finally, we extend NVR to a second application, 3D structural homology detection, and demonstrate that NVR is able to identify structural homologies between proteins with remote amino acid sequences using a database of structural models.  相似文献   

13.
Desulforedoxin is a simple dimeric protein isolated from Desulfovibrio gigas containing a distorted rubredoxin-like center with one iron coordinated by four cysteinyl residues (7.9?kDa with a 36-amino-acid monomer). 1H NMR spectra of the oxidized Dx(Fe3+) and reduced Dx(Fe2+) forms were analyzed. The spectra show substantial line broadening due to the paramagnetism of iron. However, very low-field-shifted resonances, assigned to Hβ protons, were observed in the reduced state and their temperature dependence analyzed. The active site of Dx was reconstituted with zinc, and its solution structure was determined using 2D NMR methods. This diamagnetic form gave high-resolution NMR data enabling the identification of all the amino acid spin systems. Sequential assignment and the determination of secondary structural elements was attempted using 2D NOESY experiments. However, because of the symmetrical dimer nature of the protein standard, NMR sequential assignment methods could not resolve all cross peaks due to inter- and intra-chain effects. The X-ray structure enabled the spatial relationship between the monomers to be obtained, and resolved the assignment problems. Secondary structural features could be identified from the NMR data; an antiparallel β-sheet running from D5 to V18 with a well-defined β-turn around cysteines C9 and C12. The section G22 to T25 is poorly defined by the NMR data and is followed by a turn around V27-C29. The C-terminus ends up near residues V6 and Y7. Distance geometry (DG) calculations allowed families of structures to be generated from the NMR data. A family of structures with a low target function violation for the Dx monomer and dimer were found to have secondary structural elements identical to those seen in the X-ray structure. The amide protons for G4, D5, G13, L11 NH and Q14 NHε amide protons, H-bonded in the X-ray structure, were not seen by NMR as slowly exchanging, while structural disorder at the N-terminus, for the backbone at E10 and for the section G22–T25, was observed. Comparison between the Fe and Zn forms of Dx suggests that metal substitution does not have an effect on the structure of the protein.  相似文献   

14.
A procedure for automated protein structure determination is presented that is based on an iterative procedure during which the NOESY peak list assignment and the structure calculation are performed simultaneously. The input consists of a list of NOESY peak positions and a list of chemical shifts as obtained from sequence-specific resonance assignment. For the present applications of this approach the previously introduced NOAH routine was implemented in the distance geometry program DIANA. As an illustration, experimental 2D and 3D NOESY cross-peak lists of six proteins have been analyzed, for which complete sequence-specific 1H assignments are available for the polypeptide backbone and the amino acid side chains. The automated method assigned 70–90% of all NOESY cross peaks, which is on average 10% less than with the interactive approach, and only between 0.8% and 2.4% of the automatically assigned peaks had a different assignment than in the corresponding manually assigned peak lists. The structures obtained with NOAH/DIANA are in close agreement with those from manually assigned peak lists, and with both approaches the residual constraint violations correspond to high-quality NMR structure determinations. Systematic comparisons of the bundles of conformers that represent corresponding automatically and interactively determined structures document the absence of significant bias in either approach, indicating that an important step has been made towards automation of structure determination from NMR spectra.  相似文献   

15.
ASCAN is a new algorithm for automatic sequence-specific NMR assignment of amino acid side-chains in proteins, which uses as input the primary structure of the protein, chemical shift lists of (1)H(N), (15)N, (13)C(alpha), (13)C(beta) and possibly (1)H(alpha) from the previous polypeptide backbone assignment, and one or several 3D (13)C- or (15)N-resolved [(1)H,(1)H]-NOESY spectra. ASCAN has also been laid out for the use of TOCSY-type data sets as supplementary input. The program assigns new resonances based on comparison of the NMR signals expected from the chemical structure with the experimentally observed NOESY peak patterns. The core parts of the algorithm are a procedure for generating expected peak positions, which is based on variable combinations of assigned and unassigned resonances that arise for the different amino acid types during the assignment procedure, and a corresponding set of acceptance criteria for assignments based on the NMR experiments used. Expected patterns of NOESY cross peaks involving unassigned resonances are generated using the list of previously assigned resonances, and tentative chemical shift values for the unassigned signals taken from the BMRB statistics for globular proteins. Use of this approach with the 101-amino acid residue protein FimD(25-125) resulted in 84% of the hydrogen atoms and their covalently bound heavy atoms being assigned with a correctness rate of 90%. Use of these side-chain assignments as input for automated NOE assignment and structure calculation with the ATNOS/CANDID/DYANA program suite yielded structure bundles of comparable quality, in terms of precision and accuracy of the atomic coordinates, as those of a reference structure determined with interactive assignment procedures. A rationale for the high quality of the ASCAN-based structure determination results from an analysis of the distribution of the assigned side chains, which revealed near-complete assignments in the core of the protein, with most of the incompletely assigned residues located at or near the protein surface.  相似文献   

16.
Battiste JL  Wagner G 《Biochemistry》2000,39(18):5355-5365
To test whether distances derived from paramagnetic broadening of (15)N heteronuclear single quantum coherence (HSQC) resonances could be used to determine the global fold of a large, perdeuterated protein, we used site-directed spin-labeling of 5 amino acids on the surface of (15)N-labeled eukaryotic translation initiation factor 4E (eIF4E). eIF4E is a 25 kDa translation initiation protein, whose solution structure was previously solved in a 3-[(3-cholamidopropyl)dimethylammonio]-1-propanesulfonate hydrate (CHAPS) micelle of total molecular mass approximately 45-50 kDa. Distance-dependent line broadening consistent with the three-dimensional structure of eIF4E was observed for all spin-label substitutions. The paramagnetic broadening effects (PBEs) were converted into distances for modeling by a simple method comparing peak heights in (15)N-HSQC spectra before and after reduction of the nitroxide spin label with ascorbic acid. The PBEs, in combination with HN-HN nuclear Overhauser effects (NOEs) and chemical shift index (CSI) angle restraints, correctly determined the global fold of eIF4E with a backbone precision of 2.3 A (1.7 A for secondary structure elements). The global fold was not correctly determined with the HN-HN NOEs and CSI angles alone. The combination of PBEs with simulated restraints from another nuclear magnetic resonance (NMR) method for global fold determination of large proteins (methyl-protonated, highly deuterated samples) improved the quality of calculated structures. In addition, the combination of the two methods simulated from a crystal structure of an all alpha-helical protein (40 kDa farnesyl diphoshphate synthase) correctly determined the global fold where neither method individually was successful. These results show the potential feasibility of obtaining medium-resolution structures for proteins in the 40-100 kDa range via NMR.  相似文献   

17.
In NMR protein structure determination, after the resonance peaks have been identified and chemical shifts from peaks across multiple spectra have been grouped into spin systems, associating these spin systems to their host residues is the key toward the success of structural information extraction and thus the key to the success of the structure calculation. To achieve accurate enough structure calculation, a near complete and accurate assignment is a prerequisite. There are two pieces of information that can be used into the assignment, one of which is the adjacency information among the spin systems and the other is the signature information of the spin systems. The signature information reflects the fact that, generally speaking, for one type of amino acid residing in a specific local structural environment, the chemical shifts for the atoms inside the amino acid fall into some very narrow distinct ranges. In most of the existing work, normal distributions are assumed with means and standard deviations statistically collected from the available data. In this paper, we followed a simple yet effective histogram-based way to estimate for every spin system the probability that its host is a certain type of amino acid residing in a certain type of secondary structure. We used two combinations of chemical shifts to demonstrate the effectiveness of this type of histogram-based scoring schemes.  相似文献   

18.
Summary Simulated neural networks are described which aid the assignment of protein NMR spectra. A network trained to recognize amino acid type from TOCSY data was trained on 148 assigned spin systems from E. coli acyl carrier proteins (ACPs) and tested on spin systems from spinach ACP, which has a 37% sequence homology with E. coli ACP and a similar secondary structure. The output unit corresponding to the correct amino acid is one of the four most activated units in 83% of the spin systems tested. The utility of this information is illustrated by a second network which uses a constraint satisfaction algorithm to find the best fit of the spin systems to the amino acid sequence. Application to a stretch of 20 amino acids in spinach ACP results in 75% correct sequential assignment. Since the output of the amino acid type identification network can be coupled with a variety of sequential assignment strategies, the approach offers substantial potential for expediting assignment of protein NMR spectra.  相似文献   

19.
A consensus approach for the assignment of structural domains in proteins is presented. The approach combines a number of previously published algorithms, and takes advantage of the elevated accuracy obtained when assignments from the individual algorithms are in agreement. The consensus approach is tested on a data set of 55 protein chains, for which domain assignments from four automated methods were known, and for which crystallographers assignments had been reported in the literature. Accuracy was found to increase in this test from 72% using individual algorithms to 100% when all four methods were in agreement. However a consensus prediction using all four methods was only possible for 52% of the dataset. The consensus approach [using three publicly available domain assignment algorithms (PUU, DETECTIVE, DOMAK)] was then used to make domain assignments for a data set of 787 protein chains from the Protein Data Bank. Analysis of the assignments showed 55.7% of assignments could be made automatically, and of these, 13.5% were multi-domain proteins. Of the remaining 44.3% that could not be assigned by the consensus procedure 90.4% had their domain boundaries assigned correctly by at least one of the algorithms. Once identified, these domains were analyzed for trends in their size and secondary structure class. In addition, the discontinuity of each domain along the protein chain was considered.  相似文献   

20.
W Eberle  W Klaus  G Cesareni  C Sander  P R?sch 《Biochemistry》1990,29(32):7402-7407
The complete resonance assignment of the ColE1 rop (rom) protein at pH 2.3 was obtained by two-dimensional (2D) proton nuclear magnetic resonance spectroscopy (1H NMR) at 500 and 600 MHz using through-bond and through-space connectivities. Sequential assignments and elements of regular secondary structure were deduced by analysis of nuclear Overhauser enhancement spectroscopy (NOESY) experiments and 3JHN alpha coupling constants. One 7.2-kDa monomer of the homodimer consists of two antiparallel helices connected by a hairpin loop at residue 31. The C-terminal peptide consisting of amino acids 59-63 shows no stable conformation. The dimer forms a four-helix bundle with opposite polarization of neighboring elements in agreement with the X-ray structure.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号