首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Nuclear magnetic resonance (NMR) spectroscopy allows scientists to study protein structure, dynamics and interactions in solution. A necessary first step for such applications is determining the resonance assignment, mapping spectral data to atoms and residues in the primary sequence. Automated resonance assignment algorithms rely on information regarding connectivity (e.g., through-bond atomic interactions) and amino acid type, typically using the former to determine strings of connected residues and the latter to map those strings to positions in the primary sequence. Significant ambiguity exists in both connectivity and amino acid type information. This paper focuses on the information content available in connectivity alone and develops a novel random-graph theoretic framework and algorithm for connectivity-driven NMR sequential assignment. Our random graph model captures the structure of chemical shift degeneracy, a key source of connectivity ambiguity. We then give a simple and natural randomized algorithm for finding optimal assignments as sets of connected fragments in NMR graphs. The algorithm naturally and efficiently reuses substrings while exploring connectivity choices; it overcomes local ambiguity by enforcing global consistency of all choices. By analyzing our algorithm under our random graph model, we show that it can provably tolerate relatively large ambiguity while still giving expected optimal performance in polynomial time. We present results from practical applications of the algorithm to experimental datasets from a variety of proteins and experimental set-ups. We demonstrate that our approach is able to overcome significant noise and local ambiguity in identifying significant fragments of sequential assignments.  相似文献   

2.
Reliable automated NOE assignment and structure calculation on the basis of a largely complete, assigned input chemical shift list and a list of unassigned NOESY cross peaks has recently become feasible for routine NMR protein structure calculation and has been shown to yield results that are equivalent to those of the conventional, manual approach. However, these algorithms rely on the availability of a virtually complete list of the chemical shifts. This paper investigates the influence of incomplete chemical shift assignments on the reliability of NMR structures obtained with automated NOESY cross peak assignment. The program CYANA was used for combined automated NOESY assignment with the CANDID algorithm and structure calculations with torsion angle dynamics at various degrees of completeness of the chemical shift assignment which was simulated by random omission of entries in the experimental 1H chemical shift lists that had been used for the earlier, conventional structure determinations of two proteins. Sets of structure calculations were performed choosing the omitted chemical shifts randomly among all assigned hydrogen atoms, or among aromatic hydrogen atoms. For comparison, automated NOESY assignment and structure calculations were performed with the complete experimental chemical shift but under random omission of NOESY cross peaks. When heteronuclear-resolved three-dimensional NOESY spectra are available the current CANDID algorithm yields in the absence of up to about 10% of the experimental 1H chemical shifts reliable NOE assignments and three-dimensional structures that deviate by less than 2 Å from the reference structure obtained using all experimental chemical shift assignments. In contrast, the algorithm can accommodate the omission of up to 50% of the cross peaks in heteronuclear- resolved NOESY spectra without producing structures with a RMSD of more than 2 Å to the reference structure. When only homonuclear NOESY spectra are available, the algorithm is slightly more susceptible to missing data and can tolerate the absence of up to about 7% of the experimental 1H chemical shifts or of up to 30% of the NOESY peaks.Abbreviations: BmPBPA – Bombyx mori pheromone binding protein form A; CYANA – combined assignment and dynamics algorithm for NMR applications; NMR – nuclear magnetic resonance; NOE – nuclear Overhauser effect; NOESY – NOE spectroscopy; RMSD – root-mean-square deviation; WmKT – Williopsis mrakii killer toxin  相似文献   

3.
Selective methyl labeling is an extremely powerful approach to study the structure, dynamics and function of biomolecules by NMR. Despite spectacular progress in the field, such studies remain rather limited in number. One of the main obstacles remains the assignment of the methyl resonances, which is labor intensive and error prone. Typically, NOESY crosspeak patterns are manually correlated to the available crystal structure or an in silico template model of the protein. Here, we propose methyl assignment by graphing inference construct, an exhaustive search algorithm with no peak network definition requirement. In order to overcome the combinatorial problem, the exhaustive search is performed locally, i.e. for a small number of methyls connected through-space according to experimental 3D methyl NOESY data. The local network approach drastically reduces the search space. Only the best local assignments are combined to provide the final output. Assignments that match the data with comparable scores are made available to the user for cross-validation by additional experiments such as methyl-amide NOEs. Several NMR datasets for proteins in the 25–50 kDa range were used during development and for performance evaluation against the manually assigned data. We show that the algorithm is robust, reliable and greatly speeds up the methyl assignment task.  相似文献   

4.
MOTIVATION: Liquid state nuclear magnetic resonance (NMR) spectroscopy has now been well established as a method for RNA tertiary structure determination. Most of the steps involved in the determination of RNA molecules are performed using computer programs. They however, do not apply to resonance assignment being the starting point of the whole procedure. We propose a tabu search algorithm as a tool for automating this step. Nuclear overhause effect (NOE) pathway, which determines the assignment, is constructed during an analysis of possible connections between resonances within aromatic/anomeric region of two-dimensional NOESY spectrum resulting from appropriate NMR experiment. RESULTS: Computational tests demonstrate the superior performance of the tabu search algorithm as compared with the exact enumerative approach and genetic procedure applied to the experimental and simulated spectral data for RNA molecules. AVAILABILITY: The software package can be obtained upon request from Marta Szachniuk.  相似文献   

5.
We present an RNA-As-Graphs (RAG) based inverse folding algorithm, RAG-IF, to design novel RNA sequences that fold onto target tree graph topologies. The algorithm can be used to enhance our recently reported computational design pipeline (Jain et al., NAR 2018). The RAG approach represents RNA secondary structures as tree and dual graphs, where RNA loops and helices are coarse-grained as vertices and edges, opening the usage of graph theory methods to study, predict, and design RNA structures. Our recently developed computational pipeline for design utilizes graph partitioning (RAG-3D) and atomic fragment assembly (F-RAG) to design sequences to fold onto RNA-like tree graph topologies; the atomic fragments are taken from existing RNA structures that correspond to tree subgraphs. Because F-RAG may not produce the target folds for all designs, automated mutations by RAG-IF algorithm enhance the candidate pool markedly. The crucial residues for mutation are identified by differences between the predicted and the target topology. A genetic algorithm then mutates the selected residues, and the successful sequences are optimized to retain only the minimal or essential mutations. Here we evaluate RAG-IF for 6 RNA-like topologies and generate a large pool of successful candidate sequences with a variety of minimal mutations. We find that RAG-IF adds robustness and efficiency to our RNA design pipeline, making inverse folding motivated by graph topology rather than secondary structure more productive.  相似文献   

6.
The directed Hamiltonian path (DHP) problem is one of the hard computational problems for which there is no practical algorithm on a conventional computer available. Many problems, including the traveling sales person problem and the longest path problem, can be translated into the DHP problem, which implies that an algorithm for DHP can also solve all the translated problems. To study the robustness of the laboratory protocol of the pioneering DNA computing for the DHP problem performed by Leonard Adleman (1994), we investigated how the graph size, multiplicity of the Hamiltonian paths, and the size of oligonucleotides that encode the vertices would affect the laboratory procedures. We applied Adleman's protocol with 18-mer oligonucleotide per node to a graph with 8 vertices and 14 edges containing two Hamiltonian paths (Adleman used 20-mer oligonucleotides for a graph with 7 nodes, 14 edges and one Hamiltonian path). We found that depending on the graph characteristics such as the number of short cycles, the oligonucleotide size, and the hybridization conditions that used to encode the graph, the protocol should be executed with different parameters from Adleman's.  相似文献   

7.
Novel algorithms are presented for automated NOESY peak picking and NOE signal identification in homonuclear 2D and heteronuclear-resolved 3D [1H,1H]-NOESY spectra during de novoprotein structure determination by NMR, which have been implemented in the new software ATNOS (automated NOESY peak picking). The input for ATNOS consists of the amino acid sequence of the protein, chemical shift lists from the sequence-specific resonance assignment, and one or several 2D or 3D NOESY spectra. In the present implementation, ATNOS performs multiple cycles of NOE peak identification in concert with automated NOE assignment with the software CANDID and protein structure calculation with the program DYANA. In the second and subsequent cycles, the intermediate protein structures are used as an additional guide for the interpretation of the NOESY spectra. By incorporating the analysis of the raw NMR data into the process of automated de novoprotein NMR structure determination, ATNOS enables direct feedback between the protein structure, the NOE assignments and the experimental NOESY spectra. The main elements of the algorithms for NOESY spectral analysis are techniques for local baseline correction and evaluation of local noise level amplitudes, automated determination of spectrum-specific threshold parameters, the use of symmetry relations, and the inclusion of the chemical shift information and the intermediate protein structures in the process of distinguishing between NOE peaks and artifacts. The ATNOS procedure has been validated with experimental NMR data sets of three proteins, for which high-quality NMR structures had previously been obtained by interactive interpretation of the NOESY spectra. The ATNOS-based structures coincide closely with those obtained with interactive peak picking. Overall, we present the algorithms used in this paper as a further important step towards objective and efficient de novoprotein structure determination by NMR.  相似文献   

8.
基于质粒DNA匹配问题的分子算法   总被引:7,自引:0,他引:7  
给定无向图,图的最小极大匹配问题是寻找每条边都不相邻的最大集中的最小者,这个问题是著名的NP-完全问题.1994年Adleman博士首次提出用DNA计算解决NP-完全问题,以编码的DNA序列为运算对象,通过分子生物学的运算操作解决复杂的数学难题,使得NP-完全问题的求解可能得到解决.提出了基于质粒DNA的无向图的最大匹配问题的DNA分子生物算法,通过限制性内切酶的酶切和凝胶电泳完成解的产生和最终接的分离,依据分子生物学的实验手段,算法是有效并且可行的.  相似文献   

9.
Cross-referencing experimental data with our current knowledge of signaling network topologies is one central goal of mathematical modeling of cellular signal transduction networks. We present a new methodology for data-driven interrogation and training of signaling networks. While most published methods for signaling network inference operate on Bayesian, Boolean, or ODE models, our approach uses integer linear programming (ILP) on interaction graphs to encode constraints on the qualitative behavior of the nodes. These constraints are posed by the network topology and their formulation as ILP allows us to predict the possible qualitative changes (up, down, no effect) of the activation levels of the nodes for a given stimulus. We provide four basic operations to detect and remove inconsistencies between measurements and predicted behavior: (i) find a topology-consistent explanation for responses of signaling nodes measured in a stimulus-response experiment (if none exists, find the closest explanation); (ii) determine a minimal set of nodes that need to be corrected to make an inconsistent scenario consistent; (iii) determine the optimal subgraph of the given network topology which can best reflect measurements from a set of experimental scenarios; (iv) find possibly missing edges that would improve the consistency of the graph with respect to a set of experimental scenarios the most. We demonstrate the applicability of the proposed approach by interrogating a manually curated interaction graph model of EGFR/ErbB signaling against a library of high-throughput phosphoproteomic data measured in primary hepatocytes. Our methods detect interactions that are likely to be inactive in hepatocytes and provide suggestions for new interactions that, if included, would significantly improve the goodness of fit. Our framework is highly flexible and the underlying model requires only easily accessible biological knowledge. All related algorithms were implemented in a freely available toolbox SigNetTrainer making it an appealing approach for various applications.  相似文献   

10.
MOTIVATION: Clustering of protein sequences is widely used for the functional characterization of proteins. However, it is still not easy to cluster distantly-related proteins, which have only regional similarity among their sequences. It is therefore necessary to develop an algorithm for clustering such distantly-related proteins. RESULTS: We have developed a time and space efficient clustering algorithm. It uses a graph representation where its vertices and edges denote proteins and their sequence similarities above a certain cutoff score, respectively. It repeatedly partitions the graph by removing edges that have small weights, which correspond to low sequence similarities. To find the appropriate partitions, we introduce a score combining the normalized cut and a locally minimal cut capacities. Our method is applied to the entire 40,703 human proteins in SWISS-PROT and TrEMBL. The resulting clusters shows a 76% recall (20,529 proteins) of the 26,917 classified by InterPro. It also finds relationships not found by other clustering methods. AVAILABILITY: The complete result of our algorithm for all the human proteins in SWISS-PROT and TrEMBL, and other supplementary information are available at http://motif.ics.es.osaka-u.ac.jp/Ncut-KL/  相似文献   

11.
The solution structure of a novel 69 residue proteinase inhibitor, Linum usitatissimum trypsin inhibitor (LUTI), was determined using a method based on computer aided assignment of nuclear Overhauser enhancement spectroscopy (NOESY) data. The approach applied uses the program NOAH/DYANA for automatic assignment of NOESY cross-peaks. Calculations were carried out using two unassigned NOESY peak lists and a set of determined dihedral angle restraints. In addition, hydrogen bonds involving amide protons were identified during calculations using geometrical criteria and values of HN temperature coefficients. Stereospecific assignment of beta-methylene protons was carried out using a standard procedure based on nuclear Overhauser enhancement intensities and 3J(alpha)(beta) coupling constants. Further stereospecific assignment of methylene protons and diastereotopic methyl groups were established upon structure-based method available in the program GLOMSA and chemical shift calculations. The applied algorithm allowed us to assign 1968 out of 2164 peaks (91%) derived from NOESY spectra recorded in H2O and 2H2O. The final experimental data input consisted of 1609 interproton distance restraints, 88 restraints for 44 hydrogen bonds, 63 torsion angle restraints and 32 stereospecifically assigned methylene proton pairs and methyl groups. The algorithm allowed the calculation of a high precision protein structure without the laborious manual assignment of NOESY cross-peaks. For the 20 best conformers selected out of 40 refined ones in the program CNS, the calculated average pairwise rmsd values for residues 3 to 69 were 0.38 A (backbone atoms) and 1.02 A (all heavy atoms). The three-dimensional LUTI structure consists of a mixed parallel and antiparallel beta-sheet, a single alpha-helix and shows the fold of the potato 1 family of proteinase inhibitors. Compared to known structures of the family, LUTI contains Arg and Trp residues at positions P6' and P8', respectively, instead of two Arg residues, involved in the proteinase binding loop stabilization. A consequence of the ArgTrp substitution at P8' is a slightly more compact conformation of the loop relative to the protein core.  相似文献   

12.
13.
NMR spectra of large RNAs are difficult to assign because of extensive spectral overlap and unfavorable relaxation properties. Here we present a new approach to facilitate assignment of RNA spectra using a suite of four 2D-filtered/edited NOESY experiments in combination with base-type-specific isotopically labeled RNA. The filtering method was developed for use in 3D filtered NOESY experiments (Zwahlen et al., 1997), but the 2D versions are both more sensitive and easier to interpret for larger RNAs than their 3D counterparts. These experiments are also useful for identifying intermolecular NOEs in RNA-protein complexes. Applications to NOE assignment of larger RNAs and an RNA-protein complex are presented.  相似文献   

14.
In liquid chromatography-mass spectrometry (LC-MS), parts of LC peaks are often corrupted by their co-eluting peptides, which results in increased quantification variance. In this paper, we propose to apply accurate LC peak boundary detection to remove the corrupted part of LC peaks. Accurate LC peak boundary detection is achieved by checking the consistency of intensity patterns within peptide elution time ranges. In addition, we remove peptides with erroneous mass assignment through model fitness check, which compares observed intensity patterns to theoretically constructed ones. The proposed algorithm can significantly improve the accuracy and precision of peptide ratio measurements.  相似文献   

15.
Repeated games and direct reciprocity under active linking   总被引:2,自引:1,他引:1  
Direct reciprocity relies on repeated encounters between the same two individuals. Here we examine the evolution of cooperation under direct reciprocity in dynamically structured populations. Individuals occupy the vertices of a graph, undergoing repeated interactions with their partners via the edges of the graph. Unlike the traditional approach to evolutionary game theory, where individuals meet at random and have no control over the frequency or duration of interactions, we consider a model in which individuals differ in the rate at which they seek new interactions. Moreover, once a link between two individuals has formed, the productivity of this link is evaluated. Links can be broken off at different rates. Whenever the active dynamics of links is sufficiently fast, population structure leads to a simple transformation of the payoff matrix, effectively changing the game under consideration, and hence paving the way for reciprocators to dominate defectors. We derive analytical conditions for evolutionary stability.  相似文献   

16.
A multivariate data-representation of a portion of the H-NOESY spectrum of an RNA octamer duplex was used to explore the possibility of using Principal Component Analysis and Partial Least Squares Discrimination for pattern recognition. In this case, it is found that the methods can: (i) distinguish slices containing signal from those containing only noise, (ii) locate slices containing overlapping signals, and (iii) in some cases to segregate slices with unique aspects such as those from terminal nucleotides, overlapping signals, purine-H8, pyrimidine-H6 and adenine-H2 containing slices. These properties can easily be included in a scheme to automate spectral analysis. The formulation described here does not distinguish patterns needed to automate sequential assignment of resonances in NOESY spectra of RNA.  相似文献   

17.
18.

Background  

We consider the problem of identifying the dynamic interactions in biochemical networks from noisy experimental data. Typically, approaches for solving this problem make use of an estimation algorithm such as the well-known linear Least-Squares (LS) estimation technique. We demonstrate that when time-series measurements are corrupted by white noise and/or drift noise, more accurate and reliable identification of network interactions can be achieved by employing an estimation algorithm known as Constrained Total Least Squares (CTLS). The Total Least Squares (TLS) technique is a generalised least squares method to solve an overdetermined set of equations whose coefficients are noisy. The CTLS is a natural extension of TLS to the case where the noise components of the coefficients are correlated, as is usually the case with time-series measurements of concentrations and expression profiles in gene networks.  相似文献   

19.
Bu D  Zhao Y  Cai L  Xue H  Zhu X  Lu H  Zhang J  Sun S  Ling L  Zhang N  Li G  Chen R 《Nucleic acids research》2003,31(9):2443-2450
Interaction detection methods have led to the discovery of thousands of interactions between proteins, and discerning relevance within large-scale data sets is important to present-day biology. Here, a spectral method derived from graph theory was introduced to uncover hidden topological structures (i.e. quasi-cliques and quasi-bipartites) of complicated protein-protein interaction networks. Our analyses suggest that these hidden topological structures consist of biologically relevant functional groups. This result motivates a new method to predict the function of uncharacterized proteins based on the classification of known proteins within topological structures. Using this spectral analysis method, 48 quasi-cliques and six quasi-bipartites were isolated from a network involving 11,855 interactions among 2617 proteins in budding yeast, and 76 uncharacterized proteins were assigned functions.  相似文献   

20.
Combined automated NOE assignment and structure determination module (CANDID) is a new software for efficient NMR structure determination of proteins by automated assignment of the NOESY spectra. CANDID uses an iterative approach with multiple cycles of NOE cross-peak assignment and protein structure calculation using the fast DYANA torsion angle dynamics algorithm, so that the result from each CANDID cycle consists of exhaustive, possibly ambiguous NOE cross-peak assignments in all available spectra and a three-dimensional protein structure represented by a bundle of conformers. The input for the first CANDID cycle consists of the amino acid sequence, the chemical shift list from the sequence-specific resonance assignment, and listings of the cross-peak positions and volumes in one or several two, three or four-dimensional NOESY spectra. The input for the second and subsequent CANDID cycles contains the three-dimensional protein structure from the previous cycle, in addition to the complete input used for the first cycle. CANDID includes two new elements that make it robust with respect to the presence of artifacts in the input data, i.e. network-anchoring and constraint-combination, which have a key role in de novo protein structure determinations for the successful generation of the correct polypeptide fold by the first CANDID cycle. Network-anchoring makes use of the fact that any network of correct NOE cross-peak assignments forms a self-consistent set; the initial, chemical shift-based assignments for each individual NOE cross-peak are therefore weighted by the extent to which they can be embedded into the network formed by all other NOE cross-peak assignments. Constraint-combination reduces the deleterious impact of artifact NOE upper distance constraints in the input for a protein structure calculation by combining the assignments for two or several peaks into a single upper limit distance constraint, which lowers the probability that the presence of an artifact peak will influence the outcome of the structure calculation. CANDID test calculations were performed with NMR data sets of four proteins for which high-quality structures had previously been solved by interactive protocols, and they yielded comparable results to these reference structure determinations with regard to both the residual constraint violations, and the precision and accuracy of the atomic coordinates. The CANDID approach has further been validated by de novo NMR structure determinations of four additional proteins. The experience gained in these calculations shows that once nearly complete sequence-specific resonance assignments are available, the automated CANDID approach results in greatly enhanced efficiency of the NOESY spectral analysis. The fact that the correct fold is obtained in cycle 1 of a de novo structure calculation is the single most important advance achieved with CANDID, when compared with previously proposed automated NOESY assignment methods that do not use network-anchoring and constraint-combination.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号