首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We develop an iterative relaxation algorithm called RIBRA for NMR protein backbone assignment. RIBRA applies nearest neighbor and weighted maximum independent set algorithms to solve the problem. To deal with noisy NMR spectral data, RIBRA is executed in an iterative fashion based on the quality of spectral peaks. We first produce spin system pairs using the spectral data without missing peaks, then the data group with one missing peak, and finally, the data group with two missing peaks. We test RIBRA on two real NMR datasets, hbSBD and hbLBD, and perfect BMRB data (with 902 proteins) and four synthetic BMRB data which simulate four kinds of errors. The accuracy of RIBRA on hbSBD and hbLBD are 91.4% and 83.6%, respectively. The average accuracy of RIBRA on perfect BMRB datasets is 98.28%, and 98.28%, 95.61%, 98.16%, and 96.28% on four kinds of synthetic datasets, respectively.  相似文献   

2.
MOTIVATION: Datasets resulting from metabolomics or metabolic profiling experiments are becoming increasingly complex. Such datasets may contain underlying factors, such as time (time-resolved or longitudinal measurements), doses or combinations thereof. Currently used biostatistics methods do not take the structure of such complex datasets into account. However, incorporating this structure into the data analysis is important for understanding the biological information in these datasets. RESULTS: We describe ASCA, a new method that can deal with complex multivariate datasets containing an underlying experimental design, such as metabolomics datasets. It is a direct generalization of analysis of variance (ANOVA) for univariate data to the multivariate case. The method allows for easy interpretation of the variation induced by the different factors of the design. The method is illustrated with a dataset from a metabolomics experiment with time and dose factors.  相似文献   

3.
SUMMARY: The development of NMR in structural proteomics requires the availability of automatic structure determination methods. Many researchers are commonly confronted with the lack of raw datasets during the validation step of such methods. In order to increase test possibilities, the NMRb web-site offers a database of NMR raw datasets, ordered by spectral characteristics. AVAILABILITY: NMRb is available from: http://nmrb.cbs.cnrs.fr. SUPPLEMENTARY INFORMATION: General organization of NMRb figure, relational model organization, and XML structure files are available from http://nmrb.cbs.cnrs.fr/nmrb-doc.html.  相似文献   

4.
1H NMR spectra from urine can yield information-rich data sets that offer important insights into many biological and biochemical phenomena. However, the quality and utility of these insights can be profoundly affected by how the NMR spectra are processed and interpreted. For instance, if the NMR spectra are incorrectly referenced or inconsistently aligned, the identification of many compounds will be incorrect. If the NMR spectra are mis-phased or if the baseline correction is flawed, the estimated concentrations of many compounds will be systematically biased. Furthermore, because NMR permits the measurement of concentrations spanning up to five orders of magnitude, several problems can arise with data analysis. For instance, signals originating from the most abundant metabolites may prove to be the least biologically relevant while signals arising from the least abundant metabolites may prove to be the most important but hardest to accurately and precisely measure. As a result, a number of data processing techniques such as scaling, transformation and normalization are often required to address these issues. Therefore, proper processing of NMR data is a critical step to correctly extract useful information in any NMR-based metabolomic study. In this review we highlight the significance, advantages and disadvantages of different NMR spectral processing steps that are common to most NMR-based metabolomic studies of urine. These include: chemical shift referencing, phase and baseline correction, spectral alignment, spectral binning, scaling and normalization. We also provide a set of recommendations for best practices regarding spectral and data processing for NMR-based metabolomic studies of biofluids, with a particular focus on urine.  相似文献   

5.
Nuclear magnetic resonance (NMR) and Mass Spectroscopy (MS) are the two most common spectroscopic analytical techniques employed in metabolomics. The large spectral datasets generated by NMR and MS are often analyzed using data reduction techniques like Principal Component Analysis (PCA). Although rapid, these methods are susceptible to solvent and matrix effects, high rates of false positives, lack of reproducibility and limited data transferability from one platform to the next. Given these limitations, a growing trend in both NMR and MS-based metabolomics is towards targeted profiling or "quantitative" metabolomics, wherein compounds are identified and quantified via spectral fitting prior to any statistical analysis.?Despite the obvious advantages of this method, targeted profiling is hindered by the time required to perform manual or computer-assisted spectral fitting. In an effort to increase data analysis throughput for NMR-based metabolomics, we have developed an automatic method for identifying and quantifying metabolites in one-dimensional (1D) proton NMR spectra. This new algorithm is capable of using carefully constructed reference spectra and optimizing thousands of variables to reconstruct experimental NMR spectra of biofluids using rules and concepts derived from physical chemistry and NMR theory. The automated profiling program has been tested against spectra of synthetic mixtures as well as biological spectra of urine, serum and cerebral spinal fluid (CSF). Our results indicate that the algorithm can correctly identify compounds with high fidelity in each biofluid sample (except for urine). Furthermore, the metabolite concentrations exhibit a very high correlation with both simulated and manually-detected values.  相似文献   

6.
BACKGROUND: Infrared spectroscopy probes the chemical composition and molecular structure of complex systems such as tissue and cells. Infrared spectroscopic imaging combines this spectral information with lateral resolution near the single-cell level. We analyzed whether this method is competitive with classic immunohistochemical methods for immunologic tissue and cells. METHODS: We recorded infrared microspectroscopic mapping datasets with a 90- x 90-microm2 aperture from a 3- x 3-mm2 unstained tissue area of human spleen. A secondary follicle containing a germinal center and a T zone were studied in more detail by infrared microspectroscopic imaging with lateral resolution near 5 mum. The results were compared with consecutive sections stained by immunoglobulin D antibodies. T and B lymphocytes were extracted from human blood and served as independent test samples. RESULTS: Cluster analysis of infrared datasets produced images that distinguished anatomical features such as primary and secondary follicles, T zones, arteries, and spleen red pulp. The assignments could be confirmed in consecutive sections by immunohistochemical staining. Main spectral variances between T and B lymphocytes in high-resolution measurements were attributed to specific spectral contributions of DNA and cytosol. CONCLUSIONS: Sensitivity and specificity of the infrared based methods are comparable to those of standard staining procedures for identification of B and T cells. However, infrared spectroscopic imaging can offer advantages in velocity, data throughput, and standardization because of minimal sample preparation. The results emphasize the potential of infrared spectroscopy as an innovative tool for the distinction of cell types, in particular in immunologic tissue.  相似文献   

7.
Data mining application to proteomic data from mass spectrometry has gained much interest in recent years. Advances made in proteomics and mass spectrometry have resulted in considerable amount of data that cannot be easily visualized or interpreted. Mass spectral proteomic datasets are typically high dimensional but with small sample size. Consequently, advanced artificial intelligence and machine learning algorithms are increasingly being used for knowledge discovery from such datasets. Their overall goal is to extract useful information that leads to the identification of protein biomarker candidates. Such biomarkers could potentially have diagnostic value as tools for early detection, diagnosis, and prognosis of many diseases. The purpose of this review is to focus on the current trends in mining mass spectral proteomic data. Special emphasis is placed on the critical steps involved in the analysis of surface-enhanced laser desorption/ionization mass spectrometry proteomic data. Examples are drawn from previously published studies and relevant data mining terminology and techniques are exlained.  相似文献   

8.
Nmrglue, an open source Python package for working with multidimensional NMR data, is described. When used in combination with other Python scientific libraries, nmrglue provides a highly flexible and robust environment for spectral processing, analysis and visualization and includes a number of common utilities such as linear prediction, peak picking and lineshape fitting. The package also enables existing NMR software programs to be readily tied together, currently facilitating the reading, writing and conversion of data stored in Bruker, Agilent/Varian, NMRPipe, Sparky, SIMPSON, and Rowland NMR Toolkit file formats. In addition to standard applications, the versatility offered by nmrglue makes the package particularly suitable for tasks that include manipulating raw spectrometer data files, automated quantitative analysis of multidimensional NMR spectra with irregular lineshapes such as those frequently encountered in the context of biomacromolecular solid-state NMR, and rapid implementation and development of unconventional data processing methods such as covariance NMR and other non-Fourier approaches. Detailed documentation, install files and source code for nmrglue are freely available at http://nmrglue.com. The source code can be redistributed and modified under the New BSD license.  相似文献   

9.
Nuclear magnetic resonance (NMR) spectroscopy allows scientists to study protein structure, dynamics and interactions in solution. A necessary first step for such applications is determining the resonance assignment, mapping spectral data to atoms and residues in the primary sequence. Automated resonance assignment algorithms rely on information regarding connectivity (e.g., through-bond atomic interactions) and amino acid type, typically using the former to determine strings of connected residues and the latter to map those strings to positions in the primary sequence. Significant ambiguity exists in both connectivity and amino acid type information. This paper focuses on the information content available in connectivity alone and develops a novel random-graph theoretic framework and algorithm for connectivity-driven NMR sequential assignment. Our random graph model captures the structure of chemical shift degeneracy, a key source of connectivity ambiguity. We then give a simple and natural randomized algorithm for finding optimal assignments as sets of connected fragments in NMR graphs. The algorithm naturally and efficiently reuses substrings while exploring connectivity choices; it overcomes local ambiguity by enforcing global consistency of all choices. By analyzing our algorithm under our random graph model, we show that it can provably tolerate relatively large ambiguity while still giving expected optimal performance in polynomial time. We present results from practical applications of the algorithm to experimental datasets from a variety of proteins and experimental set-ups. We demonstrate that our approach is able to overcome significant noise and local ambiguity in identifying significant fragments of sequential assignments.  相似文献   

10.
A nearly complete sequential resonance assignment is a key factor leading to successful protein structure determination via NMR spectroscopy. Assuming the availability of a set of NMR spectral peak lists, most of the existing assignment algorithms first use the differences between chemical shift values for common nuclei across multiple spectra to provide the evidence that some pairs of peaks should be assigned to sequentially adjacent amino acid residues in the target protein. They then use these connectivities as constraints to produce a sequential assignment. At various levels of success, these algorithms typically generate a large number of potential connectivity constraints, and it grows exponentially as the quality of spectral data decreases. A key observation used in our sequential assignment program, CISA, is that chemical shift residual signature information can be used to improve the connectivity determination, and thus to dramatically decrease the number of predicted connectivity constraints. Fewer connectivity constraints lead to less ambiguities in the sequential assignment. Extensive simulation studies on several large test datasets demonstrated that CISA is efficient and effective, compared to three most recently proposed sequential resonance assignment programs RANDOM, PACES, and MARS.  相似文献   

11.
We present a computational environment for Fast Analysis of multidimensional NMR DAta Sets (FANDAS) that allows assembling multidimensional data sets from a variety of input parameters and facilitates comparing and modifying such ??in silico?? data sets during the various stages of the NMR data analysis. The input parameters can vary from (partial) NMR assignments directly obtained from experiments to values retrieved from in silico prediction programs. The resulting predicted data sets enable a rapid evaluation of sample labeling in light of spectral resolution and structural content, using standard NMR software such as Sparky. In addition, direct comparison to experimental data sets can be used to validate NMR assignments, distinguish different molecular components, refine structural models or other parameters derived from NMR data. The method is demonstrated in the context of solid-state NMR data obtained for the cyclic nucleotide binding domain of a bacterial cyclic nucleotide-gated channel and on membrane-embedded sensory rhodopsin II. FANDAS is freely available as web portal under WeNMR (http://www.wenmr.eu/services/FANDAS).  相似文献   

12.
Recent technological advances and experimental techniques have contributed to an increasing number and size of NMR datasets. In order to scale up productivity, laboratory information management systems for handling these extensive data need to be designed and implemented. The SPINS (Standardized ProteIn Nmr Storage) Laboratory Information Management System (LIMS) addresses these needs by providing an interface for archival of complete protein NMR structure determinations, together with functionality for depositing these data to the public BioMagResBank (BMRB). The software tracks intermediate files during each step of an NMR structure-determination process, including: data collection, data processing, resonance assignments, resonance assignment validation, structure calculation, and structure validation. The underlying SPINS data dictionary allows for the integration of various third party NMR data processing and analysis software, enabling users to launch programs they are accustomed to using for each step of the structure determination process directly out of the SPINS user interface.  相似文献   

13.
14.
Chemical shifts reflect the structural environment of a certain nucleus and can be used to extract structural and dynamic information. Proper calibration is indispensable to extract such information from chemical shifts. Whereas a variety of procedures exist to verify the chemical shift calibration for proteins, no such procedure is available for RNAs to date. We present here a procedure to analyze and correct the calibration of 13C NMR data of RNAs. Our procedure uses five 13C chemical shifts as a reference, each of them found in a narrow shift range in most datasets deposited in the Biological Magnetic Resonance Bank. In 49 datasets we could evaluate the 13C calibration and detect errors or inconsistencies in RNA 13C chemical shifts based on these chemical shift reference values. More than half of the datasets (27 out of those 49) were found to be improperly referenced or contained inconsistencies. This large inconsistency rate possibly explains that no clear structure–13C chemical shift relationship has emerged for RNA so far. We were able to recalibrate or correct 17 datasets resulting in 39 usable 13C datasets. 6 new datasets from our lab were used to verify our method increasing the database to 45 usable datasets. We can now search for structure–chemical shift relationships with this improved list of 13C chemical shift data. This is demonstrated by a clear relationship between ribose 13C shifts and the sugar pucker, which can be used to predict a C2′- or C3′-endo conformation of the ribose with high accuracy. The improved quality of the chemical shift data allows statistical analysis with the potential to facilitate assignment procedures, and the extraction of restraints for structure calculations of RNA.  相似文献   

15.
Clean absorption mode NMR data acquisition is presented based on mirrored time domain sampling and widely used time-proportional phase incrementation (TPPI) for quadrature detection. The resulting NMR spectra are devoid of dispersive frequency domain peak components. Those peak components exacerbate peak identification and shift peak maxima, and thus impede automated spectral analysis. The new approach is also of unique value for obtaining clean absorption mode reduced-dimensionality projection NMR spectra, which can rapidly provide high-dimensional spectral information for high-throughput NMR structure determination.  相似文献   

16.
Tandem mass spectrometry (MS/MS) experiments yield multiple, nearly identical spectra of the same peptide in various laboratories, but proteomics researchers typically do not leverage the unidentified spectra produced in other labs to decode spectra they generate. We propose a spectral archives approach that clusters MS/MS datasets, representing similar spectra by a single consensus spectrum. Spectral archives extend spectral libraries by analyzing both identified and unidentified spectra in the same way and maintaining information about peptide spectra that are common across species and conditions. Thus archives offer both traditional library spectrum similarity-based search capabilities along with new ways to analyze the data. By developing a clustering tool, MS-Cluster, we generated a spectral archive from ~1.18 billion spectra that greatly exceeds the size of existing spectral repositories. We advocate that publicly available data should be organized into spectral archives rather than be analyzed as disparate datasets, as is mostly the case today.  相似文献   

17.
Experimental biologists are often left alone with the task to download, process, and analyze big datasets in order to perform correlation or other simpler analyses. To address these issues, we introduce EviCor, a handy toolbox for exploration of data from large public resources such as The Cancer Genome Atlas and The Cancer Cell Line Encyclopedia, complemented with follow-up information on same samples, which couples omics datasets with drug response profiles (https://www.evicor.org/). The data was processed for easy retrieval from the server-side database and includes pre-computed drug-feature correlation tables. Using information from multiple independent sources, the task-oriented web interface presents relations between phenotype, single-molecule, and pathway variables with graphical, statistical, and network analysis tools. Building custom multivariate models is enabled via user-friendly web interface and programmatic access via RESTinterface. Project code is available at https://github.com/aveviort/HyperSet.  相似文献   

18.
Information obtained from Nuclear Magnetic Resonance (NMR) experiments is encoded as a set of constraint lists when calculating three-dimensional structures for a protein. With the amount of constraint data from the world wide Protein Data Bank (wwPDB) that is now available, it is possible to do a global, large-scale analysis using only information from the constraints, without taking the coordinate information into account. This article describes such an analysis of distance constraints from NOE data based on a set of 1834 NMR PDB entries containing 1909 protein chains. In order to best represent the quality and extent of the data that is currently deposited at the wwPDB, only the original data as deposited by the authors was used, and no attempt was made to ‘clean up’ and further interpret this information. Because the constraint lists provide a single set of data, and not an ensemble of structural solutions, they are easier to analyse and provide a reduced form of structural information that is relevant for NMR analysis only. The online resource resulting from this analysis () makes it possible to check, for example, how often a particular contact occurs when assigning NOESY spectra, or to find out whether a particular sequence fragment is likely to be difficult to assign. In this respect it formalises information that scientists with experience in spectrum analysis are aware of but cannot necessarily quantify. The analysis described here illustrates the importance of depositing constraints (and all other possible NMR derived information) along with the structure coordinates, as this type of information can greatly assist the NMR community.  相似文献   

19.
20.

Background  

Microarray technology is generating huge amounts of data about the expression level of thousands of genes, or even whole genomes, across different experimental conditions. To extract biological knowledge, and to fully understand such datasets, it is essential to include external biological information about genes and gene products to the analysis of expression data. However, most of the current approaches to analyze microarray datasets are mainly focused on the analysis of experimental data, and external biological information is incorporated as a posterior process.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号