首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
The frequencies of each of the 257 468 complete protein coding sequences (CDSs) have been compiled from the taxonomical divisions of the GenBank DNA sequence database. The sum of the codons used by 8792 organisms has also been calculated. The data files can be obtained from the anonymous ftp sites of DDBJ, Kazusa and EBI. A list of the codon usage of genes and the sum of the codons used by each organism can be obtained through the web site http://www.kazusa.or.jp/codon/ . The present study also reports recent developments on the WWW site. The new web interface provides data in the CodonFrequency-compatible format as well as in the traditional table format. The use of the database is facilitated by keyword based search analysis and the availability of codon usage tables for selected genes from each species. These new tools will provide users with the ability to further analyze for variations in codon usage among different genomes.  相似文献   

3.
4.
Most existing Mass Spectra (MS) analysis programs are automatic and provide limited opportunity for editing during the interpretation. Furthermore, they rely entirely on publicly available databases for interpretation. VEMS (Virtual Expert Mass Spectrometrist) is a program for interactive analysis of peptide MS/MS spectra imported in text file format. Peaks are annotated, the monoisotopic peaks retained, and the b-and y-ion series identified in an interactive manner. The called peptide sequence is searched against a local protein database for sequence identity and peptide mass. The report compares the calculated and the experimental mass spectrum of the called peptide. The program package includes four accessory programs. VEMStrans creates protein databases in FASTA format from EST or cDNA sequence files. VEMSdata creates a virtual peptide database from FASTA files. VEMSdist displays the distribution of masses up to 5000 Da. VEMSmaldi searches singly charged peptide masses against the local database.  相似文献   

5.
In metabolomics, the rapid identification of quantitative differences between multiple biological samples remains a major challenge. While capillary electrophoresis–mass spectrometry (CE–MS) is a powerful tool to simultaneously quantify charged metabolites, reliable and easy-to-use software that is well suited to analyze CE–MS metabolic profiles is still lacking. Optimized software tools for CE–MS are needed because of the sometimes large variation in migration time between runs and the wider variety of peak shapes in CE–MS data compared with LC–MS or GC–MS. Therefore, we implemented a stand-alone application named JDAMP (Java application for Differential Analysis of Metabolite Profiles), which allows users to identify the metabolites that vary between two groups. The main features include fast calculation modules and a file converter using an original compact file format, baseline subtraction, dataset normalization and alignment, visualization on 2D plots (m/z and time axis) with matching metabolite standards, and the detection of significant differences between metabolite profiles. Moreover, it features an easy-to-use graphical user interface that requires only a few mouse-actions to complete the analysis. The interface also enables the analyst to evaluate the semiautomatic processes and interactively tune options and parameters depending on the input datasets. The confirmation of findings is available as a list of overlaid electropherograms, which is ranked using a novel difference-evaluation function that accounts for peak size and distortion as well as statistical criteria for accurate difference-detection. Overall, the JDAMP software complements other metabolomics data processing tools and permits easy and rapid detection of significant differences between multiple complex CE–MS profiles.  相似文献   

6.

Introduction

The Metabolomics Workbench Data Repository is a public repository of mass spectrometry and nuclear magnetic resonance data and metadata derived from a wide variety of metabolomics studies. The data and metadata for each study is deposited, stored, and accessed via files in the domain-specific ‘mwTab’ flat file format.

Objectives

In order to improve the accessibility, reusability, and interoperability of the data and metadata stored in ‘mwTab’ formatted files, we implemented a Python library and package. This Python package, named ‘mwtab’, is a parser for the domain-specific ‘mwTab’ flat file format, which provides facilities for reading, accessing, and writing ‘mwTab’ formatted files. Furthermore, the package provides facilities to validate both the format and required metadata elements of a given ‘mwTab’ formatted file.

Methods

In order to develop the ‘mwtab’ package we used the official ‘mwTab’ format specification. We used Git version control along with Python unit-testing framework as well as continuous integration service to run those tests on multiple versions of Python. Package documentation was developed using sphinx documentation generator.

Results

The ‘mwtab’ package provides both Python programmatic library interfaces and command-line interfaces for reading, writing, and validating ‘mwTab’ formatted files. Data and associated metadata are stored within Python dictionary- and list-based data structures, enabling straightforward, ‘pythonic’ access and manipulation of data and metadata. Also, the package provides facilities to convert ‘mwTab’ files into a JSON formatted equivalent, enabling easy reusability of the data by all modern programming languages that implement JSON parsers. The ‘mwtab’ package implements its metadata validation functionality based on a pre-defined JSON schema that can be easily specialized for specific types of metabolomics studies. The library also provides a command-line interface for interconversion between ‘mwTab’ and JSONized formats in raw text and a variety of compressed binary file formats.

Conclusions

The ‘mwtab’ package is an easy-to-use Python package that provides FAIRer utilization of the Metabolomics Workbench Data Repository. The source code is freely available on GitHub and via the Python Package Index. Documentation includes a ‘User Guide’, ‘Tutorial’, and ‘API Reference’. The GitHub repository also provides ‘mwtab’ package unit-tests via a continuous integration service.
  相似文献   

7.
mzTab is the most recent standard format developed by the Proteomics Standards Initiative. mzTab is a flexible tab‐delimited file that can capture identification and quantification results coming from MS‐based proteomics and metabolomics approaches. We here present an open‐source Java application programming interface for mzTab called jmzTab. The software allows the efficient processing of mzTab files, providing read and write capabilities, and is designed to be embedded in other software packages. The second key feature of the jmzTab model is that it provides a flexible framework to maintain the logical integrity between the metadata and the table‐based sections in the mzTab files. In this article, as two example implementations, we also describe two stand‐alone tools that can be used to validate mzTab files and to convert PRIDE XML files to mzTab. The library is freely available at http://mztab.googlecode.com .  相似文献   

8.
We here present the jmzReader library: a collection of Java application programming interfaces (APIs) to parse the most commonly used peak list and XML-based mass spectrometry (MS) data formats: DTA, MS2, MGF, PKL, mzXML, mzData, and mzML (based on the already existing API jmzML). The library is optimized to be used in conjunction with mzIdentML, the recently released standard data format for reporting protein and peptide identifications, developed by the HUPO proteomics standards initiative (PSI). mzIdentML files do not contain spectra data but contain references to different kinds of external MS data files. As a key functionality, all parsers implement a common interface that supports the various methods used by mzIdentML to reference external spectra. Thus, when developing software for mzIdentML, programmers no longer have to support multiple MS data file formats but only this one interface. The library (which includes a viewer) is open source and, together with detailed documentation, can be downloaded from http://code.google.com/p/jmzreader/.  相似文献   

9.
The application of mass spectrometry imaging (MS imaging) is rapidly growing with a constantly increasing number of different instrumental systems and software tools. The data format imzML was developed to allow the flexible and efficient exchange of MS imaging data between different instruments and data analysis software. imzML data is divided in two files which are linked by a universally unique identifier (UUID). Experimental details are stored in an XML file which is based on the HUPO-PSI format mzML. Information is provided in the form of a 'controlled vocabulary' (CV) in order to unequivocally describe the parameters and to avoid redundancy in nomenclature. Mass spectral data are stored in a binary file in order to allow efficient storage. imzML is supported by a growing number of software tools. Users will be no longer limited to proprietary software, but are able to use the processing software best suited for a specific question or application. MS imaging data from different instruments can be converted to imzML and displayed with identical parameters in one software package for easier comparison. All technical details necessary to implement imzML and additional background information is available at www.imzml.org.  相似文献   

10.
After gas chromatography-mass spectrometry (GC-MS) analysis, data processing, including retention time correction, spectral deconvolution, peak alignment, and normalization prior to statistical analysis, is an important step in metabolomics. Several commercial or free software packages have been introduced for data processing, but most of them are vendor dependent. To design a simple method for Agilent GC/MS data processing, we developed an in-house program, "CompExtractor", using Microsoft Visual Basic. We tailored the macro modules of an Agilent Chemstation and implanted them in the program. To verify the performance of CompExtractor processing, 30 samples from the three species of the genus Papaver were analyzed with Agilent 5973 MSD GC-MS. The results of CompExtractor processing were compared with those of AMDIS-SpectConnect processing by hierarchical cluster analysis (HCA) and principal component analysis (PCA). The two methods showed good classification according to their species in HCA. The PC1+PC2 scores were 54.32-63.62% for AMDIS-SpectConnect and 56.65-85.92% for CompExtractor in PCA. Although the CompExtractor processing method is an Agilent GC-MS-specific application and the target compounds must be selected first, it can extract the target compounds more precisely in the raw data file with batch mode and simultaneously assemble the matrix text file.  相似文献   

11.
The collection and conversion of 4-color fluorescent genotyping data from capillary array electrophoresis microchip devices and its conversion to a format easily and rapidly analyzed by Genetic Profiler genotyping software is presented. Microchip fluorescence intensity data are acquired and stored as 4-color tab-delimited text. These files are converted to electrophoretic signal data (ESD) files using a utility program (TEXT-to-ESD) written in C. TEXT-to-ESD generates an ESD file by converting text data to binary data and then appending a 632-byte ESD-file trailer. Up to 96 ESD files are then assembled into a run folder and imported into Genetic Profiler, where data are reduced to 4-color electropherograms and analyzed. In this manner, DNA fragment sizing data acquired with our high-speed electrophoretic microchip devices can be rapidly analyzed using robust commercial software. Additionally, the conversion program allows sizing of data with Genetic Profiler that have been preprocessed using other third-party software, such as BaseFinder.  相似文献   

12.
Nuclear magnetic resonance (NMR) and liquid chromatography-mass spectrometry (LCMS) are frequently used as technological platforms for metabolomics applications. In this study, the metabolic profiles of ripe fruits from 50 different tomato cultivars, including beef, cherry and round types, were recorded by both 1H NMR and accurate mass LC-quadrupole time-of-flight (QTOF) MS. Different analytical selectivities were found for these both profiling techniques. In fact, NMR and LCMS provided complementary data, as the metabolites detected belong to essentially different metabolic pathways. Yet, upon unsupervised multivariate analysis, both NMR and LCMS datasets revealed a clear segregation of, on the one hand, the cherry tomatoes and, on the other hand, the beef and round tomatoes. Intra-method (NMR–NMR, LCMS–LCMS) and inter-method (NMR–LCMS) correlation analyses were performed enabling the annotation of metabolites from highly correlating metabolite signals. Signals belonging to the same metabolite or to chemically related metabolites are among the highest correlations found. Inter-method correlation analysis produced highly informative and complementary information for the identification of metabolites, even in de case of low abundant NMR signals. The applied approach appears to be a promising strategy in extending the analytical capacities of these metabolomics techniques with regard to the discovery and identification of biomarkers and yet unknown metabolites. Electronic supplementary material  The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

13.
Environmental metabolomics can be described as the study of the interactions of living organisms with their natural environments at the metabolic level. Until recently, nuclear magnetic resonance (NMR) spectroscopy has been the primary bioanalytical tool for measuring metabolite levels in this field. While NMR has some specific advantages, the higher sensitivity offered by mass spectrometry (MS) is beginning to revolutionise our ability to probe environmental metabolomes. This review provides the first comprehensive overview of the use and capabilities of MS within environmental metabolomics. Its primary aims are to introduce environmental scientists to the range of MS approaches used in metabolomics and to highlight the breadth and diversity of environmental and ecological research conducted, from ecophysiology and ecotoxicology to chemical ecology. The review is structured around MS approaches: non-targeted gas chromatography–MS, non-targeted directed infusion MS, and both non-targeted and targeted liquid chromatography–MS. Each section begins with a brief introduction to the analytical method, including some advantages and limitations in the context of metabolomics research, and then exemplifies the use of that technique in environmental metabolomics. The review concludes with a discussion on some of the challenges that remain in MS based environmental metabolomics and provides recommendations for the path ahead.  相似文献   

14.
This article introduces an algorithm for the lossless compression of DNA files, which contain annotation text besides the nucleotide sequence. First a grammar is specifically designed to capture the regularities of the annotation text. A revertible transformation uses the grammar rules in order to equivalently represent the original file as a collection of parsed segments and a sequence of decisions made by the grammar parser. This decomposition enables the efficient use of state-of-the-art encoders for processing the parsed segments. The output size of the decision-making process of the grammar is optimized by extending the states to account for high-order Markovian dependencies. The practical implementation of the algorithm achieves a significant improvement when compared to the general-purpose methods currently used for DNA files.  相似文献   

15.

Environmental metabolomics can be described as the study of the interactions of living organisms with their natural environments at the metabolic level. Until recently, nuclear magnetic resonance (NMR) spectroscopy has been the primary bioanalytical tool for measuring metabolite levels in this field. While NMR has some specific advantages, the higher sensitivity offered by mass spectrometry (MS) is beginning to revolutionise our ability to probe environmental metabolomes. This review provides the first comprehensive overview of the use and capabilities of MS within environmental metabolomics. Its primary aims are to introduce environmental scientists to the range of MS approaches used in metabolomics and to highlight the breadth and diversity of environmental and ecological research conducted, from ecophysiology and ecotoxicology to chemical ecology. The review is structured around MS approaches: non-targeted gas chromatography–MS, non-targeted directed infusion MS, and both non-targeted and targeted liquid chromatography–MS. Each section begins with a brief introduction to the analytical method, including some advantages and limitations in the context of metabolomics research, and then exemplifies the use of that technique in environmental metabolomics. The review concludes with a discussion on some of the challenges that remain in MS based environmental metabolomics and provides recommendations for the path ahead.

  相似文献   

16.
17.

Background  

The Distributed Annotation System (DAS) allows merging of DNA sequence annotations from multiple sources and provides a single annotation view. A straightforward way to establish a DAS annotation server is to use the "Lightweight DAS" server (LDAS). Onto this type of server, annotations can be uploaded as flat text files in a defined format. The popular Ensembl ContigView uses the same format for the transient upload and display of user data.  相似文献   

18.
SPLICE, a software tool for the extraction of sequences fromfiles in GenBank tape format, has been developed. The programcan analyze the features table in this format and use any ofthe information provided to write the corresponding sequencesinto a standard sequence file format suitable for use with sequenceanalysis programs. Sequences that are present as several subsequentfragments in a single GenBank file, such as those encoding apeptide, can be spliced together by the program. Further, sequencesthat are present in more than one Genbank file, such as an exonwhich spans several different files, can also be spliced intoone sequence. SPLICE runs under the MS/DOS and Unix operatingsystems, can be called as a sub-process by other programs andcan process batches of files. Received on December 26, 1989; accepted on May 30, 1990  相似文献   

19.
The HUPO Proteomics Standards Initiative has developed several standardized data formats to facilitate data sharing in mass spectrometry (MS)-based proteomics. These allow researchers to report their complete results in a unified way. However, at present, there is no format to describe the final qualitative and quantitative results for proteomics and metabolomics experiments in a simple tabular format. Many downstream analysis use cases are only concerned with the final results of an experiment and require an easily accessible format, compatible with tools such as Microsoft Excel or R.We developed the mzTab file format for MS-based proteomics and metabolomics results to meet this need. mzTab is intended as a lightweight supplement to the existing standard XML-based file formats (mzML, mzIdentML, mzQuantML), providing a comprehensive summary, similar in concept to the supplemental material of a scientific publication. mzTab files can contain protein, peptide, and small molecule identifications together with experimental metadata and basic quantitative information. The format is not intended to store the complete experimental evidence but provides mechanisms to report results at different levels of detail. These range from a simple summary of the final results to a representation of the results including the experimental design. This format is ideally suited to make MS-based proteomics and metabolomics results available to a wider biological community outside the field of MS. Several software tools for proteomics and metabolomics have already adapted the format as an output format. The comprehensive mzTab specification document and extensive additional documentation can be found online.Mass spectrometry (MS)1 has become a major analysis tool in the life sciences (1). It is currently used in different modes for several “omics” approaches, proteomics and metabolomics being the most prominent. In both disciplines, one major burden in the exchange, communication, and large-scale (re-) analysis of MS-based data is the significant number of software pipelines and, consequently, heterogeneous file formats used to process, analyze, and store these experimental results, including both identification and quantification data. Publication guidelines from scientific journals and funding agencies'' requirements for public data availability have led to an increasing amount of MS-based proteomics and metabolomics data being submitted to public repositories, such as those of the ProteomeXchange consortium (2) or, in the case of metabolomics, the resources from the nascent COSMOS (Coordination of Standards in Metabolomics) initiative (3).In the past few years, the Human Proteome Organization Proteomics Standards Initiative (PSI) has developed several vendor-neutral standard data formats to overcome the representation heterogeneity. The Human Proteome Organization PSI promotes the usage of three XML file formats to fully report the data coming from MS-based proteomics experiments (including related metadata): mzML (4) to store the “primary” MS data (the spectra and chromatograms), mzIdentML (5) to report peptide identifications and inferred protein identifications, and mzQuantML (6) to store quantitative information associated with these results.Even though the existence of the PSI standard data formats represents a huge step forward, these formats cannot address all use cases related to proteomics and metabolomics data exchange and sharing equally well. During the development of mzML, mzIdentML, and mzQuantML, the main focus lay on providing an exact and comprehensive representation of the gathered results. All three formats can be used within analysis pipelines and as interchange formats between independent analysis tools. It is thus vital that these formats be capable of storing the full data and analysis that led to the results. Therefore, all three formats result in relatively complex schemas, a clear necessity for adequate representation of the complexity found in MS-based data.An inevitable drawback of this approach is that data consumers can find it difficult to quickly retrieve the required information. Several application programming interfaces (APIs) have been developed to simplify software development based on these formats (79), but profound proteomics and bioinformatics knowledge still is required in order to use them efficiently and take full advantage of the comprehensive information contained.The new file format presented here, mzTab, aims to describe the qualitative and quantitative results for MS-based proteomics and metabolomics experiments in a consistent, simpler tabular format, abstracting from the mass spectrometry details. The format contains identifications, basic quantitative information, and related metadata. With mzTab''s flexible design, it is possible to report results at different levels ranging from a simple summary or subset of the complete information (e.g. the final results) to fairly comprehensive representation of the results including the experimental design. Many downstream analysis use cases are only concerned with the final results of an experiment in an easily accessible format that is compatible with tools such as Microsoft Excel® or R (10) and can easily be adapted by existing bioinformatics tools. Therefore, mzTab is ideally suited to make MS proteomics and metabolomics results available to the wider biological community, beyond the field of MS.mzTab follows a similar philosophy as the other tab-delimited format recently developed by the PSI to represent molecular interaction data, MITAB (11). MITAB is a simpler tab-delimited format, whereas PSI-MI XML (12), the more detailed XML-based format, holds the complete evidence. The microarray community makes wide use of the format MAGE-TAB (13), another example of such a solution that can cover the main use cases and, for the sake of simplicity, is often preferred to the XML standard format MAGE-ML (14). Additionally, in MS-based proteomics, several software packages, such as Mascot (15), OMSSA (16), MaxQuant (17), OpenMS/TOPP (18, 19), and SpectraST (20), also support the export of their results in a tab-delimited format next to a more complete and complex default format. These simple formats do not contain the complete information but are nevertheless sufficient for the most frequent use cases.mzTab has been designed with the same purpose in mind. It can be used alone or in conjunction with mzML (or other related MS data formats such as mzXML (21) or text-based peak list formats such as MGF), mzIdentML, and/or mzQuantML. Several highly successful concepts taken from the development process of mzIdentML and mzQuantML were adapted to the text-based nature of mzTab.In addition, there is a trend to perform more integrated experimental workflows involving both proteomics and metabolomics data. Thus, we developed a standard format that can represent both types of information in a single file.  相似文献   

20.
Flux distribution in central metabolic pathways of Desulfovibrio vulgaris Hildenborough was examined using 13C tracer experiments. Consistent with the current genome annotation and independent evidence from enzyme activity assays, the isotopomer results from both gas chromatography-mass spectrometry (GC-MS) and Fourier transform-ion cyclotron resonance mass spectrometry (FT-ICR MS) indicate the lack of an oxidatively functional tricarboxylic acid (TCA) cycle and an incomplete pentose phosphate pathway. Results from this study suggest that fluxes through both pathways are limited to biosynthesis. The data also indicate that >80% of the lactate was converted to acetate and that the reactions involved are the primary route of energy production [NAD(P)H and ATP production]. Independently of the TCA cycle, direct cleavage of acetyl coenzyme A to CO and 5,10-methyl tetrahydrofuran also leads to production of NADH and ATP. Although the genome annotation implicates a ferredoxin-dependent oxoglutarate synthase, isotopic evidence does not support flux through this reaction in either the oxidative or the reductive mode; therefore, the TCA cycle is incomplete. FT-ICR MS was used to locate the labeled carbon distribution in aspartate and glutamate and confirmed the presence of an atypical enzyme for citrate formation suggested in previous reports [the citrate synthesized by this enzyme is the isotopic antipode of the citrate synthesized by the (S)-citrate synthase]. These findings enable a better understanding of the relation between genome annotation and actual metabolic pathways in D. vulgaris and also demonstrate that FT-ICR MS is a powerful tool for isotopomer analysis, overcoming the problems with both GC-MS and nuclear magnetic resonance spectroscopy.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号