期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Fast and Efficient XML Data Access for Next-Generation Mass Spectrometry

Hannes L. R?st Uwe Schmitt Ruedi Aebersold Lars Malmstr?m 《PloS one》2015,10(4)

Motivation

In mass spectrometry-based proteomics, XML formats such as mzML and mzXML provide an open and standardized way to store and exchange the raw data (spectra and chromatograms) of mass spectrometric experiments. These file formats are being used by a multitude of open-source and cross-platform tools which allow the proteomics community to access algorithms in a vendor-independent fashion and perform transparent and reproducible data analysis. Recent improvements in mass spectrometry instrumentation have increased the data size produced in a single LC-MS/MS measurement and put substantial strain on open-source tools, particularly those that are not equipped to deal with XML data files that reach dozens of gigabytes in size.

Results

Here we present a fast and versatile parsing library for mass spectrometric XML formats available in C++ and Python, based on the mature OpenMS software framework. Our library implements an API for obtaining spectra and chromatograms under memory constraints using random access or sequential access functions, allowing users to process datasets that are much larger than system memory. For fast access to the raw data structures, small XML files can also be completely loaded into memory. In addition, we have improved the parsing speed of the core mzML module by over 4-fold (compared to OpenMS 1.11), making our library suitable for a wide variety of algorithms that need fast access to dozens of gigabytes of raw mass spectrometric data.

Availability

Our C++ and Python implementations are available for the Linux, Mac, and Windows operating systems. All proposed modifications to the OpenMS code have been merged into the OpenMS mainline codebase and are available to the community at https://github.com/OpenMS/OpenMS. 相似文献

2.

Scavager: A Versatile Postsearch Validation Algorithm for Shotgun Proteomics Based on Gradient Boosting

Mark V. Ivanov Lev I. Levitsky Julia A. Bubis Mikhail V. Gorshkov 《Proteomics》2019,19(3)

Shotgun proteomics workflows for database protein identification typically include a combination of search engines and postsearch validation software based mostly on machine learning algorithms. Here, a new postsearch validation tool called Scavager employing CatBoost, an open‐source gradient boosting library, which shows improved efficiency compared with the other popular algorithms, such as Percolator, PeptideProphet, and Q‐ranker, is presented. The comparison is done using multiple data sets and search engines, including MSGF+, MSFragger, X!Tandem, Comet, and recently introduced IdentiPy. Implemented in Python programming language, Scavager is open‐source and freely available at https://bitbucket.org/markmipt/scavager . 相似文献

3.

DEER-PREdict: Software for efficient calculation of spin-labeling EPR and NMR data from conformational ensembles

Giulio Tesei Joo M. Martins Micha B. A. Kunze Yong Wang Ramon Crehuet Kresten Lindorff-Larsen 《PLoS computational biology》2021,17(1)

相似文献

4.

Open source clustering software 总被引：20，自引：0，他引：20

de Hoon MJ Imoto S Nolan J Miyano S 《Bioinformatics (Oxford, England)》2004,20(9):1453-1454

相似文献

5.

PanClassif: Improving pan cancer classification of single cell RNA-seq gene expression data using machine learning

《Genomics》2022,114(2):110264

Cancer is one of the major causes of human death per year. In recent years, cancer identification and classification using machine learning have gained momentum due to the availability of high throughput sequencing data. Using RNA-seq, cancer research is blooming day by day and new insights of cancer and related treatments are coming into light. In this paper, we propose PanClassif, a method that requires a very few and effective genes to detect cancer from RNA-seq data and is able to provide performance gain in several wide range machine learning classifiers. We have taken 22 types of cancer samples from The Cancer Genome Atlas (TCGA) having 8287 cancer samples and 680 normal samples. Firstly, PanClassif uses k-Nearest Neighbour (k-NN) smoothing to smooth the samples to handle noise in the data. Then effective genes are selected by Anova based test. For balancing the train data, PanClassif applies an oversampling method, SMOTE. We have performed comprehensive experiments on the datasets using several classification algorithms. Experimental results shows that PanClassif outperform existing state-of-the-art methods available and shows consistent performance for two single cell RNA-seq datasets taken from Gene Expression Omnibus (GEO). PanClassif improves performances of a wide variety of classifiers for both binary cancer prediction and multi-class cancer classification. PanClassif is available as a python package (https://pypi.org/project/panclassif/). All the source code and materials of PanClassif are available at https://github.com/Zwei-inc/panclassif. 相似文献

6.

multiplierz v2.0: A Python‐based ecosystem for shared access and analysis of native mass spectrometry data

下载免费PDF全文

William M. Alexander Scott B. Ficarro Guillaume Adelmant Jarrod A. Marto 《Proteomics》2017,17(15-16)

The continued evolution of modern mass spectrometry instrumentation and associated methods represents a critical component in efforts to decipher the molecular mechanisms which underlie normal physiology and understand how dysregulation of biological pathways contributes to human disease. The increasing scale of these experiments combined with the technological diversity of mass spectrometers presents several challenges for community‐wide data access, analysis, and distribution. Here we detail a redesigned version of multiplierz, our Python software library which leverages our common application programming interface (mzAPI) for analysis and distribution of proteomic data. New features include support for a wider range of native mass spectrometry file types, interfaces to additional database search engines, compatibility with new reporting formats, and high‐level tools to perform post‐search proteomic analyses. A GUI desktop environment, mzDesktop, provides access to multiplierz functionality through a user friendly interface. multiplierz is available for download from: https://github.com/BlaisProteomics/multiplierz ; and mzDesktop is available for download from: https://sourceforge.net/projects/multiplierz/ 相似文献

7.

Libsequence: a C++ class library for evolutionary genetic analysis

Thornton K 《Bioinformatics (Oxford, England)》2003,19(17):2325-2327

A C++ class library is available to facilitate the implementation of software for genomics and sequence polymorphism analysis. The library implements methods for data manipulation and the calculation of several statistics commonly used to analyze SNP data. The object-oriented design of the library is intended to be extensible, allowing users to design custom classes for their own needs. In addition, routines are provided to process samples generated by a widely used coalescent simulation. AVAILABILITY: The source code (in C++) is available from http://www.molpopgen.org 相似文献

8.

OMSSA Parser: An open‐source library to parse and extract data from OMSSA MS/MS search results

Harald Barsnes Steffen Huber Albert Sickmann Ingvar Eidhammer Lennart Martens 《Proteomics》2009,9(14):3772-3774

Protein identification by MS is an important technique in both gel‐based and gel‐free proteome studies. The Open Mass Spectrometry Search Algorithm (OMSSA) ( http://pubchem.ncbi.nlm.nih.gov/omssa ) is an open‐source search engine that can be used to identify MS/MS spectra acquired in these experiments. We here present a lightweight, open‐source Java software library, OMSSA Parser ( http://code.google.com/p/omssa‐parser ), which parses OMSSA omx result files into easy accessible and fully functional object models. In addition, we also provide examples illustrating the usage of our library. 相似文献

9.

The Quetzal Coalescence template library: A C++ programmers resource for integrating distributional,demographic and coalescent models

Arnaud Becheler Camille Coron Stphane Dupas 《Molecular ecology resources》2019,19(3):788-793

Genetic samples can be used to understand and predict the behaviour of species living in a fragmented and temporally changing environment. In this regard, models of coalescence conditioned to an environment through an explicit modelling of population growth and migration have been developed in recent years, and simulators implementing these models have been developed, enabling biologists to estimate parameters of interest with Approximate Bayesian Computation techniques. However, model choice remains limited, and developing new coalescence simulators is extremely time consuming because code re‐use is limited. We present Quetzal, a C++ library composed of re‐usable components, which is sufficiently general to efficiently implement a wide range of spatially explicit coalescence‐based environmental models of population genetics and to embed the simulation in an Approximate Bayesian Computation framework. Quetzal is not a simulation program, but a toolbox for programming simulators aimed at the community of scientific coders and research software engineers in molecular ecology and phylogeography. This new code resource is open‐source and available at https://becheler.github.io/pages/quetzal.html along with other documentation resources. 相似文献

10.

ESS++: a C++ objected-oriented algorithm for Bayesian stochastic search model exploration

Bottolo L Chadeau-Hyam M Hastie DI Langley SR Petretto E Tiret L Tregouet D Richardson S 《Bioinformatics (Oxford, England)》2011,27(4):587-588

SUMMARY: ESS++ is a C++ implementation of a fully Bayesian variable selection approach for single and multiple response linear regression. ESS++ works well both when the number of observations is larger than the number of predictors and in the 'large p, small n' case. In the current version, ESS++ can handle several hundred observations, thousands of predictors and a few responses simultaneously. The core engine of ESS++ for the selection of relevant predictors is based on Evolutionary Monte Carlo. Our implementation is open source, allowing community-based alterations and improvements. AVAILABILITY: C++ source code and documentation including compilation instructions are available under GNU licence at http://bgx.org.uk/software/ESS.html. 相似文献

11.

SIMLR: A Tool for Large‐Scale Genomic Analyses by Multi‐Kernel Learning

下载免费PDF全文

Bo Wang Daniele Ramazzotti Luca De Sano Junjie Zhu Emma Pierson Serafim Batzoglou 《Proteomics》2018,18(2)

SIMLR (S ingle‐cell I nterpretation via M ulti‐kernel L eaR ning), an open‐source tool that implements a novel framework to learn a sample‐to‐sample similarity measure from expression data observed for heterogenous samples, is presented here. SIMLR can be effectively used to perform tasks such as dimension reduction, clustering, and visualization of heterogeneous populations of samples. SIMLR was benchmarked against state‐of‐the‐art methods for these three tasks on several public datasets, showing it to be scalable and capable of greatly improving clustering performance, as well as providing valuable insights by making the data more interpretable via better a visualization. SIMLR is available on https://github.com/BatzoglouLabSU/SIMLR GitHub in both R and MATLAB implementations. Furthermore, it is also available as an R package on http://bioconductor.org 相似文献

12.

TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders 总被引：5，自引：0，他引：5

Majoros WH Pertea M Salzberg SL 《Bioinformatics (Oxford, England)》2004,20(16):2878-2879

We describe two new Generalized Hidden Markov Model implementations for ab initio eukaryotic gene prediction. The C/C++ source code for both is available as open source and is highly reusable due to their modular and extensible architectures. Unlike most of the currently available gene-finders, the programs are re-trainable by the end user. They are also re-configurable and include several types of probabilistic submodels which can be independently combined, such as Maximal Dependence Decomposition trees and interpolated Markov models. Both programs have been used at TIGR for the annotation of the Aspergillus fumigatus and Toxoplasma gondii genomes. AVAILABILITY: Source code and documentation are available under the open source Artistic License from http://www.tigr.org/software/pirate 相似文献

13.

XTandem Parser: An open‐source library to parse and analyse X!Tandem MS/MS search results

Thilo Muth Marc Vaudel Harald Barsnes Lennart Martens Albert Sickmann 《Proteomics》2010,10(7):1522-1524

Identification of proteins by MS plays an important role in proteomics. A crucial step concerns the identification of peptides from MS/MS spectra. The X!Tandem Project ( http://www.thegpm.org/tandem ) supplies an open‐source search engine for this purpose. In this study, we present an open‐source Java library called XTandem Parser that parses X!Tandem XML result files into an easily accessible and fully functional object model ( http://xtandem‐parser.googlecode.com ). In addition, a graphical user interface is provided that functions as a usage example and an end‐user visualization tool. 相似文献

14.

Python as a federation tool for GENESIS 3.0

Cornelis H Rodriguez AL Coop AD Bower JM 《PloS one》2012,7(1):e29018

相似文献

15.

SDMdata: A Web-Based Software Tool for Collecting Species Occurrence Records

Xiaoquan Kong Minyi Huang Renyan Duan 《PloS one》2015,10(6)

It is important to easily and efficiently obtain high quality species distribution data for predicting the potential distribution of species using species distribution models (SDMs). There is a need for a powerful software tool to automatically or semi-automatically assist in identifying and correcting errors. Here, we use Python to develop a web-based software tool (SDMdata) to easily collect occurrence data from the Global Biodiversity Information Facility (GBIF) and check species names and the accuracy of coordinates (latitude and longitude). It is an open source software (GNU Affero General Public License/AGPL licensed) allowing anyone to access and manipulate the source code. SDMdata is available online free of charge from <http://www.sdmserialsoftware.org/sdmdata/>. 相似文献

16.

pseudoQC: A Regression‐Based Simulation Software for Correction and Normalization of Complex Metabolomics and Proteomics Datasets

Shisheng Wang Hao Yang 《Proteomics》2019,19(19)

Various types of unwanted and uncontrollable signal variations in MS‐based metabolomics and proteomics datasets severely disturb the accuracies of metabolite and protein profiling. Therefore, pooled quality control (QC) samples are often employed in quality management processes, which are indispensable to the success of metabolomics and proteomics experiments, especially in high‐throughput cases and long‐term projects. However, data consistency and QC sample stability are still difficult to guarantee because of the experimental operation complexity and differences between experimenters. To make things worse, numerous proteomics projects do not take QC samples into consideration at the beginning of experimental design. Herein, a powerful and interactive web‐based software, named pseudoQC, is presented to simulate QC sample data for actual metabolomics and proteomics datasets using four different machine learning‐based regression methods. The simulated data are used for correction and normalization of the two published datasets, and the obtained results suggest that nonlinear regression methods perform better than linear ones. Additionally, the above software is available as a web‐based graphical user interface and can be utilized by scientists without a bioinformatics background. pseudoQC is open‐source software and freely available at https://www.omicsolution.org/wukong/pseudoQC/ . 相似文献

17.

EXFI: Exon and splice graph prediction without a reference genome

Jorge Langa Andone Estonba Darrell Conklin 《Ecology and evolution》2020,10(16):8880-8893

相似文献

18.

MR-Tandem: parallel X!Tandem using Hadoop MapReduce on Amazon Web Services

Pratt B Howbert JJ Tasman NI Nilsson EJ 《Bioinformatics (Oxford, England)》2012,28(1):136-137

相似文献

19.

Retracted: Bisindole‐oxadiazole hybrids,T3P mediated® ‐synthesis and appraisal of their apoptotic,antimetastatic and computational Bcl‐2 binding potential

下载免费PDF全文

《Journal of biochemical and molecular toxicology》2017,31(11)

Retraction: Kamath PR, Joseph MM, Ajees AA, et al. Bisindole‐oxadiazole hybrids, T3P mediated synthesis and appraisal of their apoptotic, antimetastatic and computational Bcl‐2 binding potential. J Biochem Mol Toxicol . 2017;31:e21962. https://doi.org/10.1002/jbt.21962 The above article from the Journal of Biochemical and Molecular Toxicology , published online on 19 July 2017 in Wiley Online Library ( https://onlinelibrary.wiley.com/doi/abs/10.1002/jbt.21962 ) and in Volume 31, Issue 11, has been retracted by agreement of the Journal Editor‐in‐Chief, Dr Hari Bhat, and Wiley Periodicals, Inc. The retraction has been agreed due to the absence of access to the original data needed to answer questions about the reliability of some of the findings presented in the paper. 相似文献

20.

OLS Client and OLS Dialog: Open Source Tools to Annotate Public Omics Datasets

下载免费PDF全文

Yasset Perez‐Riverol Tobias Ternent Maximilian Koch Harald Barsnes Olga Vrousgou Simon Jupp Juan Antonio Vizcaíno 《Proteomics》2017,17(19)

The availability of user‐friendly software to annotate biological datasets and experimental details is becoming essential in data management practices, both in local storage systems and in public databases. The Ontology Lookup Service (OLS, http://www.ebi.ac.uk/ols ) is a popular centralized service to query, browse and navigate biomedical ontologies and controlled vocabularies. Recently, the OLS framework has been completely redeveloped (version 3.0), including enhancements in the data model, like the added support for Web Ontology Language based ontologies, among many other improvements. However, the new OLS is not backwards compatible and new software tools are needed to enable access to this widely used framework now that the previous version is no longer available. We here present the OLS Client as a free, open‐source Java library to retrieve information from the new version of the OLS. It enables rapid tool creation by providing a robust, pluggable programming interface and common data model to programmatically access the OLS. The library has already been integrated and is routinely used by several bioinformatics resources and related data annotation tools. Secondly, we also introduce an updated version of the OLS Dialog (version 2.0), a Java graphical user interface that can be easily plugged into Java desktop applications to access the OLS. The software and related documentation are freely available at https://github.com/PRIDE-Utilities/ols-client and https://github.com/PRIDE-Toolsuite/ols-dialog . 相似文献