首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
The increasing role played by liquid chromatography‐mass spectrometry (LC‐MS)‐based proteomics in biological discovery has led to a growing need for quality control (QC) on the LC‐MS systems. While numerous quality control tools have been developed to track the performance of LC‐MS systems based on a pre‐defined set of performance factors (e.g., mass error, retention time), the precise influence and contribution of the performance factors and their generalization property to different biological samples are not as well characterized. Here, a web‐based application (QCMAP) is developed for interactive diagnosis and prediction of the performance of LC‐MS systems across different biological sample types. Leveraging on a standardized HeLa cell sample run as QC within a multi‐user facility, predictive models are trained on a panel of commonly used performance factors to pinpoint the precise conditions to a (un)satisfactory performance in three LC‐MS systems. It is demonstrated that the learned model can be applied to predict LC‐MS system performance for brain samples generated from an independent study. By compiling these predictive models into our web‐application, QCMAP allows users to benchmark the performance of their LC‐MS systems using their own samples and identify key factors for instrument optimization. QCMAP is freely available from: http://shiny.maths.usyd.edu.au/QCMAP/ .  相似文献   

2.
The recent improvements in mass spectrometry instruments and new analytical methods are increasing the intersection between proteomics and big data science. In addition, bioinformatics analysis is becoming increasingly complex and convoluted, involving multiple algorithms and tools. A wide variety of methods and software tools have been developed for computational proteomics and metabolomics during recent years, and this trend is likely to continue. However, most of the computational proteomics and metabolomics tools are designed as single‐tiered software application where the analytics tasks cannot be distributed, limiting the scalability and reproducibility of the data analysis. In this paper the key steps of metabolomics and proteomics data processing, including the main tools and software used to perform the data analysis, are summarized. The combination of software containers with workflows environments for large‐scale metabolomics and proteomics analysis is discussed. Finally, a new approach for reproducible and large‐scale data analysis based on BioContainers and two of the most popular workflow environments, Galaxy and Nextflow, is introduced to the proteomics and metabolomics communities.  相似文献   

3.
High throughput MS‐based proteomic experiments generate large volumes of complex data and necessitate bioinformatics tools to facilitate their handling. Needs include means to archive data, to disseminate them to the scientific communities, and to organize and annotate them to facilitate their interpretation. We present here an evolution of PROTICdb, a database software that now handles MS data, including quantification. PROTICdb has been developed to be as independent as possible from tools used to produce the data. Biological samples and proteomics data are described using ontology terms. A Taverna workflow is embedded, thus permitting to automatically retrieve information related to identified proteins by querying external databases. Stored data can be displayed graphically and a “Query Builder” allows users to make sophisticated queries without knowledge on the underlying database structure. All resources can be accessed programmatically using a Java client API or RESTful web services, allowing the integration of PROTICdb in any portal. An example of application is presented, where proteins extracted from a maize leaf sample by four different methods were compared using a label‐free shotgun method. Data are available at http://moulon.inra.fr/protic/public . PROTICdb thus provides means for data storage, enrichment, and dissemination of proteomics data.  相似文献   

4.
5.
MOTIVATION: Two practical realities constrain the analysis of microarray data, mass spectra from proteomics, and biomedical infrared or magnetic resonance spectra. One is the 'curse of dimensionality': the number of features characterizing these data is in the thousands or tens of thousands. The other is the 'curse of dataset sparsity': the number of samples is limited. The consequences of these two curses are far-reaching when such data are used to classify the presence or absence of disease. RESULTS: Using very simple classifiers, we show for several publicly available microarray and proteomics datasets how these curses influence classification outcomes. In particular, even if the sample per feature ratio is increased to the recommended 5-10 by feature extraction/reduction methods, dataset sparsity can render any classification result statistically suspect. In addition, several 'optimal' feature sets are typically identifiable for sparse datasets, all producing perfect classification results, both for the training and independent validation sets. This non-uniqueness leads to interpretational difficulties and casts doubt on the biological relevance of any of these 'optimal' feature sets. We suggest an approach to assess the relative quality of apparently equally good classifiers.  相似文献   

6.
Predicting protein–protein interactions (PPIs) is a challenging task and essential to construct the protein interaction networks, which is important for facilitating our understanding of the mechanisms of biological systems. Although a number of high‐throughput technologies have been proposed to predict PPIs, there are unavoidable shortcomings, including high cost, time intensity, and inherently high false positive rates. For these reasons, many computational methods have been proposed for predicting PPIs. However, the problem is still far from being solved. In this article, we propose a novel computational method called RVM‐BiGP that combines the relevance vector machine (RVM) model and Bi‐gram Probabilities (BiGP) for PPIs detection from protein sequences. The major improvement includes (1) Protein sequences are represented using the Bi‐gram probabilities (BiGP) feature representation on a Position Specific Scoring Matrix (PSSM), in which the protein evolutionary information is contained; (2) For reducing the influence of noise, the Principal Component Analysis (PCA) method is used to reduce the dimension of BiGP vector; (3) The powerful and robust Relevance Vector Machine (RVM) algorithm is used for classification. Five‐fold cross‐validation experiments executed on yeast and Helicobacter pylori datasets, which achieved very high accuracies of 94.57 and 90.57%, respectively. Experimental results are significantly better than previous methods. To further evaluate the proposed method, we compare it with the state‐of‐the‐art support vector machine (SVM) classifier on the yeast dataset. The experimental results demonstrate that our RVM‐BiGP method is significantly better than the SVM‐based method. In addition, we achieved 97.15% accuracy on imbalance yeast dataset, which is higher than that of balance yeast dataset. The promising experimental results show the efficiency and robust of the proposed method, which can be an automatic decision support tool for future proteomics research. For facilitating extensive studies for future proteomics research, we developed a freely available web server called RVM‐BiGP‐PPIs in Hypertext Preprocessor (PHP) for predicting PPIs. The web server including source code and the datasets are available at http://219.219.62.123:8888/BiGP/ .  相似文献   

7.

Background

Quality assurance (QA) and quality control (QC) are two quality management processes that are integral to the success of metabolomics including their application for the acquisition of high quality data in any high-throughput analytical chemistry laboratory. QA defines all the planned and systematic activities implemented before samples are collected, to provide confidence that a subsequent analytical process will fulfil predetermined requirements for quality. QC can be defined as the operational techniques and activities used to measure and report these quality requirements after data acquisition.

Aim of review

This tutorial review will guide the reader through the use of system suitability and QC samples, why these samples should be applied and how the quality of data can be reported.

Key scientific concepts of review

System suitability samples are applied to assess the operation and lack of contamination of the analytical platform prior to sample analysis. Isotopically-labelled internal standards are applied to assess system stability for each sample analysed. Pooled QC samples are applied to condition the analytical platform, perform intra-study reproducibility measurements (QC) and to correct mathematically for systematic errors. Standard reference materials and long-term reference QC samples are applied for inter-study and inter-laboratory assessment of data.
  相似文献   

8.
The design and analysis of experiments using gene expression microarrays is a topic of considerable current research, and work is beginning to appear on the analysis of proteomics and metabolomics data by mass spectrometry and NMR spectroscopy. The literature in this area is evolving rapidly, and commercial software for analysis of array or proteomics data is rarely up to date, and is essentially nonexistent for metabolomics data. In this paper, I review some of the issues that should concern any biologists planning to use such high-throughput biological assay data in an experimental investigation. Technical details are kept to a minimum, and may be found in the referenced literature, as well as in the many excellent papers which space limitations prevent my describing. There are usually a number of viable options for design and analysis of such experiments, but unfortunately, there are even more non-viable ones that have been used even in the published literature. This is an area in which up-to-date knowledge of the literature is indispensable for efficient and effective design and analysis of these experiments. In general, we concentrate on relatively simple analyses, often focusing on identifying differentially expressed genes and the comparable issues in mass spectrometry and NMR spectroscopy (consistent differences in peak heights or areas for example). Complex multivariate and pattern recognition methods also need much attention, but the issues we describe in this paper must be dealt with first. The literature on analysis of proteomics and metabolomics data is as yet sparse, so the main focus of this paper will be on methods devised for analysis of gene expression data that generalize to proteomics and metabolomics, with some specific comments near the end on analysis of metabolomics data by mass spectrometry and NMR spectroscopy.  相似文献   

9.
With the rapid progress in metabolomics and sequencing technologies, more data on the metabolome of single microbes and their communities become available, revealing the potential of microorganisms to metabolize a broad range of chemical compounds. The analysis of microbial metabolomics datasets remains challenging since it inherits the technical challenges of metabolomics analysis, such as compound identification and annotation, while harboring challenges in data interpretation, such as distinguishing metabolite sources in mixed samples. This review outlines the recent advances in computational methods to analyze primary microbial metabolism: knowledge-based approaches that take advantage of metabolic and molecular networks and data-driven approaches that employ machine/deep learning algorithms in combination with large-scale datasets. These methods aim at improving metabolite identification and disentangling reciprocal interactions between microbes and metabolites. We also discuss the perspective of combining these approaches and further developments required to advance the investigation of primary metabolism in mixed microbial samples.  相似文献   

10.
11.
We present a suite of software for the complete and easy deposition of NMR data to the PDB and BMRB. This suite uses the CCPN framework and introduces a freely downloadable, graphical desktop application called CcpNmr Entry Completion Interface (ECI) for the secure editing of experimental information and associated datasets through the lifetime of an NMR project. CCPN projects can be created within the CcpNmr Analysis software or by importing existing NMR data files using the CcpNmr FormatConverter. After further data entry and checking with the ECI, the project can then be rapidly deposited to the PDBe using AutoDep, or exported as a complete deposition NMR-STAR file. In full CCPN projects created with ECI, it is straightforward to select chemical shift lists, restraint data sets, structural ensembles and all relevant associated experimental collection details, which all are or will become mandatory when depositing to the PDB. Instructions and download information for the ECI are available from the PDBe web site at http://www.ebi.ac.uk/pdbe/nmr/deposition/eci.html.  相似文献   

12.
The Laurentian Great Lakes are undergoing intensive ecological restoration in Canada and the United States. In the United States, an interagency committee was formed to facilitate implementation of quality practices for federally funded restoration projects in the Great Lakes basin. The Committee's responsibilities include developing a guidance document that will provide a common approach to the application of quality assurance and quality control (QA/QC) practices for restoration projects. The document will serve as a “how‐to” guide for ensuring data quality during each aspect of ecological restoration projects. In addition, the document will provide suggestions on linking QA/QC data with the routine project data and hints on creating detailed supporting documentation. Finally, the document will advocate integrating all components of the project, including QA/QC applications, into an overarching decision‐support framework. The guidance document is expected to be released by the U.S. EPA Great Lakes National Program Office in 2017.  相似文献   

13.
Labeling‐based proteomics is a powerful method for detection of differentially expressed proteins (DEPs). The current data analysis platform typically relies on protein‐level ratios, which is obtained by summarizing peptide‐level ratios for each protein. In shotgun proteomics, however, some proteins are quantified with more peptides than others, and this reproducibility information is not incorporated into the differential expression (DE) analysis. Here, we propose a novel probabilistic framework EBprot that directly models the peptide‐protein hierarchy and rewards the proteins with reproducible evidence of DE over multiple peptides. To evaluate its performance with known DE states, we conducted a simulation study to show that the peptide‐level analysis of EBprot provides better receiver‐operating characteristic and more accurate estimation of the false discovery rates than the methods based on protein‐level ratios. We also demonstrate superior classification performance of peptide‐level EBprot analysis in a spike‐in dataset. To illustrate the wide applicability of EBprot in different experimental designs, we applied EBprot to a dataset for lung cancer subtype analysis with biological replicates and another dataset for time course phosphoproteome analysis of EGF‐stimulated HeLa cells with multiplexed labeling. Through these examples, we show that the peptide‐level analysis of EBprot is a robust alternative to the existing statistical methods for the DE analysis of labeling‐based quantitative datasets. The software suite is freely available on the Sourceforge website http://ebprot.sourceforge.net/ . All MS data have been deposited in the ProteomeXchange with identifier PXD001426 ( http://proteomecentral.proteomexchange.org/dataset/PXD001426/ ).  相似文献   

14.
15.

Introduction

The Metabolomics Society Data Quality Task Group (DQTG) developed a questionnaire regarding quality assurance (QA) and quality control (QC) to provide baseline information about current QA and QC practices applied in the international metabolomics community.

Objectives

The DQTG has a long-term goal of promoting robust QA and QC in the metabolomics community through increased awareness via communication, outreach and education, and through the promotion of best working practices. An assessment of current QA and QC practices will serve as a foundation for future activities and development of appropriate guidelines.

Method

QA was defined as the set of procedures that are performed in advance of analysis of samples and that are used to improve data quality. QC was defined as the set of activities that a laboratory does during or immediately after analysis that are applied to demonstrate the quality of project data. A questionnaire was developed that included 70 questions covering demographic information, QA approaches and QC approaches and allowed all respondents to answer a subset or all of the questions.

Result

The DQTG questionnaire received 97 individual responses from 84 institutions in all fields of metabolomics covering NMR, LC-MS, GC-MS, and other analytical technologies.

Conclusion

There was a vast range of responses concerning the use of QA and QC approaches that indicated the limited availability of suitable training, lack of Standard Operating Procedures (SOPs) to review and make decisions on quality, and limited use of standard reference materials (SRMs) as QC materials. The DQTG QA/QC questionnaire has for the first time demonstrated that QA and QC usage is not uniform across metabolomics laboratories. Here we present recommendations on how to address the issues concerning QA and QC measurements and reporting in metabolomics.
  相似文献   

16.
Assay for transposase-accessible chromatin with high-throughput sequencing (ATAC-seq) is a technique widely used to investigate genome-wide chromatin accessibility. The recently published Omni-ATAC-seq protocol substantially improves the signal/noise ratio and reduces the input cell number. High-quality data are critical to ensure accurate analysis. Several tools have been developed for assessing sequencing quality and insertion size distribution for ATAC-seq data; however, key quality control (QC) metrics have not yet been established to accurately determine the quality of ATAC-seq data. Here, we optimized the analysis strategy for ATAC-seq and defined a series of QC metrics for ATAC-seq data, including reads under peak ratio (RUPr), background (BG), promoter enrichment (ProEn), subsampling enrichment (SubEn), and other measurements. We incorporated these QC tests into our recently developed ATAC-seq Integrative Analysis Package (AIAP) to provide a complete ATAC-seq analysis system, including quality assurance, improved peak calling, and downstream differential analysis. We demonstrated a significant improvement of sensitivity (20%–60%) in both peak calling and differential analysis by processing paired-end ATAC-seq datasets using AIAP. AIAP is compiled into Docker/Singularity, and it can be executed by one command line to generate a comprehensive QC report. We used ENCODE ATAC-seq data to benchmark and generate QC recommendations, and developed qATACViewer for the user-friendly interaction with the QC report. The software, source code, and documentation of AIAP are freely available at https://github.com/Zhang-lab/ATAC-seq_QC_analysis.  相似文献   

17.
Hi-C is a sample preparation method that enables high-throughput sequencing to capture genome-wide spatial interactions between DNA molecules. The technique has been successfully applied to solve challenging problems such as 3D structural analysis of chromatin, scaffolding of large genome assemblies and more recently the accurate resolution of metagenome-assembled genomes (MAGs). Despite continued refinements, however, preparing a Hi-C library remains a complex laboratory protocol. To avoid costly failures and maximise the odds of successful outcomes, diligent quality management is recommended. Current wet-lab methods provide only a crude assay of Hi-C library quality, while key post-sequencing quality indicators used have—thus far—relied upon reference-based read-mapping. When a reference is accessible, this reliance introduces a concern for quality, where an incomplete or inexact reference skews the resulting quality indicators. We propose a new, reference-free approach that infers the total fraction of read-pairs that are a product of proximity ligation. This quantification of Hi-C library quality requires only a modest amount of sequencing data and is independent of other application-specific criteria. The algorithm builds upon the observation that proximity ligation events are likely to create k-mers that would not naturally occur in the sample. Our software tool (qc3C) is to our knowledge the first to implement a reference-free Hi-C QC tool, and also provides reference-based QC, enabling Hi-C to be more easily applied to non-model organisms and environmental samples. We characterise the accuracy of the new algorithm on simulated and real datasets and compare it to reference-based methods.  相似文献   

18.
In high-throughput mass spectrometry proteomics, peptides and proteins are not simply identified as present or not present in a sample, rather the identifications are associated with differing levels of confidence. The false discovery rate (FDR) has emerged as an accepted means for measuring the confidence associated with identifications. We have developed the Systematic Protein Investigative Research Environment (SPIRE) for the purpose of integrating the best available proteomics methods. Two successful approaches to estimating the FDR for MS protein identifications are the MAYU and our current SPIRE methods. We present here a method to combine these two approaches to estimating the FDR for MS protein identifications into an integrated protein model (IPM). We illustrate the high quality performance of this IPM approach through testing on two large publicly available proteomics datasets. MAYU and SPIRE show remarkable consistency in identifying proteins in these datasets. Still, IPM results in a more robust FDR estimation approach and additional identifications, particularly among low abundance proteins. IPM is now implemented as a part of the SPIRE system.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号