期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Discussion on common data analysis strategies used in MS-based proteomics

Matthiesen R Azevedo L Amorim A Carvalho AS 《Proteomics》2011,11(4):604-619

Current proteomics technology is limited in resolving the proteome complexity of biological systems. The main issue at stake is to increase throughput and spectra quality so that spatiotemporal dimensions, population parameters and the complexity of protein modifications on a quantitative scale can be considered. MS-based proteomics and protein arrays are the main players in large-scale proteome analysis and an integration of these two methodologies is powerful but presently not sufficient for detailed quantitative and spatiotemporal proteome characterization. Improvements of instrumentation for MS-based proteomics have been achieved recently resulting in data sets of approximately one million spectra which is a large step in the right direction. The corresponding raw data range from 50 to 100?Gb and are frequently made available. Multidimensional LC-MS data sets have been demonstrated to identify and quantitate 2000-8000 proteins from whole cell extracts. The analysis of the resulting data sets requires several steps from raw data processing, to database-dependent search, statistical evaluation of the search result, quantitative algorithms and statistical analysis of quantitative data. A large number of software tools have been proposed for the above-mentioned tasks. However, it is not the aim of this review to cover all software tools, but rather discuss common data analysis strategies used by various algorithms for each of the above-mentioned steps in a non-redundant approach and to argue that there are still some areas which need improvements. 相似文献

2.

基于质谱的定量蛋白质组学策略和方法研究进展

下载免费PDF全文

常乘朱云平《中国科学:生命科学》2015,45(5):425-438

定量蛋白质组学已经成为组学领域研究的热点之一.相关实验技术和计算方法的不断创新极大地促进了定量蛋白质组学的飞速发展.常用的定量蛋白质组学策略按照是否需要稳定同位素标记可以分为无标定量和有标定量两大类.每类策略又产生了众多定量方法和工具,它们一方面推动了定量蛋白质组学的深入发展;另一方面,也在实验策略与技术的发展过程中不断更新.因此对这些定量实验策略和方法进行系统总结和归纳将有助于定量蛋白质组学的研究.本文主要从方法学角度全面归纳了目前定量蛋白质组学研究的相关策略和算法,详述了无标定量和有标定量的具体算法流程并比较了各自特点,还对以研究蛋白质绝对丰度为目标的绝对定量算法进行了总结,列举了常用的定量软件和工具,最后概述了定量结果的质量控制方法,对定量蛋白质组学方法发展的前景进行了展望. 相似文献

3.

Bioinformatic challenges in targeted proteomics

D Reker L Malmström 《Journal of proteome research》2012,11(9):4393-4402

Selected reaction monitoring mass spectrometry is an emerging targeted proteomics technology that allows for the investigation of complex protein samples with high sensitivity and efficiency. It requires extensive knowledge about the sample for the many parameters needed to carry out the experiment to be set appropriately. Most studies today rely on parameter estimation from prior studies, public databases, or from measuring synthetic peptides. This is efficient and sound, but in absence of prior data, de novo parameter estimation is necessary. Computational methods can be used to create an automated framework to address this problem. However, the number of available applications is still small. This review aims at giving an orientation on the various bioinformatical challenges. To this end, we state the problems in classical machine learning and data mining terms, give examples of implemented solutions and provide some room for alternatives. This will hopefully lead to an increased momentum for the development of algorithms and serve the needs of the community for computational methods. We note that the combination of such methods in an assisted workflow will ease both the usage of targeted proteomics in experimental studies as well as the further development of computational approaches. 相似文献

4.

TOPP--the OpenMS proteomics pipeline

Kohlbacher O Reinert K Gröpl C Lange E Pfeifer N Schulz-Trieglaff O Sturm M 《Bioinformatics (Oxford, England)》2007,23(2):e191-e197

MOTIVATION: Experimental techniques in proteomics have seen rapid development over the last few years. Volume and complexity of the data have both been growing at a similar rate. Accordingly, data management and analysis are one of the major challenges in proteomics. Flexible algorithms are required to handle changing experimental setups and to assist in developing and validating new methods. In order to facilitate these studies, it would be desirable to have a flexible 'toolbox' of versatile and user-friendly applications allowing for rapid construction of computational workflows in proteomics. RESULTS: We describe a set of tools for proteomics data analysis-TOPP, The OpenMS Proteomics Pipeline. TOPP provides a set of computational tools which can be easily combined into analysis pipelines even by non-experts and can be used in proteomics workflows. These applications range from useful utilities (file format conversion, peak picking) over wrapper applications for known applications (e.g. Mascot) to completely new algorithmic techniques for data reduction and data analysis. We anticipate that TOPP will greatly facilitate rapid prototyping of proteomics data evaluation pipelines. As such, we describe the basic concepts and the current abilities of TOPP and illustrate these concepts in the context of two example applications: the identification of peptides from a raw dataset through database search and the complex analysis of a standard addition experiment for the absolute quantitation of biomarkers. The latter example demonstrates TOPP's ability to construct flexible analysis pipelines in support of complex experimental setups. AVAILABILITY: The TOPP components are available as open-source software under the lesser GNU public license (LGPL). Source code is available from the project website at www.OpenMS.de 相似文献

5.

Scalable Data Analysis in Proteomics and Metabolomics Using BioContainers and Workflows Engines

Yasset Perez‐Riverol Pablo Moreno 《Proteomics》2020,20(9)

The recent improvements in mass spectrometry instruments and new analytical methods are increasing the intersection between proteomics and big data science. In addition, bioinformatics analysis is becoming increasingly complex and convoluted, involving multiple algorithms and tools. A wide variety of methods and software tools have been developed for computational proteomics and metabolomics during recent years, and this trend is likely to continue. However, most of the computational proteomics and metabolomics tools are designed as single‐tiered software application where the analytics tasks cannot be distributed, limiting the scalability and reproducibility of the data analysis. In this paper the key steps of metabolomics and proteomics data processing, including the main tools and software used to perform the data analysis, are summarized. The combination of software containers with workflows environments for large‐scale metabolomics and proteomics analysis is discussed. Finally, a new approach for reproducible and large‐scale data analysis based on BioContainers and two of the most popular workflow environments, Galaxy and Nextflow, is introduced to the proteomics and metabolomics communities. 相似文献

6.

Protein and peptide identification algorithms using MS for use in high-throughput, automated pipelines

Shadforth I Crowther D Bessant C 《Proteomics》2005,5(16):4082-4095

Current proteomics experiments can generate vast quantities of data very quickly, but this has not been matched by data analysis capabilities. Although there have been a number of recent reviews covering various aspects of peptide and protein identification methods using MS, comparisons of which methods are either the most appropriate for, or the most effective at, their proposed tasks are not readily available. As the need for high-throughput, automated peptide and protein identification systems increases, the creators of such pipelines need to be able to choose algorithms that are going to perform well both in terms of accuracy and computational efficiency. This article therefore provides a review of the currently available core algorithms for PMF, database searching using MS/MS, sequence tag searches and de novo sequencing. We also assess the relative performances of a number of these algorithms. As there is limited reporting of such information in the literature, we conclude that there is a need for the adoption of a system of standardised reporting on the performance of new peptide and protein identification algorithms, based upon freely available datasets. We go on to present our initial suggestions for the format and content of these datasets. 相似文献

7.

Normalization and quantification of differential expression in gene expression microarrays

Steinhoff C Vingron M 《Briefings in bioinformatics》2006,7(2):166-177

Array-based gene expression studies frequently serve to identify genes that are expressed differently under two or more conditions. The actual analysis of the data, however, may be hampered by a number of technical and statistical problems. Possible remedies on the level of computational analysis lie in appropriate preprocessing steps, proper normalization of the data and application of statistical testing procedures in the derivation of differentially expressed genes. This review summarizes methods that are available for these purposes and provides a brief overview of the available software tools. 相似文献

8.

A computational study of Shewanella oneidensis MR-1: structural prediction and functional inference of hypothetical proteins

Yost C Hauser L Larimer F Thompson D Beliaev A Zhou J Xu Y Xu D 《Omics : a journal of integrative biology》2003,7(2):177-191

The genomes of many organisms have been sequenced in the last 5 years. Typically about 30% of predicted genes from a newly sequenced genome cannot be given functional assignments using sequence comparison methods. In these situations three-dimensional structural predictions combined with a suite of computational tools can suggest possible functions for these hypothetical proteins. Suggesting functions may allow better interpretation of experimental data (e.g., microarray data and mass spectroscopy data) and help experimentalists design new experiments. In this paper, we focus on three hypothetical proteins of Shewanella oneidensis MR-1 that are potentially related to iron transport/metabolism based on microarray experiments. The threading program PROSPECT was used for protein structural predictions and functional annotation, in conjunction with literature search and other computational tools. Computational tools were used to perform transmembrane domain predictions, coiled coil predictions, signal peptide predictions, sub-cellular localization predictions, motif prediction, and operon structure evaluations. Combined computational results from all tools were used to predict roles for the hypothetical proteins. This method, which uses a suite of computational tools that are freely available to academic users, can be used to annotate hypothetical proteins in general. 相似文献

9.

Microarray probe expression measures, data normalization and statistical validation

Saviozzi S Calogero RA 《Comparative and Functional Genomics》2003,4(4):442-446

DNA microarray technology is a high-throughput method for gaining information on gene function. Microarray technology is based on deposition/synthesis, in an ordered manner, on a solid surface, of thousands of EST sequences/genes/oligonucleotides. Due to the high number of generated datapoints, computational tools are essential in microarray data analysis and mining to grasp knowledge from experimental results. In this review, we will focus on some of the methodologies actually available to define gene expression intensity measures, microarray data normalization, and statistical validation of differential expression. 相似文献

10.

A phase synchronization clustering algorithm for identifying interesting groups of genes from cell cycle expression data

Chang Sik Kim Cheol Soo Bae Hong Joon Tcha 《BMC bioinformatics》2008,9(1):1-22

Background

Quantitative proteomics holds great promise for identifying proteins that are differentially abundant between populations representing different physiological or disease states. A range of computational tools is now available for both isotopically labeled and label-free liquid chromatography mass spectrometry (LC-MS) based quantitative proteomics. However, they are generally not comparable to each other in terms of functionality, user interfaces, information input/output, and do not readily facilitate appropriate statistical data analysis. These limitations, along with the array of choices, present a daunting prospect for biologists, and other researchers not trained in bioinformatics, who wish to use LC-MS-based quantitative proteomics.

Results

We have developed Corra, a computational framework and tools for discovery-based LC-MS proteomics. Corra extends and adapts existing algorithms used for LC-MS-based proteomics, and statistical algorithms, originally developed for microarray data analyses, appropriate for LC-MS data analysis. Corra also adapts software engineering technologies (e.g. Google Web Toolkit, distributed processing) so that computationally intense data processing and statistical analyses can run on a remote server, while the user controls and manages the process from their own computer via a simple web interface. Corra also allows the user to output significantly differentially abundant LC-MS-detected peptide features in a form compatible with subsequent sequence identification via tandem mass spectrometry (MS/MS). We present two case studies to illustrate the application of Corra to commonly performed LC-MS-based biological workflows: a pilot biomarker discovery study of glycoproteins isolated from human plasma samples relevant to type 2 diabetes, and a study in yeast to identify in vivo targets of the protein kinase Ark1 via phosphopeptide profiling.

Conclusion

The Corra computational framework leverages computational innovation to enable biologists or other researchers to process, analyze and visualize LC-MS data with what would otherwise be a complex and not user-friendly suite of tools. Corra enables appropriate statistical analyses, with controlled false-discovery rates, ultimately to inform subsequent targeted identification of differentially abundant peptides by MS/MS. For the user not trained in bioinformatics, Corra represents a complete, customizable, free and open source computational platform enabling LC-MS-based proteomic workflows, and as such, addresses an unmet need in the LC-MS proteomics field. 相似文献

11.

计算系统生物学:理论、方法及在药物研发中的应用

刁妍妍蔡超前蒋华良李洪林《生命科学》2010,(10):1035-1042

计算系统生物学是一个多学科交叉的新兴领域,旨在通过整合海量数据建立其生物系统相互作用的复杂网络。数据的整合和模型的建立需要发展合适的数学方法和软件工具,这也是计算系统生物学的主要任务。生物系统模型有助于从整体上理解生物体的内在功能和特性。同时,生物网络模型在药物研发中的应用也越来越受到制药企业以及新药研发机构的重视,如用于特异性药物作用靶点的预测和药物毒性评估等。该文简要介绍计算系统生物学的常见网络和计算模型,以及建立模型所用的研究方法,并阐述其在建模和分析中的作用及面临的问题和挑战。相似文献

12.

Algorithmic approaches to clonal reconstruction in heterogeneous cell populations

Wazim Mohammed Ismail Etienne Nzabarushimana Haixu Tang 《Quantitative Biology.》2019,7(4):255

Background: The reconstruction of clonal haplotypes and their evolutionary history in evolving populations is a common problem in both microbial evolutionary biology and cancer biology. The clonal theory of evolution provides a theoretical framework for modeling the evolution of clones.Results: In this paper, we review the theoretical framework and assumptions over which the clonal reconstruction problem is formulated. We formally define the problem and then discuss the complexity and solution space of the problem. Various methods have been proposed to find the phylogeny that best explains the observed data. We categorize these methods based on the type of input data that they use (space-resolved or time-resolved), and also based on their computational formulation as either combinatorial or probabilistic. It is crucial to understand the different types of input data because each provides essential but distinct information for drastically reducing the solution space of the clonal reconstruction problem. Complementary information provided by single cell sequencing or from whole genome sequencing of randomly isolated clones can also improve the accuracy of clonal reconstruction. We briefly review the existing algorithms and their relationships. Finally we summarize the tools that are developed for either directly solving the clonal reconstruction problem or a related computational problem.Conclusions: In this review, we discuss the various formulations of the problem of inferring the clonal evolutionary history from allele frequeny data, review existing algorithms and catergorize them according to their problem formulation and solution approaches. We note that most of the available clonal inference algorithms were developed for elucidating tumor evolution whereas clonal reconstruction for unicellular genomes are less addressed. We conclude the review by discussing more open problems such as the lack of benchmark datasets and comparison of performance between available tools. 相似文献

13.

RIBAR and xRIBAR: Methods for reproducible relative MS/MS-based label-free protein quantification

Colaert N Gevaert K Martens L 《Journal of proteome research》2011,10(7):3183-3189

Mass spectrometry-driven proteomics is increasingly relying on quantitative analyses for biological discoveries. As a result, different methods and algorithms have been developed to perform relative or absolute quantification based on mass spectrometry data. One of the most popular quantification methods are the so-called label-free approaches, which require no special sample processing, and can even be applied retroactively to existing data sets. Of these label-free methods, the MS/MS-based approaches are most often applied, mainly because of their inherent simplicity as compared to MS-based methods. The main application of these approaches is the determination of relative protein amounts between different samples, expressed as protein ratios. However, as we demonstrate here, there are some issues with the reproducibility across replicates of these protein ratio sets obtained from the various MS/MS-based label-free methods, indicating that the existing methods are not optimally robust. We therefore present two new methods (called RIBAR and xRIBAR) that use the available MS/MS data more effectively, achieving increased robustness. Both the accuracy and the precision of our novel methods are analyzed and compared to the existing methods to illustrate the increased robustness of our new methods over existing ones. 相似文献

14.

Analysis of large-scale gene expression data. 总被引：10，自引：0，他引：10

G Sherlock 《Briefings in bioinformatics》2001,2(4):350-362

DNA microarray technology has resulted in the generation of large complex data sets, such that the bottleneck in biological investigation has shifted from data generation, to data analysis. This review discusses some of the algorithms and tools for the analysis and organisation of microarray expression data, including clustering methods, partitioning methods, and methods for correlating expression data to other biological data. 相似文献

15.

Metabolomics- and proteomics-assisted genome annotation and analysis of the draft metabolic network of Chlamydomonas reinhardtii

May P Wienkoop S Kempa S Usadel B Christian N Rupprecht J Weiss J Recuenco-Munoz L Ebenhöh O Weckwerth W Walther D 《Genetics》2008,179(1):157-166

We present an integrated analysis of the molecular repertoire of Chlamydomonas reinhardtii under reference conditions. Bioinformatics annotation methods combined with GCxGC/MS-based metabolomics and LC/MS-based shotgun proteomics profiling technologies have been applied to characterize abundant proteins and metabolites, resulting in the detection of 1069 proteins and 159 metabolites. Of the measured proteins, 204 currently do not have EST sequence support; thus a significant portion of the proteomics-detected proteins provide evidence for the validity of in silico gene models. Furthermore, the generated peptide data lend support to the validity of a number of proteins currently in the proposed model stage. By integrating genomic annotation information with experimentally identified metabolites and proteins, we constructed a draft metabolic network for Chlamydomonas. Computational metabolic modeling allowed an identification of missing enzymatic links. Some experimentally detected metabolites are not producible by the currently known and annotated enzyme set, thus suggesting entry points for further targeted gene discovery or biochemical pathway research. All data sets are made available as supplementary material as well as web-accessible databases and within the functional context via the Chlamydomonas-adapted MapMan annotation platform. Information of identified peptides is also available directly via the JGI-Chlamydomonas genomic resource database (http://genome.jgi-psf.org/Chlre3/Chlre3.home.html). 相似文献

16.

Computational systems approach towards phosphoproteomics and their downstream regulation

Di Xiao Carissa Chen Pengyi Yang 《Proteomics》2023,23(3-4):2200068

相似文献

17.

Protein function from sequence and structure data

Domingues FS Lengauer T 《Applied bioinformatics》2003,2(1):3-12

With the large amount of genomics and proteomics data that we are confronted with, computational support for the elucidation of protein function becomes more and more pressing. Many different kinds of biological data harbour signals of protein function, but these signals are often concealed. Computational methods that use protein sequence and structure data can be used for discovering these signals. They provide information that can substantially speed up experimental function elucidation. In this review we concentrate on such methods. 相似文献

18.

Basic microarray analysis: grouping and feature reduction 总被引：10，自引：0，他引：10

Raychaudhuri S Sutphin PD Chang JT Altman RB 《Trends in biotechnology》2001,19(5):189-193

DNA microarray technologies are useful for addressing a broad range of biological problems - including the measurement of mRNA expression levels in target cells. These studies typically produce large data sets that contain measurements on thousands of genes under hundreds of conditions. There is a critical need to summarize this data and to pick out the important details. The most common activities, therefore, are to group together microarray data and to reduce the number of features. Both of these activities can be done using only the raw microarray data (unsupervised methods) or using external information that provides labels for the microarray data (supervised methods). We briefly review supervised and unsupervised methods for grouping and reducing data in the context of a publicly available suite of tools called CLEAVER, and illustrate their application on a representative data set collected to study lymphoma. 相似文献

19.

Analysis and validation of proteomic data generated by tandem mass spectrometry 总被引：1，自引：0，他引：1

Nesvizhskii AI Vitek O Aebersold R 《Nature methods》2007,4(10):787-797

The analysis of the large amount of data generated in mass spectrometry-based proteomics experiments represents a significant challenge and is currently a bottleneck in many proteomics projects. In this review we discuss critical issues related to data processing and analysis in proteomics and describe available methods and tools. We place special emphasis on the elaboration of results that are supported by sound statistical arguments. 相似文献

20.

A face in the crowd: recognizing peptides through database search

Eng JK Searle BC Clauser KR Tabb DL 《Molecular & cellular proteomics : MCP》2011,10(11):R111.009522

Peptide identification via tandem mass spectrometry sequence database searching is a key method in the array of tools available to the proteomics researcher. The ability to rapidly and sensitively acquire tandem mass spectrometry data and perform peptide and protein identifications has become a commonly used proteomics analysis technique because of advances in both instrumentation and software. Although many different tandem mass spectrometry database search tools are currently available from both academic and commercial sources, these algorithms share similar core elements while maintaining distinctive features. This review revisits the mechanism of sequence database searching and discusses how various parameter settings impact the underlying search. 相似文献