首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
High-throughput genome sequencing continues to accelerate the rate at which complete genomes are available for biological research. Many of these new genome sequences have little or no genome annotation currently available and hence rely upon computational predictions of protein coding genes. Evidence of translation from proteomic techniques could facilitate experimental validation of protein coding genes, but the techniques for whole genome searching with MS/MS data have not been adequately developed to date. Here we describe GENQUEST, a novel method using peptide isoelectric focusing and accurate mass to greatly reduce the peptide search space, making fast, accurate, and sensitive whole human genome searching possible on common desktop computers. In an initial experiment, almost all exonic peptides identified in a protein database search were identified when searching genomic sequence. Many peptides identified exclusively in the genome searches were incorrectly identified or could not be experimentally validated, highlighting the importance of orthogonal validation. Experimentally validated peptides exclusive to the genomic searches can be used to reannotate protein coding genes. GENQUEST represents an experimental tool that can be used by the proteomics community at large for validating computational approaches to genome annotation.  相似文献   

2.
Mass spectrometry is a technique widely employed for the identification and characterization of proteins. The role of bioinformatics is fundamental for the elaboration of mass spectrometry data due to the amount of data that this technique can produce. To process data efficiently, new software packages and algorithms are continuously being developed to improve protein identification and characterization in terms of high-throughput and statistical accuracy. However, many limitations exist concerning bioinformatics spectral data elaboration. This review aims to critically cover the recent and future developments of new bioinformatics approaches in mass spectrometry data analysis for proteomics studies.  相似文献   

3.
Mass spectrometry is a technique widely employed for the identification and characterization of proteins. The role of bioinformatics is fundamental for the elaboration of mass spectrometry data due to the amount of data that this technique can produce. To process data efficiently, new software packages and algorithms are continuously being developed to improve protein identification and characterization in terms of high-throughput and statistical accuracy. However, many limitations exist concerning bioinformatics spectral data elaboration. This review aims to critically cover the recent and future developments of new bioinformatics approaches in mass spectrometry data analysis for proteomics studies.  相似文献   

4.
Proteomic analysis of biological samples plays an increasing role in modern research. Although the application of proteomics technologies varies across many disciplines, proteomics largely is a tool for discovery that then leads to novel hypotheses. In recent years, new methods and technologies have been developed and applied in many areas of proteomics, and there is a strong push towards using proteomics in a quantitative manner. Indeed, mass spectrometry-based, quantitative proteomics approaches have been applied to great success in a variety of biochemical studies. In particular, the use of quantitative proteomics provides new insights into protein complexes and post-translational modifications and leads to the generation of novel insights into these important biochemical systems.  相似文献   

5.
The recent improvements in mass spectrometry instruments and new analytical methods are increasing the intersection between proteomics and big data science. In addition, bioinformatics analysis is becoming increasingly complex and convoluted, involving multiple algorithms and tools. A wide variety of methods and software tools have been developed for computational proteomics and metabolomics during recent years, and this trend is likely to continue. However, most of the computational proteomics and metabolomics tools are designed as single‐tiered software application where the analytics tasks cannot be distributed, limiting the scalability and reproducibility of the data analysis. In this paper the key steps of metabolomics and proteomics data processing, including the main tools and software used to perform the data analysis, are summarized. The combination of software containers with workflows environments for large‐scale metabolomics and proteomics analysis is discussed. Finally, a new approach for reproducible and large‐scale data analysis based on BioContainers and two of the most popular workflow environments, Galaxy and Nextflow, is introduced to the proteomics and metabolomics communities.  相似文献   

6.
Recently a number of computational approaches have been developed for the prediction of protein–protein interactions. Complete genome sequencing projects have provided the vast amount of information needed for these analyses. These methods utilize the structural, genomic, and biological context of proteins and genes in complete genomes to predict protein interaction networks and functional linkages between proteins. Given that experimental techniques remain expensive, time-consuming, and labor-intensive, these methods represent an important advance in proteomics. Some of these approaches utilize sequence data alone to predict interactions, while others combine multiple computational and experimental datasets to accurately build protein interaction maps for complete genomes. These methods represent a complementary approach to current high-throughput projects whose aim is to delineate protein interaction maps in complete genomes. We will describe a number of computational protocols for protein interaction prediction based on the structural, genomic, and biological context of proteins in complete genomes, and detail methods for protein interaction network visualization and analysis.  相似文献   

7.
The objective of proteomics is to get an overview of the proteins expressed at a given point in time in a given tissue and to identify the connection to the biochemical status of that tissue. Therefore sample throughput and analysis time are important issues in proteomics. The concept of proteomics is to encircle the identity of proteins of interest. However, the overall relation between proteins must also be explained. Classical proteomics consist of separation and characterization, based on two-dimensional electrophoresis, trypsin digestion, mass spectrometry and database searching. Characterization includes labor intensive work in order to manage, handle and analyze data. The field of classical proteomics should therefore be extended to also include handling of large datasets in an objective way. The separation obtained by two-dimensional electrophoresis and mass spectrometry gives rise to huge amount of data. We present a multivariate approach to the handling of data in proteomics with the advantage that protein patterns can be spotted at an early stage and consequently the proteins selected for sequencing can be selected intelligently. These methods can also be applied to other data generating protein analysis methods like mass spectrometry and near infrared spectroscopy and examples of application to these techniques are also presented. Multivariate data analysis can unravel complicated data structures and may thereby relieve the characterization phase in classical proteomics. Traditionally statistical methods are not suitable for analysis of the huge amounts of data, where the number of variables exceed the number of objects. Multivariate data analysis, on the other hand, may uncover the hidden structures present in these data. This study takes its starting point in the field of classical proteomics and shows how multivariate data analysis can lead to faster ways of finding interesting proteins. Multivariate analysis has shown interesting results as a supplement to classical proteomics and added a new dimension to the field of proteomics.  相似文献   

8.
9.
Nesvizhskii AI 《Proteomics》2012,12(10):1639-1655
Analysis of protein interaction networks and protein complexes using affinity purification and mass spectrometry (AP/MS) is among most commonly used and successful applications of proteomics technologies. One of the foremost challenges of AP/MS data is a large number of false-positive protein interactions present in unfiltered data sets. Here we review computational and informatics strategies for detecting specific protein interaction partners in AP/MS experiments, with a focus on incomplete (as opposite to genome wide) interactome mapping studies. These strategies range from standard statistical approaches, to empirical scoring schemes optimized for a particular type of data, to advanced computational frameworks. The common denominator among these methods is the use of label-free quantitative information such as spectral counts or integrated peptide intensities that can be extracted from AP/MS data. We also discuss related issues such as combining multiple biological or technical replicates, and dealing with data generated using different tagging strategies. Computational approaches for benchmarking of scoring methods are discussed, and the need for generation of reference AP/MS data sets is highlighted. Finally, we discuss the possibility of more extended modeling of experimental AP/MS data, including integration with external information such as protein interaction predictions based on functional genomics data.  相似文献   

10.
Mass spectrometry-driven proteomics is increasingly relying on quantitative analyses for biological discoveries. As a result, different methods and algorithms have been developed to perform relative or absolute quantification based on mass spectrometry data. One of the most popular quantification methods are the so-called label-free approaches, which require no special sample processing, and can even be applied retroactively to existing data sets. Of these label-free methods, the MS/MS-based approaches are most often applied, mainly because of their inherent simplicity as compared to MS-based methods. The main application of these approaches is the determination of relative protein amounts between different samples, expressed as protein ratios. However, as we demonstrate here, there are some issues with the reproducibility across replicates of these protein ratio sets obtained from the various MS/MS-based label-free methods, indicating that the existing methods are not optimally robust. We therefore present two new methods (called RIBAR and xRIBAR) that use the available MS/MS data more effectively, achieving increased robustness. Both the accuracy and the precision of our novel methods are analyzed and compared to the existing methods to illustrate the increased robustness of our new methods over existing ones.  相似文献   

11.
Liquid chromatography (LC) coupled to electrospray mass spectrometry (MS) is well established in high-throughput proteomics. The technology enables rapid identification of large numbers of proteins in a relatively short time. Comparative quantification of identified proteins from different samples is often regarded as the next step in proteomics experiments enabling the comparison of protein expression in different proteomes. Differential labeling of samples using stable isotope incorporation or conjugation is commonly used to compare protein levels between samples but these procedures are difficult to carry out in the laboratory and for large numbers of samples. Recently, comparative quantification of label-free LC(n)-MS proteomics data has emerged as an alternative approach. In this review, we discuss different computational approaches for extracting comparative quantitative information from label-free LC(n)-MS proteomics data. The procedure for computationally recovering the quantitative information is described. Furthermore, statistical tests used to evaluate the relevance of results will also be discussed.  相似文献   

12.
The emerging field of systems biology seeks to develop novel approaches to integrate heterogeneous data sources for effective analysis of complex living systems. Systemic studies of mitochondria have generated a large number of proteomic data sets in numerous species, including yeast, plant, mouse, rat, and human. Beyond component identification, mitochondrial proteomics is recognized as a powerful tool for diagnosing and characterizing complex diseases associated with these organelles. Various proteomic techniques for isolation and purification of proteins have been developed; each tailored to preserve protein properties relevant to study of a particular disease type. Examples of such techniques include immunocapture, which minimizes loss of posttranslational modification, 4-iodobutyltriphenylphosphonium labeling, which quantifies protein redox states, and surface-enhanced laser desorption ionization-time-of-flight mass spectrometry, which allows sequence-specific binding. With the rapidly increasing number of discovered molecular components, computational models are also being developed to facilitate the organization and analysis of such data. Computational models of mitochondria have been accomplished with top-down and bottom-up approaches and have been steadily improved in size and scope. Results from top-down methods tend to be more qualitative but are unbiased by prior knowledge about the system. Bottom-up methods often require the incorporation of a large amount of existing data but provide more rigorous and quantitative information, which can be used as hypotheses for subsequent experimental studies. Successes and limitations of the studies reviewed here provide opportunities and challenges that must be addressed to facilitate the application of systems biology to larger systems. constraint-based modeling; kinetics-based modeling; data integration; standards; bioinformatics  相似文献   

13.
Shen C  Li L  Chen JY 《Proteins》2006,64(2):436-443
Experimental processes to collect and process proteomics data are increasingly complex, and the computational methods to assess the quality and significance of these data remain unsophisticated. These challenges have led to many biological oversights and computational misconceptions. We developed an empirical Bayes model to analyze multiprotein complex (MPC) proteomics data derived from peptide mass spectrometry detections of purified protein complex pull-down experiments. Using our model and two yeast proteomics data sets, we estimated that there should be an average of about 20 true associations per MPC, almost 10 times as high as was previously estimated. For data sets generated to mimic a real proteome, our model achieved on average 80% sensitivity in detecting true associations, as compared with the 3% sensitivity in previous work, while maintaining a comparable false discovery rate of 0.3%. Cross-examination of our results with protein complexes confirmed by various experimental techniques demonstrates that many true associations that cannot be identified by previous approach are identified by our method.  相似文献   

14.

Background

Quantitative proteomics holds great promise for identifying proteins that are differentially abundant between populations representing different physiological or disease states. A range of computational tools is now available for both isotopically labeled and label-free liquid chromatography mass spectrometry (LC-MS) based quantitative proteomics. However, they are generally not comparable to each other in terms of functionality, user interfaces, information input/output, and do not readily facilitate appropriate statistical data analysis. These limitations, along with the array of choices, present a daunting prospect for biologists, and other researchers not trained in bioinformatics, who wish to use LC-MS-based quantitative proteomics.

Results

We have developed Corra, a computational framework and tools for discovery-based LC-MS proteomics. Corra extends and adapts existing algorithms used for LC-MS-based proteomics, and statistical algorithms, originally developed for microarray data analyses, appropriate for LC-MS data analysis. Corra also adapts software engineering technologies (e.g. Google Web Toolkit, distributed processing) so that computationally intense data processing and statistical analyses can run on a remote server, while the user controls and manages the process from their own computer via a simple web interface. Corra also allows the user to output significantly differentially abundant LC-MS-detected peptide features in a form compatible with subsequent sequence identification via tandem mass spectrometry (MS/MS). We present two case studies to illustrate the application of Corra to commonly performed LC-MS-based biological workflows: a pilot biomarker discovery study of glycoproteins isolated from human plasma samples relevant to type 2 diabetes, and a study in yeast to identify in vivo targets of the protein kinase Ark1 via phosphopeptide profiling.

Conclusion

The Corra computational framework leverages computational innovation to enable biologists or other researchers to process, analyze and visualize LC-MS data with what would otherwise be a complex and not user-friendly suite of tools. Corra enables appropriate statistical analyses, with controlled false-discovery rates, ultimately to inform subsequent targeted identification of differentially abundant peptides by MS/MS. For the user not trained in bioinformatics, Corra represents a complete, customizable, free and open source computational platform enabling LC-MS-based proteomic workflows, and as such, addresses an unmet need in the LC-MS proteomics field.  相似文献   

15.
Function prediction of uncharacterized protein sequences generated by genome projects has emerged as an important focus for computational biology. We have categorized several approaches beyond traditional sequence similarity that utilize the overwhelmingly large amounts of available data for computational function prediction, including structure-, association (genomic context)-, interaction (cellular context)-, process (metabolic context)-, and proteomics-experiment-based methods. Because they incorporate structural and experimental data that is not used in sequence-based methods, they can provide additional accuracy and reliability to protein function prediction. Here, first we review the definition of protein function. Then the recent developments of these methods are introduced with special focus on the type of predictions that can be made. The need for further development of comprehensive systems biology techniques that can utilize the ever-increasing data presented by the genomics and proteomics communities is emphasized. For the readers' convenience, tables of useful online resources in each category are included. The role of computational scientists in the near future of biological research and the interplay between computational and experimental biology are also addressed.  相似文献   

16.
Hjerrild M  Gammeltoft S 《FEBS letters》2006,580(20):4764-4770
Protein phosphorylation is important for regulation of most biological functions and up to 50% of all proteins are thought to be modified by protein kinases. Increased knowledge about potential phosphorylation of a protein may increase our understanding of the molecular processes in which it takes part. Despite the importance of protein phosphorylation, identification of phosphoproteins and localization of phosphorylation sites is still a major challenge in proteomics. However, high-throughput methods for identification of phosphoproteins are being developed, in particular within the fields of bioinformatics and mass spectrometry. In this review, we present a toolbox of current technology applied in phosphoproteomics including computational prediction, chemical approaches and mass spectrometry-based analysis, and propose an integrated strategy for experimental phosphoproteomics.  相似文献   

17.
Reliable statistical validation of peptide and protein identifications is a top priority in large-scale mass spectrometry based proteomics. PeptideProphet is one of the computational tools commonly used for assessing the statistical confidence in peptide assignments to tandem mass spectra obtained using database search programs such as SEQUEST, MASCOT, or X! TANDEM. We present two flexible methods, the variable component mixture model and the semiparametric mixture model, that remove the restrictive parametric assumptions in the mixture modeling approach of PeptideProphet. Using a control protein mixture data set generated on an linear ion trap Fourier transform (LTQ-FT) mass spectrometer, we demonstrate that both methods improve parametric models in terms of the accuracy of probability estimates and the power to detect correct identifications controlling the false discovery rate to the same degree. The statistical approaches presented here require that the data set contain a sufficient number of decoy (known to be incorrect) peptide identifications, which can be obtained using the target-decoy database search strategy.  相似文献   

18.
In the past several years, proteomics and its subdiscipline clinical proteomics have been engaged in the discovery of the next generation protein of biomarkers. As the effort and the intensive debate it has sparked continue, it is becoming apparent that a paradigm shift is needed in proteomics in order to truly comprehend the complexity of the human proteome and assess its subtle variations among individuals. This review introduces the concept of population proteomics as a future direction in proteomics research. Population proteomics is the study of protein diversity in human populations. High-throughput, top-down mass spectrometric approaches are employed to investigate, define and understand protein diversity and modulations across and within populations. Population proteomics is a discovery-oriented endeavor with a goal of establishing the incidence of protein structural variations and quantitative regulation of these modifications. Assessing human protein variations among and within populations is viewed as a paramount undertaking that can facilitate clinical proteomics’ effort in discovery and validation of protein features that can be used as markers for early diagnosis of disease, monitoring of disease progression and assessment of therapy. This review outlines the growing need for analyzing individuals’ proteomes and describes the approaches that are likely to be applied in such a population proteomics endeavor.  相似文献   

19.
Since the publication of the human genome, two key points have emerged. First, it is still not certain which regions of the genome code for proteins. Second, the number of discrete protein-coding genes is far fewer than the number of different proteins. Proteomics has the potential to address some of these postgenomic issues if the obstacles that we face can be overcome in our efforts to combine proteomic and genomic data. There are many challenges associated with high-throughput and high-output proteomic technologies. Consequently, for proteomics to continue at its current growth rate, new approaches must be developed to ease data management and data mining. Initiatives have been launched to develop standard data formats for exchanging mass spectrometry proteomic data, including the Proteomics Standards Initiative formed by the Human Proteome Organization. Databases such as SwissProt and Uniprot are publicly available repositories for protein sequences annotated for function, subcellular location and known potential post-translational modifications. The availability of bioinformatics solutions is crucial for proteomics technologies to fulfil their promise of adding further definition to the functional output of the human genome. The aim of the Oxford Genome Anatomy Project is to provide a framework for integrating molecular, cellular, phenotypic and clinical information with experimental genetic and proteomics data. This perspective also discusses models to make the Oxford Genome Anatomy Project accessible and beneficial for academic and commercial research and development.  相似文献   

20.
Manual analysis of mass spectrometry data is a current bottleneck in high throughput proteomics. In particular, the need to manually validate the results of mass spectrometry database searching algorithms can be prohibitively time-consuming. Development of software tools that attempt to quantify the confidence in the assignment of a protein or peptide identity to a mass spectrum is an area of active interest. We sought to extend work in this area by investigating the potential of recent machine learning algorithms to improve the accuracy of these approaches and as a flexible framework for accommodating new data features. Specifically we demonstrated the ability of boosting and random forest approaches to improve the discrimination of true hits from false positive identifications in the results of mass spectrometry database search engines compared with thresholding and other machine learning approaches. We accommodated additional attributes obtainable from database search results, including a factor addressing proton mobility. Performance was evaluated using publically available electrospray data and a new collection of MALDI data generated from purified human reference proteins.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号