首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The Human Proteome Organization's Proteomics Standards Initiative (PSI) promotes the development of exchange standards to improve data integration and interoperability. PSI specifies the suitable level of detail required when reporting a proteomics experiment (via the Minimum Information About a Proteomics Experiment), and provides extensible markup language (XML) exchange formats and dedicated controlled vocabularies (CVs) that must be combined to generate a standard compliant document. The framework presented here tackles the issue of checking that experimental data reported using a specific format, CVs and public bio‐ontologies (e.g. Gene Ontology, NCBI taxonomy) are compliant with the Minimum Information About a Proteomics Experiment recommendations. The semantic validator not only checks the XML syntax but it also enforces rules regarding the use of an ontology class or CV terms by checking that the terms exist in the resource and that they are used in the correct location of a document. Moreover, this framework is extremely fast, even on sizable data files, and flexible, as it can be adapted to any standard by customizing the parameters it requires: an XML Schema Definition, one or more CVs or ontologies, and a mapping file describing in a formal way how the semantic resources and the format are interrelated. As such, the validator provides a general solution to the common problem in data exchange: how to validate the correct usage of a data standard beyond simple XML Schema Definition validation. The framework source code and its various applications can be found at http://psidev.info/validator .  相似文献   

2.
Scherl A  Tsai YS  Shaffer SA  Goodlett DR 《Proteomics》2008,8(14):2791-2797
Although mass spectrometers are capable of providing high mass accuracy data, assignment of true monoisotopic precursor ion mass is complicated during data-dependent ion selection for LC-MS/MS analysis of complex mixtures. The complication arises when chromatographic peak widths for a given analyte exceed the time required to acquire a precursor ion mass spectrum. The result is that many measured monoisotopic masses are misassigned due to calculation from a single mass spectrum with poor ion statistics based on only a fraction of the total available ions for a given analyte. Such data in turn produces errors in automated database searches, where precursor m/z value is one search parameter. We propose here a postacquisition approach to correct misassigned monoisotopic m/z values that involves peak detection over the entire elution profile and correction of the precursor ion monoisotopic mass. As a result of using this approach to reprocess shotgun proteomic data we increased peptide sequence assignments by 10% while reducing the estimated false positive ratio from 1 to 0.2%. We also show that 4% of the salvaged identifications may be accounted for by correction of mixed tandem mass spectra resulting from fragmentation of multiple peptides simultaneously, a situation which we refer to as accidental CID.  相似文献   

3.
We introduce the computer tool “Know Your Samples” (KYSS) for assessment and visualisation of large scale proteomics datasets, obtained by mass spectrometry (MS) experiments. KYSS facilitates the evaluation of sample preparation protocols, LC peptide separation, and MS and MS/MS performance by monitoring the number of missed cleavages, precursor ion charge states, number of protein identifications and peptide mass error in experiments. KYSS generates several different protein profiles based on protein abundances, and allows for comparative analysis of multiple experiments. KYSS was adapted for blood plasma proteomics and provides concentrations of identified plasma proteins. We demonstrate the utility of the KYSS tool for MS based proteome analysis of blood plasma and for assessment of hydrogel particles for depletion of abundant proteins in plasma. The KYSS software is open source and is freely available at http://kyssproject.github.io/.  相似文献   

4.
Haw R  Hermjakob H  D'Eustachio P  Stein L 《Proteomics》2011,11(18):3598-3613
Reactome (http://www.reactome.org) is an open-source, expert-authored, peer-reviewed, manually curated database of reactions, pathways and biological processes. We provide an intuitive web-based user interface to pathway knowledge and a suite of data analysis tools. The Pathway Browser is a Systems Biology Graphical Notation-like visualization system that supports manual navigation of pathways by zooming, scrolling and event highlighting, and that exploits PSI Common Query Interface web services to overlay pathways with molecular interaction data from the Reactome Functional Interaction Network and interaction databases such as IntAct, ChEMBL and BioGRID. Pathway and expression analysis tools employ web services to provide ID mapping, pathway assignment and over-representation analysis of user-supplied data sets. By applying Ensembl Compara to curated human proteins and reactions, Reactome generates pathway inferences for 20 other species. The Species Comparison tool provides a summary of results for each of these species as a table showing numbers of orthologous proteins found by pathway from which users can navigate to inferred details for specific proteins and reactions. Reactome's diverse pathway knowledge and suite of data analysis tools provide a platform for data mining, modeling and analysis of large-scale proteomics data sets. This Tutorial is part of the International Proteomics Tutorial Programme (IPTP 8).  相似文献   

5.
Normalized spectral index quantification was recently presented as an accurate method of label‐free quantitation, which improved spectral counting by incorporating the intensities of peptide MS/MS fragment ions into the calculation of protein abundance. We present SINQ, a tool implementing this method within the framework of existing analysis software, our freely available central proteomics facilities pipeline (CPFP). We demonstrate, using data sets of protein standards acquired on a variety of mass spectrometers, that SINQ can rapidly provide useful estimates of the absolute quantity of proteins present in a medium‐complexity sample. In addition, relative quantitation of standard proteins spiked into a complex lysate background and run without pre‐fractionation produces accurate results at amounts above 1 fmol on column. We compare quantitation performance to various precursor intensity‐ and identification‐based methods, including the normalized spectral abundance factor (NSAF), exponentially modified protein abundance index (emPAI), MaxQuant, and Progenesis LC‐MS. We anticipate that the SINQ tool will be a useful asset for core facilities and individual laboratories that wish to produce quantitative MS data, but lack the necessary manpower to routinely support more complicated software workflows. SINQ is freely available to obtain and use as part of the central proteomics facilities pipeline, which is released under an open‐source license.  相似文献   

6.
Proteomics has become dominated by large amounts of experimental data and interpreted results. This experimental data cannot be effectively used without understanding the fundamental structure of its information content and representing that information in such a way that knowledge can be extracted from it. This review explores the structure of this information with regard to three fundamental issues: the extraction of relevant information from raw data, the scale of the projects involved and the statistical significance of protein identification results.  相似文献   

7.
The Protein Information and Property Explorer 2 (PIPE2) is an enhanced software program and updated web application that aims at providing the proteomic researcher a simple, intuitive user interface through which to begin inquiry into the biological significance of a list of proteins typically produced by MS/MS proteomic processing software. PIPE2 includes an improved interface, new data visualization options, and new data analysis methods for combining disparate, but related, data sets. In particular, PIPE2 has been enhanced to handle multi-dimensional data such as protein abundance, gene expression, and/or interaction data. The current architecture of PIPE2, modeled after that of Gaggle (a programming infrastructure for interoperability between separately developed software tools), contains independent functional units that can be instantiated and pieced together at the user's discretion to form a pipelined analysis workflow. Among these functional units is the Network Viewer component, which adds rich network analysis capabilities to the suite of existing proteomic web resources. Additionally, PIPE2 implements a framework within which new analysis procedures can be easily deployed and distributed over the World Wide Web. PIPE2 is available as a web service at http://pipe2.systemsbiology.net/.  相似文献   

8.
Absolute protein concentration determination is becoming increasingly important in a number of fields including diagnostics, biomarker discovery and systems biology modeling. The recently introduced quantification concatamer methodology provides a novel approach to performing such determinations, and it has been applied to both microbial and mammalian systems. While a number of software tools exist for performing analyses of quantitative data generated by related methodologies such as SILAC, there is currently no analysis package dedicated to the quantification concatamer approach. Furthermore, most tools that are currently available in the field of quantitative proteomics do not manage storage and dissemination of such data sets.  相似文献   

9.
Jorda J  Baudrand T  Kajava AV 《Proteomics》2012,12(9):1333-1336
Rapidly increasing genomic data present new challenges for scientists: making sense of millions of amino acid sequences requires a systematic approach and information about their 3D structure, function, and evolution. Over the last decade, numerous studies demonstrated the fundamental importance of protein tandem repeats and their involvement in human diseases. Bioinformatics analysis of these regions requires special computer programs and databases, since the conventional approaches predominantly developed for globular domains have limited success. To perform a global comparative analysis of protein tandem repeats, we developed the Protein Tandem Repeat DataBase (PRDB). PRDB is a curated database that includes the protein tandem repeats found in sequence databanks by the T‐REKS program. The database is available at http://bioinfo.montp.cnrs.fr/?r=repeatDB  相似文献   

10.
Tandem proteomic strategies based on large‐scale and high‐resolution mass spectrometry have been widely applied in various biomedical studies. However, protein sequence databases and proteomic software are continuously updated. Proteomic studies should not be ended with a stable list of proteins. It is necessary and beneficial to regularly revise the results. Besides, the original proteomic studies usually focused on a limited aspect of protein information and valuable information may remain undiscovered in the raw spectra. Several studies have reported novel findings by reanalyzing previously published raw data. However, there are still no standard guidelines for comprehensive reanalysis. In the present study, we proposed the concept and draft framework for complementary proteomics, which are aimed to revise protein list or mine new discoveries by revisiting published data.  相似文献   

11.
Selecting proteins with significant differential abundance is the cornerstone of many relative quantitative proteomics experiments. To do so, a trade‐off between p‐value thresholding and fold‐change thresholding can be performed because of a specific parameter, named fudge factor, and classically noted s0. We have observed that this fudge factor is routinely turned away from its original (and statistically valid) use, leading to important distortion in the distribution of p‐values, jeopardizing the protein differential analysis, as well as the subsequent biological conclusion. In this article, we provide a comprehensive viewpoint on this issue, as well as some guidelines to circumvent it.  相似文献   

12.
Proteomic studies involve the identification as well as qualitative and quantitative comparison of proteins expressed under different conditions, and elucidation of their properties and functions, usually in a large-scale, high-throughput format. The high dimensionality of data generated from these studies will require the development of improved bioinformatics tools and data-mining approaches for efficient and accurate data analysis of biological specimens from healthy and diseased individuals. Mining large proteomics data sets provides a better understanding of the complexities between the normal and abnormal cell proteome of various biological systems, including environmental hazards, infectious agents (bioterrorism) and cancers. This review will shed light on recent developments in bioinformatics and data-mining approaches, and their limitations when applied to proteomics data sets, in order to strengthen the interdependence between proteomic technologies and bioinformatics tools.  相似文献   

13.
The plenary session of the Proteomics Standards Initiative (PSI) of the Human Proteome Organization at the Tenth annual HUPO World Congress updated the delegates on the ongoing activities of this group. The Molecular Interactions workgroup described the success of the PSICQUIC web service, which enables users to access multiple interaction resources with a single query. One user instance is the IMEx Consortium, which uses the service to enable users to access a non-redundant set of protein-protein interaction records. The mass spectrometry data formats, mzML for mass spectrometer output files and mzIdentML for the output of search engines, are now successfully established with increasing numbers of implementations. A format for the output of quantitative proteomics data, mzQuantML, and also TraML, for SRM/MRM transition lists, are both currently nearing completion. The corresponding MIAPE documents are being updated in line with advances in the field, as is the shared controlled vocabulary PSI-MS. In addition, the mzTab format was introduced, as a simpler way to report MS proteomics and metabolomics results. Finally, the ProteomeXchange Consortium, which will supply a single entry point for the submission of MS proteomics data to multiple data resources including PRIDE and PeptideAtlas, is currently being established.  相似文献   

14.
With its predicted proteome of 1550 proteins (data set Etalon) Helicobacter pylori 26695 represents a perfect model system of medium complexity for investigating basic questions in proteomics. We analyzed urea‐solubilized proteins by 2‐DE/MS (data set 2‐DE) and by 1‐DE‐LC/MS (Supprot); proteins insoluble in 9 M urea but solubilized by SDS (Pellet); proteins precipitating in the Sephadex layer at the application side of IEF (Sephadex) by 1‐DE‐LC/MS; and proteins precipitating close to the application side within the IEF gel by LC/MS (Startline). The experimental proteomics data of H. pylori comprising 567 proteins (protein coverage: 36.6%) were stored in the Proteome Database System for Microbial Research ( http://www.mpiib‐berlin.mpg.de/2D‐PAGE/ ), which gives access to raw mass spectra (MALDI‐TOF/TOF) in T2D format, as well as to text files of peak lists. For data mining the protein mapping and comparison tool PROMPT ( http://webclu.bio.wzw.tum.de/prompt/ ) was used. The percentage of proteins with transmembrane regions, relative to all proteins detected, was 0, 0.2, 0, 0.5, 3.8 and 6.3% for 2‐DE, Supprot, Startline, Sephadex, Pellet, and Etalon, respectively. 2‐DE does not separate membrane proteins because they are insoluble in 9 M urea/70 mM DTT and 2% CHAPS. SDS solubilizes a considerable portion of the urea‐insoluble proteins and makes them accessible for separation by SDS‐PAGE and LC. The 2‐DE/MS analysis with urea‐solubilized proteins and the 1‐DE‐LC/MS analysis with the urea‐insoluble protein fraction (Pellet) are complementary procedures in the pursuit of a complete proteome analysis. Access to the PROMPT‐generated diagrams in the Proteome Database allows the mining of experimental data with respect to other functional aspects.  相似文献   

15.
pyOpenMS is an open‐source, Python‐based interface to the C++ OpenMS library, providing facile access to a feature‐rich, open‐source algorithm library for MS‐based proteomics analysis. It contains Python bindings that allow raw access to the data structures and algorithms implemented in OpenMS, specifically those for file access (mzXML, mzML, TraML, mzIdentML among others), basic signal processing (smoothing, filtering, de‐isotoping, and peak‐picking) and complex data analysis (including label‐free, SILAC, iTRAQ, and SWATH analysis tools). pyOpenMS thus allows fast prototyping and efficient workflow development in a fully interactive manner (using the interactive Python interpreter) and is also ideally suited for researchers not proficient in C++. In addition, our code to wrap a complex C++ library is completely open‐source, allowing other projects to create similar bindings with ease. The pyOpenMS framework is freely available at https://pypi.python.org/pypi/pyopenms while the autowrap tool to create Cython code automatically is available at https://pypi.python.org/pypi/autowrap (both released under the 3‐clause BSD licence).  相似文献   

16.
17.
Mostafavi S  Morris Q 《Proteomics》2012,12(10):1687-1696
In this article, we review how interaction networks can be used alone or in combination in an automated fashion to provide insight into gene and protein function. We describe the concept of a "gene-recommender system" that can be applied to any large collection of interaction networks to make predictions about gene or protein function based on a query list of proteins that share a function of interest. We discuss these systems in general and focus on one specific system, GeneMANIA, that has unique features and uses different algorithms from the majority of other systems.  相似文献   

18.
19.
Mayer U 《Proteomics》2008,8(1):42-44
Proteomic studies often produce sets of hundreds of proteins. Bioinformatic information for these large protein sets must be collected from multiple online resources. Protein Information Crawler (PIC) automatically bulk-collects such data from multiple databases and prediction servers, based on National Center for Biotechnology Information (NCBI) gi numbers or accession numbers, and summarizes them in a Microsoft Excel spreadsheet and/or HTML table. PIC greatly accelerates information procurement, helps to build customized protein information databases and drastically reduces manual database investigation in extensive proteomic studies. Availability: http://www.zoo.uni-heidelberg.de/mfa/PIC.  相似文献   

20.
Comparative proteomics was applied to three vegetative organs of Brassica napus, the leaf, stem, and root using 2‐DE. Among the >1600 analyzed spots, 43% were found to be common to all three organs, suggesting the existence of a “basal” or ubiquitous proteome composed of housekeeping proteins. The green organs, leaf, and stem, were closely related (~80% common spots) while the root displayed more organ‐specific polypeptides (~10%). Reference maps were established using MS, allowing the identification of 93, 385, and 266 proteins in leaf, stem, and root proteomes, respectively. Bioinformatic analyses were also performed; in silico functional categorization and cellular localization allow obtaining a precise picture of the cell molecular network within vegetative organs. These proteome maps can be explored using the PROTICdb software at the following address: http://bioinformatique.moulon.inra.fr/proticdb/web_view/.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号