首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
The past decade has witnessed an explosion in the growth of proteomics. The completion of numerous genome sequences, the development of powerful protein analytical technologies, as well as the design of innovative bioinformatics tools have marked the beginning of a new post-genomic era. Proteomics, the large-scale analysis of proteins in an organism, organ or organelle encompasses different aspects: (1) the identification, analysis of post-translational modifications and quantification of proteins; (2) the study of protein-protein interactions; and (3) the functional analysis of interactome networks. Here, we briefly summarize the emerging analytical tools and databases that are paving the way for studying Drosophila development by proteomic approaches.  相似文献   

2.
We present a Java application programming interface (API), jmzIdentML, for the Human Proteome Organisation (HUPO) Proteomics Standards Initiative (PSI) mzIdentML standard for peptide and protein identification data. The API combines the power of Java Architecture of XML Binding (JAXB) and an XPath-based random-access indexer to allow a fast and efficient mapping of extensible markup language (XML) elements to Java objects. The internal references in the mzIdentML files are resolved in an on-demand manner, where the whole file is accessed as a random-access swap file, and only the relevant piece of XMLis selected for mapping to its corresponding Java object. The APIis highly efficient in its memory usage and can handle files of arbitrary sizes. The APIfollows the official release of the mzIdentML (version 1.1) specifications and is available in the public domain under a permissive licence at http://www.code.google.com/p/jmzidentml/.  相似文献   

3.
High throughput MS‐based proteomic experiments generate large volumes of complex data and necessitate bioinformatics tools to facilitate their handling. Needs include means to archive data, to disseminate them to the scientific communities, and to organize and annotate them to facilitate their interpretation. We present here an evolution of PROTICdb, a database software that now handles MS data, including quantification. PROTICdb has been developed to be as independent as possible from tools used to produce the data. Biological samples and proteomics data are described using ontology terms. A Taverna workflow is embedded, thus permitting to automatically retrieve information related to identified proteins by querying external databases. Stored data can be displayed graphically and a “Query Builder” allows users to make sophisticated queries without knowledge on the underlying database structure. All resources can be accessed programmatically using a Java client API or RESTful web services, allowing the integration of PROTICdb in any portal. An example of application is presented, where proteins extracted from a maize leaf sample by four different methods were compared using a label‐free shotgun method. Data are available at http://moulon.inra.fr/protic/public . PROTICdb thus provides means for data storage, enrichment, and dissemination of proteomics data.  相似文献   

4.
MS, the reference technology for proteomics, routinely produces large numbers of protein lists whose fast comparison would prove very useful. Unfortunately, most softwares only allow comparisons of two to three lists at once. We introduce here nwCompare, a simple tool for n‐way comparison of several protein lists without any query language, and exemplify its use with differential and shared cancer cell proteomes. As the software compares character strings, it can be applied to any type of data mining, such as genomic or metabolomic datalists.  相似文献   

5.
The mzQuantML standard from the HUPO Proteomics Standards Initiative has recently been released, capturing quantitative data about peptides and proteins, following analysis of MS data. We present a Java application programming interface (API) for mzQuantML called jmzQuantML. The API provides robust bridges between Java classes and elements in mzQuantML files and allows random access to any part of the file. The API provides read and write capabilities, and is designed to be embedded in other software packages, enabling mzQuantML support to be added to proteomics software tools ( http://code.google.com/p/jmzquantml/ ). The mzQuantML standard is designed around a multilevel validation system to ensure that files are structurally and semantically correct for different proteomics quantitative techniques. In this article, we also describe a Java software tool ( http://code.google.com/p/mzquantml‐validator/ ) for validating mzQuantML files, which is a formal part of the data standard.  相似文献   

6.
In the cell, the majority of proteins exist in complexes. Most of these complexes have a constant stoichiometry and thus can be used as internal standards. In this rapid communication, we show that it is possible to calculate a correlation coefficient that reflects the reproducibility of the analytical approach used. The abundance of one subunit in a heterodimer is plotted against the abundance of the other, and this is repeated for all subunits in all heteromers found in the data set. The correlation coefficient obtained (the “heteromer score”) is a new bioinformatic tool that is independent of the method used to collect the data, requires no special sample preparation and can be used retrospectively on old datasets. It can be used for quality control, to indicate when a change becomes significant or identify complexes whose stoichiometry has been perturbed during the experiment.  相似文献   

7.
We introduce the computer tool “Know Your Samples” (KYSS) for assessment and visualisation of large scale proteomics datasets, obtained by mass spectrometry (MS) experiments. KYSS facilitates the evaluation of sample preparation protocols, LC peptide separation, and MS and MS/MS performance by monitoring the number of missed cleavages, precursor ion charge states, number of protein identifications and peptide mass error in experiments. KYSS generates several different protein profiles based on protein abundances, and allows for comparative analysis of multiple experiments. KYSS was adapted for blood plasma proteomics and provides concentrations of identified plasma proteins. We demonstrate the utility of the KYSS tool for MS based proteome analysis of blood plasma and for assessment of hydrogel particles for depletion of abundant proteins in plasma. The KYSS software is open source and is freely available at http://kyssproject.github.io/.  相似文献   

8.
With recent advances in mass spectrometry techniques, it is now possible to investigate proteins over a wide range of molecular weights in small biological specimens. This advance has generated data-analytic challenges in proteomics, similar to those created by microarray technologies in genetics, namely, discovery of 'signature' protein profiles specific to each pathologic state (e.g. normal vs. cancer) or differential profiles between experimental conditions (e.g. treated by a drug of interest vs. untreated) from high-dimensional data. We propose a data-analytic strategy for discovering protein biomarkers based on such high-dimensional mass spectrometry data. A real biomarker-discovery project on prostate cancer is taken as a concrete example throughout the paper: the project aims to identify proteins in serum that distinguish cancer, benign hyperplasia, and normal states of prostate using the Surface Enhanced Laser Desorption/Ionization (SELDI) technology, a recently developed mass spectrometry technique. Our data-analytic strategy takes properties of the SELDI mass spectrometer into account: the SELDI output of a specimen contains about 48,000 (x, y) points where x is the protein mass divided by the number of charges introduced by ionization and y is the protein intensity of the corresponding mass per charge value, x, in that specimen. Given high coefficients of variation and other characteristics of protein intensity measures (y values), we reduce the measures of protein intensities to a set of binary variables that indicate peaks in the y-axis direction in the nearest neighborhoods of each mass per charge point in the x-axis direction. We then account for a shifting (measurement error) problem of the x-axis in SELDI output. After this pre-analysis processing of data, we combine the binary predictors to generate classification rules for cancer, benign hyperplasia, and normal states of prostate. Our approach is to apply the boosting algorithm to select binary predictors and construct a summary classifier. We empirically evaluate sensitivity and specificity of the resulting summary classifiers with a test dataset that is independent from the training dataset used to construct the summary classifiers. The proposed method performed nearly perfectly in distinguishing cancer and benign hyperplasia from normal. In the classification of cancer vs. benign hyperplasia, however, an appreciable proportion of the benign specimens were classified incorrectly as cancer. We discuss practical issues associated with our proposed approach to the analysis of SELDI output and its application in cancer biomarker discovery.  相似文献   

9.
Fungal communities associated with plants and soil influence plant fitness and ecosystem functioning. They are frequently studied by metabarcoding approaches targeting the ribosomal internal transcribed spacer (ITS), but there is no consensus concerning the most appropriate bioinformatic approach for the analysis of these data. We sequenced an artificial fungal community composed of 189 strains covering a wide range of Ascomycota and Basidiomycota, to compare the performance of 360 software and parameter combinations. The most sensitive approaches, based on the USEARCH and VSEARCH clustering algorithms, detected almost all fungal strains but greatly overestimated the total number of strains. By contrast, approaches using DADA2 to detect amplicon sequence variants were the most effective for recovering the richness and composition of the fungal community. Our results suggest that analyzing single forward (R1) sequences with DADA2 and no filter other than the removal of low-quality and chimeric sequences is a good option for fungal community characterization.  相似文献   

10.
The Protein Information and Property Explorer 2 (PIPE2) is an enhanced software program and updated web application that aims at providing the proteomic researcher a simple, intuitive user interface through which to begin inquiry into the biological significance of a list of proteins typically produced by MS/MS proteomic processing software. PIPE2 includes an improved interface, new data visualization options, and new data analysis methods for combining disparate, but related, data sets. In particular, PIPE2 has been enhanced to handle multi-dimensional data such as protein abundance, gene expression, and/or interaction data. The current architecture of PIPE2, modeled after that of Gaggle (a programming infrastructure for interoperability between separately developed software tools), contains independent functional units that can be instantiated and pieced together at the user's discretion to form a pipelined analysis workflow. Among these functional units is the Network Viewer component, which adds rich network analysis capabilities to the suite of existing proteomic web resources. Additionally, PIPE2 implements a framework within which new analysis procedures can be easily deployed and distributed over the World Wide Web. PIPE2 is available as a web service at http://pipe2.systemsbiology.net/.  相似文献   

11.
Diverse biological events are regulated through protein phosphorylation mediated by protein kinases. Some of these protein kinases are known to be involved in the pathogenesis of various diseases. Although 518 protein kinase genes were identified in the human genome, it remains unclear how many and what kind of protein kinases are expressed and activated in cells and tissues under varying situations. To investigate cellular signaling by protein kinases, we developed monoclonal antibodies, designated as Multi-PK antibodies, that can recognize multiple protein kinases in various biological species. These Multi-PK antibodies can be used to profile the kinases expressed in cells and tissues, identify the kinases of special interest, and analyze protein kinase expression and phosphorylation state. Here we introduce some applications of Multi-PK antibodies to identify and characterize the protein kinases involved in epigenetics, glucotoxicity in type 2 diabetes, and pathogenesis of ulcerative colitis. In this review, we focus on the recently developed technologies for kinomics studies using the powerful analytical tools of Multi-PK antibodies.  相似文献   

12.
The Stanford Microarray Database (SMD; http://genome-www.stanford.edu/microarray/) serves as a microarray research database for Stanford investigators and their collaborators. In addition, SMD functions as a resource for the entire scientific community, by making freely available all of its source code and providing full public access to data published by SMD users, along with many tools to explore and analyze those data. SMD currently provides public access to data from 3500 microarrays, including data from 85 publications, and this total is increasing rapidly. In this article, we describe some of SMD's newer tools for accessing public data, assessing data quality and for data analysis.  相似文献   

13.
Joshua L Heazlewood 《BBA》2003,1604(3):159-169
The NADH:ubiquinone oxidoreductase of the mitochondrial respiratory chain is a large multisubunit complex in eukaryotes containing 30-40 different subunits. Analysis of this complex using blue-native gel electrophoresis coupled to tandem mass spectrometry (MS) has identified a series of 30 different proteins from the model dicot plant, Arabidopsis, and 24 different proteins from the model monocot plant, rice. These proteins have been linked back to genes from plant genome sequencing and comparison of this dataset made with predicted orthologs of complex I components in these plants. This analysis reveals that plants contain the series of 14 highly conserved complex I subunits found in other eukaryotic and related prokaryotic enzymes and a small set of 9 proteins widely found in eukaryotic complexes. A significant number of the proteins present in bovine complex I but absent from fungal complex I are also absent from plant complex I and are not encoded in plant genomes. A series of plant-specific nuclear-encoded complex I associated subunits were identified, including a series of ferripyochelin-binding protein-like subunits and a range of small proteins of unknown function. This represents a post-genomic and large-scale analysis of complex I composition in higher plants.  相似文献   

14.
The degree to which variation in plant community composition (beta-diversity) is predictable from environmental variation, relative to other spatial processes, is of considerable current interest. We addressed this question in Costa Rican rain forest pteridophytes (1,045 plots, 127 species). We also tested the effect of data quality on the results, which has largely been overlooked in earlier studies. To do so, we compared two alternative spatial models [polynomial vs. principal coordinates of neighbour matrices (PCNM)] and ten alternative environmental models (all available environmental variables vs. four subsets, and including their polynomials vs. not). Of the environmental data types, soil chemistry contributed most to explaining pteridophyte community variation, followed in decreasing order of contribution by topography, soil type and forest structure. Environmentally explained variation increased moderately when polynomials of the environmental variables were included. Spatially explained variation increased substantially when the multi-scale PCNM spatial model was used instead of the traditional, broad-scale polynomial spatial model. The best model combination (PCNM spatial model and full environmental model including polynomials) explained 32% of pteridophyte community variation, after correcting for the number of sampling sites and explanatory variables. Overall evidence for environmental control of beta-diversity was strong, and the main floristic gradients detected were correlated with environmental variation at all scales encompassed by the study (c. 100–2,000 m). Depending on model choice, however, total explained variation differed more than fourfold, and the apparent relative importance of space and environment could be reversed. Therefore, we advocate a broader recognition of the impacts that data quality has on analysis results. A general understanding of the relative contributions of spatial and environmental processes to species distributions and beta-diversity requires that methodological artefacts are separated from real ecological differences.  相似文献   

15.

Background

In the absence of consolidated pipelines to archive biological data electronically, information dispersed in the literature must be captured by manual annotation. Unfortunately, manual annotation is time consuming and the coverage of published interaction data is therefore far from complete. The use of text-mining tools to identify relevant publications and to assist in the initial information extraction could help to improve the efficiency of the curation process and, as a consequence, the database coverage of data available in the literature. The 2006 BioCreative competition was aimed at evaluating text-mining procedures in comparison with manual annotation of protein-protein interactions.

Results

To aid the BioCreative protein-protein interaction task, IntAct and MINT (Molecular INTeraction) provided both the training and the test datasets. Data from both databases are comparable because they were curated according to the same standards. During the manual curation process, the major cause of data loss in mining the articles for information was ambiguity in the mapping of the gene names to stable UniProtKB database identifiers. It was also observed that most of the information about interactions was contained only within the full-text of the publication; hence, text mining of protein-protein interaction data will require the analysis of the full-text of the articles and cannot be restricted to the abstract.

Conclusion

The development of text-mining tools to extract protein-protein interaction information may increase the literature coverage achieved by manual curation. To support the text-mining community, databases will highlight those sentences within the articles that describe the interactions. These will supply data-miners with a high quality dataset for algorithm development. Furthermore, the dictionary of terms created by the BioCreative competitors could enrich the synonym list of the PSI-MI (Proteomics Standards Initiative-Molecular Interactions) controlled vocabulary, which is used by both databases to annotate their data content.
  相似文献   

16.
Knowledge of the 3D structure of glycans is a prerequisite for a complete understanding of the biological processes glycoproteins are involved in. However, due to a lack of standardised nomenclature, carbohydrate compounds are difficult to locate within the Protein Data Bank (PDB). Using an algorithm that detects carbohydrate structures only requiring element types and atom coordinates, we were able to detect 1663 entries containing a total of 5647 carbohydrate chains. The majority of chains are found to be N-glycosidically bound. Noncovalently bound ligands are also frequent, while O-glycans form a minority. About 30% of all carbohydrate containing PDB entries comprise one or several errors. The automatic assignment of carbohydrate structures in PDB entries will improve the cross-linking of glycobiology resources with genomic and proteomic data collections, which will be an important issue of the upcoming glycomics projects. By aiding in detection of erroneous annotations and structures, the algorithm might also help to increase database quality.  相似文献   

17.
18.
19.
Tandem mass spectrometry (MS/MS) is frequently used in the identification of peptides and proteins. Typical proteomic experiments rely on algorithms such as SEQUEST and MASCOT to compare thousands of tandem mass spectra against the theoretical fragment ion spectra of peptides in a database. The probabilities that these spectrum-to-sequence assignments are correct can be determined by statistical software such as PeptideProphet or through estimations based on reverse or decoy databases. However, many of the software applications that assign probabilities for MS/MS spectra to sequence matches were developed using training data sets from 3D ion-trap mass spectrometers. Given the variety of types of mass spectrometers that have become commercially available over the last 5 years, we sought to generate a data set of reference data covering multiple instrumentation platforms to facilitate both the refinement of existing computational approaches and the development of novel software tools. We analyzed the proteolytic peptides in a mixture of tryptic digests of 18 proteins, named the "ISB standard protein mix", using 8 different mass spectrometers. These include linear and 3D ion traps, two quadrupole time-of-flight platforms (qq-TOF), and two MALDI-TOF-TOF platforms. The resulting data set, which has been named the Standard Protein Mix Database, consists of over 1.1 million spectra in 150+ replicate runs on the mass spectrometers. The data were inspected for quality of separation and searched using SEQUEST. All data, including the native raw instrument and mzXML formats and the PeptideProphet validated peptide assignments, are available at http://regis-web.systemsbiology.net/PublicDatasets/.  相似文献   

20.
MOTIVATION: Mass spectrometry experiments in the field of proteomics produce lists containing tens to thousands of identified proteins. With the protein information and property explorer (PIPE), the biologist can acquire functional annotations for these proteins and explore the enrichment of the list, or fraction thereof, with respect to functional classes. These protein lists may be saved for access at a later time or different location. The PIPE is interoperable with the Firegoose and the Gaggle, permitting wide-ranging data exploration and analysis. The PIPE is a rich-client web application which uses AJAX capabilities provided by the Google Web Toolkit, and server-side data storage using Hibernate. AVAILABILITY: http://pipe.systemsbiology.net.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号