首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Nesvizhskii AI 《Proteomics》2012,12(10):1639-1655
Analysis of protein interaction networks and protein complexes using affinity purification and mass spectrometry (AP/MS) is among most commonly used and successful applications of proteomics technologies. One of the foremost challenges of AP/MS data is a large number of false-positive protein interactions present in unfiltered data sets. Here we review computational and informatics strategies for detecting specific protein interaction partners in AP/MS experiments, with a focus on incomplete (as opposite to genome wide) interactome mapping studies. These strategies range from standard statistical approaches, to empirical scoring schemes optimized for a particular type of data, to advanced computational frameworks. The common denominator among these methods is the use of label-free quantitative information such as spectral counts or integrated peptide intensities that can be extracted from AP/MS data. We also discuss related issues such as combining multiple biological or technical replicates, and dealing with data generated using different tagging strategies. Computational approaches for benchmarking of scoring methods are discussed, and the need for generation of reference AP/MS data sets is highlighted. Finally, we discuss the possibility of more extended modeling of experimental AP/MS data, including integration with external information such as protein interaction predictions based on functional genomics data.  相似文献   

2.
3.
Correct phosphorylation site assignment is a critical aspect of phosphoproteomic analysis. Large-scale phosphopeptide data sets that are generated through liquid chromatography-coupled tandem mass spectrometry (LC-MS/MS) analysis often contain hundreds or thousands of phosphorylation sites that require validation. To this end, we have created PhosphoScore, an open-source assignment program that is compatible with phosphopeptide data from multiple MS levels (MS(n)). The algorithm takes into account both the match quality and normalized intensity of observed spectral peaks compared to a theoretical spectrum. PhosphoScore produced >95% correct MS(2) assignments from known synthetic data, > 98% agreement with an established MS(2) assignment algorithm (Ascore), and >92% agreement with visual inspection of MS(3) and MS(4) spectra.  相似文献   

4.
5.
We present MassSieve, a Java‐based platform for visualization and parsimony analysis of single and comparative LC‐MS/MS database search engine results. The success of mass spectrometric peptide sequence assignment algorithms has led to the need for a tool to merge and evaluate the increasing data set sizes that result from LC‐MS/MS‐based shotgun proteomic experiments. MassSieve supports reports from multiple search engines with differing search characteristics, which can increase peptide sequence coverage and/or identify conflicting or ambiguous spectral assignments.  相似文献   

6.
DBParser: web-based software for shotgun proteomic data analyses   总被引:1,自引:0,他引:1  
We describe a web-based program called 'DBParser' for rapidly culling, merging, and comparing sequence search engine results from multiple LC-MS/MS peptide analyses. DBParser employs the principle of parsimony to consolidate redundant protein assignments and derive the most concise set of proteins consistent with all of the assigned peptide sequences observed in an experiment or series of experiments. The resulting reports summarize peptide and protein identifications from multidimensional experiments that may contain a single data set or combine data from a group of data sets, all related to a single analytical sample. Additionally, the results of multiple experiments, each of which may contain several data sets, can be compared in reports that identify features that are common or different. DBParser actively links to the primary mass spectral data and to public online databases such as NCBI, GO, and Swiss-Prot in order to structure contextually specific reports for biologists and biochemists.  相似文献   

7.
Kebing Yu  Arthur R. Salomon 《Proteomics》2010,10(11):2113-2122
Recent advances in the speed and sensitivity of mass spectrometers and in analytical methods, the exponential acceleration of computer processing speeds, and the availability of genomic databases from an array of species and protein information databases have led to a deluge of proteomic data. The development of a lab‐based automated proteomic software platform for the automated collection, processing, storage, and visualization of expansive proteomic data sets is critically important. The high‐throughput autonomous proteomic pipeline described here is designed from the ground up to provide critically important flexibility for diverse proteomic workflows and to streamline the total analysis of a complex proteomic sample. This tool is composed of a software that controls the acquisition of mass spectral data along with automation of post‐acquisition tasks such as peptide quantification, clustered MS/MS spectral database searching, statistical validation, and data exploration within a user‐configurable lab‐based relational database. The software design of high‐throughput autonomous proteomic pipeline focuses on accommodating diverse workflows and providing missing software functionality to a wide range of proteomic researchers to accelerate the extraction of biological meaning from immense proteomic data sets. Although individual software modules in our integrated technology platform may have some similarities to existing tools, the true novelty of the approach described here is in the synergistic and flexible combination of these tools to provide an integrated and efficient analysis of proteomic samples.  相似文献   

8.
Metabolomics spectral formatting, alignment and conversion tools (MSFACTs)   总被引:13,自引:0,他引:13  
MOTIVATION: The amplified interest in metabolic profiling has generated the need for additional tools to assist in the rapid analysis of complex data sets. RESULTS: A new program; metabolomics spectral formatting, alignment and conversion tools, (MSFACTs) is described here for the automated import, reformatting, alignment, and export of large chromatographic data sets to allow more rapid visualization and interrogation of metabolomic data. MSFACTs incorporates two tools: one for the alignment of integrated chromatographic peak lists and another for extracting information from raw chromatographic ASCII formatted data files. MSFACTs is illustrated in the processing of GC/MS metabolomic data from different tissues of the model legume plant, Medicago truncatula. The results document that various tissues such as roots, stems, and leaves from the same plant can be easily differentiated based on metabolite profiles. Further, similar types of tissues within the same plant, such as the first to eleventh internodes of stems, could also be differentiated based on metabolite profiles. AVAILABILITY: Freely available upon request for academic and non-commercial use. Commercial use is available through licensing agreement http://www.noble.org/PlantBio/MS/MSFACTs/MSFACTs.html.  相似文献   

9.
Here we present the Coon OMSSA Proteomic Analysis Software Suite (COMPASS): a free and open-source software pipeline for high-throughput analysis of proteomics data, designed around the Open Mass Spectrometry Search Algorithm. We detail a synergistic set of tools for protein database generation, spectral reduction, peptide false discovery rate analysis, peptide quantitation via isobaric labeling, protein parsimony and protein false discovery rate analysis, and protein quantitation. We strive for maximum ease of use, utilizing graphical user interfaces and working with data files in the original instrument vendor format. Results are stored in plain text comma-separated value files, which are easy to view and manipulate with a text editor or spreadsheet program. We illustrate the operation and efficacy of COMPASS through the use of two LC-MS/MS data sets. The first is a data set of a highly annotated mixture of standard proteins and manually validated contaminants that exhibits the identification workflow. The second is a data set of yeast peptides, labeled with isobaric stable isotope tags and mixed in known ratios, to demonstrate the quantitative workflow. For these two data sets, COMPASS performs equivalently or better than the current de facto standard, the Trans-Proteomic Pipeline.  相似文献   

10.
Beyond specific applications, such as the relative or absolute quantification of peptides in targeted proteomic experiments, synthetic spike‐in peptides are not yet systematically used as internal standards in bottom‐up proteomics. A number of retention time standards have been reported that enable chromatographic aligning of multiple LC–MS/MS experiments. However, only few peptides are typically included in such sets limiting the analytical parameters that can be monitored. Here, we describe PROCAL (ProteomeTools Calibration Standard), a set of 40 synthetic peptides that span the entire hydrophobicity range of tryptic digests, enabling not only accurate determination of retention time indices but also monitoring of chromatographic separation performance over time. The fragmentation characteristics of the peptides can also be used to calibrate and compare collision energies between mass spectrometers. The sequences of all selected peptides do not occur in any natural protein, thus eliminating the need for stable isotope labeling. We anticipate that this set of peptides will be useful for multiple purposes in individual laboratories but also aiding the transfer of data acquisition and analysis methods between laboratories, notably the use of spectral libraries.  相似文献   

11.

Introduction  

Raw spectral data from matrix-assisted laser desorption/ionisation time-of-flight (MALDI-TOF) with MS profiling techniques usually contains complex information not readily providing biological insight into disease. The association of identified features within raw data to a known peptide is extremely difficult. Data preprocessing to remove uncertainty characteristics in the data is normally required before performing any further analysis. This study proposes an alternative yet simple solution to preprocess raw MALDI-TOF-MS data for identification of candidate marker ions. Two in-house MALDI-TOF-MS data sets from two different sample sources (melanoma serum and cord blood plasma) are used in our study.  相似文献   

12.
We present a statistical method SAINT-MS1 for scoring protein-protein interactions based on the label-free MS1 intensity data from affinity purification-mass spectrometry (AP-MS) experiments. The method is an extension of Significance Analysis of INTeractome (SAINT), a model-based method previously developed for spectral count data. We reformulated the statistical model for log-transformed intensity data, including adequate treatment of missing observations, that is, interactions identified in some but not all replicate purifications. We demonstrate the performance of SAINT-MS1 using two recently published data sets: a small LTQ-Orbitrap data set with three replicate purifications of single human bait protein and control purifications and a larger drosophila data set targeting insulin receptor/target of rapamycin signaling pathway generated using an LTQ-FT instrument. Using the drosophila data set, we also compare and discuss the performance of SAINT analysis based on spectral count and MS1 intensity data in terms of the recovery of orthologous and literature-curated interactions. Given rapid advances in high mass accuracy instrumentation and intensity-based label-free quantification software, we expect that SAINT-MS1 will become a useful tool allowing improved detection of protein interactions in label-free AP-MS data, especially in the low abundance range.  相似文献   

13.
14.
Label-free quantification of high mass resolution LC-MS data has emerged as a promising technology for proteome analysis. Computational methods are required for the accurate extraction of peptide signals from LC-MS data and the tracking of these features across the measurements of different samples. We present here an open source software tool, SuperHirn, that comprises a set of modules to process LC-MS data acquired on a high resolution mass spectrometer. The program includes newly developed functionalities to analyze LC-MS data such as feature extraction and quantification, LC-MS similarity analysis, LC-MS alignment of multiple datasets, and intensity normalization. These program routines extract profiles of measured features and comprise tools for clustering and classification analysis of the profiles. SuperHirn was applied in an MS1-based profiling approach to a benchmark LC-MS dataset of complex protein mixtures with defined concentration changes. We show that the program automatically detects profiling trends in an unsupervised manner and is able to associate proteins to their correct theoretical dilution profile.  相似文献   

15.
Recent studies have revealed a relationship between protein abundance and sampling statistics, such as sequence coverage, peptide count, and spectral count, in label-free liquid chromatography-tandem mass spectrometry (LC-MS/MS) shotgun proteomics. The use of sampling statistics offers a promising method of measuring relative protein abundance and detecting differentially expressed or coexpressed proteins. We performed a systematic analysis of various approaches to quantifying differential protein expression in eukaryotic Saccharomyces cerevisiae and prokaryotic Rhodopseudomonas palustris label-free LC-MS/MS data. First, we showed that, among three sampling statistics, the spectral count has the highest technical reproducibility, followed by the less-reproducible peptide count and relatively nonreproducible sequence coverage. Second, we used spectral count statistics to measure differential protein expression in pairwise experiments using five statistical tests: Fisher's exact test, G-test, AC test, t-test, and LPE test. Given the S. cerevisiae data set with spiked proteins as a benchmark and the false positive rate as a metric, our evaluation suggested that the Fisher's exact test, G-test, and AC test can be used when the number of replications is limited (one or two), whereas the t-test is useful with three or more replicates available. Third, we generalized the G-test to increase the sensitivity of detecting differential protein expression under multiple experimental conditions. Out of 1622 identified R. palustris proteins in the LC-MS/MS experiment, the generalized G-test detected 1119 differentially expressed proteins under six growth conditions. Finally, we studied correlated expression of these 1119 proteins by analyzing pairwise expression correlations and by delineating protein clusters according to expression patterns. Through pairwise expression correlation analysis, we demonstrated that proteins co-located in the same operon were much more strongly coexpressed than those from different operons. Combining cluster analysis with existing protein functional annotations, we identified six protein clusters with known biological significance. In summary, the proposed generalized G-test using spectral count sampling statistics is a viable methodology for robust quantification of relative protein abundance and for sensitive detection of biologically significant differential protein expression under multiple experimental conditions in label-free shotgun proteomics.  相似文献   

16.
Overdispersion or extra-Poisson variation is very common for count data. This phenomenon arises when the variability of the counts greatly exceeds the mean under the Poisson assumption, resulting in substantial bias for the parameter estimates. To detect whether count data are overdispersed in the Poisson regression setting, various tests have been proposed and among them, the score tests derived by Dean (1992) are popular and easy to implement. However, such tests can be sensitive to anomalous or extreme observations. In this paper, diagnostic measures are proposed for assessing the sensitivity of Dean's score test for overdispersion in Poisson regression. Applications to the well-known fabric faults and Ames salmonella assay data sets illustrate the usefulness of the diagnostics in analyzing overdispersed count data.  相似文献   

17.
Orive ME  Asmussen MA 《Genetics》2000,155(2):833-854
A new maximum-likelihood method is developed for estimating unidirectional pollen and seed flow in mixed-mating plant populations from counts of joint nuclear-cytoplasmic genotypes. Data may include multiple unlinked nuclear markers with a single maternally or paternally inherited cytoplasmic marker, or with two cytoplasmic markers inherited through opposite parents, as in many conifer species. Migration rate estimates are based on fitting the equilibrium genotype frequencies under continent-island models of plant gene flow to the data. Detailed analysis of their equilibrium structures indicates when each of the three nuclear-cytoplasmic systems allows gene flow estimation and shows that, in general, it is easier to estimate seed than pollen migration. Three-locus nuclear-dicytoplasmic data only increase the conditions allowing seed migration estimates; however, the additional dicytonuclear disequilibria allow more accurate estimates of both forms of gene flow. Estimates and their confidence limits for simulated data sets confirm that two-locus data with paternal cytoplasmic inheritance provide better estimates than those with maternal inheritance, while three-locus dicytonuclear data with three modes of inheritance generally provide the most reliable estimates for both types of gene flow. Similar results are obtained for hybrid zones receiving pollen and seed flow from two source populations. An estimation program is available upon request.  相似文献   

18.
Detecting differentially expressed proteins is a key goal of proteomics. We describe a label-free method, the spectral index, for analyzing relative protein abundance in large-scale data sets derived from biological samples by shotgun proteomics. The spectral index is comprised of two biochemically plausible features: relative protein abundance (assessed by spectral counts) and the number of samples within a group with detectable peptides. We combined the spectral index with permutation analysis to establish confidence intervals for assessing differential protein expression in bronchoalveolar lavage fluid from cystic fibrosis and control subjects. Significant differences in protein abundance determined by the spectral index agreed well with independent biochemical measurements. When used to analyze simulated data sets, the spectral index outperformed four other statistical tests (Student's t-test, G-test, Bayesian t-test, and Significance Analysis of Microarrays) by correctly identifying the largest number of differentially expressed proteins. Correspondence analysis and functional annotation analysis indicated that the spectral index improves the identification of enriched proteins corresponding to clinical phenotypes. The spectral index is easily implemented and statistically robust, and its results are readily interpreted graphically. Therefore, it should be useful for biomarker discovery and comparisons of protein expression between normal and disease states.  相似文献   

19.
A notable inefficiency of shotgun proteomics experiments is the repeated rediscovery of the same identifiable peptides by sequence database searching methods, which often are time-consuming and error-prone. A more precise and efficient method, in which previously observed and identified peptide MS/MS spectra are catalogued and condensed into searchable spectral libraries to allow new identifications by spectral matching, is seen as a promising alternative. To that end, an open-source, functionally complete, high-throughput and readily extensible MS/MS spectral searching tool, SpectraST, was developed. A high-quality spectral library was constructed by combining the high-confidence identifications of millions of spectra taken from various data repositories and searched using four sequence search engines. The resulting library consists of over 30,000 spectra for Saccharomyces cerevisiae. Using this library, SpectraST vastly outperforms the sequence search engine SEQUEST in terms of speed and the ability to discriminate good and bad hits. A unique advantage of SpectraST is its full integration into the popular Trans Proteomic Pipeline suite of software, which facilitates user adoption and provides important functionalities such as peptide and protein probability assignment, quantification, and data visualization. This method of spectral library searching is especially suited for targeted proteomics applications, offering superior performance to traditional sequence searching.  相似文献   

20.
Separation of proteins by two-dimensional electrophoresis and following mass spectrometry (MS) is now a conventional technique for proteomic analysis. For proteomic analysis of a certain tissue with a limited information of primary structures of proteins, we have developed an analytical system for peptide mass fingerprinting in gene products in the testis of the ascidian Ciona intestinalis. Ciona sperm proteins were separated by two-dimensional gel electrophoresis and the tryptic fragments were subjected to MALDI-TOF/MS. The mass pattern was searched against on-line databases but resulted in less identification of these proteins. We have constructed a MS database from Ciona testis ESTs and the genome draft sequence, along with a newly devised, perl-based search program PerMS for peptide mass fingerprinting. This system could identify more than 80% of Ciona sperm proteins, suggesting that it could be widely applied for proteomic analysis for a limited tissue with less genomic information.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号