首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Mixtures of known proteins have been very useful in the assessment and validation of methods for high-throughput (HTP) MS (MS/MS) proteomics experiments. However, these test mixtures have generally consisted of few proteins at near equal concentration or of a single protein at varied concentrations. Such mixtures are too simple to effectively assess the validity of error rates for protein identification and differential expression in HTP MS/MS studies. This work aimed at overcoming these limitations and simulating studies of complex biological samples. We introduced a pair of 54-protein standard mixtures of variable concentrations with up to a 1000-fold dynamic range in concentration and up to ten-fold expression ratios with additional negative controls (infinite expression ratios). These test mixtures comprised 16 off-the-shelf Sigma-Aldrich proteins and 38 Shewanella oneidensis proteins produced in-house. The standard proteins were systematically distributed into three main concentration groups (high, medium, and low) and then the concentrations were varied differently for each mixture within the groups to generate different expression ratios. The mixtures were analyzed with both low mass accuracy LCQ and high mass accuracy FT-LTQ instruments. In addition, these 54 standard proteins closely follow the molecular weight distributions of both bacterial and human proteomes. As a result, these new standard mixtures allow for a much more realistic assessment of approaches for protein identification and label-free differential expression than previous mixtures. Finally, methodology and experimental design developed in this work can be readily applied in future to development of more complex standard mixtures for HTP proteomics studies.  相似文献   

2.
Accurate protein identification in large-scale proteomics experiments relies upon a detailed, accurate protein catalogue, which is derived from predictions of open reading frames based on genome sequence data. Integration of mass spectrometry-based proteomics data with computational proteome predictions from environmental metagenomic sequences has been challenging because of the variable overlap between proteomic datasets and corresponding short-read nucleotide sequence data. In this study, we have benchmarked several strategies for increasing microbial peptide spectral matching in metaproteomic datasets using protein predictions generated from matched metagenomic sequences from the same human fecal samples. Additionally, we investigated the impact of mass spectrometry-based filters (high mass accuracy, delta correlation), and de novo peptide sequencing on the number and robustness of peptide-spectrum assignments in these complex datasets. In summary, we find that high mass accuracy peptide measurements searched against non-assembled reads from DNA sequencing of the same samples significantly increased identifiable proteins without sacrificing accuracy.  相似文献   

3.
Recent studies have revealed a relationship between protein abundance and sampling statistics, such as sequence coverage, peptide count, and spectral count, in label-free liquid chromatography-tandem mass spectrometry (LC-MS/MS) shotgun proteomics. The use of sampling statistics offers a promising method of measuring relative protein abundance and detecting differentially expressed or coexpressed proteins. We performed a systematic analysis of various approaches to quantifying differential protein expression in eukaryotic Saccharomyces cerevisiae and prokaryotic Rhodopseudomonas palustris label-free LC-MS/MS data. First, we showed that, among three sampling statistics, the spectral count has the highest technical reproducibility, followed by the less-reproducible peptide count and relatively nonreproducible sequence coverage. Second, we used spectral count statistics to measure differential protein expression in pairwise experiments using five statistical tests: Fisher's exact test, G-test, AC test, t-test, and LPE test. Given the S. cerevisiae data set with spiked proteins as a benchmark and the false positive rate as a metric, our evaluation suggested that the Fisher's exact test, G-test, and AC test can be used when the number of replications is limited (one or two), whereas the t-test is useful with three or more replicates available. Third, we generalized the G-test to increase the sensitivity of detecting differential protein expression under multiple experimental conditions. Out of 1622 identified R. palustris proteins in the LC-MS/MS experiment, the generalized G-test detected 1119 differentially expressed proteins under six growth conditions. Finally, we studied correlated expression of these 1119 proteins by analyzing pairwise expression correlations and by delineating protein clusters according to expression patterns. Through pairwise expression correlation analysis, we demonstrated that proteins co-located in the same operon were much more strongly coexpressed than those from different operons. Combining cluster analysis with existing protein functional annotations, we identified six protein clusters with known biological significance. In summary, the proposed generalized G-test using spectral count sampling statistics is a viable methodology for robust quantification of relative protein abundance and for sensitive detection of biologically significant differential protein expression under multiple experimental conditions in label-free shotgun proteomics.  相似文献   

4.
Spectral counting has become a commonly used approach for measuring protein abundance in label-free shotgun proteomics. At the same time, the development of data analysis methods has lagged behind. Currently most studies utilizing spectral counts rely on simple data transforms and posthoc corrections of conventional signal-to-noise ratio statistics. However, these adjustments can neither handle the bias toward high abundance proteins nor deal with the drawbacks due to the limited number of replicates. We present a novel statistical framework (QSpec) for the significance analysis of differential expression with extensions to a variety of experimental design factors and adjustments for protein properties. Using synthetic and real experimental data sets, we show that the proposed method outperforms conventional statistical methods that search for differential expression for individual proteins. We illustrate the flexibility of the model by analyzing a data set with a complicated experimental design involving cellular localization and time course.  相似文献   

5.
Mass spectrometry (MS)-based shotgun proteomics allows protein identifications even in complex biological samples. Protein abundances can then be estimated from the counts of tandem MS (MS/MS) spectra attributable to each protein, provided one accounts for differential MS detectability of contributing peptides. We developed a method, APEX, which calculates Absolute Protein EXpression levels based upon learned correction factors, MS/MS spectral counts and each protein's probability of correct identification. This protocol describes APEX-based calculations in three parts. (i) Using training data, peptide sequences and their sequence properties, a model is built to estimate MS detectability (O(i)) for any given protein. (ii) Absolute protein abundances are calculated from spectral counts, identification probabilities and the learned O(i)-values. (iii) Simple statistics allow calculation of differential expression in two distinct biological samples, i.e., measuring relative protein abundances. APEX-based protein abundances span 3-4 orders of magnitude and are applicable to mixtures of 100s to 1,000s of proteins.  相似文献   

6.
Proteome analysis, utilizing high-throughput proteomics approaches, involves studying proteins that a whole organism (or specific tissue or cellular compartment) expresses under certain conditions. Intrinsic difficulties of these studies, as well as the enormous volumes of data they typically produce, make the proteome analysis and interpretation very difficult. As with any high-throughput approach, proteomics experiments should be carefully designed, analyzed, and verified. In addition to computational standards,experimental standards--simple and complex mixtures of known proteins--for high-throughput proteomics have to be developed and utilized. This article discusses such experimental standards and their implementations.  相似文献   

7.
Bandeira N 《BioTechniques》2007,42(6):687, 689, 691 passim
Significant technological advances have accelerated high-throughput proteomics to the automated generation of millions of tandem mass spectra on a daily basis. In such a setup, the desire for greater sequence coverage combines with standard experimental procedures to commonly yield multiple tandem mass spectra from overlapping peptides-typical observations include peptides differing by one or two terminal amino acids and spectra from modified and unmodified variants of the same peptides. In a departure from the traditional spectrum identification algorithms that analyze each tandem mass spectrum in isolation, spectral networks define a new computational approach that instead finds and simultaneously interprets sets of spectra from overlapping peptides. In shotgun protein sequencing, spectral networks capitalize on the redundant sequence information in the aligned spectra to deliver the longest and most accurate de novo sequences ever reported for ion trap data. Also, by combining spectra from multiple modified and unmodified variants of the same peptides, spectral networks are able to bypass the dominant guess/confirm approach to the identification of posttranslational modifications and alternatively discover modifications and highly modified peptides directly from experimental data. Open-source implementations of these algorithms may be downloaded from peptide.ucsd.edu.  相似文献   

8.
In recent years, genomics has been extended to functional genomics. Toward the characterization of organisms or species on the genome level, changes on the metabolite and protein level have been shown to be essential to assign functions to genes and to describe the dynamic molecular phenotype. Gas chromatography (GC) and liquid chromatography coupled to mass spectrometry (GC- and LC-MS) are well suited for the fast and comprehensive analysis of ultracomplex metabolite samples. For the integration of metabolite profiles with quantitative protein profiles, a high throughput (HTP) shotgun proteomics approach using LC-MS and label-free quantification of unique proteins in a complex protein digest is described. Multivariate statistics are applied to examine sample pattern recognition based on data-dimensionality reduction and biomarker identification in plant systems biology. The integration of the data reveal multiple correlative biomarkers providing evidence for an increase of information in such holistic approaches. With computational simulation of metabolic networks and experimental measurements, it can be shown that biochemical regulation is reflected by metabolite network dynamics measured in a metabolomics approach. Examples in molecular plant physiology are presented to substantiate the integrative approach.  相似文献   

9.
Tryptic digestion of proteins continues to be a workhorse of proteomics. Traditional tryptic digestion requires several hours to generate an adequate protein digest. A number of enhanced accelerated digestion protocols have been developed in recent years. Nonetheless, a need still exists for new digestion strategies that meet the demands of proteomics for high-throughput and rapid detection and identification of proteins. We performed an evaluation of direct tryptic digestion of proteins on a MALDI target plate and the potential for integrating RP HPLC separation of protein with on-target tryptic digestion in order to achieve a rapid and effective identification of proteins in complex biological samples. To this end, we used a Tempo HPLC/MALDI target plate deposition hybrid instrument (ABI). The technique was evaluated using a number of soluble and membrane proteins and an MRC5 cell lysate. We demonstrated that direct deposition of proteins on a MALDI target plate after reverse-phase HPLC separation and subsequent tryptic digestion of the proteins on the target followed by MALDI TOF/TOF analysis provided substantial data (intact protein mass, peptide mass and peptide fragment mass) that allowed a rapid and unambiguous identification of proteins. The rapid protein separation and direct deposition of fractions on a MALDI target plate provided by the RP HPLC combined with off-line interfacing with the MALDI MS is a unique platform for rapid protein identification with improved sequence coverage. This simple and robust approach significantly reduces the sample handling and potential loss in large-scale proteomics experiments. This approach allows combination of peptide mass fingerprinting (PMF), MS/MS peptide fragment fingerprinting (PPF) and whole protein MS for both protein identification and structural analysis of proteins.  相似文献   

10.
Ideally, shotgun proteomics would facilitate the identification of an entire proteome with 100% protein sequence coverage. In reality, the large dynamic range and complexity of cellular proteomes results in oversampling of abundant proteins, while peptides from low abundance proteins are undersampled or remain undetected. We tested the proteome equalization technology, ProteoMiner, in conjunction with Multidimensional Protein Identification Technology (MudPIT) to determine how the equalization of protein dynamic range could improve shotgun proteomics methods for the analysis of cellular proteomes. Our results suggest low abundance protein identifications were improved by two mechanisms: (1) depletion of high abundance proteins freed ion trap sampling space usually occupied by high abundance peptides and (2) enrichment of low abundance proteins increased the probability of sampling their corresponding more abundant peptides. Both mechanisms also contributed to dramatic increases in the quantity of peptides identified and the quality of MS/MS spectra acquired due to increases in precursor intensity of peptides from low abundance proteins. From our large data set of identified proteins, we categorized the dominant physicochemical factors that facilitate proteome equalization with a hexapeptide library. These results illustrate that equalization of the dynamic range of the cellular proteome is a promising methodology to improve low abundance protein identification confidence, reproducibility, and sequence coverage in shotgun proteomics experiments, opening a new avenue of research for improving proteome coverage.  相似文献   

11.
Genes that encode glycosylphosphatidylinositol anchored proteins (GPI-APs) constitute an estimated 1-2% of eukaryote genomes. Current computational methods for the prediction of GPI-APs are sensitive and specific; however, the analysis of the processing site (omega- or omega-site) of GPI-APs is still challenging. Only 10% of the proteins that are annotated as GPI-APs have the omega-site experimentally verified. We describe an integrated computational and experimental proteomics approach for the identification and characterization of GPI-APs that provides the means to identify GPI-APs and the derived GPI-anchored peptides in LC-MS/MS data sets. The method takes advantage of sequence features of GPI-APs and the known core structure of the GPI-anchor. The first stage of the analysis encompasses LC-MS/MS based protein identification. The second stage involves prediction of the processing sites of the identified GPI-APs and prediction of the corresponding terminal tryptic peptides. The third stage calculates possible GPI structures on the peptides from stage two. The fourth stage calculates the scores by comparing the theoretical spectra of the predicted GPI-peptides against the observed MS/MS spectra. Automated identification of C-terminal GPI-peptides from porcine membrane dipeptidase, folate receptor and CD59 in complex LC-MS/MS data sets demonstrates the sensitivity and specificity of this integrated computational and experimental approach.  相似文献   

12.
Recently a number of computational approaches have been developed for the prediction of protein–protein interactions. Complete genome sequencing projects have provided the vast amount of information needed for these analyses. These methods utilize the structural, genomic, and biological context of proteins and genes in complete genomes to predict protein interaction networks and functional linkages between proteins. Given that experimental techniques remain expensive, time-consuming, and labor-intensive, these methods represent an important advance in proteomics. Some of these approaches utilize sequence data alone to predict interactions, while others combine multiple computational and experimental datasets to accurately build protein interaction maps for complete genomes. These methods represent a complementary approach to current high-throughput projects whose aim is to delineate protein interaction maps in complete genomes. We will describe a number of computational protocols for protein interaction prediction based on the structural, genomic, and biological context of proteins in complete genomes, and detail methods for protein interaction network visualization and analysis.  相似文献   

13.
Detecting differentially expressed proteins is a key goal of proteomics. We describe a label-free method, the spectral index, for analyzing relative protein abundance in large-scale data sets derived from biological samples by shotgun proteomics. The spectral index is comprised of two biochemically plausible features: relative protein abundance (assessed by spectral counts) and the number of samples within a group with detectable peptides. We combined the spectral index with permutation analysis to establish confidence intervals for assessing differential protein expression in bronchoalveolar lavage fluid from cystic fibrosis and control subjects. Significant differences in protein abundance determined by the spectral index agreed well with independent biochemical measurements. When used to analyze simulated data sets, the spectral index outperformed four other statistical tests (Student's t-test, G-test, Bayesian t-test, and Significance Analysis of Microarrays) by correctly identifying the largest number of differentially expressed proteins. Correspondence analysis and functional annotation analysis indicated that the spectral index improves the identification of enriched proteins corresponding to clinical phenotypes. The spectral index is easily implemented and statistically robust, and its results are readily interpreted graphically. Therefore, it should be useful for biomarker discovery and comparisons of protein expression between normal and disease states.  相似文献   

14.
15.
The recent surge in microbial genomic sequencing, combined with the development of high-throughput liquid chromatography-mass-spectrometry-based (LC/LC-MS/MS) proteomics, has raised the question of the extent to which genomic information of one strain or environmental sample can be used to profile proteomes of related strains or samples. Even with decreasing sequencing costs, it remains impractical to obtain genomic sequence for every strain or sample analyzed. Here, we evaluate how shotgun proteomics is affected by amino acid divergence between the sample and the genomic database using a probability-based model and a random mutation simulation model constrained by experimental data. To assess the effects of nonrandom distribution of mutations, we also evaluated identification levels using in silico peptide data from sequenced isolates with average amino acid identities (AAI) varying between 76 and 98%. We compared the predictions to experimental protein identification levels for a sample that was evaluated using a database that included genomic information for the dominant organism and for a closely related variant (95% AAI). The range of models set the boundaries at which half of the proteins in a proteomic experiment can be identified to be 77-92% AAI between orthologs in the sample and database. Consistent with this prediction, experimental data indicated loss of half the identifiable proteins at 90% AAI. Additional analysis indicated a 6.4% reduction of the initial protein coverage per 1% amino acid divergence and total identification loss at 86% AAI. Consequently, shotgun proteomics is capable of cross-strain identifications but avoids most cross-species false positives.  相似文献   

16.
The in‐depth analysis of complex proteome samples requires fractionation of the sample into subsamples prior to LC‐MS/MS in shotgun proteomics experiments. We have established a 3D workflow for shotgun proteomics that relies on protein separation by 1D PAGE, gel fractionation, trypsin digestion, and peptide separation by in‐gel IEF, prior to RP‐HPLC‐MS/MS. Our results show that applying peptide IEF can significantly increase the number of proteins identified from PAGE subfractionation. This method delivers deeper proteome coverage and provides a large degree of flexibility in experimentally approaching highly complex mixtures by still relying on protein separation according to molecular weight in the first dimension.  相似文献   

17.
Shen C  Li L  Chen JY 《Proteins》2006,64(2):436-443
Experimental processes to collect and process proteomics data are increasingly complex, and the computational methods to assess the quality and significance of these data remain unsophisticated. These challenges have led to many biological oversights and computational misconceptions. We developed an empirical Bayes model to analyze multiprotein complex (MPC) proteomics data derived from peptide mass spectrometry detections of purified protein complex pull-down experiments. Using our model and two yeast proteomics data sets, we estimated that there should be an average of about 20 true associations per MPC, almost 10 times as high as was previously estimated. For data sets generated to mimic a real proteome, our model achieved on average 80% sensitivity in detecting true associations, as compared with the 3% sensitivity in previous work, while maintaining a comparable false discovery rate of 0.3%. Cross-examination of our results with protein complexes confirmed by various experimental techniques demonstrates that many true associations that cannot be identified by previous approach are identified by our method.  相似文献   

18.
19.
The formation of proteins into stable protein complexes plays a fundamental role in the operation of the cell. The study of the degree of evolutionary conservation of protein complexes between species and the evolution of protein-protein interactions has been hampered by lack of comprehensive coverage of the high-throughput (HTP) technologies that measure the interactome. We show that new high-throughput datasets on protein co-purification in yeast have a substantially lower false negative rate than previous datasets when compared to known complexes. These datasets are therefore more suitable to estimate the conservation of protein complex membership than hitherto possible. We perform comparative genomics between curated protein complexes from human and the HTP data in Saccharomyces cerevisiae to study the evolution of co-complex memberships. This analysis revealed that out of the 5,960 protein pairs that are part of the same complex in human, 2,216 are absent because both proteins lack an ortholog in S. cerevisiae, while for 1,828 the co-complex membership is disrupted because one of the two proteins lacks an ortholog. For the remaining 1,916 protein pairs, only 10% were never co-purified in the large-scale experiments. This implies a conservation level of co-complex membership of 90% when the genes coding for the protein pairs that participate in the same protein complex are also conserved. We conclude that the evolutionary dynamics of protein complexes are, by and large, not the result of network rewiring (i.e. acquisition or loss of co-complex memberships), but mainly due to genomic acquisition or loss of genes coding for subunits. We thus reveal evidence for the tight interrelation of genomic and network evolution.  相似文献   

20.
在蛋白质组学中,进行液相质谱(LC-MS)实验谱数据处理,发现并分析生物标志物的复杂肽或蛋白质样本的差异是重点,而校准相同样本的多次重复实验中肽链产生的洗脱时间峰信号(LC峰)是进行量化、分析差异的关键。目前多个重复实验数据的校准通常是在重复的实验数据集中根据液相二级质谱(LC-MS/MS)实验标识LC峰的时间特征,然后使用翘曲函数对时间特征进行对齐。由于多重数据的洗脱时间误差产生是随机的,统一使用翘曲函数校准会产生较大误差。为了解决这个问题,本研究重点研究了多个重复实验数据中LC峰的时间校准算法。我们选取了两个重复实验数据,采用机器学习的思路,通过选用两个数据的LC-MS/MS中重复检测到的肽链数据作为可信数据,部分选为训练序列,部分作为测试序列,建立统计数学模型,提出了一种新的校准算法,并采用测试序列对该统计模型进行准确率测试,表明算法的准确性达到95%以上;然后,将该模型应用在两个实验数据的所有LC-MS/MS肽链检测值上,提高检测值在多个数据中的覆盖率,表明覆盖率可以到达85%以上。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号