首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
An important aim of proteogenomics, which combines data of high throughput nucleic acid and protein analysis, is to reliably identify single amino acid substitutions representing a main type of coding genome variants. Exact knowledge of deviations from the consensus genome can be utilized in several biomedical fields, such as studies of expression of mutated proteins in cancer, deciphering heterozygosity mechanisms, identification of neoantigens in anticancer vaccine production, search for RNA editing sites at the level of the proteome, etc. Generation of this new knowledge requires processing of large data arrays from high–resolution mass spectrometry, where information on single–point protein variation is often difficult to extract. Accordingly, a significant problem in proteogenomic analysis is the presence of high levels of false positive results for variant–containing peptides in the produced results. Here we review recently suggested approaches of high quality proteomics data processing that may provide more reliable identification of single amino acid substitutions, especially contrary to residue modifications occurring in vitro and in vivo. Optimized methods for assessment of false discovery rate save instrumental and computational time spent for validation of interesting findings of amino acid polymorphism by orthogonal methods.  相似文献   

2.
Laura Y. Zhou  Fei Zou  Wei Sun 《Biometrics》2023,79(3):2664-2676
Cancer (treatment) vaccines that are made of neoantigens, or peptides unique to tumor cells due to somatic mutations, have emerged as a promising method to reinvigorate the immune response against cancer. A key step to prioritizing neoantigens for cancer vaccines is computationally predicting which neoantigens are presented on the cell surface by a human leukocyte antigen (HLA). We propose to address this challenge by training a neural network using mass spectrometry (MS) data composed of peptides presented by at least one of several HLAs of a subject. We embed the neural network within a mixture model and train the neural network by maximizing the likelihood of the mixture model. After evaluating our method using data sets where the peptide presentation status was known, we applied it to analyze somatic mutations of 60 melanoma patients and identified a group of neoantigens more immunogenic in tumor cells than in normal cells. Moreover, neoantigen burden estimated by our method was significantly associated with a measurement of the immune system activity, suggesting these neoantigens could induce an immune response.  相似文献   

3.
4.
Introduction: Mass spectrometry (MS)-based proteomics has become an indispensable tool for the characterization of the proteome and its post-translational modifications (PTM). In addition to standard protein sequence databases, proteogenomics strategies search the spectral data against the theoretical spectra obtained from customized protein sequence databases. Up to date, there are no published proteogenomics studies on acute myeloid leukemia (AML) samples.

Areas covered: Proteogenomics involves the understanding of genomic and proteomic data. The intersection of both datatypes requires advanced bioinformatics skills. A standard proteogenomics workflow that could be used for the study of AML samples is described. The generation of customized protein sequence databases as well as bioinformatics tools and pipelines commonly used in proteogenomics are discussed in detail.

Expert commentary: Drawing on evidence from recent cancer proteogenomics studies and taking into account the public availability of AML genomic data, the interpretation of present and future MS-based AML proteomic data using AML-specific protein sequence databases could discover new biological mechanisms and targets in AML. However, proteogenomics workflows including bioinformatics guidelines can be challenging for the wide AML research community. It is expected that further automation and simplification of the bioinformatics procedures might attract AML investigators to adopt the proteogenomics strategy.  相似文献   


5.
6.
Molecular-assisted precision oncology gained tremendous ground with high-throughput next-generation sequencing(NGS), supported by robust bioinformatics. The quest for genomicsbased cancer medicine set the foundations for improved patient stratification, while unveiling a wide array of neoantigens for immunotherapy. Upfront pre-clinical and clinical studies have successfully used tumor-specific peptides in vaccines with minimal off-target effects. However, the low mutational burden presented by many lesions challenges the generalization of these solutions, requiring the diversification of neoantigen sources. Oncoproteogenomics utilizing customized databases for protein annotation by mass spectrometry(MS) is a powerful tool toward this end. Expanding the concept toward exploring proteoforms originated from post-translational modifications(PTMs)will be decisive to improve molecular subtyping and provide potentially targetable functional nodes with increased cancer specificity. Walking through the path of systems biology, we highlight that alterations in protein glycosylation at the cell surface not only have functional impact on cancer progression and dissemination but also originate unique molecular fingerprints for targeted therapeutics. Moreover, we discuss the outstanding challenges required to accommodate glycoproteomics in oncoproteogenomics platforms. We envisage that such rationale may flag a rather neglected research field, generating novel paradigms for precision oncology and immunotherapy.  相似文献   

7.
8.
随着高通量DNA测序技术的飞速发展,越来越多的物种完成了基因组测序.定位编码基因、确定编码基因结构是基因组注释的基本任务,然而以往的基因组注释方法主要依赖于DNA及RNA序列信息.为了更加精确地解读完成测序的基因组,我们需要整合多种类型的组学数据进行基因组注释.近年来,基于串联质谱技术的蛋白质组学已经发展成熟,实现了对蛋白质组的高覆盖,使得利用串联质谱数据进行基因组注释成为可能.串联质谱数据一方面可以对已注释的基因进行表达验证,另一方面还可以校正原注释基因,进而发现新基因,实现对基因组序列的重新注释.这正是当前进展较快的蛋白质基因组学的研究内容.利用该方法系统地注释已完成测序的基因组已成为解读基因组的一个重要补充.本文综述了蛋白质基因组学的主要研究内容和研究方法,并展望了该研究方向未来的发展.  相似文献   

9.
Database search tools identify peptides by matching tandem mass spectra against a protein database. We study an alternative approach when all plausible de novo interpretations of a spectrum (spectral dictionary) are generated and then quickly matched against the database. We present a new MS-Dictionary algorithm for efficiently generating spectral dictionaries and demonstrate that MS-Dictionary can identify spectra that are missed in the database search. We argue that MS-Dictionary enables proteogenomics searches in six-frame translation of genomic sequences that may be prohibitively time-consuming for existing database search approaches. We show that such searches allow one to correct sequencing errors and find programmed frameshifts.  相似文献   

10.
An encouraging approach for the diagnosis and effective therapy of immunological pathologies, which would include cancer, is the identification of proteins and phosphorylated proteins. Disease proteomics, in particular, is a potentially useful method for this purpose. A key role is played by protein phosphorylation in the regulation of normal immunology disorders and targets for several new cancer drugs and drug candidates are cancer cells and protein kinases. Protein phosphorylation is a highly dynamic process. The functioning of new drugs is of major importance as is the selection of those patients who would respond best to a specific treatment regime. In all major aspects of cellular life signalling networks are key elements which play a major role in inter- and intracellular communications. They are involved in diverse processes such as cell-cycle progression, cellular metabolism, cell-cell communication and appropriate response to the cellular environment. A whole range of networks that are involved in the regulation of cell development, differentiation, proliferation, apoptosis, and immunologic responses is contained in the latter. It is so necessary to understand and monitor kinase signalling pathways in order to understand many immunology pathologies. Enrichment of phosphorylated proteins or peptides from tissue or bodily fluid samples is required. The application of technologies such as immunoproteomic techniques, phosphoenrichments and mass spectrometry (MS) is crucial for the identification and quantification of protein phosphorylation sites in order to advance in clinical research. Pharmacodynamic readouts of disease states and cellular drug responses in tumour samples will be provided as the field develops. We aim to detail the current and most useful techniques with research examples to isolate and carry out clinical phosphoproteomic studies which may be helpful for immunology and cancer research. Different phosphopeptide enrichment and quantitative techniques need to be combined to achieve good phosphopeptide recovery and good up- and-down phospho-regulation protein studies.  相似文献   

11.
Venter E  Smith RD  Payne SH 《PloS one》2011,6(11):e27587
Experimental evidence is increasingly being used to reassess the quality and accuracy of genome annotation. Proteomics data used for this purpose, called proteogenomics, can alleviate many of the problematic areas of genome annotation, e.g. short protein validation and start site assignment. We performed a proteogenomic analysis of 46 genomes spanning eight bacterial and archaeal phyla across the tree of life. These diverse datasets facilitated the development of a robust approach for proteogenomics that is functional across genomes varying in %GC, gene content, proteomic sampling depth, phylogeny, and genome size. In addition to finding evidence for 682 novel proteins, 1336 new start sites, and numerous dubious genes, we discovered sites of post-translational maturation in the form of proteolytic cleavage of 1175 signal peptides. The number of novel proteins per genome is highly variable (median 7, mean 15, stdev 20). Moreover, comparison of novel genes with the current genes did not reveal any consistent abnormalities. Thus, we conclude that proteogenomics fulfills a yet to be understood deficiency in gene prediction. With the adoption of new sequencing technologies which have higher error rates than Sanger-based methods and the advances in proteomics, proteogenomics may become even more important in the future.  相似文献   

12.
陈莹  徐平  戴二黑  张瑶 《微生物学报》2023,63(8):2948-2966
结核病(tuberculosis, TB)是由结核分枝杆菌(Mycobacterium tuberculosis, MTB)感染引起的慢性传染病,是仅次于正在暴发的新型冠状病毒肺炎(COVID-19)的第二大单一感染致死病因。COVID-19的大流行对TB的诊断及治疗造成了破坏性的影响,全球实现终结TB目标的进展偏离了轨道。因此,早诊断、早治疗依然是防控TB蔓延的关键。TB精准诊断一直受MTB抗原特异性、检测技术特异性和灵敏度的影响,因此亟需挖掘高特异性新抗原、开发新检测技术。随着蛋白质基因组学(proteogenomics)和质谱技术的快速发展,从临床体液、组织样本中高效、精准靶向检测MTB特异性已知、甚至新抗原的表达,以及监测治疗过程中的抗原表达量的动态变化,是TB诊断及治疗的发展趋势。在MTB标准菌株H37Rv的4 008个注释基因中(NC_000 962.3, NCBI),国内外报道的已注释抗原虽有140多个,但仅有极少的抗原应用于TB的筛查及辅助诊断,离世界卫生组织(World Health Organization, WHO)的诊断标准尚远。本文通过对MTB已报道抗原以及基...  相似文献   

13.
原核生物蛋白质基因组学研究进展   总被引:1,自引:0,他引:1  
随着基因组测序技术的不断发展,大量微生物基因组序列可以在短时间内得以准确鉴定。为了进一步探究基因组的结构与功能,基于序列特征与同源特征的基因组注释算法广泛应用于新测序物种。然而受基因组测序质量以及算法本身准确性偏低等问题的影响,现有的基因组注释存在着相当比例的假基因以及注释错误,尤其是蛋白质N端的注释错误。为了弥补基因组注释的不足,以基因芯片或RNA-seq为核心的转录组测序技术和以串联质谱为核心的蛋白质组测序技术可以高通量地对基因的转录和翻译产物进行精确测定,进而实现预测基因结构的实验验证。然而,原核生物细胞中存在的大量非编码RNA给转录组测序技术引入了污染数据,限制了其对基因组注释的应用。相对而言,以串联质谱技术为核心的蛋白质组学测序可以在短时间内鉴定到生物体内大量的蛋白质,实现注释基因的验证甚至校准。已成为基因组注释和重注释的重要依据,并因而衍生了"蛋白质基因组学"的新研究方向。文中首先介绍传统的基于序列预测和同源比对的基因组注释算法,指出其中存在的不足。在此基础上,结合转录组学与蛋白质组学的技术特点,分析蛋白质组学对于原核生物基因组注释的优势,总结现阶段大规模蛋白质基因组学研究的进展情况。最后从信息学角度指出当前蛋白质组数据进行基因组重注释存在的问题与相应的解决方案,进而探讨未来蛋白质基因组学的发展方向。  相似文献   

14.
Over the last 5 years proteogenomics (using mass spectroscopy to identify proteins predicted from genomic sequences) has emerged as a promising approach to the high‐throughput identification of protein N‐termini, which remains a problem in genome annotation. Comparison of the experimentally determined N‐termini with those predicted by sequence analysis tools allows identification of the signal peptides and therefore conclusions on the cytoplasmic or extracytoplasmic (periplasmic or extracellular) localization of the respective proteins. We present here the results of a proteogenomic study of the signal peptides in Escherichia coli K‐12 and compare its results with the available experimental data and predictions by such software tools as SignalP and Phobius. A single proteogenomics experiment recovered more than a third of all signal peptides that had been experimentally determined during the past three decades and confirmed at least 31 additional signal peptides, mostly in the known exported proteins, which had been previously predicted but not validated. The filtering of putative signal peptides for the peptide length and the presence of an eight‐residue hydrophobic patch and a typical signal peptidase cleavage site proved sufficient to eliminate the false‐positive hits. Surprisingly, the results of this proteogenomics study, as well as a re‐analysis of the E. coli genome with the latest version of SignalP program, show that the fraction of proteins containing signal peptides is only about 10%, or half of previous estimates.  相似文献   

15.
Proteogenomics     
Renuse S  Chaerkady R  Pandey A 《Proteomics》2011,11(4):620-630
The ability to sequence DNA rapidly, inexpensively and in a high-throughput fashion provides a unique opportunity to sequence whole genomes of a large number of species. The cataloging of protein-coding genes from these species, however, remains a non-trivial task with the majority of initial genome annotation dependent on the use of gene prediction algorithms. Recent advances in mass spectrometry-based proteomics now enable generation of accurate and comprehensive protein sequence of tissues and organisms. Proteogenomics allows us to harness the wealth of information available at the proteome level and apply it to the available genomic information of organisms. This includes identifying novel genes and splice isoforms, assigning correct start sites and validating predicted exons and genes. It is also possible to use proteogenomics to identify protein variants that could cause diseases, to identify protein biomarkers and to study genome variation. We anticipate proteogenomics to become a powerful approach that will be routinely employed by 'Genome and Proteome Centers' of the future.  相似文献   

16.
Grapevine is an important perennial fruit to the wine industry, and has implications for the health industry with some causative agents proven to reduce heart disease. Since the sequencing and assembly of grapevine cultivar Pinot Noir, several studies have contributed to its genome annotation. This new study further contributes toward genome annotation efforts by conducting a proteogenomics analysis using the latest genome annotation from CRIBI, legacy proteomics dataset from cultivar Cabernet Sauvignon and a large RNA‐seq dataset. A total of 341 novel annotation events are identified consisting of five frame‐shifts, 37 translated UTRs, 15 exon boundaries, one novel splice, nine novel exons, 159 gene boundaries, 112 reverse strands, and one novel gene event in 213 genes and 323 proteins. From this proteogenomics evidence, the Augustus gene prediction tool predicted 52 novel and revised genes (54 protein isoforms), 11 genes of which are associated with key traits such as stress tolerance and floral and fruity wine characteristics. This study also highlights a likely over‐assembly with the genome, particularly on chromosome 7.  相似文献   

17.
Biological therapy is currently being investigated in the treatment of a number of malignancies. The hypothesis for the use of this therapeutic modality involves an attempt to stimulate an already existent but perhaps suboptimal immune response to foreign protein, including tumor. Immunologic therapy appears to work best against small-volume disease, as indicated from animal studies. This condition is potentially achievable in advanced ovarian cancer, where surgery is capable of producing multi-log reduction in tumor mass, and thus immunotherapy may be an option in this disease. The attraction of biologic therapy in patients with ovarian cancer is the potential to treat relatively localized but often chemotherapy-resistant disease. In cervical cancer, the rationale for the use of interferon is somewhat different in that this disease may be a manifestation of a virally induced proliferative lesion. Thus, the antiviral properties of interferon are being investigated in both limited and advanced cervical cancer. Both of these hypothesis have pre-clinical data to support them. This paper presents the pre-clinical and clinical work currently available for consideration of future use.  相似文献   

18.
With the onset of modern DNA sequencing technologies, genomics is experiencing a revolution in terms of quantity and quality of sequencing data. Rapidly growing numbers of sequenced genomes and metagenomes present a tremendous challenge for bioinformatics tools that predict protein-coding regions. Experimental evidence of expressed genomic regions, both at the RNA and protein level, is becoming invaluable for genome annotation and training of gene prediction algorithms. Evidence of gene expression at the protein level using mass spectrometry-based proteomics is increasingly used in refinement of raw genome sequencing data. In a typical "proteogenomics" experiment, the whole proteome of an organism is extracted, digested into peptides and measured by a mass spectrometer. The peptide fragmentation spectra are identified by searching against a six-frame translation of the raw genomic assembly, thus enabling the identification of hitherto unpredicted protein-coding genomic regions. Application of mass spectrometry to genome annotation presents a range of challenges to the standard workflows in proteomics, especially in terms of proteome coverage and database search strategies. Here we provide an overview of the field and argue that the latest mass spectrometry technologies that enable high mass accuracy at high acquisition rates will prove to be especially well suited for proteogenomics applications.  相似文献   

19.
Proteogenomics, the integrative analysis of the proteome and the genome, increasingly provides protein‐level insights about the regulation of gene expression and protein translation. Armengaud et al. (Proteomics 2017, 17, 1700211) nicely illustrate this trend with the first in‐depth proteomic analysis of the eukaryotic and unicellular intestinal parasite Blastocystis sp. Not only this work constitutes an important milestone toward the proteogenomics profile of this human pathogen, but also it demonstrates at the protein level the occurrence of a specific mechanism of mRNA decoding. GU‐rich motifs located downstream of mRNA polyadenylation sites create termination codons that ultimately result in the synthesis of proteins with lower molecular weight than predicted from gene sequence. Thus, the scope of proteogenomics now extends to the regulation of mRNA translation into proteins, providing a proof of concept for future studies in multicellular eukaryotes such as humans and plants.  相似文献   

20.
Cotranslational protein N-terminal modifications, including proteolytic maturation such as initiator methionine excision by methionine aminopeptidases and N-terminal blocking, occur universally. Protein alpha-N-acetylation, or the transfer of the acetyl moiety of acetyl-coenzyme A to nascent protein N-termini, catalysed by multisubunit N-terminal acetyltransferase complexes, generally takes place during protein translation. Nearly all protein modifications are known to influence different protein aspects such as folding, stability, activity and localization, and several studies have indicated similar functions for protein alpha-N-acetylation. However, until recently, protein alpha-N-acetylation remained poorly explored, mainly due to the absence of targeted proteomics technologies. The recent emergence of N-terminomics technologies that allow isolation of protein N-terminal peptides, together with proteogenomics efforts combining experimental and informational content have greatly boosted the field of alpha-N-acetylation. In this review, we report on such emerging technologies as well as on breakthroughs in our understanding of protein N-terminal biology.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号