首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Grapevine is an important perennial fruit to the wine industry, and has implications for the health industry with some causative agents proven to reduce heart disease. Since the sequencing and assembly of grapevine cultivar Pinot Noir, several studies have contributed to its genome annotation. This new study further contributes toward genome annotation efforts by conducting a proteogenomics analysis using the latest genome annotation from CRIBI, legacy proteomics dataset from cultivar Cabernet Sauvignon and a large RNA‐seq dataset. A total of 341 novel annotation events are identified consisting of five frame‐shifts, 37 translated UTRs, 15 exon boundaries, one novel splice, nine novel exons, 159 gene boundaries, 112 reverse strands, and one novel gene event in 213 genes and 323 proteins. From this proteogenomics evidence, the Augustus gene prediction tool predicted 52 novel and revised genes (54 protein isoforms), 11 genes of which are associated with key traits such as stress tolerance and floral and fruity wine characteristics. This study also highlights a likely over‐assembly with the genome, particularly on chromosome 7.  相似文献   

2.
Venter E  Smith RD  Payne SH 《PloS one》2011,6(11):e27587
Experimental evidence is increasingly being used to reassess the quality and accuracy of genome annotation. Proteomics data used for this purpose, called proteogenomics, can alleviate many of the problematic areas of genome annotation, e.g. short protein validation and start site assignment. We performed a proteogenomic analysis of 46 genomes spanning eight bacterial and archaeal phyla across the tree of life. These diverse datasets facilitated the development of a robust approach for proteogenomics that is functional across genomes varying in %GC, gene content, proteomic sampling depth, phylogeny, and genome size. In addition to finding evidence for 682 novel proteins, 1336 new start sites, and numerous dubious genes, we discovered sites of post-translational maturation in the form of proteolytic cleavage of 1175 signal peptides. The number of novel proteins per genome is highly variable (median 7, mean 15, stdev 20). Moreover, comparison of novel genes with the current genes did not reveal any consistent abnormalities. Thus, we conclude that proteogenomics fulfills a yet to be understood deficiency in gene prediction. With the adoption of new sequencing technologies which have higher error rates than Sanger-based methods and the advances in proteomics, proteogenomics may become even more important in the future.  相似文献   

3.
4.
5.
原核生物蛋白质基因组学研究进展   总被引:1,自引:0,他引:1  
随着基因组测序技术的不断发展,大量微生物基因组序列可以在短时间内得以准确鉴定。为了进一步探究基因组的结构与功能,基于序列特征与同源特征的基因组注释算法广泛应用于新测序物种。然而受基因组测序质量以及算法本身准确性偏低等问题的影响,现有的基因组注释存在着相当比例的假基因以及注释错误,尤其是蛋白质N端的注释错误。为了弥补基因组注释的不足,以基因芯片或RNA-seq为核心的转录组测序技术和以串联质谱为核心的蛋白质组测序技术可以高通量地对基因的转录和翻译产物进行精确测定,进而实现预测基因结构的实验验证。然而,原核生物细胞中存在的大量非编码RNA给转录组测序技术引入了污染数据,限制了其对基因组注释的应用。相对而言,以串联质谱技术为核心的蛋白质组学测序可以在短时间内鉴定到生物体内大量的蛋白质,实现注释基因的验证甚至校准。已成为基因组注释和重注释的重要依据,并因而衍生了"蛋白质基因组学"的新研究方向。文中首先介绍传统的基于序列预测和同源比对的基因组注释算法,指出其中存在的不足。在此基础上,结合转录组学与蛋白质组学的技术特点,分析蛋白质组学对于原核生物基因组注释的优势,总结现阶段大规模蛋白质基因组学研究的进展情况。最后从信息学角度指出当前蛋白质组数据进行基因组重注释存在的问题与相应的解决方案,进而探讨未来蛋白质基因组学的发展方向。  相似文献   

6.
随着高通量DNA测序技术的飞速发展,越来越多的物种完成了基因组测序.定位编码基因、确定编码基因结构是基因组注释的基本任务,然而以往的基因组注释方法主要依赖于DNA及RNA序列信息.为了更加精确地解读完成测序的基因组,我们需要整合多种类型的组学数据进行基因组注释.近年来,基于串联质谱技术的蛋白质组学已经发展成熟,实现了对蛋白质组的高覆盖,使得利用串联质谱数据进行基因组注释成为可能.串联质谱数据一方面可以对已注释的基因进行表达验证,另一方面还可以校正原注释基因,进而发现新基因,实现对基因组序列的重新注释.这正是当前进展较快的蛋白质基因组学的研究内容.利用该方法系统地注释已完成测序的基因组已成为解读基因组的一个重要补充.本文综述了蛋白质基因组学的主要研究内容和研究方法,并展望了该研究方向未来的发展.  相似文献   

7.
Vitis vinifera has been an emblematic plant for humans since the Neolithic period. Human civilization has been shaped by its domestication as both its medicinal and nutritional values were exploited. It is now cultivated on all habitable continents, and more than 5000 varieties have been developed. A global passion for the art of wine fuels innovation and a profound desire for knowledge on this plant. The genome sequence of a homozygotic cultivar and several RNA‐seq datasets on other varieties have been released paving the way to gaining further insight into its biology and tailoring improvements to varieties. However, its genome annotation remains unpolished. In this issue of Proteomics, Chapman and Bellgard (Proteomics 2017, 17, 1700197) discuss how proteogenomics can help improve genome annotation. By mining shotgun proteomics data, they defined new protein‐coding genes, refined gene structures, and corrected numerous mRNA splicing events. This stimulating study shows how large international consortia could work together to improve plant and animal genome annotation on a large scale. To achieve this aim, time should be invested to generate comprehensive, high‐quality experimental datasets for a wide range of well‐defined lineages and exploit them with pipelines capable of handling giant datasets.  相似文献   

8.
Proteogenomics     
Renuse S  Chaerkady R  Pandey A 《Proteomics》2011,11(4):620-630
The ability to sequence DNA rapidly, inexpensively and in a high-throughput fashion provides a unique opportunity to sequence whole genomes of a large number of species. The cataloging of protein-coding genes from these species, however, remains a non-trivial task with the majority of initial genome annotation dependent on the use of gene prediction algorithms. Recent advances in mass spectrometry-based proteomics now enable generation of accurate and comprehensive protein sequence of tissues and organisms. Proteogenomics allows us to harness the wealth of information available at the proteome level and apply it to the available genomic information of organisms. This includes identifying novel genes and splice isoforms, assigning correct start sites and validating predicted exons and genes. It is also possible to use proteogenomics to identify protein variants that could cause diseases, to identify protein biomarkers and to study genome variation. We anticipate proteogenomics to become a powerful approach that will be routinely employed by 'Genome and Proteome Centers' of the future.  相似文献   

9.
10.

Background  

Proteogenomics aims to utilize experimental proteome information for refinement of genome annotation. Since mass spectrometry-based shotgun proteomics approaches provide large-scale peptide sequencing data with high throughput, a data repository for shotgun proteogenomics would represent a valuable source of gene expression evidence at the translational level for genome re-annotation.  相似文献   

11.
Xing XB  Li QR  Sun H  Fu X  Zhan F  Huang X  Li J  Chen CL  Shyr Y  Zeng R  Li YX  Xie L 《Genomics》2011,98(5):343-351
Identifying protein-coding genes in eukaryotic genomes remains a challenge in post-genome era due to the complex gene models. We applied a proteogenomics strategy to detect un-annotated protein-coding regions in mouse genome. High-accuracy tandem mass spectrometry (MS/MS) data from diverse mouse samples were generated by LTQ-Orbitrap mass spectrometer in house. Two searchable diagnostic proteomic datasets were constructed, one with all possible encoding exon junctions, and the other with all putative encoding exons, for the discovery of novel exon splicing events and novel uninterrupted protein-coding regions. Altogether 29,586 unique peptides were identified. Aligning backwards to the mouse genome, the translation of 4471 annotated genes was validated by the known peptides; and 172 genic events were defined in mouse genome by the novel peptides. The approach in the current work can provide substantial evidences for eukaryote genome annotation in encoding genes.  相似文献   

12.
With the onset of modern DNA sequencing technologies, genomics is experiencing a revolution in terms of quantity and quality of sequencing data. Rapidly growing numbers of sequenced genomes and metagenomes present a tremendous challenge for bioinformatics tools that predict protein-coding regions. Experimental evidence of expressed genomic regions, both at the RNA and protein level, is becoming invaluable for genome annotation and training of gene prediction algorithms. Evidence of gene expression at the protein level using mass spectrometry-based proteomics is increasingly used in refinement of raw genome sequencing data. In a typical "proteogenomics" experiment, the whole proteome of an organism is extracted, digested into peptides and measured by a mass spectrometer. The peptide fragmentation spectra are identified by searching against a six-frame translation of the raw genomic assembly, thus enabling the identification of hitherto unpredicted protein-coding genomic regions. Application of mass spectrometry to genome annotation presents a range of challenges to the standard workflows in proteomics, especially in terms of proteome coverage and database search strategies. Here we provide an overview of the field and argue that the latest mass spectrometry technologies that enable high mass accuracy at high acquisition rates will prove to be especially well suited for proteogenomics applications.  相似文献   

13.
Proteomics data can supplement genome annotation efforts, for example being used to confirm gene models or correct gene annotation errors. Here, we present a large‐scale proteogenomics study of two important apicomplexan pathogens: Toxoplasma gondii and Neospora caninum. We queried proteomics data against a panel of official and alternate gene models generated directly from RNASeq data, using several newly generated and some previously published MS datasets for this meta‐analysis. We identified a total of 201 996 and 39 953 peptide‐spectrum matches for T. gondii and N. caninum, respectively, at a 1% peptide FDR threshold. This equated to the identification of 30 494 distinct peptide sequences and 2921 proteins (matches to official gene models) for T. gondii, and 8911 peptides/1273 proteins for N. caninum following stringent protein‐level thresholding. We have also identified 289 and 140 loci for T. gondii and N. caninum, respectively, which mapped to RNA‐Seq‐derived gene models used in our analysis and apparently absent from the official annotation (release 10 from EuPathDB) of these species. We present several examples in our study where the RNA‐Seq evidence can help in correction of the current gene model and can help in discovery of potential new genes. The findings of this study have been integrated into the EuPathDB. The data have been deposited to the ProteomeXchange with identifiers PXD000297and PXD000298.  相似文献   

14.
Over the last 5 years proteogenomics (using mass spectroscopy to identify proteins predicted from genomic sequences) has emerged as a promising approach to the high‐throughput identification of protein N‐termini, which remains a problem in genome annotation. Comparison of the experimentally determined N‐termini with those predicted by sequence analysis tools allows identification of the signal peptides and therefore conclusions on the cytoplasmic or extracytoplasmic (periplasmic or extracellular) localization of the respective proteins. We present here the results of a proteogenomic study of the signal peptides in Escherichia coli K‐12 and compare its results with the available experimental data and predictions by such software tools as SignalP and Phobius. A single proteogenomics experiment recovered more than a third of all signal peptides that had been experimentally determined during the past three decades and confirmed at least 31 additional signal peptides, mostly in the known exported proteins, which had been previously predicted but not validated. The filtering of putative signal peptides for the peptide length and the presence of an eight‐residue hydrophobic patch and a typical signal peptidase cleavage site proved sufficient to eliminate the false‐positive hits. Surprisingly, the results of this proteogenomics study, as well as a re‐analysis of the E. coli genome with the latest version of SignalP program, show that the fraction of proteins containing signal peptides is only about 10%, or half of previous estimates.  相似文献   

15.
【目的】海单胞菌Marinomonas sp. FW-1是1株经验证可以获得高活性芳基硫酸酯酶的菌株。为深入研究FW-1菌株产芳基硫酸酯酶机制,进一步筛选高活性的芳基硫酸酯酶基因片段,有必要解析FW-1菌株的全基因组序列信息。【方法】本研究采用高通量测序技术对FW-1进行全基因组测序,使用相关软件对测序数据进行基因组装、基因预测与功能注释、COG聚类分析等。结合异源表达的方法对其不同基因片段所产生的芳基硫酸酯酶活性进行分析。【结果】全基因组测序结果表明该基因组大小为3964876 bp,GC含量为44.03%,编码3590个蛋白基因,含有78个tRNA和25个rRNA操纵子。从全基因组测序结果中找到22个可能具有芳基硫酸酯酶活性的基因,对其中4个进一步异源表达后发现FW-1中至少含有的3个具有芳基硫酸酯酶活性的基因,其均含有芳基硫酸酯酶的特异性氨基酸基团C-X-P-X-R基团。【结论】本研究首次报道了1株含有多个芳基硫酸酯酶基因序列的菌株FW-1的全基因组序列,分析了基因组的基本特征,为芳基硫酸酯酶的进一步应用提供了思路。  相似文献   

16.
17.
Proteogenomics has emerged as a field at the junction of genomics and proteomics. It is a loose collection of technologies that allow the search of tandem mass spectra against genomic databases to identify and characterize protein-coding genes. Proteogenomic peptides provide invaluable information for gene annotation, which is difficult or impossible to ascertain using standard annotation methods. Examples include confirmation of translation, reading-frame determination, identification of gene and exon boundaries, evidence for post-translational processing, identification of splice-forms including alternative splicing, and also, prediction of completely novel genes. For proteogenomics to deliver on its promise, however, it must overcome a number of technological hurdles, including speed and accuracy of peptide identification, construction and search of specialized databases, correction of sampling bias, and others. This article reviews the state of the art of the field, focusing on the current successes, and the role of computation in overcoming these challenges. We describe how technological and algorithmic advances have already enabled large-scale proteogenomic studies in many model organisms, including arabidopsis, yeast, fly, and human. We also provide a preview of the field going forward, describing early efforts in tackling the problems of complex gene structures, searching against genomes of related species, and immunoglobulin gene reconstruction.  相似文献   

18.
Proteogenomics, the integrative analysis of the proteome and the genome, increasingly provides protein‐level insights about the regulation of gene expression and protein translation. Armengaud et al. (Proteomics 2017, 17, 1700211) nicely illustrate this trend with the first in‐depth proteomic analysis of the eukaryotic and unicellular intestinal parasite Blastocystis sp. Not only this work constitutes an important milestone toward the proteogenomics profile of this human pathogen, but also it demonstrates at the protein level the occurrence of a specific mechanism of mRNA decoding. GU‐rich motifs located downstream of mRNA polyadenylation sites create termination codons that ultimately result in the synthesis of proteins with lower molecular weight than predicted from gene sequence. Thus, the scope of proteogenomics now extends to the regulation of mRNA translation into proteins, providing a proof of concept for future studies in multicellular eukaryotes such as humans and plants.  相似文献   

19.
20.
The aims of this study are to provide protein‐based evidence upon which to reannotate the genome of Coccidiodes posadasii, one of two closely related species of Coccidioides, a dimorphic fungal pathogen that causes coccidioidomycosis, also called Valley fever. Proteins present in lysates and filtrates of in vitro grown mycelia and parasitic phase spherules from C. posadasii strain Silveira are analyzed using a GeLC‐MS/MS method. Acquired spectra are processed with a proteogenomics workflow comprising a Silveira proteome database, a six‐frame translation of the Silveira genome and an ab initio gene prediction tool prior to validation against published ESTs. This study provides evidence for 837 genes expressed at the protein level, of which 169 proteins (20.2%) are putative proteins and 103 (12.3%) are not annotated in the Silveira genome. Additionally, 275 novel peptides are derived from intragenic regions of the genome and 13 from intergenic regions, resulting in 172 gene refinements. Additionally, we are the first group to report translationally active retrotransposon elements in a Coccidioides spp. Our study reveals that the currently annotated genome of C. posadasii str. Silveira needs refinement, which is likely to be the case for many nonmodel organisms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号