首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 140 毫秒
1.
随着高通量DNA测序技术的飞速发展,越来越多的物种完成了基因组测序.定位编码基因、确定编码基因结构是基因组注释的基本任务,然而以往的基因组注释方法主要依赖于DNA及RNA序列信息.为了更加精确地解读完成测序的基因组,我们需要整合多种类型的组学数据进行基因组注释.近年来,基于串联质谱技术的蛋白质组学已经发展成熟,实现了对蛋白质组的高覆盖,使得利用串联质谱数据进行基因组注释成为可能.串联质谱数据一方面可以对已注释的基因进行表达验证,另一方面还可以校正原注释基因,进而发现新基因,实现对基因组序列的重新注释.这正是当前进展较快的蛋白质基因组学的研究内容.利用该方法系统地注释已完成测序的基因组已成为解读基因组的一个重要补充.本文综述了蛋白质基因组学的主要研究内容和研究方法,并展望了该研究方向未来的发展.  相似文献   

2.
常规的应用计算机软件注释基因存在缺陷,目前对基因组的准确注释依然是一项富有挑战性的任务本研究旨在应用蛋白质基因组学(proteogenomics)方法完善福氏志贺菌的基因组注释。提取福氏2a志贺菌301株(Sf2a301)的全菌蛋白,胰蛋白酶水解的肤混合物经二维液相色谱分离、在线ESI串联质谱分析,质谱数据检索Sf2a301的6个读码框数据库,鉴定结果进一步经过生物信息学分析和实验验证。本研究共验证了729个Sf2a301已注释基因的蛋白编码产物,鉴定蛋白在分子量、等电点和疏水性等理化性质方面的分布与Sf2a301基因组已注释蛋白的趋势一致。共发现了6个未注释的新基因,新基因得到了RT-PCR在转录水平上的进一步验证。蛋白质基因组学能够有效的完善志贺菌的基因组注释,不仅验证了已注释基因,而且能够发现新的基因补充其原有基因组注释库,这种策略有望被推广到其他经过测序的生物体基因组注释工作中。  相似文献   

3.
《遗传》2020,(7)
随着测序技术的不断发展,产生了海量的基因组测序数据,极大地丰富了公共遗传数据资源。同时为了应对大量基因组数据的产生,基因组比较和注释算法、工具不断更新,使得联合多种注释工具得到更准确的蛋白编码基因的注释信息成为可能。目前公共数据库的原核生物基因组测序和装配有些是10多年前的,存在大量预测的功能未知的编码基因。为了提升美国国家生物信息中心(National Center for Biotechnology Information,NCBI)数据库中基因组的注释质量,本研究联合使用多种原核基因识别算法/软件和基因表达数据重注释1587个细菌和古细菌基因组。首先,利用Z曲线的33个变量从177个基因组原注释中识别获得3092个被过度注释为蛋白编码基因的序列;其次,通过同源比对为939个基因组中的4447个功能未知的蛋白编码基因注释上具体功能;最后,通过联合采用ZCURVE 3.0和Glimmer 3.02以及Prodigal这3种高精度的、广泛使用且基于算法不同而互补的基因识别软件来寻找漏注释基因。最终,从9个基因组中找到了2003个被漏注释的蛋白编码基因,这些基因属于多个蛋白质直系同源簇(clusters of orthologous groups of proteins, COG)。本研究使用新的工具并结合多组学数据重新注释早期测序的细菌和古细菌基因组,不仅为新测序菌株提供注释方法参考,而且这些重注释后得到的细菌基因序列也会对后续基础研究有所帮助。  相似文献   

4.
【目的】优化柞蚕Antheraea pernyi基因组注释,更好地扩展其在比较基因组学及品种改良研究中的应用。【方法】对柞蚕进行全长转录组测序分析;经全长转录本与参考基因组比对,鉴定新基因及新转录本,并对这些新基因和新转录本进行功能注释及长链非编码RNAs (lncRNAs)预测。利用大量的蛋白质编码转录本和lncRNAs对柞蚕基因组中基因结构进行修订。最后创建矫正后的柞蚕基因组基因注释。【结果】新发现1 997个蛋白编码基因和3 399个lncRNA基因,分别由2 402个和3 574个全长转录本数据支持。发现柞蚕基因组含25 021个基因,其中19 825个基因是蛋白编码基因,包括7个保幼激素酸甲基转移酶基因。【结论】本研究促进了对柞蚕基因组基因注释信息的认识,为柞蚕及相关物种功能基因组及比较基因组学研究提供了很有用的数据资源。  相似文献   

5.
串联质谱图谱从头测序算法研究进展   总被引:1,自引:0,他引:1  
近年来,基于质谱技术的高通量蛋白质组学研究发展迅速,利用串联质谱图谱鉴定蛋白质是其数据处理中一个基础而又重要的环节.由于不需要利用蛋白质序列数据库,从头测序方法能够分析新物种或者基因组未测序物种的串联质谱数据,具有数据库搜索方法不可替代的优势.简要介绍高通量串联质谱图谱从头测序问题及其研究现状.归纳出几种典型的计算策略并分析了各种策略的优缺点.总结常用的从头测序算法和软件,介绍算法评估的各种指标和常用评估数据集,概括各种算法的特点,展望未来研究可能的发展方向.  相似文献   

6.
2014蛋白质组学专刊序言   总被引:2,自引:0,他引:2       下载免费PDF全文
蛋白质组学研究是后基因组学时代最重要的功能基因组学研究之一,与医学生物学、化学、物理学、信息学以及现代技术等关系十分密切。为了检阅近年来国内外蛋白质组学某些重要研究进展,探索其可能的应用范围,讨论其存在的问题,展望其发展前景,特组织出版"蛋白质组学专刊"。本期专刊包括综述和研究论文两部分,内容主要涉及不同物种(包括人类、哺乳类动物、原核生物、放线菌等)蛋白质组学研究、蛋白质组学重要方法学与技术研究(包括串联质谱分析、尿蛋白膜保存法、定量蛋白质组学分折、meta分析等)和蛋白质组功能与应用研究(包括蜘蛛毒素蛋白质组、磷酸化蛋白质组、卵母细胞和早期胚胎蛋白质组、肝脏纤维化蛋白质组、分枝杆菌耐药的蛋白质组等)。  相似文献   

7.
【目的】鉴定洛斯里被毛孢OWVT-1菌株的线粒体基因组,验证公布的USA-87-5菌株线粒体基因组中的错误,对洛斯里被毛孢正确的线粒体基因组序列进行注释并开展不同被毛孢物种间的比较线粒体基因组学分析。【方法】借助DNA高通量测序数据并通过必要的Sanger测序组装OWVT-1的线粒体基因组。通过PCR验证OWVT-1与公布的USA-87-5线粒体基因组序列差异的真实性。利用多种生物信息方法分析和注释洛斯里被毛孢的线粒体基因组。【结果】公布的洛斯里被毛孢USA-87-5菌株的线粒体基因组存在几处序列错误,包括3处长片段的插入缺失和多处短片段的插入缺失。实际上,洛斯里被毛孢USA-87-5与OWVT-1菌株的线粒体基因组序列完全相同。该菌的线粒体基因组全长62949 bp,在7个基因中共插入13个内含子,部分内含子和基因间区显现出序列退化的特征。洛斯里被毛孢、明尼苏达被毛孢、线虫被毛孢的线粒体基因组具有较强的共线性关系。除一些独立的ORF外,核心蛋白编码基因、rRNA基因和tRNA基因的排列顺序非常保守。基因间区的长短是影响3种被毛孢线粒体基因组大小最主要的因素。【结论】公布的洛斯里被毛孢USA-87-5菌株线粒体基因组中存在序列错误。本文新报道了OWVT-1菌株的线粒体基因组,并进行注释和比较线粒体基因组学分析。  相似文献   

8.
微生物基因组的生物信息学研究平台的建立   总被引:1,自引:0,他引:1  
随着人类基因组计划及其它测序工作顺利进行,人们已经得到了大量的基因序列。如何阐明这些序列的功能和意义,是功能基因组学的主要任务,生物信息学和比较基因组学为加速这一进程提供了有利的工具,该研究建立了对已经完成全基因组测序和部分测序的25种细菌的基因组的生物信息学研究平台,提供了WEB形式的服务(http://202.116.74.108)。25种细菌的全基因组蛋白质序列可以在NCBI的ftp://ftp.ncbi.nlm.nih.gov/genbank/genomes/bacteria下载,该系统可以按照基因序列号,功能和种属名查询基因序列。根据美国国家信息中心(NCBI)的功能代码表对每个基因进行了自动和手工分类,并可查询分类情况,在此基因上建立了几种亲缘关系相近的种属的同源基因相互注释功能的应用。  相似文献   

9.
植物蛋白质组学研究进展Ⅰ. 蛋白质组关键技术   总被引:10,自引:0,他引:10  
阮松林  马华升  王世恒  忻雅  钱丽华  童建新  赵杭苹  王杰 《遗传》2006,28(11):1472-1486
随着模式植物拟南芥和水稻基因组测序相继完成, 使植物基因组学研究成功迈入到功能基因组学研究的时代。这为蛋白质组学产生及其发展奠定了坚实的基础。文章重点介绍了蛋白质组学的概念、产生背景和蛋白质组学的关键技术。蛋白质组学的关键技术包括双向电泳、高效液相色谱、蛋白芯片、质谱技术、蛋白质组学的相关数据库、定量蛋白组技术、蛋白复合体标签亲和纯化技术和酵母双杂交系统。同时对当前蛋白质组技术面临的挑战和发展前景进行了讨论。  相似文献   

10.
生物质谱与蛋白质组学   总被引:4,自引:0,他引:4  
蛋白质组学是后基因组学时代最受关注的研究领域之一,其核心的鉴定技术——生物质谱近年来在仪器设计以及鉴定通量、分辨率和灵敏度等各方面均有质的飞跃,促进了蛋白质表达谱作图、定量蛋白质组分析、亚细胞器蛋白质组作图、蛋白质翻译后修饰以及蛋白质相互作用等蛋白质组研究各个领域的飞速发展。本综述了生物质谱技术的最新进展,及其在蛋白质组学研究中的应用。  相似文献   

11.
With the onset of modern DNA sequencing technologies, genomics is experiencing a revolution in terms of quantity and quality of sequencing data. Rapidly growing numbers of sequenced genomes and metagenomes present a tremendous challenge for bioinformatics tools that predict protein-coding regions. Experimental evidence of expressed genomic regions, both at the RNA and protein level, is becoming invaluable for genome annotation and training of gene prediction algorithms. Evidence of gene expression at the protein level using mass spectrometry-based proteomics is increasingly used in refinement of raw genome sequencing data. In a typical "proteogenomics" experiment, the whole proteome of an organism is extracted, digested into peptides and measured by a mass spectrometer. The peptide fragmentation spectra are identified by searching against a six-frame translation of the raw genomic assembly, thus enabling the identification of hitherto unpredicted protein-coding genomic regions. Application of mass spectrometry to genome annotation presents a range of challenges to the standard workflows in proteomics, especially in terms of proteome coverage and database search strategies. Here we provide an overview of the field and argue that the latest mass spectrometry technologies that enable high mass accuracy at high acquisition rates will prove to be especially well suited for proteogenomics applications.  相似文献   

12.
Venter E  Smith RD  Payne SH 《PloS one》2011,6(11):e27587
Experimental evidence is increasingly being used to reassess the quality and accuracy of genome annotation. Proteomics data used for this purpose, called proteogenomics, can alleviate many of the problematic areas of genome annotation, e.g. short protein validation and start site assignment. We performed a proteogenomic analysis of 46 genomes spanning eight bacterial and archaeal phyla across the tree of life. These diverse datasets facilitated the development of a robust approach for proteogenomics that is functional across genomes varying in %GC, gene content, proteomic sampling depth, phylogeny, and genome size. In addition to finding evidence for 682 novel proteins, 1336 new start sites, and numerous dubious genes, we discovered sites of post-translational maturation in the form of proteolytic cleavage of 1175 signal peptides. The number of novel proteins per genome is highly variable (median 7, mean 15, stdev 20). Moreover, comparison of novel genes with the current genes did not reveal any consistent abnormalities. Thus, we conclude that proteogenomics fulfills a yet to be understood deficiency in gene prediction. With the adoption of new sequencing technologies which have higher error rates than Sanger-based methods and the advances in proteomics, proteogenomics may become even more important in the future.  相似文献   

13.
14.
The genome sequencing of H37Rv strain of Mycobacterium tuberculosis was completed in 1998 followed by the whole genome sequencing of a clinical isolate, CDC1551 in 2002. Since then, the genomic sequences of a number of other strains have become available making it one of the better studied pathogenic bacterial species at the genomic level. However, annotation of its genome remains challenging because of high GC content and dissimilarity to other model prokaryotes. To this end, we carried out an in-depth proteogenomic analysis of the M. tuberculosis H37Rv strain using Fourier transform mass spectrometry with high resolution at both MS and tandem MS levels. In all, we identified 3176 proteins from Mycobacterium tuberculosis representing ~80% of its total predicted gene count. In addition to protein database search, we carried out a genome database search, which led to identification of ~250 novel peptides. Based on these novel genome search-specific peptides, we discovered 41 novel protein coding genes in the H37Rv genome. Using peptide evidence and alternative gene prediction tools, we also corrected 79 gene models. Finally, mass spectrometric data from N terminus-derived peptides confirmed 727 existing annotations for translational start sites while correcting those for 33 proteins. We report creation of a high confidence set of protein coding regions in Mycobacterium tuberculosis genome obtained by high resolution tandem mass-spectrometry at both precursor and fragment detection steps for the first time. This proteogenomic approach should be generally applicable to other organisms whose genomes have already been sequenced for obtaining a more accurate catalogue of protein-coding genes.  相似文献   

15.
16.
17.
BG7 is a new system for de novo bacterial, archaeal and viral genome annotation based on a new approach specifically designed for annotating genomes sequenced with next generation sequencing technologies. The system is versatile and able to annotate genes even in the step of preliminary assembly of the genome. It is especially efficient detecting unexpected genes horizontally acquired from bacterial or archaeal distant genomes, phages, plasmids, and mobile elements. From the initial phases of the gene annotation process, BG7 exploits the massive availability of annotated protein sequences in databases. BG7 predicts ORFs and infers their function based on protein similarity with a wide set of reference proteins, integrating ORF prediction and functional annotation phases in just one step. BG7 is especially tolerant to sequencing errors in start and stop codons, to frameshifts, and to assembly or scaffolding errors. The system is also tolerant to the high level of gene fragmentation which is frequently found in not fully assembled genomes. BG7 current version – which is developed in Java, takes advantage of Amazon Web Services (AWS) cloud computing features, but it can also be run locally in any operating system. BG7 is a fast, automated and scalable system that can cope with the challenge of analyzing the huge amount of genomes that are being sequenced with NGS technologies. Its capabilities and efficiency were demonstrated in the 2011 EHEC Germany outbreak in which BG7 was used to get the first annotations right the next day after the first entero-hemorrhagic E. coli genome sequences were made publicly available. The suitability of BG7 for genome annotation has been proved for Illumina, 454, Ion Torrent, and PacBio sequencing technologies. Besides, thanks to its plasticity, our system could be very easily adapted to work with new technologies in the future.  相似文献   

18.
The use and development of post-genomic tools naturally depends on large-scale genome sequencing projects. The usefulness of post-genomic applications is dependent on the accuracy of genome annotations, for which the correct identification of intron-exon borders in complex genomes of eukaryotic organisms is often an error-prone task. Although automated algorithms for predicting intron-exon structures are available, supporting exon evidence is necessary to achieve comprehensive genome annotation. Besides cDNA and EST support, peptides identified via MS/MS can be used as extrinsic evidence in a proteogenomic approach. We describe an improved version of the Genomic Peptide Finder (GPF), which aligns de novo predicted amino acid sequences to the genomic DNA sequence of an organism while correcting for peptide sequencing errors and accounting for the possibility of splicing. We have coupled GPF and the gene finding program AUGUSTUS in a way that provides automatic structural annotations of the Chlamydomonas reinhardtii genome, using highly unbiased GPF evidence. A comparison of the AUGUSTUS gene set incorporating GPF evidence to the standard JGI FM4 (Filtered Models 4) gene set reveals 932 GPF peptides that are not contained in the Filtered Models 4 gene set. Furthermore, the GPF evidence improved the AUGUSTUS gene models by altering 65 gene models and adding three previously unidentified genes.  相似文献   

19.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号