首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 187 毫秒
1.
原核生物蛋白质基因组学研究进展   总被引:1,自引:0,他引:1       下载免费PDF全文
随着基因组测序技术的不断发展,大量微生物基因组序列可以在短时间内得以准确鉴定。为了进一步探究基因组的结构与功能,基于序列特征与同源特征的基因组注释算法广泛应用于新测序物种。然而受基因组测序质量以及算法本身准确性偏低等问题的影响,现有的基因组注释存在着相当比例的假基因以及注释错误,尤其是蛋白质N端的注释错误。为了弥补基因组注释的不足,以基因芯片或RNA-seq为核心的转录组测序技术和以串联质谱为核心的蛋白质组测序技术可以高通量地对基因的转录和翻译产物进行精确测定,进而实现预测基因结构的实验验证。然而,原核生物细胞中存在的大量非编码RNA给转录组测序技术引入了污染数据,限制了其对基因组注释的应用。相对而言,以串联质谱技术为核心的蛋白质组学测序可以在短时间内鉴定到生物体内大量的蛋白质,实现注释基因的验证甚至校准。已成为基因组注释和重注释的重要依据,并因而衍生了"蛋白质基因组学"的新研究方向。文中首先介绍传统的基于序列预测和同源比对的基因组注释算法,指出其中存在的不足。在此基础上,结合转录组学与蛋白质组学的技术特点,分析蛋白质组学对于原核生物基因组注释的优势,总结现阶段大规模蛋白质基因组学研究的进展情况。最后从信息学角度指出当前蛋白质组数据进行基因组重注释存在的问题与相应的解决方案,进而探讨未来蛋白质基因组学的发展方向。  相似文献   

2.
《遗传》2010,(5)
薛庆中等编著,2009年第2版,科学出版社出版在众多生物基因组测序项目完成之际,我们面临的最大挑战是如何对DNA和蛋白质数据进行科学的分析和注释。本书分三个层次解读基因数据库和网络工具:基因组学层面重点介绍序列比对工具BLAST和ClustalX的使用、真核生  相似文献   

3.
DNA和蛋白质序列数据分析工具(第三版)作者:薛庆中等在众多生物基因组测序项目完成之际,我们面临的最大挑战是如何对DNA和蛋白质数掘进行科学的分析和注释。《DNA和蛋白质序列数据分析工具(第三版)》分三个层次解读基因数据库和网络工具:基因组学层面重点介绍序列比对工  相似文献   

4.
常规的应用计算机软件注释基因存在缺陷,目前对基因组的准确注释依然是一项富有挑战性的任务本研究旨在应用蛋白质基因组学(proteogenomics)方法完善福氏志贺菌的基因组注释。提取福氏2a志贺菌301株(Sf2a301)的全菌蛋白,胰蛋白酶水解的肤混合物经二维液相色谱分离、在线ESI串联质谱分析,质谱数据检索Sf2a301的6个读码框数据库,鉴定结果进一步经过生物信息学分析和实验验证。本研究共验证了729个Sf2a301已注释基因的蛋白编码产物,鉴定蛋白在分子量、等电点和疏水性等理化性质方面的分布与Sf2a301基因组已注释蛋白的趋势一致。共发现了6个未注释的新基因,新基因得到了RT-PCR在转录水平上的进一步验证。蛋白质基因组学能够有效的完善志贺菌的基因组注释,不仅验证了已注释基因,而且能够发现新的基因补充其原有基因组注释库,这种策略有望被推广到其他经过测序的生物体基因组注释工作中。  相似文献   

5.
基因组序列为昆虫分子生物学研究提供丰富的数据资源,推动系统生物学在古老的昆虫学中蓬勃发展。昆虫基因组学研究已经成为当前的研究热点,目前在NCBI登录注册的昆虫基因组测序计划有494项,其中已提交原始测序数据的昆虫有225种,完成基因组拼接的有215种,具有基因注释的有65种,公开发表的昆虫基因组有43篇。本文综述了测序技术发展的历史及其对昆虫基因组研究的推动作用、昆虫基因组的组装和注释及其存在的问题、昆虫基因组测序进展、昆虫基因组数据库的发展及基因数据挖掘利用的基本思路和对策,以及昆虫基因大数据在害虫防治和资源昆虫利用中的应用前景。  相似文献   

6.
【目的】优化柞蚕Antheraea pernyi基因组注释,更好地扩展其在比较基因组学及品种改良研究中的应用。【方法】对柞蚕进行全长转录组测序分析;经全长转录本与参考基因组比对,鉴定新基因及新转录本,并对这些新基因和新转录本进行功能注释及长链非编码RNAs (lncRNAs)预测。利用大量的蛋白质编码转录本和lncRNAs对柞蚕基因组中基因结构进行修订。最后创建矫正后的柞蚕基因组基因注释。【结果】新发现1 997个蛋白编码基因和3 399个lncRNA基因,分别由2 402个和3 574个全长转录本数据支持。发现柞蚕基因组含25 021个基因,其中19 825个基因是蛋白编码基因,包括7个保幼激素酸甲基转移酶基因。【结论】本研究促进了对柞蚕基因组基因注释信息的认识,为柞蚕及相关物种功能基因组及比较基因组学研究提供了很有用的数据资源。  相似文献   

7.
水稻基因组测序及基因功能的鉴定   总被引:6,自引:0,他引:6  
刘庆坡  薛庆中 《遗传学报》2006,33(8):669-677
水稻是重要的粮食作物。作为单子叶模式植物,水稻基因组的大规模测序具有巨大的理论价值和现实意义。目前已获得了籼稻“93—11”和粳稻“日本晴”高质量的基因组数据,这为在基因组水平上深入研究其生长、发育、抗病和高产等的遗传机理提供了便利,从而为进一步解决世界粮食危机提供了新的突破口和契机。随着水稻基因组计划的顺利结束,其研究重心也已由建立高分辨率的遗传、物理和转录图谱为主的结构基因组学转向基因功能的研究。结构基因组学研究获得的大量序列数据为揭示和开发功能基因开辟了广阔的前景。目前,利用图位克隆和电子克隆等方法已成功分离了多个水稻抗病、抗虫、抗逆境、抗倒伏、高产、优质等重要农艺性状相关的基因,对培育水稻新品种,促进农业的可持续发展意义重大。据估计,水稻至少拥有3.7万个非转座因子相关的蛋白编码基因。因此,完成全基因组序列测定后,重要基因功能的鉴定已成为当前基因组学研究的主要目标。反向遗传学、大规模基因功能表达谱分析和蛋白质组研究等策略已在研究水稻重要基因的功能方面发挥了重要作用。文章综述了水稻基因组测序及基因功能研究的现状,并就新基因发掘和基因功能注释的方法作了评述,期待为水稻遗传工程和育种实践提供参考。  相似文献   

8.
《遗传》2020,(7)
随着测序技术的不断发展,产生了海量的基因组测序数据,极大地丰富了公共遗传数据资源。同时为了应对大量基因组数据的产生,基因组比较和注释算法、工具不断更新,使得联合多种注释工具得到更准确的蛋白编码基因的注释信息成为可能。目前公共数据库的原核生物基因组测序和装配有些是10多年前的,存在大量预测的功能未知的编码基因。为了提升美国国家生物信息中心(National Center for Biotechnology Information,NCBI)数据库中基因组的注释质量,本研究联合使用多种原核基因识别算法/软件和基因表达数据重注释1587个细菌和古细菌基因组。首先,利用Z曲线的33个变量从177个基因组原注释中识别获得3092个被过度注释为蛋白编码基因的序列;其次,通过同源比对为939个基因组中的4447个功能未知的蛋白编码基因注释上具体功能;最后,通过联合采用ZCURVE 3.0和Glimmer 3.02以及Prodigal这3种高精度的、广泛使用且基于算法不同而互补的基因识别软件来寻找漏注释基因。最终,从9个基因组中找到了2003个被漏注释的蛋白编码基因,这些基因属于多个蛋白质直系同源簇(clusters of orthologous groups of proteins, COG)。本研究使用新的工具并结合多组学数据重新注释早期测序的细菌和古细菌基因组,不仅为新测序菌株提供注释方法参考,而且这些重注释后得到的细菌基因序列也会对后续基础研究有所帮助。  相似文献   

9.
串联质谱图谱从头测序算法研究进展   总被引:1,自引:0,他引:1  
近年来,基于质谱技术的高通量蛋白质组学研究发展迅速,利用串联质谱图谱鉴定蛋白质是其数据处理中一个基础而又重要的环节.由于不需要利用蛋白质序列数据库,从头测序方法能够分析新物种或者基因组未测序物种的串联质谱数据,具有数据库搜索方法不可替代的优势.简要介绍高通量串联质谱图谱从头测序问题及其研究现状.归纳出几种典型的计算策略并分析了各种策略的优缺点.总结常用的从头测序算法和软件,介绍算法评估的各种指标和常用评估数据集,概括各种算法的特点,展望未来研究可能的发展方向.  相似文献   

10.
微生物功能基因组学研究   总被引:5,自引:0,他引:5  
自从1995年流感嗜血杆菌的基因组序列测定完成之后[1],目前已有75种(株)微生物的基因组完成测序,160多种(株)微生物的基因组测序正在进行中[2]。随着各种微生物基因组测序工作的不断完成和序列信息的积累,微生物基因组学研究的重点已由结构基因组学向功能基因组学转移。微生物功能基因组学研究不仅要阐明微生物基因组内每个基因的作用或功能,还要研究基因的调节及表达谱,进而从整个基因组及其全套蛋白质产物的结构、功能、机理的高度去了解微生物生命活动的全貌,揭示微生物世界的各种前所未知的规律,并使之为人类和社会服务。与真核生物相比,虽然微生物的基因组相对简单,但微生物基因组学研究仍具有重大的科学和经济意义。在细菌基因组中,既有编码在极端环境下起催化作用的酶的基因,也有编码分解化学污染物的酶的基因,这些基因在真核细胞是不存在的。通过微生物功能基因组学研究,还能发现药物靶位和疫苗抗原。微生物基因的功能及表达研究结果也能为研究复杂生物的基因功能提供参考。近些年微生物功能基因组学研究受到了普遍重视。日本组织了十几所大学和研究机构,计划用5年时间完成大肠杆菌的功能基因组研究[3]。日本还与欧洲联合正在开展枯草杆菌功能基因组学研究[4]。其它微生物的功能基因组学研究也在进行中。由于微生物的种类繁多,功能基因组研究的内容又较丰富,要全面介绍微生物功能基因组学研究是困难的。本文仅从未知功能基因的鉴定、药物靶位及疫苗抗原研究、致病机制研究、生物功能图谱研究4个方面进行简要的评述。  相似文献   

11.
Xing XB  Li QR  Sun H  Fu X  Zhan F  Huang X  Li J  Chen CL  Shyr Y  Zeng R  Li YX  Xie L 《Genomics》2011,98(5):343-351
Identifying protein-coding genes in eukaryotic genomes remains a challenge in post-genome era due to the complex gene models. We applied a proteogenomics strategy to detect un-annotated protein-coding regions in mouse genome. High-accuracy tandem mass spectrometry (MS/MS) data from diverse mouse samples were generated by LTQ-Orbitrap mass spectrometer in house. Two searchable diagnostic proteomic datasets were constructed, one with all possible encoding exon junctions, and the other with all putative encoding exons, for the discovery of novel exon splicing events and novel uninterrupted protein-coding regions. Altogether 29,586 unique peptides were identified. Aligning backwards to the mouse genome, the translation of 4471 annotated genes was validated by the known peptides; and 172 genic events were defined in mouse genome by the novel peptides. The approach in the current work can provide substantial evidences for eukaryote genome annotation in encoding genes.  相似文献   

12.
With the onset of modern DNA sequencing technologies, genomics is experiencing a revolution in terms of quantity and quality of sequencing data. Rapidly growing numbers of sequenced genomes and metagenomes present a tremendous challenge for bioinformatics tools that predict protein-coding regions. Experimental evidence of expressed genomic regions, both at the RNA and protein level, is becoming invaluable for genome annotation and training of gene prediction algorithms. Evidence of gene expression at the protein level using mass spectrometry-based proteomics is increasingly used in refinement of raw genome sequencing data. In a typical "proteogenomics" experiment, the whole proteome of an organism is extracted, digested into peptides and measured by a mass spectrometer. The peptide fragmentation spectra are identified by searching against a six-frame translation of the raw genomic assembly, thus enabling the identification of hitherto unpredicted protein-coding genomic regions. Application of mass spectrometry to genome annotation presents a range of challenges to the standard workflows in proteomics, especially in terms of proteome coverage and database search strategies. Here we provide an overview of the field and argue that the latest mass spectrometry technologies that enable high mass accuracy at high acquisition rates will prove to be especially well suited for proteogenomics applications.  相似文献   

13.
Venter E  Smith RD  Payne SH 《PloS one》2011,6(11):e27587
Experimental evidence is increasingly being used to reassess the quality and accuracy of genome annotation. Proteomics data used for this purpose, called proteogenomics, can alleviate many of the problematic areas of genome annotation, e.g. short protein validation and start site assignment. We performed a proteogenomic analysis of 46 genomes spanning eight bacterial and archaeal phyla across the tree of life. These diverse datasets facilitated the development of a robust approach for proteogenomics that is functional across genomes varying in %GC, gene content, proteomic sampling depth, phylogeny, and genome size. In addition to finding evidence for 682 novel proteins, 1336 new start sites, and numerous dubious genes, we discovered sites of post-translational maturation in the form of proteolytic cleavage of 1175 signal peptides. The number of novel proteins per genome is highly variable (median 7, mean 15, stdev 20). Moreover, comparison of novel genes with the current genes did not reveal any consistent abnormalities. Thus, we conclude that proteogenomics fulfills a yet to be understood deficiency in gene prediction. With the adoption of new sequencing technologies which have higher error rates than Sanger-based methods and the advances in proteomics, proteogenomics may become even more important in the future.  相似文献   

14.
Large-scale prokaryotic gene prediction and comparison to genome annotation   总被引:4,自引:0,他引:4  
MOTIVATION: Prokaryotic genomes are sequenced and annotated at an increasing rate. The methods of annotation vary between sequencing groups. It makes genome comparison difficult and may lead to propagation of errors when questionable assignments are adapted from one genome to another. Genome comparison either on a large or small scale would be facilitated by using a single standard for annotation, which incorporates a transparency of why an open reading frame (ORF) is considered to be a gene. RESULTS: A total of 143 prokaryotic genomes were scored with an updated version of the prokaryotic genefinder EasyGene. Comparison of the GenBank and RefSeq annotations with the EasyGene predictions reveals that in some genomes up to approximately 60% of the genes may have been annotated with a wrong start codon, especially in the GC-rich genomes. The fractional difference between annotated and predicted confirms that too many short genes are annotated in numerous organisms. Furthermore, genes might be missing in the annotation of some of the genomes. We predict 41 of 143 genomes to be over-annotated by >5%, meaning that too many ORFs are annotated as genes. We also predict that 12 of 143 genomes are under-annotated. These results are based on the difference between the number of annotated genes not found by EasyGene and the number of predicted genes that are not annotated in GenBank. We argue that the average performance of our standardized and fully automated method is slightly better than the annotation.  相似文献   

15.
16.
17.
BG7 is a new system for de novo bacterial, archaeal and viral genome annotation based on a new approach specifically designed for annotating genomes sequenced with next generation sequencing technologies. The system is versatile and able to annotate genes even in the step of preliminary assembly of the genome. It is especially efficient detecting unexpected genes horizontally acquired from bacterial or archaeal distant genomes, phages, plasmids, and mobile elements. From the initial phases of the gene annotation process, BG7 exploits the massive availability of annotated protein sequences in databases. BG7 predicts ORFs and infers their function based on protein similarity with a wide set of reference proteins, integrating ORF prediction and functional annotation phases in just one step. BG7 is especially tolerant to sequencing errors in start and stop codons, to frameshifts, and to assembly or scaffolding errors. The system is also tolerant to the high level of gene fragmentation which is frequently found in not fully assembled genomes. BG7 current version – which is developed in Java, takes advantage of Amazon Web Services (AWS) cloud computing features, but it can also be run locally in any operating system. BG7 is a fast, automated and scalable system that can cope with the challenge of analyzing the huge amount of genomes that are being sequenced with NGS technologies. Its capabilities and efficiency were demonstrated in the 2011 EHEC Germany outbreak in which BG7 was used to get the first annotations right the next day after the first entero-hemorrhagic E. coli genome sequences were made publicly available. The suitability of BG7 for genome annotation has been proved for Illumina, 454, Ion Torrent, and PacBio sequencing technologies. Besides, thanks to its plasticity, our system could be very easily adapted to work with new technologies in the future.  相似文献   

18.
19.
There are currently 151 plants with draft genomes available but levels of functional annotation for putative protein products are low. Therefore, accurate computational predictions are essential to annotate genomes in the first instance, and to provide focus for the more costly and time consuming functional assays that follow. DNA-binding proteins are an important class of proteins that require annotation, but current computational methods are not applicable for genome wide predictions in plant species. Here, we explore the use of species and lineage specific models for the prediction of DNA-binding proteins in plants. We show that a species specific support vector machine model based on Arabidopsis sequence data is more accurate (accuracy 81%) than a generic model (74%), and based on this we develop a plant specific model for predicting DNA-binding proteins. We apply this model to the tomato proteome and demonstrate its ability to perform accurate high-throughput prediction of DNA-binding proteins. In doing so, we have annotated 36 currently uncharacterised proteins by assigning a putative DNA-binding function. Our model is publically available and we propose it be used in combination with existing tools to help increase annotation levels of DNA-binding proteins encoded in plant genomes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号