首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 203 毫秒
1.
原核生物蛋白质基因组学研究进展   总被引:1,自引:0,他引:1  
随着基因组测序技术的不断发展,大量微生物基因组序列可以在短时间内得以准确鉴定。为了进一步探究基因组的结构与功能,基于序列特征与同源特征的基因组注释算法广泛应用于新测序物种。然而受基因组测序质量以及算法本身准确性偏低等问题的影响,现有的基因组注释存在着相当比例的假基因以及注释错误,尤其是蛋白质N端的注释错误。为了弥补基因组注释的不足,以基因芯片或RNA-seq为核心的转录组测序技术和以串联质谱为核心的蛋白质组测序技术可以高通量地对基因的转录和翻译产物进行精确测定,进而实现预测基因结构的实验验证。然而,原核生物细胞中存在的大量非编码RNA给转录组测序技术引入了污染数据,限制了其对基因组注释的应用。相对而言,以串联质谱技术为核心的蛋白质组学测序可以在短时间内鉴定到生物体内大量的蛋白质,实现注释基因的验证甚至校准。已成为基因组注释和重注释的重要依据,并因而衍生了"蛋白质基因组学"的新研究方向。文中首先介绍传统的基于序列预测和同源比对的基因组注释算法,指出其中存在的不足。在此基础上,结合转录组学与蛋白质组学的技术特点,分析蛋白质组学对于原核生物基因组注释的优势,总结现阶段大规模蛋白质基因组学研究的进展情况。最后从信息学角度指出当前蛋白质组数据进行基因组重注释存在的问题与相应的解决方案,进而探讨未来蛋白质基因组学的发展方向。  相似文献   

2.
微生物基因组注释系统MGAP   总被引:6,自引:0,他引:6  
利用生物信息学方法和工具开发了微生物基因组注释系统(Microbial genome annotation package, MGAP),并用于蓝细菌PCC7002的基因组注释。该系统由基因组注释系统和基于Web的用户接口程序两部分组成。基因组注释系统整合多个基因识别、功能预测和序列分析软件;以及蛋白质序列数据库、蛋白质资源信息系统和直系同源蛋白质家族数据库等。用户接口程序包括基因组环状图展示、基因和开放读码框在染色体上的分布图,以及注释信息检索工具。该系统基于PC微机和Linux操作系统,用MySQL作数据库管理系统、用Apache作Web服务器程序,用Perl脚本语言编写应用程序接口,上述软件均可免费获得。  相似文献   

3.
微生物基因组的生物信息学研究平台的建立   总被引:1,自引:0,他引:1  
随着人类基因组计划及其它测序工作顺利进行,人们已经得到了大量的基因序列。如何阐明这些序列的功能和意义,是功能基因组学的主要任务,生物信息学和比较基因组学为加速这一进程提供了有利的工具,该研究建立了对已经完成全基因组测序和部分测序的25种细菌的基因组的生物信息学研究平台,提供了WEB形式的服务(http://202.116.74.108)。25种细菌的全基因组蛋白质序列可以在NCBI的ftp://ftp.ncbi.nlm.nih.gov/genbank/genomes/bacteria下载,该系统可以按照基因序列号,功能和种属名查询基因序列。根据美国国家信息中心(NCBI)的功能代码表对每个基因进行了自动和手工分类,并可查询分类情况,在此基因上建立了几种亲缘关系相近的种属的同源基因相互注释功能的应用。  相似文献   

4.
后基因组时代的基因组功能注释   总被引:25,自引:0,他引:25  
基因组功能注释是后基因组时代功能基因组学研究的热点领域.从基因组功能注释的研究内容与研究手段出发,重点综述了生物信息学在该领域方法学上的研究进展,并展望了今后的发展前景.  相似文献   

5.
牟少华  李娟  李雪平  高健 《广西植物》2022,42(8):1383-1393
毛竹是我国重要的经济竹种,在长期栽培适应过程中产生了丰富的变异。为揭示毛竹竹秆变异变型的全基因组突变类型,以黄皮毛竹、金丝毛竹、绿皮花毛竹和花毛竹4个毛竹变型为实验材料,采用高通量重测序技术获得全基因组序列,进行单核苷多态性(SNP)、小片段插入缺失(InDel)和结构变异(SV)检测和注释,并将变异基因进行功能注释。结果表明:花毛竹基因组检测得到的基因变异数最多,为12 555个; 金丝毛竹样品变异位点数最少,为11 923个; 4个样品都有7 000多个变异基因得到功能注释。GO注释分类包括细胞组件、分子功能和生物过程三个基因功能分类体系的56个功能组。在细胞组件方面,叶绿素合成相关基因有2 431个; 在生物过程方面,参与类胡萝卜素合成过程的基因有75个,参与花青素合成过程中的调控以及紫外光下组织中花青素积累的相关基因有80个。COG分类表明参与复制、重组和修复的基因数为369个,信号转导机制的基因数为291个,转录的相关基因为222个。通过KEGG数据库系统地分析变异基因参与的黄酮类、类胡萝卜素等物质代谢合成途径。深入研究这些差异基因的调控途径,从DNA水平上解释竹秆的变异机制,可为深入研究毛竹种内丰富的多态性和遗传变异提供数据支持,阐析不同变异类型的基因家族、功能基因等遗传基础。  相似文献   

6.
随着高通量DNA测序技术的飞速发展,越来越多的物种完成了基因组测序.定位编码基因、确定编码基因结构是基因组注释的基本任务,然而以往的基因组注释方法主要依赖于DNA及RNA序列信息.为了更加精确地解读完成测序的基因组,我们需要整合多种类型的组学数据进行基因组注释.近年来,基于串联质谱技术的蛋白质组学已经发展成熟,实现了对蛋白质组的高覆盖,使得利用串联质谱数据进行基因组注释成为可能.串联质谱数据一方面可以对已注释的基因进行表达验证,另一方面还可以校正原注释基因,进而发现新基因,实现对基因组序列的重新注释.这正是当前进展较快的蛋白质基因组学的研究内容.利用该方法系统地注释已完成测序的基因组已成为解读基因组的一个重要补充.本文综述了蛋白质基因组学的主要研究内容和研究方法,并展望了该研究方向未来的发展.  相似文献   

7.
【背景】桑氏链霉菌(Streptomyces sampsonii)KJ40是一株具有防病、促生多重功能的放线菌,有作为生物农药的潜力。目前还没有相关研究报道S.sampsonii全基因组序列,这限制了其功能基因、代谢产物合成途径及比较基因组学等研究。【目的】解析S.sampsonii KJ40的基因组序列信息,以深入研究该菌株防病促生机制及挖掘次级代谢产物基因资源。【方法】利用Illumina HiSeq高通量测序平台对KJ40菌株进行全基因组测序,使用相关软件对测序数据进行基因组组装、基因预测和功能注释、预测次级代谢产物合成基因簇、共线性分析等。【结果】基因组最后得到9个Scaffolds和578个Contigs,总长度为7 261 502 bp,G+C%含量平均为73.41%,预测到6 605个基因、1 260个串联重复序列、804个小卫星序列、67个微卫星序列、90个tRNA、9个rRNA和19个sRNA。其中,2 429、3 765、2 890、6 063和1 911个基因分别能够在COG、GO、KEGG、NR和Swiss-Prot数据库提取到注释信息。同时,还预测得到21个次级代谢产物合成基因簇。基因组测序数据提交至NCBI获得Gen Bank登录号:LORI00000000。S.sampsonii KJ40与Streptomyces coelicolor A3(2)、Streptomyces griseus subsp.griseus NBRC 13350三株链霉菌基因组存在翻转、易位等基因组重排,3个基因组共有1 711个蛋白聚类簇。【结论】研究为从基因组层面上解析KJ40菌株具有良好促生防病效果的内在原因提供基础数据,为深入了解链霉菌次级代谢合成途径提供参考信息,对S.sampsonii后续相关研究具有重要意义。  相似文献   

8.
假基因的组成、分布及其分子进化   总被引:5,自引:0,他引:5  
假基因(pseudogene)是指基因组中与正常基因序列相似,但是缺乏功能的DNA序列.通过序列同源性搜索,可以收集基因组中假基因的群体特性、染色体分布和同源家族等特性.假基因很好地保留了数百万年前基因组中祖先基因的分子记录,被视为"基因化石",因此假基因在进化和比较基因组学中是重要的资源.应用假基因和基因比较体系,可以探究生物基因的进化史和基因组稳定性.如:用Ka/Ks比值确定假基因的自然选择压、物种亲缘关系和进化距离,分析假基因自身的进化趋势,探讨DNA突变的成因等.  相似文献   

9.
目前, 大量园艺植物基因组测序已经完成或接近尾声, 它们的基因组序列和注释数据极大地促进了功能基因组学研究。为给科研人员提供批量下载特定的基因组区段序列和注释平台, 笔者开发了一个称为OBRRP的生物信息学工具。OBRRP具有提取葡萄(Vitis vinifera)、桃(Prunus persica)、草莓(Fragaria vesca)、黄瓜(Cucumis sativus)、西瓜(Citrullus lanatus)、番茄(Solanum lycopersicum)、甜橙(Citrus sinensis)、苹果(Malus x domestica)、猕猴桃(Actinidia chinensis)、马铃薯(Solanum tuberosum)、香蕉(Musa acuminata)和拟南芥(Arabidopsis thaliana) 12种植物基因组序列及注释数据的功能; 同时, 也具有扩展到其它Gbrowser浏览器架构的数据库功能。测试结果表明, OBRRP是一个快捷简便的在线、批量和实时提取工具, 其登录地址为http://bioinfo.jit.edu.cn/OBRRP/。  相似文献   

10.
微生物基因组研究进展   总被引:6,自引:1,他引:5  
本综述了微生物全基因组测序的基本方法,数据收集和组装,序列缺口的填充、全基因组序列注释。同时对微生物基因组的研究现状和重大意义也作了简单概述。  相似文献   

11.
12.
The DOE-JGI Microbial Annotation Pipeline (DOE-JGI MAP) supports gene prediction and/or functional annotation of microbial genomes towards comparative analysis with the Integrated Microbial Genome (IMG) system. DOE-JGI MAP annotation is applied on nucleotide sequence datasets included in the IMG-ER (Expert Review) version of IMG via the IMG ER submission site. Users can submit the sequence datasets consisting of one or more contigs in a multi-fasta file. DOE-JGI MAP annotation includes prediction of protein coding and RNA genes, as well as repeats and assignment of product names to these genes.  相似文献   

13.
While genome sequencing efforts reveal the basic building blocksof life, a genome sequence alone is insufficient for elucidatingbiological function. Genome annotation—the process ofidentifying genes and assigning function to each gene in a genomesequence—provides the means to elucidate biological functionfrom sequence. Current state-of-the-art high-throughput genomeannotation uses a combination of comparative (sequence similaritydata) and non-comparative (ab initio gene prediction algorithms)methods to identify protein-coding genes in genome sequences.Because approaches used to validate the presence of predictedprotein-coding genes are typically based on expressed RNA sequences,they cannot independently and unequivocally determine whethera predicted protein-coding gene is translated into a protein.With the ability to directly measure peptides arising from expressedproteins, high-throughput liquid chromatography-tandem massspectrometry-based proteomics approaches can be used to verifycoding regions of a genomic sequence. Here, we highlight severalways in which high-throughput tandem mass spectrometry-basedproteomics can improve the quality of genome annotations andsuggest that it could be efficiently applied during the genecalling process so that the improvements are propagated throughthe subsequent functional annotation process.   相似文献   

14.
The Genome Warehouse (GWH) is a public repository housing genome assembly data for a wide range of species and delivering a series of web services for genome data submission, storage, release, and sharing. As one of the core resources in the National Genomics Data Center (NGDC), part of the China National Center for Bioinformation (CNCB; https://ngdc.cncb.ac.cn), GWH accepts both full and partial (chloroplast, mitochondrion, and plasmid) genome sequences with different assembly levels, as well as an update of existing genome assemblies. For each assembly, GWH collects detailed genome-related metadata of biological project, biological sample, and genome assembly, in addition to genome sequence and annotation. To archive high-quality genome sequences and annotations, GWH is equipped with a uniform and standardized procedure for quality control. Besides basic browse and search functionalities, all released genome sequences and annotations can be visualized with JBrowse. By May 21, 2021, GWH has received 19,124 direct submissions covering a diversity of 1108 species and has released 8772 of them. Collectively, GWH serves as an important resource for genome-scale data management and provides free and publicly accessible data to support research activities throughout the world. GWH is publicly accessible at https://ngdc.cncb.ac.cn/gwh.  相似文献   

15.
We have developed a rice (Oryza sativa) genome annotation database (Osa1) that provides structural and functional annotation for this emerging model species. Using the sequence of O. sativa subsp. japonica cv Nipponbare from the International Rice Genome Sequencing Project, pseudomolecules, or virtual contigs, of the 12 rice chromosomes were constructed. Our most recent release, version 3, represents our third build of the pseudomolecules and is composed of 98% finished sequence. Genes were identified using a series of computational methods developed for Arabidopsis (Arabidopsis thaliana) that were modified for use with the rice genome. In release 3 of our annotation, we identified 57,915 genes, of which 14,196 are related to transposable elements. Of these 43,719 non-transposable element-related genes, 18,545 (42.4%) were annotated with a putative function, 5,777 (13.2%) were annotated as encoding an expressed protein with no known function, and the remaining 19,397 (44.4%) were annotated as encoding a hypothetical protein. Multiple splice forms (5,873) were detected for 2,538 genes, resulting in a total of 61,250 gene models in the rice genome. We incorporated experimental evidence into 18,252 gene models to improve the quality of the structural annotation. A series of functional data types has been annotated for the rice genome that includes alignment with genetic markers, assignment of gene ontologies, identification of flanking sequence tags, alignment with homologs from related species, and syntenic mapping with other cereal species. All structural and functional annotation data are available through interactive search and display windows as well as through download of flat files. To integrate the data with other genome projects, the annotation data are available through a Distributed Annotation System and a Genome Browser. All data can be obtained through the project Web pages at http://rice.tigr.org.  相似文献   

16.
Genome reannotation aims for complete and accurate characterization of gene models and thus is of critical significance for in-depth exploration of gene function. Although the availability of massive RNA-seq data provides great opportunities for gene model refinement, few efforts have been made to adopt these precious data in rice genome reannotation. Here we reannotate the rice (Oryza sativa L. ssp. japonica) genome based on integration of large-scale RNA-seq data and release a new annotation system IC4R-2.0. In general, IC4R-2.0 significantly improves the completeness of gene structure, identifies a number of novel genes, and integrates a variety of functional annotations. Furthermore, long non-coding RNAs (lncRNAs) and circular RNAs (circRNAs) are systematically characterized in the rice genome. Performance evaluation shows that compared to previous annotation systems, IC4R-2.0 achieves higher integrity and quality, primarily attributable to massive RNA-seq data applied in genome annotation. Consequently, we incorporate the improved annotations into the Information Commons for Rice (IC4R), a database integrating multiple omics data of rice, and accordingly update IC4R by providing more user-friendly web interfaces and implementing a series of practical online tools. Together, the updated IC4R, which is equipped with the improved annotations, bears great promise for comparative and functional genomic studies in rice and other monocotyledonous species. The IC4R-2.0 annotation system and related resources are freely accessible at http://ic4r.org/.  相似文献   

17.
Naturally occurring DNA sequence variation within a species underlies evolutionary adaptation and can give rise to phenotypic changes that provide novel insight into biological questions. This variation exists in laboratory populations just as in wild populations and, in addition to being a source of useful alleles for genetic studies, can impact efforts to identify induced mutations in sequence-based genetic screens. The Western clawed frog Xenopus tropicalis (X. tropicalis) has been adopted as a model system for studying the genetic control of embryonic development and a variety of other areas of research. Its diploid genome has been extensively sequenced and efforts are underway to isolate mutants by phenotype- and genotype-based approaches. Here, we describe a study of genetic polymorphism in laboratory strains of X. tropicalis. Polymorphism was detected in the coding and non-coding regions of developmental genes distributed widely across the genome. Laboratory strains exhibit unexpectedly high frequencies of genetic polymorphism, with alleles carrying a variety of synonymous and non-synonymous codon substitutions and nucleotide insertions/deletions. Inter-strain comparisons of polymorphism uncover a high proportion of shared alleles between Nigerian and Ivory Coast strains, in spite of their distinct geographical origins. These observations will likely influence the design of future sequence-based mutation screens, particularly those using DNA mismatch-based detection methods which can be disrupted by the presence of naturally occurring sequence variants. The existence of a significant reservoir of alleles also suggests that existing laboratory stocks may be a useful source of novel alleles for mapping and functional studies.  相似文献   

18.
The phenotypic effects of random mutations depend on both the architecture of the genome and the gene-trait relationships. Both levels thus play a key role in the mutational variability of the phenotype, and hence in the long-term evolutionary success of the lineage. Here, by simulating the evolution of organisms with flexible genomes, we show that the need for an appropriate phenotypic variability induces a relationship between the deleteriousness of gene mutations and the quantity of non-coding sequences maintained in the genome. The more deleterious the gene mutations, the shorter the intergenic sequences. Indeed, in a shorter genome, fewer genes are affected by rearrangements (duplications, deletions, inversions, translocations) at each replication, which compensates for the higher impact of each gene mutation. This spontaneous adjustment of genome structure allows the organisms to retain the same average fitness loss per replication, despite the higher impact of single gene mutations. These results show how evolution can generate unexpected couplings between distinct organization levels.  相似文献   

19.
Noncoding sequence contains pathogenic mutations. Yet, compared with mutations in protein-coding sequence, pathogenic regulatory mutations are notoriously difficult to recognize. Most fundamentally, we are not yet adept at recognizing the sequence stretches in the human genome that are most important in regulating the expression of genes. For this reason, it is difficult to apply to the regulatory regions the same kinds of analytical paradigms that are being successfully applied to identify mutations among protein-coding regions that influence risk. To determine whether dosage sensitive genes have distinct patterns among their noncoding sequence, we present two primary approaches that focus solely on a gene’s proximal noncoding regulatory sequence. The first approach is a regulatory sequence analogue of the recently introduced residual variation intolerance score (RVIS), termed noncoding RVIS, or ncRVIS. The ncRVIS compares observed and predicted levels of standing variation in the regulatory sequence of human genes. The second approach, termed ncGERP, reflects the phylogenetic conservation of a gene’s regulatory sequence using GERP++. We assess how well these two approaches correlate with four gene lists that use different ways to identify genes known or likely to cause disease through changes in expression: 1) genes that are known to cause disease through haploinsufficiency, 2) genes curated as dosage sensitive in ClinGen’s Genome Dosage Map, 3) genes judged likely to be under purifying selection for mutations that change expression levels because they are statistically depleted of loss-of-function variants in the general population, and 4) genes judged unlikely to cause disease based on the presence of copy number variants in the general population. We find that both noncoding scores are highly predictive of dosage sensitivity using any of these criteria. In a similar way to ncGERP, we assess two ensemble-based predictors of regional noncoding importance, ncCADD and ncGWAVA, and find both scores are significantly predictive of human dosage sensitive genes and appear to carry information beyond conservation, as assessed by ncGERP. These results highlight that the intolerance of noncoding sequence stretches in the human genome can provide a critical complementary tool to other genome annotation approaches to help identify the parts of the human genome increasingly likely to harbor mutations that influence risk of disease.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号