首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 171 毫秒
1.
本研究采用Illumina HiSeq TM 2500测序平台对阿尔泰蝠蛾Hepialus altaicola Wang幼虫进行转录组测序及生物信息学分析.经序列拼接后共获得100133个Unigenes,总长度86319112 bp,平均长度862 bp,N50长度1628 bp.将Unigenes与NR、COG/KOG、Pfam、Swiss-Prot、GO、KEGG数据库比对,共获得38198条Unigenes,其中Nr数据库注释的Unigenes最多,为32381条,占32.34%.通过GO功能分类,共有13216个Unigenes在GO数据库中细胞组分、分子功能和生物学过程等3大类57个分支中找到注释;KEGG通路分析,共有15058条Unigenes被注释,归属于305条代谢通路.CDS预测发现54002条序列可被编码,占全部基因的53.93%.基因注释进一步获得311个与冷适应相关的代谢调节基因,并用FPKM值对基因表达量进行评估.本研究获得的转录组信息及分析结果,为进一步研究阿尔泰蝠蛾的基因功能及低温生态适应性奠定分子基础.  相似文献   

2.
[目的]绿眼赛茧蜂Zele chlorophthalmus是草地螟Loxoste gesticticalis幼虫重要天敌之一.本研究通过构建绿眼赛茧蜂触角转录组数据库,挖掘其与嗅觉相关的蛋白基因,为更好的利用绿眼赛茧蜂防治草地螟发挥其生防潜能提供理论依据.[方法]以Illumina Novaseq 6000高通量测序平台为基础,将绿眼赛茧蜂触角的基因进行转录组测序、组装序列,以及完成其生物信息学的研究分析,并对绿眼茧赛蜂触角的相关嗅觉基因做鉴定分析.[结果]成功构建绿眼赛茧蜂转录组数据库,数据库中的unigenes序列为65228条,N50为3882 bp.使用BLAST软件将绿眼赛茧蜂触角unigenes序列各自和Pfam、Swiss-Prot、NR、COG、KEGG、GO权威数据库进行对比,并完成基因功能相关注释,共注释基因数为18662条,占总数的28.61%.其中,NR数据库获得的注释最多,占总数的24.61%,为15863条,KEGG数据库获得的注释最少,为9612条(14.91%),其他依次为Pfam数据库注释数据库12164条(18.86%)、COG数据库注释15584条(24.17%)、GO数据库注释11634条(18.05%),Swiss-Prot数据库注释达到11634条,为总数的18.86%.借助GO数据库对unigenes的注释,其功能供分为三大类,可以细分49个分支,主要包括分子功能、细胞组分以及生物学过程.通过注释基因功能对嗅觉相关基因进行筛选,共发现151条和嗅觉有关的蛋白基因,包括3个感觉神经元膜蛋白基因、22个离子型受体基因、23个味觉受体基因、83个气味受体基因、6个化学感受蛋白基因、14个气味结合蛋白基因.[结论]成功收集了绿眼赛茧蜂触角转录组相关数据,并对与嗅觉相关的蛋白进行鉴定分析,为深入研究基因功能及嗅觉分子机制提供理论依据.  相似文献   

3.
Arthrobacter aurescens TC1和Pseudomonas sp. ADP是目前莠去津降解菌的模式菌株,筛选出Microbacterium sp.HBT4,旨在挖掘这3株不同种属细菌基因组间生物学信息的异同,并预测重要基因。通过Illumina Hiseq 4000测序平台采用DNA小文库制备和测序技术,进行了泛基因组测序,使用相关软件进行基因组组分分析、基因功能注释、基因间变异检测和比较基因组学分析,将分离得到的微杆菌HBT4与模式菌株进行核苷酸组成、共线性及菌株间变异差异分析。得到该菌株基因组大小约为3.53Mb,预测到菌株HBT4编码基因3 397个、重复序列含量为1.33%、非编码RNA 63个,通用数据库基因功能注释共3 324个,专用数据库基因功能注释共1 149个,通过菌株间差异变异分析发现SNP、Small InDel和水平转移基因,未发现结构变异基因,获得该菌株特有基因中GO注释到的基因在细胞组分、分子功能和生物学进程中的数量和比例,从KEGG代谢通路富集图中发现特有基因编码的二氢硫基赖氨酸残基琥珀酰转移酶位于三羧酸循环中α-酮戊二酸和琥珀酰辅酶A的代谢通路之间。获得3个菌株核心基因组与非必需基因组比例分布、系统进化树和共线性关系,发现三者之间共有基因家族986个、菌株HBT4特有基因家族1 171个。得到的菌株HBT4与两株模式菌株相比,其基因家族之间既有相同之处,又有较大差异。  相似文献   

4.
锥栗种仁转录组及淀粉和蔗糖代谢相关酶基因的表达分析   总被引:2,自引:0,他引:2  
本研究采用高通量测序技术对淀粉积累高峰期的锥栗种仁进行转录组测序分析,对得到的Unigene进行功能注释、分类及代谢通路分析,并进一步通过实时定量PCR方法分析了7个与淀粉和蔗糖代谢相关酶基因在锥栗种仁发育过程中的表达特征。结果显示:de novo组装后共获得53629条Unigene序列,序列平均长度为746 bp;通过与其他核酸、蛋白质数据库的Blast搜索比对,共26739条Unigene序列获得基因注释,占All-Unigene的49.86%;与COG数据库比对后将其注释的14413条Unigene序列划分成25类;GO功能注释的33926条Unigene基因共分成细胞组分、分子功能和生物过程3大类58个分支;与KEGG数据库比对结果注释的5277条Unigene序列划分为116条代谢通路;7个与淀粉和蔗糖代谢相关酶基因表达量变化趋势中,CH.29636(淀粉合成酶,SS),CH.11971(淀粉分支酶,SBE)两个基因随着种仁的发育呈现逐渐上升的趋势,而其余5个基因(CH.31302、CH.33690、CH.19238、CH.30128、CH.13088)表达量随着种仁的发育呈先上升后下降的趋势,该结果与锥栗种仁发育期淀粉和蔗糖的变化规律一致。这些信息为锥栗果实品质形成相关功能基因的研究提供了重要依据。  相似文献   

5.
东方蜜蜂微孢子虫Nosema ceranae是一种寄生于蜜蜂中肠上皮细胞的单细胞真菌,对蜜蜂的健康危害严重,给世界各国的养蜂业造成较大损失。本研究基于前期获得的N.ceranae孢子的转录组数据对其已注释基因进行结构优化,并对未注释基因进行预测和分析。通过将测序得到的clean reads比对参考基因组和转录本重构,共对10个N.ceranae的已注释基因的5'端或3'端进行了延长。利用Cuffcompare软件将重构转录本与参考基因组进行比对,共鉴定出27个新基因,随机挑选9个新基因进行RT-PCR验证,均能扩增出符合预期的目的片段,表明预测出的新基因真实存在。有6个新基因能够注释到GO数据库和6个基因注释到KEGG数据库。进一步分析结果显示上述新基因注释到细胞等10个GO条目上,它们可能在N.ceranae的生命活动中具有重要功能。研究结果为N.ceranae的基因结构和功能注释信息的完善提供了有益补充,也为新基因的功能研究打下了基础。  相似文献   

6.
Hao L  Li HP  Yan L 《遗传》2011,33(4):371-377
文章通过对东北梅花鹿(Cervus nippon hortulorum)鹿茸尖端组织cDNA文库随机测序获得了906条高质量ESTs,906条ESTs拼接后代表了701个Unigenes,其中包括重叠群86个,单拷贝615个。Blast分析显示具已知和推测功能的基因580个(82.7%),通过Gene Ontology(GO)分类对获得的580个功能基因进行了包括分子功能、生物过程和细胞组分在内的3个层次的功能注释,并根据BLAST的注释结果及进一步的筛选与分析,共得到39条与鹿茸尖端组织生长发育相关的基因。cDNA文库的构建和ESTs分析填补了鹿科动物在NCBI公共数据库上基因组信息的空白,并为科学的开发和利用梅花鹿资源提供了重要的理论依据。  相似文献   

7.
利用有限个实验条件下的基因表达谱数据,只能对与实验条件相关的基因功能类进行有效预测,所以有必要限定可预测的基因功能类范围。据此,首先基于GeneOntology(GO)选择富集差异表达基因与实验条件相关的功能类。再通过支持向量机分类器,深化预测迄今只注释到实验条件相关功能类的父结点的基因是否属于该实验条件相关功能类。应用于一套酵母基因表达谱数据,结果显示,在剔除了高度不平衡的训练集合后,平均真阳性率(precision)与平均覆盖率(recall)都分别达到了71%与47%以上。  相似文献   

8.
栽培种西番莲是中国南方广泛种植的果树,但是其基因组信息尚不清楚,严重制约了西番莲分子遗传学研究。本研究利用高通量测序得到的14.1 Gb原始数据及165.7 Mb组装到Scaffold水平、代表栽培种西番莲基因组的序列进行生物信息学分析。结果表明,西番莲基因组中含有大量的简单序列重复(simple sequence repeats, SSR)。通过与木薯和桃树基因组比对,西番莲基因组有23 053个预测基因。利用NR、Swiss Port、KEGG、InterPro、Pfam和GO数据库,西番莲预测基因能比对到282个植物基因组上。利用GO数据对注释基因的功能进行归类,即Biological process、Cellular component和Molecular function,再细化为41个二级功能,大部分基因与碳水化合物、有机酸、脂等代谢途径相关。KEGG通路富集将基因功能分为5大类19个二级功能,众多基因与新陈代谢通路相关,其中最大一类是碳水化合物代谢相关基因。通过基因家族的聚类分析,栽培种西番莲12 767个基因可以聚类到9 868个基因家族中,平均每个家族包含有1.29个基因,同时有291个特有基因家族。在进化关系中,栽培种西番莲与毛果杨和蓖麻的亲缘关系较近。本研究为西番莲的基因功能研究和分子育种奠定基础。  相似文献   

9.
【目的】中华大仰蝽Notonecta chinensis为中国和日本冲绳分布的重要水生天敌昆虫,可用于蚊虫的生物防治。本研究旨在建立中华大仰蝽转录组数据库,挖掘其基因信息。【方法】采用高通量测序平台Illumina NextSeq500对中华大仰蝽进行转录组测序、de novo组装及生物信息学分析;利用MISA软件基于转录组unigenes数据进行SSR新分子标记筛选。毛细管电泳检测SSR多态性。【结果】总计获得34782282条clean reads(NCBI SRA数据库登录号:SRR13259254),组装成37801条unigenes,N50为913 bp。将unigenes与已知数据库比对进行基因功能注释,分别有36474,32470,27781,35079和5638条序列注释到nr,Swiss-Prot,GO,eggNOG和KEGG数据库。通过GO数据库注释,unigenes的功能可分为生物学过程、细胞组分和分子功能三大类,其中参与细胞、细胞部分及结合功能的unigenes比例较大。eggNOG数据库注释结果显示,37801条unigenes归到25个基因家族,注释到未知功能的最多。KEGG代谢通路富集分析显示,5638条unigenes注释到245个代谢通路,注释到核糖体的数目最多。此外,用MISA软件在转录组测序数据中的37801条unigenes中搜索到3124个SSR位点(占总unigenes的8.26%),发生频率为7.07%。通过PCR筛选出16个SSR位点。7个中华大仰蝽地理种群3个位点NcCF/NcCR,NcKF/NcKR和NcLF/NcLR的多态信息含量(PIC)分别为0.870,0.902和0.857,具高度多态性。【结论】本研究成功获得了中华大仰蝽转录组数据,为其基因功能分析提供了分子理论基础;SSR新标记的开发为中华大仰蝽遗传多样性分析、隐存种鉴定及基因图谱构建提供了更丰富的候选分子标记。  相似文献   

10.
GESTs(gene expression similarity and taxonomy similarity)是结合基因表达相似性和基因功能分类体系Gene Ontology (GO)中的功能概念相似性测度进行功能预测的新方法. 将此预测算法推广应用于蛋白质互相作用数据, 并提出了几种在蛋白质互作网络中为功能待测蛋白质筛选邻居的方法. 与已有的其它蛋白质功能预测方法不同, 新方法在学习过程中自动地从功能分类体系中的各个功能类中选择最合适的尽可能具体细致的功能类, 利用注释于其相近功能类中的互作邻居蛋白质支持对此具体功能类的预测. 使用MIPS提供的酵母蛋白质互作信息与一套基因表达谱数据, 利用特别针对GO体系结构层次特点设计的3种测度, 评价对GO知识体系中的生物过程分支进行蛋白质功能预测的效果. 结果显示, 利用文中的方法, 可以大范围预测蛋白质的精细功能. 此外, 还利用此方法对2004年底Gene Ontology上未知功能的蛋白质进行预测, 其中部分预测结果在2006年4月发布的SGD注释数据中已经得到了证实.  相似文献   

11.
12.
Next‐generation technologies generate an overwhelming amount of gene sequence data. Efficient annotation tools are required to make these data amenable to functional genomics analyses. The Mercator pipeline automatically assigns functional terms to protein or nucleotide sequences. It uses the MapMan ‘BIN’ ontology, which is tailored for functional annotation of plant ‘omics’ data. The classification procedure performs parallel sequence searches against reference databases, compiles the results and computes the most likely MapMan BINs for each query. In the current version, the pipeline relies on manually curated reference classifications originating from the three reference organisms (Arabidopsis, Chlamydomonas, rice), various other plant species that have a reviewed SwissProt annotation, and more than 2000 protein domain and family profiles at InterPro, CDD and KOG. Functional annotations predicted by Mercator achieve accuracies above 90% when benchmarked against manual annotation. In addition to mapping files for direct use in the visualization software MapMan, Mercator provides graphical overview charts, detailed annotation information in a convenient web browser interface and a MapMan‐to‐GO translation table to export results as GO terms. Mercator is available free of charge via http://mapman.gabipd.org/web/guest/app/Mercator .  相似文献   

13.
14.
Despite recent advances, accurate gene function prediction remains an elusive goal, with very few methods directly applicable to the plant Arabidopsis thaliana. In this study, we present GO‐At (gene ontology prediction in A. thaliana), a method that combines five data types (co‐expression, sequence, phylogenetic profile, interaction and gene neighbourhood) to predict gene function in Arabidopsis. Using a simple, yet powerful two‐step approach, GO‐At first generates a list of genes ranked in descending order of probability of functional association with the query gene. Next, a prediction score is automatically assigned to each function in this list based on the assumption that functions appearing most frequently at the top of the list are most likely to represent the function of the query gene. In this way, the second step provides an effective alternative to simply taking the ‘best hit’ from the first list, and achieves success rates of up to 79%. GO‐At is applicable across all three GO categories: molecular function, biological process and cellular component, and can assign functions at multiple levels of annotation detail. Furthermore, we demonstrate GO‐At’s ability to predict functions of uncharacterized genes by identifying ten putative golgins/Golgi‐associated proteins amongst 8219 genes of previously unknown cellular component and present independent evidence to support our predictions. A web‐based implementation of GO‐At ( http://www.bioinformatics.leeds.ac.uk/goat ) is available, providing a unique resource for plant researchers to make predictions for uncharacterized genes and predict novel functions in Arabidopsis.  相似文献   

15.
DAtA: database of Arabidopsis thaliana annotation   总被引:1,自引:0,他引:1       下载免费PDF全文
The Database of Arabidopsis thaliana Annotation (D At A) was created to enable easy access to and analysis of all the Arabidopsis genome project annotation. The database was constructed using the completed A.thaliana genomic sequence data currently in GenBank. An automated annotation process was used to predict coding sequences for GenBank records that do not include annotation. D At A also contains protein motifs and protein similarities derived from searches of the proteins in D At A with motif databases and the non-redundant protein database. The database is routinely updated to include new GenBank submissions for Arabidopsis genomic sequences and new Blast and protein motif search results. A web interface to D At A allows coding sequences to be searched by name, comment, blast similarity or motif field. In addition, browse options present lists of either all the protein names or identified motifs present in the sequenced A.thaliana genome. The database can be accessed at http://baggage. stanford.edu/group/arabprotein/  相似文献   

16.
OntoBlast allows one to find information about potential functions of proteins by presenting a weighted list of ontology entries associated with similar sequences from completely sequenced genomes identified in a BLAST search. It combines, in a single analysis step, the search for sequence similarities in several species with the association of information stored in ontologies. From each identified ontology term a list of genes, which share the functional annotation, can be retrieved. The OntoBlast function is an integral part of the 'Ontologies TO GenomeMatrix' tool which provides an alternative entry point from ontology terms to the Genome-Matrix database. OntoBlast's web interface is accessible on the 'Ontologies TO GenomeMatrix Gate' page at http://functionalgenomics.de/ontogate/.  相似文献   

17.
This research analyzes some aspects of the relationship between gene expression, gene function, and gene annotation. Many recent studies are implicitly based on the assumption that gene products that are biologically and functionally related would maintain this similarity both in their expression profiles as well as in their gene ontology (GO) annotation. We analyze how accurate this assumption proves to be using real publicly available data. We also aim to validate a measure of semantic similarity for GO annotation. We use the Pearson correlation coefficient and its absolute value as a measure of similarity between expression profiles of gene products. We explore a number of semantic similarity measures (Resnik, Jiang, and Lin) and compute the similarity between gene products annotated using the GO. Finally, we compute correlation coefficients to compare gene expression similarity against GO semantic similarity. Our results suggest that the Resnik similarity measure outperforms the others and seems better suited for use in gene ontology. We also deduce that there seems to be correlation between semantic similarity in the GO annotation and gene expression for the three GO ontologies. We show that this correlation is negligible up to a certain semantic similarity value; then, for higher similarity values, the relationship trend becomes almost linear. These results can be used to augment the knowledge provided by clustering algorithms and in the development of bioinformatic tools for finding and characterizing gene products.  相似文献   

18.
With numerous whole genomes now in hand, and experimental data about genes and biological pathways on the increase, a systems approach to biological research is becoming essential. Ontologies provide a formal representation of knowledge that is amenable to computational as well as human analysis, an obvious underpinning of systems biology. Mapping function to gene products in the genome consists of two, somewhat intertwined enterprises: ontology building and ontology annotation. Ontology building is the formal representation of a domain of knowledge; ontology annotation is association of specific genomic regions (which we refer to simply as 'genes', including genes and their regulatory elements and products such as proteins and functional RNAs) to parts of the ontology. We consider two complementary representations of gene function: the Gene Ontology (GO) and pathway ontologies. GO represents function from the gene's eye view, in relation to a large and growing context of biological knowledge at all levels. Pathway ontologies represent function from the point of view of biochemical reactions and interactions, which are ordered into networks and causal cascades. The more mature GO provides an example of ontology annotation: how conclusions from the scientific literature and from evolutionary relationships are converted into formal statements about gene function. Annotations are made using a variety of different types of evidence, which can be used to estimate the relative reliability of different annotations.  相似文献   

19.
随着流感病毒基因组测序数据的急剧增加,深入挖掘流感病毒基因组大数据蕴含的生物学信息成为研究热点。基于中国流感病毒流行特征数据,建设一个集自动化、一体化和信息化的序列库系统,对于实现流感病毒基因组批量快速翻译、注释、存储、查询、分析具有重要的应用价值。本课题组通过集成一系列软件和工具包,并结合自主研发的其他功能,在底层维护的2个关键的参考数据集基础上另外追加了翻译注释信息最佳匹配的精细化筛选规则,构建具有流感病毒基因组信息存储、自动化翻译、蛋白序列精准注释、同源序列比对和进化树分析等功能的自动化系统。结果显示,通过Web端输入fasta格式的流感病毒基因序列,本系统可针对参考序列片段数据集(blastdb.fasta)进行Blast同源性检索,可以鉴定流感病毒的型别(A、B或C)、亚型和基因片段(1~8片段);在此基础上,通过查询数据库底层用于翻译、注释的基因片段参考数据集,可以获得一组肽段数据集,然后通过循环调用ProSplign软件对其进行预测。结合精细化的筛选准入规则,选出与输入序列匹配最好的翻译后产物,作为该输入序列的预测蛋白,输出为gbk,asn和fasta等通用格式的文件,给出序列长度、是否全长、病毒型别、亚型、片段等信息。基于以上工作,另外自主研发了系统其他的附加功能如进化树分析展示、基因组数据存储等功能,构建成基于Web服务的流感病毒基因组自动化翻译注释系统。本研究提示,系统高度集成系列软件以及自有的注释翻译数据库文件,实现从序列存储、翻译、注释到序列分析和展示的功能,可全面满足我国高通量基因检测数据共享化、本土化、一体化、自动化的需求。  相似文献   

20.
Automated Gene Ontology annotation for anonymous sequence data   总被引:9,自引:1,他引:9       下载免费PDF全文
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号