微生物基因组注释系统MGAP   总被引:6,自引:0,他引:6  
利用生物信息学方法和工具开发了微生物基因组注释系统(Microbial genome annotation package, MGAP),并用于蓝细菌PCC7002的基因组注释。该系统由基因组注释系统和基于Web的用户接口程序两部分组成。基因组注释系统整合多个基因识别、功能预测和序列分析软件;以及蛋白质序列数据库、蛋白质资源信息系统和直系同源蛋白质家族数据库等。用户接口程序包括基因组环状图展示、基因和开放读码框在染色体上的分布图,以及注释信息检索工具。该系统基于PC微机和Linux操作系统,用MySQL作数据库管理系统、用Apache作Web服务器程序,用Perl脚本语言编写应用程序接口,上述软件均可免费获得。  相似文献   

目的:对获得的3株肠道病毒71(EV71)型毒株进行全基因组序列测定,并对其进化特点及分型进行初步分析。方法:提取病毒RNA,反转录得到eDNA,PCR分段扩增覆盖病毒全长序列的6个重叠片段(不包括多聚腺苷酸尾);用软件将3株EV71的备片段序列进行拼接、编辑和校正,随后进行氨基酸翻译及序列比较;用MEGA4.1软件构建系统进化树。结果:获得了3株EV71的全长序列:GDV103株基因组全长7404 nt,包括741bp的5’端非编码区(UTR)、6582bp的病毒基因组编码区(ORF)及81bp的3’UTR;安徽株(Anhui2007)基因组全长7405nt,包括742bp的5'UTR、6582bp的ORF及81bp的3'UTR;VR1432株基因组全长7408nt,包括743bp的5’UTR、6582bp的ORF及83bp的3’UTR长。经同源性比对和进化树分析,证实GDV103和安徽株EV71属于C基因型的C4基因亚型。而VR1432株则属于C基因型的C2基因亚型。结论:获得了3株EV71的全长基因组序列,并进一步探讨了其型别,为下一步的干扰素保护宴,哈重定了基础.  相似文献   

基因组注释是识别出基因组序列中功能组件的过程,其可以直接对序列赋予生物学意义,由此方便研究者探究和分析基因组功能.基因组注释可以帮助研究从三个层次上理解基因组,一种是在核苷酸水平的注释,主要确定DNA序列中基因、RNA、重复序列等组件的物理位置,包括转录起始,翻译起始,外显子边界等具体位置信息.同时可以注释得到变异在不...  相似文献   

为了研究H1N1亚型SIV遗传演化与变异的特性,采用RT-PCR技术分别扩增A/Swine/Guangdong/LM/2004(H1N1)的8个基因片段,分别将其克隆到pMD18-T载体,进行全基因组序列测定.核苷酸序列测定结果显示:LM株SIV各基因片段均未发现核苷酸插入或缺失现象.HA切割位点处的氨基酸序列序列为IPSIQSR↓G,与高致病性SIV的H1N1亚型毒株的分子特征不符合.HA基因含有6个潜在的N-糖基化位点,4个在HAl的第11、23、87、和276位,增加2个分别在HA2的154和213位点;NA基因不仅在58、63、68、88和146位含有高度保守的N-糖基化位点,而且在44和235位增加2个潜在的N-糖基化位点,这可能是近期H1N1亚型SIV的一个分子特征.核苷酸同源性结果:HA基因与类人谱系的流感病毒分离株有很高的同源性(99%),而其他基因均与古典猪谱系的流感病毒分离株同源性较高(87%~98%).从绘制的各个片段进化树和核苷酸同源性分析结果,可以推测该毒株HA基因可能来源于类人谱系的流感病毒;而其他基因来源于古典猪谱系的流感病毒.  相似文献   

设计一种基于网络的可用来存储和注释海量DNA数据的数据库模型。整个过程分为三部分:首先是构建数据库框架,然后对原始基因组序列数据进行批量注释并输出有效格式导入数据库,最后通过一个友好的用户交互界面,实现对基因组数据的在线读取,查询,注释等操作。设计的数据库用于解决大量产生并有待分析的基因组序列的有效存储和管理问题。  相似文献   

国际水稻基因组测序计划(IEGSP)顺利完成, 水稻基因的研究也进入了后基因组研究阶段. 水稻基因芯片数据注释分析是一项重要的功能基因组学研究内容, 它为理解水稻基因的生物学意义提供了帮助. 本研究开发了一个基于Web的水稻基因芯片数据注释和分析平台(RiceChip), 它比同类的注释数据库更加全面快捷. 本平台共由5个功能模块组成: BioChip模块为水稻基因表达数据提供快速检索和高级检索, 可依次按照Probe Set ID, Locus ID, Analysis Name等字段进行检索; BioAnno模块整合多个生物学数据库, 为水稻基因提供基因功能、蛋白质结构、生物代谢途径以及转录调控等方面的注释信息; BioSeq模块则收集水稻基因组的序列信息, 支持对水稻基因与芯片探针的序列查询; BioView模块是系统图形可视化的核心模块, 提供友好的访问界面与结果输出, 方便研究人员使用; BioAnaly模块结合R/Bioconductor统计分析工具提供高通量芯片数据的在线分析. 本系统从不同的方面依次提供了数据检索、基因注释、序列分析、数据可视化和数据分析等功能, 其数据收集的全面性与功能分析的强大性在同类水稻基因芯片数据注释和分析平台中都较突出.  相似文献   

随着测序技术的不断发展,产生了海量的基因组测序数据,极大地丰富了公共遗传数据资源。同时为了应对大量基因组数据的产生,基因组比较和注释算法、工具不断更新,使得联合多种注释工具得到更准确的蛋白编码基因的注释信息成为可能。目前公共数据库的原核生物基因组测序和装配有些是10多年前的,存在大量预测的功能未知的编码基因。为了提升美国国家生物信息中心(National Center for Biotechnology Information,NCBI)数据库中基因组的注释质量,本研究联合使用多种原核基因识别算法/软件和基因表达数据重注释1587个细菌和古细菌基因组。首先,利用Z曲线的33个变量从177个基因组原注释中识别获得3092个被过度注释为蛋白编码基因的序列;其次,通过同源比对为939个基因组中的4447个功能未知的蛋白编码基因注释上具体功能;最后,通过联合采用ZCURVE 3.0和Glimmer 3.02以及Prodigal这3种高精度的、广泛使用且基于算法不同而互补的基因识别软件来寻找漏注释基因。最终,从9个基因组中找到了2003个被漏注释的蛋白编码基因,这些基因属于多个蛋白质直系同源簇(clusters of orthologous groups of proteins, COG)。本研究使用新的工具并结合多组学数据重新注释早期测序的细菌和古细菌基因组,不仅为新测序菌株提供注释方法参考,而且这些重注释后得到的细菌基因序列也会对后续基础研究有所帮助。  相似文献   

以RefSeq数据库和已测序基因组序列为模板,通过大规模计算得到代表转录各层次信息的"标准转录数据库",并利用通用网关接口技术,建立了人类和模式生物标准转录数据集Web服务系统。用户提交RefSeq记录号或自由注释词,可检索获得序列的全部信息,实现对基因结构解析的在线计算。目前系统覆盖了人、拟南芥、水稻、大鼠、小鼠、斑马鱼等6个物种,拥有数据记录18万余条。为深入研究人类及其他物种转录组提供了重要工具,并为进一步分析真核基因的可变剪接方式提供了坚实的数据基础。  相似文献   

目的阐明H3亚型鸭流感病毒与其他亚型流感病毒的关系。方法对活禽市场分离的3株H3N8亚型鸭源流感病毒聚合酶PB1基因进行了序列分析。结果3株鸭源H3N8流感病毒聚合酶PB1基因核苷酸同源性为99.9%,与H9N2亚型流感病毒(DK/ST/2143/00)的同源性为96.31%~96.44%,而与H3N8亚型鸭流感病毒(Mal/Alberta/279/98)为88.65%~88.79%。系统进化树分析表明,本实验中的3株病毒属于相同的分支,且与A/duck/Hong Kong/Y439为代表的H9N2亚型禽流感病毒位于一进化分支,说明三株H3N8亚型流感病毒重排了H9N2亚型禽流感病毒的基因片段。结论不同亚型禽流感病毒在贮存宿主体内的重排以及重排病毒的新特点如鸭H3N8亚型流感病毒对禽的致病性,应当引起我们的高度重视。  相似文献   

2009年11月,美、英等国科学家宣布首次绘制出家猪的基因组草图。近两年,随着全基因组序列陆续释放,越来越多的测序片段得到正确拼接组装,从全基因组水平上对猪功能基因进行注释分析显得尤为迫切。文章以丝切蛋白1(Cofilin 1,CFL1)基因的注释过程为例,介绍了运用Sanger研究所开发的Otterlace软件对猪全基因组的免疫基因序列进行人工分析与注释。通过详细说明Zmap、Blixem和Dotter 3个注释工具的使用方法,并给出了注释过程的主要步骤,以期对Otterlace的应用起一个抛砖引玉的作用。运用Otterlace软件对243个免疫相关基因进行分析,其中180个基因得到完整或部分注释,这为后续深入开展这些基因的功能研究奠定了基础。  相似文献   

The program phase is widely used for Bayesian inference of haplotypes from diploid genotypes; however, manually creating phase input files from sequence alignments is an error-prone and time-consuming process, especially when dealing with numerous variable sites and/or individuals. Here, a web tool called seqphase is presented that generates phase input files from fasta sequence alignments and converts phase output files back into fasta. During the production of the phase input file, several consistency checks are performed on the dataset and suitable command line options to be used for the actual phase data analysis are suggested. seqphase was written in perl and is freely accessible over the Internet at the address http://www.mnhn.fr/jfflot/seqphase.  相似文献   

Shi W  Lei F  Zhu C  Sievers F  Higgins DG 《PloS one》2010,5(12):e14454


More and more nucleotide sequences of type A influenza virus are available in public databases. Although these sequences have been the focus of many molecular epidemiological and phylogenetic analyses, most studies only deal with a few representative sequences. In this paper, we present a complete analysis of all Haemagglutinin (HA) and Neuraminidase (NA) gene sequences available to allow large scale analyses of the evolution and epidemiology of type A influenza.

Methodology/Principal Findings

This paper describes an analysis and complete classification of all HA and NA gene sequences available in public databases using multivariate and phylogenetic methods.


We analyzed 18975 HA sequences and divided them into 280 subgroups according to multivariate and phylogenetic analyses. Similarly, we divided 11362 NA sequences into 202 subgroups. Compared to previous analyses, this work is more detailed and comprehensive, especially for the bigger datasets. Therefore, it can be used to show the full and complex phylogenetic diversity and provides a framework for studying the molecular evolution and epidemiology of type A influenza virus. For more than 85% of type A influenza HA and NA sequences into GenBank, they are categorized in one unambiguous and unique group. Therefore, our results are a kind of genetic and phylogenetic annotation for influenza HA and NA sequences. In addition, sequences of swine influenza viruses come from 56 HA and 45 NA subgroups. Most of these subgroups also include viruses from other hosts indicating cross species transmission of the viruses between pigs and other hosts. Furthermore, the phylogenetic diversity of swine influenza viruses from Eurasia is greater than that of North American strains and both of them are becoming more diverse. Apart from viruses from human, pigs, birds and horses, viruses from other species show very low phylogenetic diversity. This might indicate that viruses have not become established in these species. Based on current evidence, there is no simple pattern of inter-hemisphere transmission of avian influenza viruses and it appears to happen sporadically. However, for H6 subtype avian influenza viruses, such transmissions might have happened very frequently and multiple and bidirectional transmission events might exist.  相似文献   

从广东省疑似流感发病猪分离到1株H3N2亚型猪流感病毒(A/Swine/Guangdong/01/2005(H3N2)),对其各个基因进行克隆与测序,并与GenBank中收录的其它猪流感、禽流感和人流感的相关基因进行比较,结果表明,HA全基因与广东2003~2004年分离的H3N2猪流感毒株的核苷酸序列同源性在99%以上,与纽约90年代末分离的H3N2人流感毒株同源性在98.5%以上;NA基因与纽约1998~2000年分离的H3N2人流感毒株的核苷酸序列同源性在99%以上;NS基因、M基因的核苷酸序列与H1N1亚型猪流感毒株A/swine/HongKong/273/1994(H1N1)的核苷酸序列同源性较高,分别为97.9%、98.4%,与美洲A/swine/Iowa/17672/1988(H1N1)的核苷酸序列同源性分别为96.7%、97.1%;其他基因的核苷酸序列与H3N2人流感毒株具有很高的同源性。因此,推测其M和NS基因来源于H1N1亚型猪流感病毒,HA、NA及其他基因均来源于H3N2亚型人流感病毒。表明此H3N2亚型猪流感病毒为H3N2亚型人流感病毒和H1N1亚型猪流感病毒经基因重排而得到的重组病毒。  相似文献   

The DOE-JGI Microbial Annotation Pipeline (DOE-JGI MAP) supports gene prediction and/or functional annotation of microbial genomes towards comparative analysis with the Integrated Microbial Genome (IMG) system. DOE-JGI MAP annotation is applied on nucleotide sequence datasets included in the IMG-ER (Expert Review) version of IMG via the IMG ER submission site. Users can submit the sequence datasets consisting of one or more contigs in a multi-fasta file. DOE-JGI MAP annotation includes prediction of protein coding and RNA genes, as well as repeats and assignment of product names to these genes.  相似文献   

SNUFER is a software for the automatic localization and generation of tables used for the presentation of single nucleotide polymorphisms (SNPs). After input of a fasta file containing the sequences to be analyzed, a multiple sequence alignment is generated using ClustalW ran inside SNUFER. The ClustalW output file is then used to generate a table which displays the SNPs detected in the aligned sequences and their degree of similarity. This table can be exported to Microsoft Word, Microsoft Excel or as a single text file, permitting further editing for publication. The software was written using Delphi 7 for programming and FireBird 2.0 for sequence database management. It is freely available for noncommercial use and can be downloaded from http://www.heranza.com.br/bioinformatica2.htm.  相似文献   

Ten influenza virus isolates were obtained from infected pigs from different places in Shandong province showing clinical symptoms from October 2002 to January 2003. All 10 isolates were identified in China's National Influenza Research Center as influenza A virus of H9N2 subtype. The complete genome of one isolate, designated A/Swine/Shandong/1/2003(H9N2), was sequenced and compared with sequences available in GenBank. The results of analyses indicated that the sequence of A/Swine/Shandong/1/2003(H9N2) was similar to those of several chicken influenza viruses and duck influenza viruses recently prevalent in South China. According to phylogenetic analysis of the complete gene sequences, A/Swine/Shandong/1/2003(H9N2) possibly originated from the reassortment of chicken influenza viruses and duck influenza viruses. It was found that the amino acid sequence at the HA cleavage site in Sw/SD/1/2003 is R-S-L-R-G, differing clearly from that of other H9N2 subtype isolates of swine influenza and avian influenza, which is R-S-S-R-G.  相似文献   

The evolutionary classification of influenza genes into lineages is a first step in understanding their molecular epidemiology and can inform the subsequent implementation of control measures. We introduce a novel approach called Lineage Assignment By Extended Learning (LABEL) to rapidly determine cladistic information for any number of genes without the need for time-consuming sequence alignment, phylogenetic tree construction, or manual annotation. Instead, LABEL relies on hidden Markov model profiles and support vector machine training to hierarchically classify gene sequences by their similarity to pre-defined lineages. We assessed LABEL by analyzing the annotated hemagglutinin genes of highly pathogenic (H5N1) and low pathogenicity (H9N2) avian influenza A viruses. Using the WHO/FAO/OIE H5N1 evolution working group nomenclature, the LABEL pipeline quickly and accurately identified the H5 lineages of uncharacterized sequences. Moreover, we developed an updated clade nomenclature for the H9 hemagglutinin gene and show a similarly fast and reliable phylogenetic assessment with LABEL. While this study was focused on hemagglutinin sequences, LABEL could be applied to the analysis of any gene and shows great potential to guide molecular epidemiology activities, accelerate database annotation, and provide a data sorting tool for other large-scale bioinformatic studies.  相似文献   

Viruses are major factors of human infectious diseases. Understanding of the structure-function correlation in viruses is important for the identification of potential anti-viral inhibitors and vaccine targets. In virology research, virus-related databases and bioinformatic analysis tools are essential for discerning relationships within complex datasets about viruses and host-virus interactions. Bioinformatic analyses on viruses include the identification of open reading frames, gene prediction, homology searching, sequence alignment, and motif and epitope recognition. The predictions of features such as transmembrane domains, glycosylation sites, and protein secondary and tertiary structure are important for analyzing the structure-function relationship of proteins encoded in viral genomes. Biochemical pathway analysis can help elucidate information at the biological systems level. Microarray analysis provides methods for high throughput screening and gene expression profiling. Virus-related bioinformatics databases include those concerned with viral sequences, taxonomy, homologous protein families, structures, or dedicated to specific viruses such as influenza and herpes simplex virus (HSV). This review provides a guide and overview of computational programs for these analyses as a resource for genomics and proteomics studies in virology research. These resources are useful for understanding viral diseases, as well as for the design and development of anti-viral agents.  相似文献   

为从分子水平掌握我国H9亚型AIV的遗传变异情况和流行规律,本研究汇集近年来从我国12个省、市、自治区的发病鸡群中分离到的23株H9亚型禽流感病毒,通过RT-PCR方法和核苷酸序列测定获得了23个毒株的HA基因cDNA核苷酸序列。核苷酸和推导的氨基酸序列同源性比较结果表明,这些毒株HA基因的核苷酸序列同源性为94.1%~100%,氨基酸序列同源性为95.4%~100%;将这23个毒株和来自亚洲及世界其它地区的另外31株的HA基因cDNA序列同源性进行比较发现,分离自香港的HK170499株与日本的2个毒株关系较近;氨基酸序列分析发现,CKGS199、CKTJ196、CKTJ296、CKSH300和CKBJ197五个毒株各发生了一个潜在的糖基化位点的丢失。54株H9亚型AIVHA基因55bp~1152bp的氨基酸序列分析发现,裂解位点尽管有10种基序,但本研究中的23株和近年来从我国大陆和香港地区的分离的毒株则均为RSSR↓GLF;构成受体结合位点的191位氨基酸有一个规律,即所有中国大陆毒株与部分香港毒株都为N,其它毒株均为H,141aa~143aa处的糖基化位点有与191aa类似的规律,即:凡是191aa为N的毒株,该处均为NVS(CKBJ194除外),凡是191aa为H的毒株,则该处均为NVT;遗传发生关系分析,中国大陆毒株处于欧亚谱系的第一支。本研究结果表明近年来我国鸡群中H9N2亚型禽流感病毒的感染流行可能有一个共同的来源,这为制定防治该亚型禽流感流行的有效对策提供了重要的科学依据。  相似文献   

Expressed sequence tags (ESTs) are generated and deposited in the public domain, as redundant, unannotated, single-pass reactions, with virtually no biological content. PipeOnline automatically analyses and transforms large collections of raw DNA-sequence data from chromatograms or FASTA files by calling the quality of bases, screening and removing vector sequences, assembling and rewriting consensus sequences of redundant input files into a unigene EST data set and finally through translation, amino acid sequence similarity searches, annotation of public databases and functional data. PipeOnline generates an annotated database, retaining the processed unigene sequence, clone/file history, alignments with similar sequences, and proposed functional classification, if available. Functional annotation is automatic and based on a novel method that relies on homology of amino acid sequence multiplicity within GenBank records. Records are examined through a function ordered browser or keyword queries with automated export of results. PipeOnline offers customization for individual projects (MyPipeOnline), automated updating and alert service. PipeOnline is available at http://stress-genomics.org.  相似文献   

