首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 234 毫秒
水稻全基因组编码抗病基因同源序列分析   总被引:1,自引:1,他引:0  
利用模糊搜索的方法,在TIGR水稻日本晴基因组数据库(TIGR Rice Genome Annotation-Release5)中识别出565个编码抗病蛋白质的同源序列;利用识别出565个编码抗病蛋白质序列分别与籼稻基因组数据库进行BLASTP联配,共确定320个对应的等位基因。通过在线生物信息学软件,识别了这565个抗病基因的保守结构域、保守模体和DNA序列内转座子元件,其中有14个抗病基因同源序列注释错误。同时绘出了这些基因的基因组分布,并基于这些基因的同源树分析和基因组物理分布,认为基因的原位和远程复制事件产生了抗病基因的现存分布和多样性,其中转座子在复制过程中扮演了重要角色。这些对抗病机制研究和抗病基因进化研究以及抗病基因的转育具有重要意义。  相似文献   

水稻NBS-LRR基因选择性剪接的全基因组检测及分析   总被引:1,自引:0,他引:1  
顾连峰  郭荣发 《遗传学报》2007,34(3):247-257
选择性剪接是促进基因组复杂性和蛋白质组多样性的一种主要机制,但是对水稻NBS-LRR序列选择性剪接的全基因组分析却未见报道。通过隐马尔柯夫模型搜索,从TIGR数据库里得到了855条编码NBS-LRR基序的序列。利用这些序列在KOME、TIGR基因索引及UniProt三个数据库中进行同源搜索,获得同源的完整cDNA序列、假设一致性序列和蛋白质序列。再利用Spidey和SIM4程序把完整cDNA序列和假设一致性序列联配到相应的BAC序列上来预测选择性剪接。蛋白质序列和基因组序列之间的联配使用tBLASTn。在这875个NBS-LRR基因中,119个基因具有选择性剪接现象,其中包括71内含子保留,20个外显子跳跃,25个选择性起始,16个选择性终止,12个5′端的选择性剪接和16个3′端选择性剪接。大多数选择性剪接都为两个和多个转录本所支持。可以通过访问http://www.bioinfor.org查询这些数据。进而通过生物信息学分析剪接边界发现外显子跳跃和内含子保留的‘GT…AG’的规则不如组成型的保守。这暗示了它们是通过不同的调控机制来指导剪接变构体的形成。通过分析内含子保留对蛋白质的影响,发现选择性剪接的蛋白更倾向于改变其C端氨基酸序列。最后对选择性剪接的组织分布和蛋白质定位进行分析,结果表明选择性剪接的最大类的组织分布是根和愈伤组织。超过1/3剪接变构体的蛋白质定位是质膜和细胞质。这些选择性剪接蛋白可能在抗病信号转导中起到重要作用。  相似文献   

草地贪夜蛾基因组注释及分析   总被引:2,自引:0,他引:2  
草地贪夜蛾Spodoptera frugiperda近年来在我国迅速扩散,造成了重大的经济损失,引起社会关注。草地贪夜蛾基因组序列对深入研究其迁飞、入侵和抗药性等特性具有十分重要的作用。目前,已有5个版本的基因组序列被公开报道,但3个版本无基因组注释信息。除以Sf 9细胞系为DNA来源的基因组版本外,其他版本的scaffold N50过小,拼接质量偏低。为此,本研究选取了scaffold N50最大的草地贪夜蛾Sf 9细胞系基因组进行了蛋白编码基因注释。该版本的基因组重复序列占比28.1%。CEGMA评估显示该本版本基因组可覆盖93.6%的核心基因,BUSCO评估显示可覆盖90.8%的核心基因。利用OMIGA注释流程预测到25 699个蛋白质编码基因,详细的基因序列可从InsectBase网站获得(http://www.insect-genome.com/FAW/),其中具有GO注释的基因为15 623个,具有KEGG注释的基因共有9 213个。选取了12个鳞翅目昆虫进行比较基因组学分析,发现草地贪夜蛾与斜纹夜蛾的亲缘关系最近,两者分化时间大约在1 284万年前。对12个鳞翅目昆虫蛋白质编码基因进行同源分析,在草地贪夜蛾中发现了2 490个单拷贝基因、891个鳞翅目特有基因、2 360个物种特异扩增基因和4 180个物种特异基因。GO富集分析显示,2 360个物种特异扩增基因主要参与DNA整合、代谢相关的生物过程;4 180个物种特异基因主要参与酶活性、光感受、糖代谢等,KEGG通路富集发现草地贪夜蛾特异基因主要参与氨基酸代谢、糖代谢和Wnt信号通路。本研究结果丰富了草地贪夜蛾的基因信息,对进一步了解其生物学特性、开发新型绿色防控方法具有指导意义。  相似文献   

原核生物蛋白质基因组学研究进展   总被引:1,自引:0,他引:1       下载免费PDF全文
随着基因组测序技术的不断发展,大量微生物基因组序列可以在短时间内得以准确鉴定。为了进一步探究基因组的结构与功能,基于序列特征与同源特征的基因组注释算法广泛应用于新测序物种。然而受基因组测序质量以及算法本身准确性偏低等问题的影响,现有的基因组注释存在着相当比例的假基因以及注释错误,尤其是蛋白质N端的注释错误。为了弥补基因组注释的不足,以基因芯片或RNA-seq为核心的转录组测序技术和以串联质谱为核心的蛋白质组测序技术可以高通量地对基因的转录和翻译产物进行精确测定,进而实现预测基因结构的实验验证。然而,原核生物细胞中存在的大量非编码RNA给转录组测序技术引入了污染数据,限制了其对基因组注释的应用。相对而言,以串联质谱技术为核心的蛋白质组学测序可以在短时间内鉴定到生物体内大量的蛋白质,实现注释基因的验证甚至校准。已成为基因组注释和重注释的重要依据,并因而衍生了"蛋白质基因组学"的新研究方向。文中首先介绍传统的基于序列预测和同源比对的基因组注释算法,指出其中存在的不足。在此基础上,结合转录组学与蛋白质组学的技术特点,分析蛋白质组学对于原核生物基因组注释的优势,总结现阶段大规模蛋白质基因组学研究的进展情况。最后从信息学角度指出当前蛋白质组数据进行基因组重注释存在的问题与相应的解决方案,进而探讨未来蛋白质基因组学的发展方向。  相似文献   

随着测序技术的不断发展,产生了海量的基因组测序数据,极大地丰富了公共遗传数据资源。同时为了应对大量基因组数据的产生,基因组比较和注释算法、工具不断更新,使得联合多种注释工具得到更准确的蛋白编码基因的注释信息成为可能。目前公共数据库的原核生物基因组测序和装配有些是10多年前的,存在大量预测的功能未知的编码基因。为了提升美国国家生物信息中心(National Center for Biotechnology Information,NCBI)数据库中基因组的注释质量,本研究联合使用多种原核基因识别算法/软件和基因表达数据重注释1587个细菌和古细菌基因组。首先,利用Z曲线的33个变量从177个基因组原注释中识别获得3092个被过度注释为蛋白编码基因的序列;其次,通过同源比对为939个基因组中的4447个功能未知的蛋白编码基因注释上具体功能;最后,通过联合采用ZCURVE 3.0和Glimmer 3.02以及Prodigal这3种高精度的、广泛使用且基于算法不同而互补的基因识别软件来寻找漏注释基因。最终,从9个基因组中找到了2003个被漏注释的蛋白编码基因,这些基因属于多个蛋白质直系同源簇(clusters of orthologous groups of proteins, COG)。本研究使用新的工具并结合多组学数据重新注释早期测序的细菌和古细菌基因组,不仅为新测序菌株提供注释方法参考,而且这些重注释后得到的细菌基因序列也会对后续基础研究有所帮助。  相似文献   

水稻14-3-3蛋白家族的生物信息学分析   总被引:12,自引:0,他引:12  
金谷雷  汪旭升  朱军 《遗传学报》2005,32(7):726-732
通过隐马尔柯夫模型(Hidden Markov Model,HMM),对粳稻(Oryza sativa L.ssp.japonica)基因组的蛋白质数据库进行搜索,结果获得8个14—3—3蛋白的同源序列,其中发现4个新基因。通过对所有粳稻的14—3—3蛋白的DNA序列与各种表达序列标签(Expression Sequence Tags,ESTs)进一步比对,为14-3-3蛋白找到了ESTs的证据。结果说明这些基因在水稻不同的处理和不同的部位都有所表达,而且不同成员之间的表达模式存在较大的差异。蛋白质多序列联配分析结果表明,存在可能的功能多态位点。通过基因结构和染色体定位的分析,确认了水稻基因组中存在E样和非E样两类14-3-3蛋白。此外,对目前植物中的14—3—3家族作了初步的进化分析。  相似文献   

中国板栗EST-SNP和抗栗疫病候选基因分析及同源比对   总被引:1,自引:0,他引:1  
从壳斗科基因组数据库下载中国板栗的Unigene序列和基于这些Unigene序列开发获得的EST-SNP数据,分析发现,中国板栗EST-SNP的发生频率为4个/kb,在碱基置换类型上,C-T置换发生的频率最高,C-G置换发生频率最低,转换和颠换的比值为1.74∶1.对211个在中国板栗健康组织和感染栗疫病的染病组织中差异表达的基因进行基因注释和蛋白质结构域分析,结果表明,参与蛋白质代谢、对胁迫、生物和非生物刺激发生响应的基因所占的比例较多,并发现了大量的跨膜结构、信号肽、卷曲螺旋和蛋白激酶相关结构域.对这211个中国板栗抗栗疫病相关的候选基因和其Unigene序列进行同源比对,统计整理定位在具有同源性的Unigene序列上的EST-SNP,共有3023个EST-SNP标记.这批EST-SNP标记可为今后开展基于候选基因途径的中国板栗抗栗疫病居群基因组学研究和关联作图研究奠定重要的基础.  相似文献   

近缘物种基因组间保留了祖先的大量信息,具有较好的保守性。通过比较近缘物种的基因组序列可以获得大片段的共线性区域,而这些区域内包含了丰富的同源信息,可用来发现未知基因、改善基因组注释的质量。本研究中,首先,借助同样的基因组注释平台对它们进行了基因组注释。其次,通过比较两个全基因组序列获得共线性信息,然后基于共线性信息对他们的基因组注释进行改善。最终,在野生黄瓜中新注释出了909个基因,栽培黄瓜中新注释出了853个基因。结合野生与栽培黄瓜的转录组信息,在野生黄瓜中发现了87例开放阅读框(ORF)较长的基因被错误注释成多个ORF短基因,40例多个ORF较短的基因被过度预测成单个长ORF基因;相应地在栽培黄瓜中分别确定了166例和36例错误注释。  相似文献   

随着高通量DNA测序技术的飞速发展,越来越多的物种完成了基因组测序.定位编码基因、确定编码基因结构是基因组注释的基本任务,然而以往的基因组注释方法主要依赖于DNA及RNA序列信息.为了更加精确地解读完成测序的基因组,我们需要整合多种类型的组学数据进行基因组注释.近年来,基于串联质谱技术的蛋白质组学已经发展成熟,实现了对蛋白质组的高覆盖,使得利用串联质谱数据进行基因组注释成为可能.串联质谱数据一方面可以对已注释的基因进行表达验证,另一方面还可以校正原注释基因,进而发现新基因,实现对基因组序列的重新注释.这正是当前进展较快的蛋白质基因组学的研究内容.利用该方法系统地注释已完成测序的基因组已成为解读基因组的一个重要补充.本文综述了蛋白质基因组学的主要研究内容和研究方法,并展望了该研究方向未来的发展.  相似文献   

微生物基因组注释系统MGAP   总被引:6,自引:0,他引:6  
利用生物信息学方法和工具开发了微生物基因组注释系统(Microbial genome annotation package, MGAP),并用于蓝细菌PCC7002的基因组注释。该系统由基因组注释系统和基于Web的用户接口程序两部分组成。基因组注释系统整合多个基因识别、功能预测和序列分析软件;以及蛋白质序列数据库、蛋白质资源信息系统和直系同源蛋白质家族数据库等。用户接口程序包括基因组环状图展示、基因和开放读码框在染色体上的分布图,以及注释信息检索工具。该系统基于PC微机和Linux操作系统,用MySQL作数据库管理系统、用Apache作Web服务器程序,用Perl脚本语言编写应用程序接口,上述软件均可免费获得。  相似文献   

Two plasmids, of 28,878 bp and 28,012 bp, were isolated from Leptospirillum ferrooxidans ATCC 49879. Altogether, a total of 67 open reading frames (ORFs) were identified on both plasmids, of which 32 had predicted products with high homology to proteins of known function, while 11 ORFs had predicted products with homology to previously identified proteins of unknown function. Twenty-four ORFs had products with no homologues in the GenBank/NCBI database. An analysis of the ORFs and other features of the two plasmids, the first to be isolated from a bacterium of the genus Leptospirillum, is presented.  相似文献   

Two plasmids, of 28,878 bp and 28,012 bp, were isolated from Leptospirillum ferrooxidans ATCC 49879. Altogether, a total of 67 open reading frames (ORFs) were identified on both plasmids, of which 32 had predicted products with high homology to proteins of known function, while 11 ORFs had predicted products with homology to previously identified proteins of unknown function. Twenty-four ORFs had products with no homologues in the GenBank/NCBI database. An analysis of the ORFs and other features of the two plasmids, the first to be isolated from a bacterium of the genus Leptospirillum, is presented.  相似文献   

The genes that are located within the odvp-6e/odv-e56 region of the Choristoneura fumiferana granulovirus (ChfuGV) were identified by sequencing the 11 kb BamHI restriction fragment on the ChfuGV genome. The global GC content that was calculated from the data obtained from this genomic region was 34.96%. The open-reading frames (ORFs), located within the odvp-6e/odv-e56 region, are presented and compared to the equivalent ORFs that are located at the same region in other GVs. This region is composed of 14 ORFs, including three ORFs that are unique to ChfuGV with no obvious homologues in other baculoviruses as well as eleven ORFs with homologues to granuloviral ORFs, such as granulin, CfORF2, pk-1, ie-1, odv-e18, p49, and odvp-6e/odv-e56. In this study, the conceptual products of seven major conserved ORFs (granulin, CfORF2, IE-1, ODV-E18, p49 and ODVP-6E/ODV-E56) were used in order to construct phylogenetic trees. Our results show that granuloviruses can be grouped in 2 distinct groups as follows: Group I; Choristoneura fumiferana granulovirus (ChfuGV), Cydia pomonella granulovirus (CpGV), Phthorimaea operculella granulovirus (PhopGV), and Adoxophyes orana granulovirus (AoGV). Group II; Xestia c-nigrum granulovirus (XcGV), Plutella xylostella granulovirus (PxGV), and Trichoplusia ni granulovirus (TnGV). The ChfuGV conserved proteins are most closely related to those of CpGV, PhopGV, and AoGV. Comparative studies, performed on gene arrangements within this region of genomes, demonstrated that three GVs from group I maintain similar gene arrangements.  相似文献   

Comparisons of the 6213 predicted Saccharomyces cerevisiae open reading frame (ORF) products with sequences from organisms of other biological phyla differentiate genes commonly conserved in evolution from 'maverick' genes which have no homologue in phyla other than the Ascomycetes. We show that a majority of the 'maverick' genes have homologues among other yeast species and thus define a set of 1892 genes that, from sequence comparisons, appear 'Ascomycetes-specific'. We estimate, retrospectively, that the S. cerevisiae genome contains 5651 actual protein-coding genes, 50 of which were identified for the first time in this work, and that the present public databases contain 612 predicted ORFs that are not real genes. Interestingly, the sequences of the 'Ascomycetes-specific' genes tend to diverge more rapidly in evolution than that of other genes. Half of the 'Ascomycetes-specific' genes are functionally characterized in S. cerevisiae, and a few functional categories are over-represented in them.  相似文献   

Herpesviruses or herpesviral sequences have been identified in various bat species. Here, we report the isolation, cell tropism, and complete genome sequence of a novel betaherpesvirus from the bat Miniopterus schreibersii (MsHV). In primary cell culture, MsHV causes cytopathic effects (CPE) and reaches peak virus production 2 weeks after infection. MsHV was found to infect and replicate less efficiently in a feline kidney cell, CRFK, and failed to replicate in 13 other cell lines tested. Sequencing of the MsHV genome using the 454 system, with a 224-fold coverage, revealed a genome size of 222,870 bp. The genome was extensively analyzed in comparison to those of related viruses. Of the 190 predicted open reading frames (ORFs), 40 were identified as herpesvirus core genes. Among 93 proteins with identifiable homologues in tree shrew herpesvirus (THV), human cytomegalovirus (HCMV), or rat cytomegalovirus (RCMV), most had highest sequence identities with THV counterparts. However, the MsHV genome organization is colinear with that of RCMV rather than that of THV. The following unique features were discovered in the MsHV genome. One predicted protein, B125, is similar to human herpesvirus 6 (HHV-6) U94, a homologue of the parvovirus Rep protein. For the unique ORFs, 7 are predicted to encode major histocompatibility complex (MHC)-related proteins, 2 to encode MHC class I homologues, and 3 to encode MHC class II homologues; 4 encode the homologues of C-type lectin- or natural killer cell lectin-like receptors;, and the products of a unique gene family, the b149 family, of 16 members, have no significant sequence identity with known proteins but exhibit immunoglobulin-like beta-sandwich domains revealed by three-dimensional (3D) structural prediction. To our knowledge, MsHV is the first virus genome known to encode MHC class II homologues.  相似文献   

Several species of tsetse flies can be infected by the Glossina pallidipes salivary gland hypertrophy virus (GpSGHV). Infection causes salivary gland hypertrophy and also significantly reduces the fecundity of the infected flies. To better understand the molecular basis underlying the pathogenesis of this unusual virus, we sequenced and analyzed its genome. The GpSGHV genome is a double-stranded circular DNA molecule of 190,032 bp containing 160 nonoverlapping open reading frames (ORFs), which are distributed equally on both strands with a gene density of one per 1.2 kb. It has a high A+T content of 72%. About 3% of the GpSGHV genome is composed of 15 sequence repeats, distributed throughout the genome. Although sharing the same morphological features (enveloped rod-shaped nucleocapsid) as baculoviruses, nudiviruses, and nimaviruses, analysis of its genome revealed that GpSGHV differs significantly from these viruses at the level of its genes. Sequence comparisons indicated that only 23% of GpSGHV genes displayed moderate homologies to genes from other invertebrate viruses, principally baculoviruses and entomopoxviruses. Most strikingly, the GpSGHV genome encodes homologues to the four baculoviral per os infectivity factors (p74 [pif-0], pif-1, pif-2, and pif-3). The DNA polymerase encoded by GpSGHV is of type B and appears to be phylogenetically distant from all DNA polymerases encoded by large double-stranded DNA viruses. The majority of the remaining ORFs could not be assigned by sequence comparison. Furthermore, no homologues to DNA-dependent RNA polymerase subunits were detected. Taken together, these data indicate that GpSGHV is the prototype member of a novel group of insect viruses.  相似文献   

Members of the Deinococcaceae (e.g., Thermus, Meiothermus, Deinococcus) contain A/V-ATPases typically found in Archaea or Eukaryotes which were probably acquired by horizontal gene transfer. Two methods were used to quantify the extent to which archaeal or eukaryotic genes have been acquired by this lineage. Screening of a Meiothermus ruber library with probes made against Thermoplasma acidophilum DNA yielded a number of clones which hybridized more strongly than background. One of these contained the prolyl tRNA synthetase (RS) gene. Phylogenetic analysis shows the M. ruber and D. radiodurans prolyl RS to be more closely related to archaeal and eukaryal forms of this gene than to the typical bacterial type. Using a bioinformatics approach, putative open reading frames (ORFs) from the prerelease version of the D. radiodurans genome were screened for genes more closely related to archaeal or eukaryotic genes. Putative ORFs were searched against representative genomes from each of the three domains using automated BLAST. ORFs showing the highest matches against archaeal and eukaryotic genes were collected and ranked. Among the top-ranked hits were the A/V-ATPase catalytic and noncatalytic subunits and the prolyl RS genes. Using phylogenetic methods, ORFs were analyzed and trees assessed for evidence of horizontal gene transfer. Of the 45 genes examined, 20 showed topologies in which D. radiodurans homologues clearly group with eukaryotic or archaeal homologues, and 17 additional trees were found to show probable evidence of horizontal gene transfer. Compared to the total number of ORFs in the genome, those that can be identified as having been acquired from Archaea or Eukaryotes are relatively few (approximately 1%), suggesting that interdomain transfer is rare.  相似文献   

The origin of eukaryotic cell nuclei by symbiosis of Archaea in Bacteria was proposed on the basis of the phylogenetic topologies of genes. However, it was not possible to conclude whether or not the genes involved were authentic representative genes. Furthermore, using the BLAST and FASTA programs, the similarity of open reading frame (ORF) groups between three domains (Eukarya, Archaea and Bacteria) was estimated at one threshold. Therefore, their similarities at other thresholds could not be clarified. Here we use our newly developed 'homology-hit analysis' method, which uses multiple thresholds, to determine the origin of the nucleus. We removed mitochondria-related ORFs from yeast ORFs, and determined the number of yeast orthologous ORFs in each functional category to the ORFs in six Archaea and nine Bacteria at several thresholds (E-values) using the BLAST. Our results indicate that yeast ORFs related to the nucleus may share their origins with archaeal ORFs, whereas ORFs that are related to the cytoplasm may share their origins with bacterial ORFs. Our results thus strongly support the idea of nucleus symbiosis.  相似文献   

The complete nucleotide sequence (62.8 kb) of pGS18, the largest sequenced plasmid to date from the species Geobacillus stearothermophilus, was determined. Computational analysis of sequence data revealed 65 putative open reading frames (ORFs); 38 were carried on one strand and 27 were carried on the other. These ORFs comprised 84.1% of the pGS18 sequence. Twenty-five ORFs (38.4%) were assigned to putative functions; four ORFs (6.2%) were annotated as pseudogenes. The amino acid sequences obtained from 29 ORFs (44.6%) had the highest similarity to hypothetical proteins of the other microorganisms, and seven (10.8%) had no significant similarity to any genes present in the current open databases. Plasmid replication region, strongly resembling that of the theta-type replicon, and genes encoding three different plasmid maintenance systems were identified, and a putative discontinuous transfer region was localized. In addition, we also found several mobile genetic elements and genes, responsible for DNA repair, distributed along the whole sequence of pGS18. The alignment of pGS18 with two other large indigenous plasmids of the genus Geobacillus highlighted the presence of well-conserved segments and has provided a framework that can be exploited to formulate hypotheses concerning the molecular evolution of these three plasmids.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号