首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A large number of complete microorganism genomes has been sequenced and submitted to the public database and then incorporated into our complete genome database, Genome Information Broker (GIB, http://gib.genes.nig.ac.jp/). However, when comparative genomics is carried out, researchers must be aware that there are protein-coding genes not confirmed by homology or motif search and that reliable protein-coding genes are missing. Therefore, we developed a protocol (Gene Trek in Prokaryote Space, GTPS) for finding possible protein-coding genes in bacterial genomes. GTPS assigns a degree of reliability to predicted protein-coding genes. We first systematically applied the protocol to the complete genomes of all 123 bacterial species and strains that were publicly available as of July 2003, and then to those of 183 species and strains available as of September 2004. We found a number of incorrect genes and several new ones in the genome data in question. We also found a way to estimate the total number of orthologous genes in the bacterial world.  相似文献   

2.
Large-scale prokaryotic gene prediction and comparison to genome annotation   总被引:4,自引:0,他引:4  
MOTIVATION: Prokaryotic genomes are sequenced and annotated at an increasing rate. The methods of annotation vary between sequencing groups. It makes genome comparison difficult and may lead to propagation of errors when questionable assignments are adapted from one genome to another. Genome comparison either on a large or small scale would be facilitated by using a single standard for annotation, which incorporates a transparency of why an open reading frame (ORF) is considered to be a gene. RESULTS: A total of 143 prokaryotic genomes were scored with an updated version of the prokaryotic genefinder EasyGene. Comparison of the GenBank and RefSeq annotations with the EasyGene predictions reveals that in some genomes up to approximately 60% of the genes may have been annotated with a wrong start codon, especially in the GC-rich genomes. The fractional difference between annotated and predicted confirms that too many short genes are annotated in numerous organisms. Furthermore, genes might be missing in the annotation of some of the genomes. We predict 41 of 143 genomes to be over-annotated by >5%, meaning that too many ORFs are annotated as genes. We also predict that 12 of 143 genomes are under-annotated. These results are based on the difference between the number of annotated genes not found by EasyGene and the number of predicted genes that are not annotated in GenBank. We argue that the average performance of our standardized and fully automated method is slightly better than the annotation.  相似文献   

3.
《遗传学报》2020,47(1):49-60
Noncoding RNAs(ncRNAs) play important roles in many biological processes and provide materials for evolutionary adaptations beyond protein-coding genes, such as in the arms race between the host and pathogen. However, currently, a comprehensive high-resolution analysis of primate genomes that includes the latest annotated ncRNAs is not available. Here, we developed a computational pipeline to estimate the selections that act on noncoding regions based on comparisons with a large number of reference sequences in introns adjacent to the interested regions. Our method yields result comparable with those of the established codon-based method and phyloP method for coding genes; thus, it provides a holistic framework for estimating the selection on the entire genome. We further showed that fastevolving protein-coding genes and their corresponding 50 UTRs have a significantly lower frequency of the CpG dinucleotides than those evolving at an average pace, and these fast-evolving genes are enriched in the process of immunity and host defense. We also identified fast-evolving miRNAs with antiviral functions in cells. Our results provide a resource for high-resolution evolution analysis of the primate genomes.  相似文献   

4.
One of the most remarkable observations stemming from the sequencing of genomes of diverse species is that the number of protein-coding genes in an organism does not correlate with its overall cellular complexity. Alternative splicing, a key mechanism for generating protein complexity, has been suggested as one of the major explanation for this discrepancy between the number of genes and genome complexity. Determining the extent and importance of alternative splicing required the confluence of critical advances in data acquisition, improved understanding of biological processes and the development of fast and accurate computational analysis tools. Although many model organisms have now been completely sequenced, we are still very far from understanding the exact frequency of alternative splicing from these sequenced genomes.This paper will highlight some recent progress and future challenges for functional genomics and bioinformatics in this rapidly developing area.  相似文献   

5.
6.
Yu JF  Xiao K  Jiang DK  Guo J  Wang JH  Sun X 《DNA research》2011,18(6):435-449
The falsely annotated protein-coding genes have been deemed one of the major causes accounting for the annotating errors in public databases. Although many filtering approaches have been designed for the over-annotated protein-coding genes, some are questionable due to the resultant increase in false negative. Furthermore, there is no webserver or software specifically devised for the problem of over-annotation. In this study, we propose an integrative algorithm for detecting the over-annotated protein-coding genes in microorganisms. Overall, an average accuracy of 99.94% is achieved over 61 microbial genomes. The extremely high accuracy indicates that the presented algorithm is efficient to differentiate the protein-coding genes from the non-coding open reading frames. Abundant analyses show that the predicting results are reliable and the integrative algorithm is robust and convenient. Our analysis also indicates that the over-annotated protein-coding genes can cause the false positive of horizontal gene transfers detection. The webserver of the proposed algorithm can be freely accessible from www.cbi.seu.edu.cn/RPGM.  相似文献   

7.
Metazoan genomes are being sequenced at an increasingly rapid rate. For each new genome, the number of protein-coding genes it encodes and the amount of functional DNA it contains are known only inaccurately. Nevertheless, there have been considerable recent advances in identifying protein-coding and non-coding sequences that have remained constrained in diverse species. However, these approaches struggle to pinpoint genomic sequences that are functional in some species but that are absent or not functional in others. Yet it is here, encoded in lineage-specific and functional sequence, that we expect physiological differences between species to be most concentrated.  相似文献   

8.
The availability of a large number of complete genome sequences raises the question of how many genes are essential for cellular life. Trying to reconstruct the core of the protein-coding gene set for a hypothetical minimal bacterial cell, we have performed a computational comparative analysis of eight bacterial genomes. Six of the analyzed genomes are very small due to a dramatic genome size reduction process, while the other two, corresponding to free-living relatives, are larger. The available data from several systematic experimental approaches to define all the essential genes in some completely sequenced bacterial genomes were also considered, and a reconstruction of a minimal metabolic machinery necessary to sustain life was carried out. The proposed minimal genome contains 206 protein-coding genes with all the genetic information necessary for self-maintenance and reproduction in the presence of a full complement of essential nutrients and in the absence of environmental stress. The main features of such a minimal gene set, as well as the metabolic functions that must be present in the hypothetical minimal cell, are discussed.  相似文献   

9.
刘玉萍  吕婷  朱迪  周勇辉  刘涛  苏旭 《植物研究》2018,38(4):518-525
藏扇穗茅(Littledalea tibetica)是禾本科(Poaceae)雀麦族(Bromeae)中一个具有重要生态价值的多年生高山特有种,主要分布于青藏高原及其毗邻地区。本文采用基于第二代高通量测序平台的Illumina MiSeq技术,对青藏高原特有种—藏扇穗茅进行了叶绿体基因组测序,首次建立了雀麦族物种的标准测序流程;同时,以其近缘物种—黑麦草(Lolium perenne)的叶绿体基因组序列作为参考,组装获得它的叶绿体基因组序列。结果表明,藏扇穗茅叶绿体基因组序列全长136 852 bp,GC含量为38.5%,呈典型的四段式结构,其中大(LSC)、小(SSC)单拷贝区大小分别为80 970和12 876 bp,反向互补重复区(IR)大小为21 503 bp,共注释得到141个基因,包含95个蛋白编码基因、38个tRNA基因和8个rRNA基因,主要分布于大单拷贝区和小单拷贝区。同时,基于藏扇穗茅和其它30种禾本科植物叶绿体基因全序列构建的系统发育树显示,藏扇穗茅与早熟禾亚科中小麦族植物亲缘关系较近。  相似文献   

10.
BACKGROUND: The annotation of genomes from next-generation sequencing platforms needs to be rapid, high-throughput, and fully integrated and automated. Although a few Web-based annotation services have recently become available, they may not be the best solution for researchers that need to annotate a large number of genomes, possibly including proprietary data, and store them locally for further analysis. To address this need, we developed a standalone software application, the Annotation of microbial Genome Sequences (AGeS) system, which incorporates publicly available and in-house-developed bioinformatics tools and databases, many of which are parallelized for high-throughput performance. METHODOLOGY: The AGeS system supports three main capabilities. The first is the storage of input contig sequences and the resulting annotation data in a central, customized database. The second is the annotation of microbial genomes using an integrated software pipeline, which first analyzes contigs from high-throughput sequencing by locating genomic regions that code for proteins, RNA, and other genomic elements through the Do-It-Yourself Annotation (DIYA) framework. The identified protein-coding regions are then functionally annotated using the in-house-developed Pipeline for Protein Annotation (PIPA). The third capability is the visualization of annotated sequences using GBrowse. To date, we have implemented these capabilities for bacterial genomes. AGeS was evaluated by comparing its genome annotations with those provided by three other methods. Our results indicate that the software tools integrated into AGeS provide annotations that are in general agreement with those provided by the compared methods. This is demonstrated by a >94% overlap in the number of identified genes, a significant number of identical annotated features, and a >90% agreement in enzyme function predictions.  相似文献   

11.
随着高通量DNA测序技术的飞速发展,越来越多的物种完成了基因组测序.定位编码基因、确定编码基因结构是基因组注释的基本任务,然而以往的基因组注释方法主要依赖于DNA及RNA序列信息.为了更加精确地解读完成测序的基因组,我们需要整合多种类型的组学数据进行基因组注释.近年来,基于串联质谱技术的蛋白质组学已经发展成熟,实现了对蛋白质组的高覆盖,使得利用串联质谱数据进行基因组注释成为可能.串联质谱数据一方面可以对已注释的基因进行表达验证,另一方面还可以校正原注释基因,进而发现新基因,实现对基因组序列的重新注释.这正是当前进展较快的蛋白质基因组学的研究内容.利用该方法系统地注释已完成测序的基因组已成为解读基因组的一个重要补充.本文综述了蛋白质基因组学的主要研究内容和研究方法,并展望了该研究方向未来的发展.  相似文献   

12.
13.
To explore the mitochondrial genes of the Cruciferae family, the mitochondrial genome of Raphanus sativus (sat) was sequenced and annotated. The circular mitochondrial genome of sat is 239,723 bp and includes 33 protein-coding genes, three rRNA genes and 17 tRNA genes. The mitochondrial genome also contains a pair of large repeat sequences 5.9 kb in length, which may mediate genome reorga-nization into two sub-genomic circles, with predicted sizes of 124.8 kb and 115.0 kb, respectively. Furthermore, gene evolution of mitochondrial genomes within the Cruciferae family was analyzed using sat mitochondrial type (mitotype), together with six other re-ported mitotypes. The cruciferous mitochondrial genomes have maintained almost the same set of functional genes. Compared with Cycas taitungensis (a representative gymnosperm), the mitochondrial genomes of the Cruciferae have lost nine protein-coding genes and seven mitochondrial-like tRNA genes, but acquired six chloroplast-like tRNAs. Among the Cruciferae, to maintain the same set of genes that are necessary for mitochondrial function, the exons of the genes have changed at the lowest rates, as indicated by the numbers of single nucleotide polymorphisms. The open reading frames (ORFs) of unknown function in the cruciferous genomes are not conserved. Evolutionary events, such as mutations, genome reorganizations and sequence insertions or deletions (indels), have resulted in the non- conserved ORFs in the cruciferous mitochondrial genomes, which is becoming significantly different among mitotypes. This work represents the first phylogenic explanation of the evolution of genes of known function in the Cruciferae family. It revealed significant variation in ORFs and the causes of such variation.  相似文献   

14.
Recognition of protein-coding genes, a classical bioinformatics issue, is an absolutely needed step for annotating newly sequenced genomes. The Z-curve algorithm, as one of the most effective methods on this issue, has been successfully applied in annotating or re-annotating many genomes, including those of bacteria, archaea and viruses. Two Z-curve based ab initio gene-finding programs have been developed: ZCURVE (for bacteria and archaea) and ZCURVE_V (for viruses and phages). ZCURVE_C (for 57 bacteria) and Zfisher (for any bacterium) are web servers for re-annotation of bacterial and archaeal genomes. The above four tools can be used for genome annotation or re-annotation, either independently or combined with the other gene-finding programs. In addition to recognizing protein-coding genes and exons, Z-curve algorithms are also effective in recognizing promoters and translation start sites. Here, we summarize the applications of Z-curve algorithms in gene finding and genome annotation.  相似文献   

15.
16.
17.
Pérez-Brocal V  Latorre A  Gil R  Moya A 《Gene》2005,345(1):73-80
Preliminary analysis of two selected genomic regions of Buchnera aphidicola BCc, the primary endosymbiont of the cedar aphid Cinara cedri, has revealed a number of interesting features when compared with the corresponding homologous regions of the three B. aphidicola genomes previously sequenced, that are associated with different aphid species. Both regions exhibit a significant reduction in length and gene number in B. aphidicola BCc, as it could be expected since it possess the smallest bacterial genome. However, the observed genome reduction is not even in both regions, as it appears to be dependent on the nature of their gene content. The region fpr-trxA, that contains mainly metabolic genes, has lost almost half of its genes (45.6%) and has reduced 52.9% its length. The reductive process in the region rrl-aroK, that contains mainly ribosomal protein genes, is less dramatic, since it has lost 9.3% of genes and has reduced 15.5% of its length. Length reduction is mainly due to the loss of protein-coding genes, not to the shortening of ORFs or intergenic regions. In both regions, G+C content is about 4% lower in BCc than in the other B. aphidicola strains. However, when only conserved genes and intergenic regions of the four B. aphidicola strains are compared, the G+C reduction is higher in the fpr-trxA region.  相似文献   

18.
19.
Most bacterial genomes have one single chromosome. The species Burkholderia cenocepacia, a Gram-negative β-proteobacterium, is one of the exceptions. Genomes of four strains of the species have been sequenced and each has three circular chromosomes. In the genus Burkholderia, there are another seven sequenced strains that have three chromosomes. In this paper, the numbers of essential genes and tRNA genes among the 11 strains of the genus Burkholderia are compared. Interestingly, it is found that the shortest chromosome of B. cenocepacia AU-1054 has much (over three times) more essential genes and tRNA genes than the corresponding chromosomes in the other 10 strains. However, no significant difference has been found on the two longer chromosomes among the 11 strains. Non-homologous chromosomal translocation between chromosomes I and III in the species B. cenocepacia is found to be responsible for the unusual distribution of essential genes. The present work may contribute to the understanding of how the secondary chromosomes of multipartite bacterial genomes originate and evolve. The computer program, DEG_match, for comparatively identifying essential genes in any annotated bacterial genomes is freely available at http://cobi.uestc.edu.cn/resource/AU1054/.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号