首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.

Background

Genome annotation is one way of summarizing the existing knowledge about genomic characteristics of an organism. There has been an increased interest during the last several decades in computer-based structural and functional genome annotation. Many methods for this purpose have been developed for eukaryotes and prokaryotes. Our study focuses on comparison of functional annotations of prokaryotic genomes. To the best of our knowledge there is no fully automated system for detailed comparison of functional genome annotations generated by different annotation methods (AMs).

Results

The presence of many AMs and development of new ones introduce needs to: a/ compare different annotations for a single genome, and b/ generate annotation by combining individual ones. To address these issues we developed an Automated Tool for Bacterial GEnome Annotation ComparisON (BEACON) that benefits both AM developers and annotation analysers. BEACON provides detailed comparison of gene function annotations of prokaryotic genomes obtained by different AMs and generates extended annotations through combination of individual ones. For the illustration of BEACON’s utility, we provide a comparison analysis of multiple different annotations generated for four genomes and show on these examples that the extended annotation can increase the number of genes annotated by putative functions up to 27 %, while the number of genes without any function assignment is reduced.

Conclusions

We developed BEACON, a fast tool for an automated and a systematic comparison of different annotations of single genomes. The extended annotation assigns putative functions to many genes with unknown functions. BEACON is available under GNU General Public License version 3.0 and is accessible at: http://www.cbrc.kaust.edu.sa/BEACON/.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1826-4) contains supplementary material, which is available to authorized users.  相似文献   

3.
4.
With recent advances in genotyping and sequencing technologies,many disease susceptibility loci have been identified.However,much of the genetic heritability remains unexplained and the replication rate between independent studies is still low.Meanwhile,there have been increasing efforts on functional annotations of the entire human genome,such as the Encyclopedia of DNA Elements(ENCODE)project and other similar projects.It has been shown that incorporating these functional annotations to prioritize genome wide association signals may help identify true association signals.However,to our knowledge,the extent of the improvement when functional annotation data are considered has not been studied in the literature.In this article,we propose a statistical framework to estimate the improvement in replication rate with annotation data,and apply it to Crohn’s disease and DNase I hypersensitive sites.The results show that with cell line specific functional annotations,the expected replication rate is improved,but only at modest level.  相似文献   

5.
Next‐generation technologies generate an overwhelming amount of gene sequence data. Efficient annotation tools are required to make these data amenable to functional genomics analyses. The Mercator pipeline automatically assigns functional terms to protein or nucleotide sequences. It uses the MapMan ‘BIN’ ontology, which is tailored for functional annotation of plant ‘omics’ data. The classification procedure performs parallel sequence searches against reference databases, compiles the results and computes the most likely MapMan BINs for each query. In the current version, the pipeline relies on manually curated reference classifications originating from the three reference organisms (Arabidopsis, Chlamydomonas, rice), various other plant species that have a reviewed SwissProt annotation, and more than 2000 protein domain and family profiles at InterPro, CDD and KOG. Functional annotations predicted by Mercator achieve accuracies above 90% when benchmarked against manual annotation. In addition to mapping files for direct use in the visualization software MapMan, Mercator provides graphical overview charts, detailed annotation information in a convenient web browser interface and a MapMan‐to‐GO translation table to export results as GO terms. Mercator is available free of charge via http://mapman.gabipd.org/web/guest/app/Mercator .  相似文献   

6.
Mycobacterium leprae has undergone extensive degenerative evolution, with a large number of pseudogenes. It is also the organism with the greatest divergence between gene annotations from independent institutes. Therefore, M. leprae is a good model to verify the currently predicted coding sequence regions between different annotations, to identify new ones and to investigate the expression of pseudogenes. We submitted a total extract of the bacteria isolated from Armadillo to Gel‐LC‐MS/MS using a linear quadrupole ion trap‐Orbitrap mass spectrometer. Spectra were analyzed using the Leproma (1614 genes and 1133 pseudogenes) and TIGR (5446 genes) databases and a database containing the full genome translation. We identified a total of 1046 proteins, including five proteins encoded by previously predicted pseudogenes, which upon closer inspection appeared to be proper genes. Only 11 of the additional annotations by TIGR were verified. We also identified six tryptic peptides from five proteins from regions not considered to be coding sequences, in addition to peptides from two unannotated gene candidates that overlap with other genes. Our data show that the Leproma annotation of M. leprae is quite accurate, and there were no peptide observations corresponding to true pseudogenes, except for a new gene candidate, overlapping with an essential enolase on the complementary strand.  相似文献   

7.
8.
9.
原核生物蛋白质基因组学研究进展   总被引:1,自引:0,他引:1  
随着基因组测序技术的不断发展,大量微生物基因组序列可以在短时间内得以准确鉴定。为了进一步探究基因组的结构与功能,基于序列特征与同源特征的基因组注释算法广泛应用于新测序物种。然而受基因组测序质量以及算法本身准确性偏低等问题的影响,现有的基因组注释存在着相当比例的假基因以及注释错误,尤其是蛋白质N端的注释错误。为了弥补基因组注释的不足,以基因芯片或RNA-seq为核心的转录组测序技术和以串联质谱为核心的蛋白质组测序技术可以高通量地对基因的转录和翻译产物进行精确测定,进而实现预测基因结构的实验验证。然而,原核生物细胞中存在的大量非编码RNA给转录组测序技术引入了污染数据,限制了其对基因组注释的应用。相对而言,以串联质谱技术为核心的蛋白质组学测序可以在短时间内鉴定到生物体内大量的蛋白质,实现注释基因的验证甚至校准。已成为基因组注释和重注释的重要依据,并因而衍生了"蛋白质基因组学"的新研究方向。文中首先介绍传统的基于序列预测和同源比对的基因组注释算法,指出其中存在的不足。在此基础上,结合转录组学与蛋白质组学的技术特点,分析蛋白质组学对于原核生物基因组注释的优势,总结现阶段大规模蛋白质基因组学研究的进展情况。最后从信息学角度指出当前蛋白质组数据进行基因组重注释存在的问题与相应的解决方案,进而探讨未来蛋白质基因组学的发展方向。  相似文献   

10.
11.
12.
The sequence and genome annotations of Drosophila melanogaster were initially published in late 1999 and early 2000. Since then, the Berkeley Drosophila Genome Project (BDGP) and FlyBase have improved the quality of the sequence and reviewed the annotations by hand, respectively, to produce an account of the fruit fly genome that is of the highest quality. This review discusses the main features of this process, both from the point of view of the biology revealed in the end result and in the development of software that has been central to this genome sequencing and annotation project.  相似文献   

13.
Neomegalonema perideroedes (formerly Meganema perideroedes) str. G1 is the type strain and only described isolate of the genus Neomegalonema (formerly Meganema) which belongs to the Alphaproteobacteria. N. perideroedes is distinguished by the ability to accumulate high amounts of polyhydroxyalkanoates and has been associated with bulking problems in wastewater treatment plants due to its filamentous morphology. In 2013, its genome was sequenced as part of the Genomic Encyclopedia of Bacteria and Archaea (GEBA), which aims to improve the sequencing coverage of the poorly represented regions of the bacterial and archaeal branches of the tree of life. As N. perideroedes str. G1 is relatively distantly related to well described species—being the only sequenced member of its proposed family—the in silico prediction of genes by nucleotide homology to reference genes might be less reliable. Here, a proteomic dataset for the refinement of the N. perideroedes genome annotations is generated which clearly indicates the shortcomings of high‐throughput in silico genome annotation.  相似文献   

14.
Vitis vinifera has been an emblematic plant for humans since the Neolithic period. Human civilization has been shaped by its domestication as both its medicinal and nutritional values were exploited. It is now cultivated on all habitable continents, and more than 5000 varieties have been developed. A global passion for the art of wine fuels innovation and a profound desire for knowledge on this plant. The genome sequence of a homozygotic cultivar and several RNA‐seq datasets on other varieties have been released paving the way to gaining further insight into its biology and tailoring improvements to varieties. However, its genome annotation remains unpolished. In this issue of Proteomics, Chapman and Bellgard (Proteomics 2017, 17, 1700197) discuss how proteogenomics can help improve genome annotation. By mining shotgun proteomics data, they defined new protein‐coding genes, refined gene structures, and corrected numerous mRNA splicing events. This stimulating study shows how large international consortia could work together to improve plant and animal genome annotation on a large scale. To achieve this aim, time should be invested to generate comprehensive, high‐quality experimental datasets for a wide range of well‐defined lineages and exploit them with pipelines capable of handling giant datasets.  相似文献   

15.
The equine genome sequence enables the use of high-throughput genomic technologies in equine research, but accurate identification of expressed gene products and interpreting their biological relevance require additional structural and functional genome annotation. Here, we employ the equine genome sequence to identify predicted and known proteins using proteomics and model these proteins into biological pathways, identifying 582 proteins in normal cell-free equine bronchoalveolar lavage fluid (BALF). We improved structural and functional annotation by directly confirming the in vivo expression of 558 (96%) proteins, which were computationally predicted previously, and adding Gene Ontology (GO) annotations for 174 proteins, 108 of which lacked functional annotation. Bronchoalveolar lavage is commonly used to investigate equine respiratory disease, leading us to model the associated proteome and its biological functions. Modelling of protein functions using Ingenuity Pathway Analysis identified carbohydrate metabolism, cell-to-cell signalling, cellular function, inflammatory response, organ morphology, lipid metabolism and cellular movement as key biological processes in normal equine BALF. Comparative modelling of protein functions in normal cell-free bronchoalveolar lavage proteomes from horse, human, and mouse, performed by grouping GO terms sharing common ancestor terms, confirms conservation of functions across species. Ninety-one of 92 human GO categories and 105 of 109 mouse GO categories were conserved in the horse. Our approach confirms the utility of the equine genome sequence to characterize protein networks without antibodies or mRNA quantification, highlights the need for continued structural and functional annotation of the equine genome and provides a framework for equine researchers to aid in the annotation effort.  相似文献   

16.
With the availability of a new highly contiguous Bos taurus reference genome assembly (ARS-UCD1.2), it is the opportune time to upgrade the bovine gene set by seeking input from researchers. Furthermore, advances in graphical genome annotation tools now make it possible for researchers to leverage sequence data generated with the latest technologies to collaboratively curate genes. For many years the Bovine Genome Database (BGD) has provided tools such as the Apollo genome annotation editor to support manual bovine gene curation. The goal of this paper is to explain the reasoning behind the decisions made in the manual gene curation process while providing examples using the existing BGD tools. We will describe the sources of gene annotation evidence provided at the BGD, including RNA-seq and Iso-Seq data. We will also explain how to interpret various data visualizations when curating gene models, and will demonstrate the value of manual gene annotation. The process described here can be applied to manual gene curation for other species with similar tools. With a better understanding of manual gene annotation, researchers will be encouraged to edit gene models and contribute to the enhancement of livestock gene sets.  相似文献   

17.
Plants produce a myriad of specialized metabolites to overcome their sessile habit and combat biotic as well as abiotic stresses. Evolution has shaped the diversity of specialized metabolites, which then drives many other aspects of plant biodiversity. However, until recently, large‐scale studies investigating the diversity of specialized metabolites in an evolutionary context have been limited by the impossibility of identifying chemical structures of hundreds to thousands of compounds in a time‐feasible manner. Here we introduce a workflow for large‐scale, semi‐automated annotation of specialized metabolites and apply it to over 1000 metabolites of the cosmopolitan plant family Rhamnaceae. We enhance the putative annotation coverage dramatically, from 2.5% based on spectral library matches alone to 42.6% of total MS/MS molecular features, extending annotations from well‐known plant compound classes into dark plant metabolomics. To gain insights into substructural diversity within this plant family, we also extract patterns of co‐occurring fragments and neutral losses, so‐called Mass2Motifs, from the dataset; for example, only the Ziziphoid clade developed the triterpenoid biosynthetic pathway, whereas the Rhamnoid clade predominantly developed diversity in flavonoid glycosides, including 7‐O‐methyltransferase activity. Our workflow provides the foundations for the automated, high‐throughput chemical identification of massive metabolite spaces, and we expect it to revolutionize our understanding of plant chemoevolutionary mechanisms.  相似文献   

18.
19.
20.
The Genome Warehouse (GWH) is a public repository housing genome assembly data for a wide range of species and delivering a series of web services for genome data submission, storage, release, and sharing. As one of the core resources in the National Genomics Data Center (NGDC), part of the China National Center for Bioinformation (CNCB; https://ngdc.cncb.ac.cn), GWH accepts both full and partial (chloroplast, mitochondrion, and plasmid) genome sequences with different assembly levels, as well as an update of existing genome assemblies. For each assembly, GWH collects detailed genome-related metadata of biological project, biological sample, and genome assembly, in addition to genome sequence and annotation. To archive high-quality genome sequences and annotations, GWH is equipped with a uniform and standardized procedure for quality control. Besides basic browse and search functionalities, all released genome sequences and annotations can be visualized with JBrowse. By May 21, 2021, GWH has received 19,124 direct submissions covering a diversity of 1108 species and has released 8772 of them. Collectively, GWH serves as an important resource for genome-scale data management and provides free and publicly accessible data to support research activities throughout the world. GWH is publicly accessible at https://ngdc.cncb.ac.cn/gwh.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号