共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Rajkumar Sasidharan Tam��s Nepusz David Swarbreck Eva Huala Alberto Paccanaro 《Nucleic acids research》2012,40(19):e152
We have developed GFam, a platform for automatic annotation of gene/protein families. GFam provides a framework for genome initiatives and model organism resources to build domain-based families, derive meaningful functional labels and offers a seamless approach to propagate functional annotation across periodic genome updates. GFam is a hybrid approach that uses a greedy algorithm to chain component domains from InterPro annotation provided by its 12 member resources followed by a sequence-based connected component analysis of un-annotated sequence regions to derive consensus domain architecture for each sequence and subsequently generate families based on common architectures. Our integrated approach increases sequence coverage by 7.2 percentage points and residue coverage by 14.6 percentage points higher than the coverage relative to the best single-constituent database within InterPro for the proteome of Arabidopsis. The true power of GFam lies in maximizing annotation provided by the different InterPro data sources that offer resource-specific coverage for different regions of a sequence. GFam’s capability to capture higher sequence and residue coverage can be useful for genome annotation, comparative genomics and functional studies. GFam is a general-purpose software and can be used for any collection of protein sequences. The software is open source and can be obtained from http://www.paccanarolab.org/software/gfam/. 相似文献
3.
Meyer F Goesmann A McHardy AC Bartels D Bekel T Clausen J Kalinowski J Linke B Rupp O Giegerich R Pühler A 《Nucleic acids research》2003,31(8):2187-2195
The flood of sequence data resulting from the large number of current genome projects has increased the need for a flexible, open source genome annotation system, which so far has not existed. To account for the individual needs of different projects, such a system should be modular and easily extensible. We present a genome annotation system for prokaryote genomes, which is well tested and readily adaptable to different tasks. The modular system was developed using an object-oriented approach, and it relies on a relational database backend. Using a well defined application programmers interface (API), the system can be linked easily to other systems. GenDB supports manual as well as automatic annotation strategies. The software currently is in use in more than a dozen microbial genome annotation projects. In addition to its use as a production genome annotation system, it can be employed as a flexible framework for the large-scale evaluation of different annotation strategies. The system is open source. 相似文献
4.
metaSHARK: software for automated metabolic network prediction from DNA sequence and its application to the genomes of Plasmodium falciparum and Eimeria tenella 总被引:1,自引:0,他引:1
下载免费PDF全文
![点击此处可从《Nucleic acids research》网站下载免费的PDF全文](/ch/ext_images/free.gif)
The metabolic SearcH And Reconstruction Kit (metaSHARK) is a new fully automated software package for the detection of enzyme-encoding genes within unannotated genome data and their visualization in the context of the surrounding metabolic network. The gene detection package (SHARKhunt) runs on a Linux system and requires only a set of raw DNA sequences (genomic, expressed sequence tag and/or genome survey sequence) as input. Its output may be uploaded to our web-based visualization tool (SHARKview) for exploring and comparing data from different organisms. We first demonstrate the utility of the software by comparing its results for the raw Plasmodium falciparum genome with the manual annotations available at the PlasmoDB and PlasmoCyc websites. We then apply SHARKhunt to the unannotated genome sequences of the coccidian parasite Eimeria tenella and observe that, at an E-value cut-off of 10−20, our software makes 142 additional assertions of enzymatic function compared with a recent annotation package working with translated open reading frame sequences. The ability of the software to cope with low levels of sequence coverage is investigated by analyzing assemblies of the E.tenella genome at estimated coverages from 0.5× to 7.5×. Lastly, as an example of how metaSHARK can be used to evaluate the genomic evidence for specific metabolic pathways, we present a study of coenzyme A biosynthesis in P.falciparum and E.tenella. 相似文献
5.
《Bioscience, biotechnology, and biochemistry》2013,77(3):670-672
We developed a semi-automated genome analysis system called GAMBLER in order to support the current whole-genome sequencing project focusing on alkaliphilic Bacillus halodurans C-125. GAMBLER was designed to reduce the human intervention required and to reduce the complications in annotating thousands of ORFs in the microbial genome. GAMBLER automates three major routines: analyzing assembly results provided by genome assembler software, assigning ORFs, and homology searching. GAMBLER is equipped with an interface for convenience of annotation. All processes and options are manipulatable through a WWW browser that enables scientists to share their genome analysis results without choosing computer platforms. 相似文献
6.
7.
8.
Lewis SE Searle SM Harris N Gibson M Lyer V Richter J Wiel C Bayraktaroglir L Birney E Crosby MA Kaminker JS Matthews BB Prochnik SE Smithy CD Tupy JL Rubin GM Misra S Mungall CJ Clamp ME 《Genome biology》2002,3(12):research0082.1-8214
The well-established inaccuracy of purely computational methods for annotating genome sequences necessitates an interactive tool to allow biological experts to refine these approximations by viewing and independently evaluating the data supporting each annotation. Apollo was developed to meet this need, enabling curators to inspect genome annotations closely and edit them. FlyBase biologists successfully used Apollo to annotate the Drosophila melanogaster genome and it is increasingly being used as a starting point for the development of customized annotation editing tools for other genome projects. 相似文献
9.
Gene and SNP annotation are among the first and most important steps in analyzing a genome. As the number of sequenced genomes continues to grow, a key question is: how does the quality of the assembled sequence affect the annotations? We compared the gene and SNP annotations for two different Bos taurus genome assemblies built from the same data but with significant improvements in the later assembly. The same annotation software was used for annotating both sequences. While some annotation differences are expected even between high-quality assemblies such as these, we found that a staggering 40% of the genes (>9,500) varied significantly between assemblies, due in part to the availability of new gene evidence but primarily to genome mis-assembly events and local sequence variations. For instance, although the later assembly is generally superior, 660 protein coding genes in the earlier assembly are entirely missing from the later genome''s annotation, and approximately 3,600 (15%) of the genes have complex structural differences between the two assemblies. In addition, 12–20% of the predicted proteins in both assemblies have relatively large sequence differences when compared to their RefSeq models, and 6–15% of bovine dbSNP records are unrecoverable in the two assemblies. Our findings highlight the consequences of genome assembly quality on gene and SNP annotation and argue for continued improvements in any draft genome sequence. We also found that tracking a gene between different assemblies of the same genome is surprisingly difficult, due to the numerous changes, both small and large, that occur in some genes. As a side benefit, our analyses helped us identify many specific loci for improvement in the Bos taurus genome assembly. 相似文献
10.
11.
12.
Brian J Haas Jennifer R Wortman Catherine M Ronning Linda I Hannick Roger K Smith Jr Rama Maiti Agnes P Chan Chunhui Yu Maryam Farzad Dongying Wu Owen White Christopher D Town 《BMC biology》2005,3(1):1-19
Background
Since the initial publication of its complete genome sequence, Arabidopsis thaliana has become more important than ever as a model for plant research. However, the initial genome annotation was submitted by multiple centers using inconsistent methods, making the data difficult to use for many applications.Results
Over the course of three years, TIGR has completed its effort to standardize the structural and functional annotation of the Arabidopsis genome. Using both manual and automated methods, Arabidopsis gene structures were refined and gene products were renamed and assigned to Gene Ontology categories. We present an overview of the methods employed, tools developed, and protocols followed, summarizing the contents of each data release with special emphasis on our final annotation release (version 5).Conclusion
Over the entire period, several thousand new genes and pseudogenes were added to the annotation. Approximately one third of the originally annotated gene models were significantly refined yielding improved gene structure annotations, and every protein-coding gene was manually inspected and classified using Gene Ontology terms. 相似文献13.
Protein interaction mapping on a functional shotgun sequence of Rickettsia sibirica 总被引:4,自引:1,他引:3
下载免费PDF全文
![点击此处可从《Nucleic acids research》网站下载免费的PDF全文](/ch/ext_images/free.gif)
Malek JA Wierzbowski JM Tao W Bosak SA Saranga DJ Doucette-Stamm L Smith DR McEwan PJ McKernan KJ 《Nucleic acids research》2004,32(3):1059-1064
Protein interaction maps can reveal novel pathways and functional complexes, allowing ‘guilt by association’ annotation of uncharacterized proteins. To address the need for large-scale protein interaction analyses, a bacterial two-hybrid system was coupled with a whole genome shotgun sequencing approach for microbial genome analysis. We report the first large-scale proteomics study using this system, integrating de novo genome sequencing with functional interaction mapping and annotation in a high-throughput format. We apply the approach by shotgun sequencing and annotating the genome of Rickettsia sibirica strain 246, an obligate intracellular human pathogen among the Spotted Fever Group rickettsiae. The bacteria invade endothelial cells and cause lysis after large amounts of progeny have accumulated. Little is known about specific Rickettsial virulence factors and their mode of pathogenicity. Analysis of the combined genomic sequence and protein–protein interaction data for a set of virulence related Type IV secretion system (T4SS) proteins revealed over 250 interactions and will provide insight into the mechanism of Rickettsial pathogenicity. 相似文献
14.
In the post-genome era, insufficient functional annotation of predicted genes
greatly restricts the potential of mining genome data. We demonstrate that an
evolutionary approach, which is independent of functional annotation, has great
potential as a tool for genome analysis. We chose the genome of a model
filamentous fungus Neurospora crassa as an example.
Phylogenetic distribution of each predicted protein coding gene (PCG) in the
N. crassa genome was used to classify genes into six
mutually exclusive lineage specificity (LS) groups, i.e.
Eukaryote/Prokaryote-core, Dikarya-core, Ascomycota-core,
Pezizomycotina-specific, N. crassa-orphans and Others.
Functional category analysis revealed that only ∼23% of PCGs
in the two most highly lineage-specific grouping, Pezizomycotina-specific and
N. crassa-orphans, have functional annotation. In contrast,
∼76% of PCGs in the remaining four LS groups have functional
annotation. Analysis of chromosomal localization of N.
crassa-orphan PCGs and genes encoding for secreted proteins showed
enrichment in subtelomeric regions. The origin of N.
crassa-orphans is not known. We found that 11% of N.
crassa-orphans have paralogous N. crassa-orphan
genes. Of the paralogous N. crassa-orphan gene pairs,
33% were tandemly located in the genome, implying a duplication
origin of N. crassa-orphan PCGs in the past. LS grouping is
thus a useful tool to explore and understand genome organization, evolution and
gene function in fungi. 相似文献
15.
Background
The current progress in sequencing projects calls for rapid, reliable and accurate function assignments of gene products. A variety of methods has been designed to annotate sequences on a large scale. However, these methods can either only be applied for specific subsets, or their results are not formalised, or they do not provide precise confidence estimates for their predictions.Results
We have developed a large-scale annotation system that tackles all of these shortcomings. In our approach, annotation was provided through Gene Ontology terms by applying multiple Support Vector Machines (SVM) for the classification of correct and false predictions. The general performance of the system was benchmarked with a large dataset. An organism-wise cross-validation was performed to define confidence estimates, resulting in an average precision of 80% for 74% of all test sequences. The validation results show that the prediction performance was organism-independent and could reproduce the annotation of other automated systems as well as high-quality manual annotations. We applied our trained classification system to Xenopus laevis sequences, yielding functional annotation for more than half of the known expressed genome. Compared to the currently available annotation, we provided more than twice the number of contigs with good quality annotation, and additionally we assigned a confidence value to each predicted GO term.Conclusions
We present a complete automated annotation system that overcomes many of the usual problems by applying a controlled vocabulary of Gene Ontology and an established classification method on large and well-described sequence data sets. In a case study, the function for Xenopus laevis contig sequences was predicted and the results are publicly available at ftp://genome.dkfz-heidelberg.de/pub/agd/gene_association.agd_Xenopus.16.
Antony Kaspi Mark Ziemann Samuel T Keating Ishant Khurana Timothy Connor Briana Spolding Adrian Cooper Ross Lazarus Ken Walder Paul Zimmet Assam El-Osta 《Epigenetics》2014,9(10):1329-1338
Current computational methods used to analyze changes in DNA methylation and chromatin modification rely on sequenced genomes. Here we describe a pipeline for the detection of these changes from short-read sequence data that does not require a reference genome. Open source software packages were used for sequence assembly, alignment, and measurement of differential enrichment. The method was evaluated by comparing results with reference-based results showing a strong correlation between chromatin modification and gene expression. We then used our de novo sequence assembly to build the DNA methylation profile for the non-referenced Psammomys obesus genome. The pipeline described uses open source software for fast annotation and visualization of unreferenced genomic regions from short-read data. 相似文献
17.
Sepsid flies (Diptera: Sepsidae) are important model insects for sexual selection research. In order to develop mitochondrial (mt) genome data for this significant group, we sequenced the first complete mt genome of the sepsid fly Nemopoda mamaevi Ozerov, 1997. The circular 15,878 bp mt genome is typical of Diptera, containing all 37 genes usually present in bilaterian animals. We discovered inaccurate annotations of fly mt genomes previously deposited on GenBank and thus re-annotated all published mt genomes of Cyclorrhapha. These re-annotations were based on comparative analysis of homologous genes, and provide a statistical analysis of start and stop codon positions. We further detected two 18 bp of conserved intergenic sequences from tRNAGlu-tRNAPhe and ND1-tRNASer(UCN) across Cyclorrhapha, which are the mtTERM binding site motifs. Additionally, we compared automated annotation software MITOS with hand annotation method. Phylogenetic trees based on the mt genome data from Cyclorrhapha were inferred by Maximum-likelihood and Bayesian methods, strongly supported a close relationship between Sepsidae and the Tephritoidea. 相似文献
18.
Seán S. óhéigeartaigh David Armisén Kevin P. Byrne Kenneth H. Wolfe 《Journal of bacteriology》2014,196(11):2030-2042
We report the development of SearchDOGS Bacteria, software to automatically detect missing genes in annotated bacterial genomes by combining BLAST searches with comparative genomics. Having successfully applied the approach to yeast genomes, we redeveloped SearchDOGS to function as a standalone, downloadable package, requiring only a set of GenBank annotation files as input. The software automatically generates a homology structure using reciprocal BLAST and a synteny-based method; this is followed by a scan of the entire genome of each species for unannotated genes. Results are provided in a HTML interface, providing coordinates, BLAST results, syntenic location, omega values (Ka/Ks, where Ks is the number of synonymous substitutions per synonymous site and Ka is the number of nonsynonymous substitutions per nonsynonymous site) for protein conservation estimates, and other information for each candidate gene. Using SearchDOGS Bacteria, we identified 155 gene candidates in the Shigella boydii sb227 genome, including 56 candidates of length < 60 codons. SearchDOGS Bacteria has two major advantages over currently available annotation software. First, it outperforms current methods in terms of sensitivity and is highly effective at identifying small or highly diverged genes. Second, as a freely downloadable package, it can be used with unpublished or confidential data. 相似文献
19.
Peter Bakke Nick Carney Will DeLoache Mary Gearing Kjeld Ingvorsen Matt Lotz Jay McNair Pallavi Penumetcha Samantha Simpson Laura Voss Max Win Laurie J. Heyer A. Malcolm Campbell 《PloS one》2009,4(7)
Genome annotations are accumulating rapidly and depend heavily on automated annotation systems. Many genome centers offer annotation systems but no one has compared their output in a systematic way to determine accuracy and inherent errors. Errors in the annotations are routinely deposited in databases such as NCBI and used to validate subsequent annotation errors. We submitted the genome sequence of halophilic archaeon Halorhabdus utahensis to be analyzed by three genome annotation services. We have examined the output from each service in a variety of ways in order to compare the methodology and effectiveness of the annotations, as well as to explore the genes, pathways, and physiology of the previously unannotated genome. The annotation services differ considerably in gene calls, features, and ease of use. We had to manually identify the origin of replication and the species-specific consensus ribosome-binding site. Additionally, we conducted laboratory experiments to test H. utahensis growth and enzyme activity. Current annotation practices need to improve in order to more accurately reflect a genome''s biological potential. We make specific recommendations that could improve the quality of microbial annotation projects. 相似文献
20.