首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
REGANOR     
With >1,000 prokaryotic genome sequencing projects ongoing or already finished, comprehensive comparative analysis of the gene content of these genomes has become viable. To allow for a meaningful comparative analysis, gene prediction of the various genomes should be as accurate as possible. It is clear that improving the state of genome annotation requires automated gene identification methods to cope with the influence of artifacts, such as genomic GC content. There is currently still room for improvement in the state of annotations. We present a web server and a database of high-quality gene predictions. The web server is a resource for gene identification in prokaryote genome sequences. It implements our previously described, accurate gene finding method REGANOR. We also provide novel gene predictions for 241 complete, or almost complete, prokaryotic genomes. We demonstrate how this resource can easily be utilised to identify promising candidates for currently missing genes from genome annotations with several examples. All data sets are available online. AVAILABILITY: The gene finding server is accessible via https://www.cebitec.uni-bielefeld.de/groups/brf/software/reganor/cgi-bin/reganor_upload.cgi. The server software is available with the GenDB genome annotation system (version 2.2.1 onwards) under the GNU general public license. The software can be downloaded from https://sourceforge.net/projects/gendb/. More information on installing GenDB and REGANOR and the system requirements can be found on the GenDB project page http://www.cebitec.uni-bielefeld.de/groups/brf/software/wiki/GenDBWiki/AdministratorDocumentation/GenDBInstallation  相似文献   

2.
《Genomics》2020,112(1):286-288
Synteny and collinearity analysis is a standard investigative strategy done in many comparative genomic studies to understand genomic conservation and evolution. Currently, most visualization toolkits of synteny and collinearity do not emphasize the graphical representation of the results, especially the lack of extensible format on vector graphics outputs. This limitation becomes more apparent as 3rd generation sequencing brings high-throughput data, requiring relatively higher resolution for the resulting images. We developed VGSC2, the 2nd version of the web-based vector graph toolkit for genome synteny and collinearity analysis. The updated version enables four types of plots for synteny and collinearity, and three types of plots for gene family evolutionary research. Using web-based technologies, VGSC2 provides an easy-to-use user interface to display the homologous genomic result into vector graphs such as SVG, EPS, and PDF, as well as an online editor. VGSC2 is open source and freely available for use online through the web server available at http://bio.njfu.edu.cn/vgsc2.  相似文献   

3.

Background  

The SEED integrates many publicly available genome sequences into a single resource. The database contains accurate and up-to-date annotations based on the subsystems concept that leverages clustering between genomes and other clues to accurately and efficiently annotate microbial genomes. The backend is used as the foundation for many genome annotation tools, such as the Rapid Annotation using Subsystems Technology (RAST) server for whole genome annotation, the metagenomics RAST server for random community genome annotations, and the annotation clearinghouse for exchanging annotations from different resources. In addition to a web user interface, the SEED also provides Web services based API for programmatic access to the data in the SEED, allowing the development of third-party tools and mash-ups.  相似文献   

4.
SLAM is a program that simultaneously aligns and annotates pairs of homologous sequences. The SLAM web server integrates SLAM with repeat masking tools and the AVID alignment program to allow for rapid alignment and gene prediction in user submitted sequences. Along with annotations and alignments for the submitted sequences, users obtain a list of predicted conserved non-coding sequences (and their associated alignments). The web site also links to whole genome annotations of the human, mouse and rat genomes produced with the SLAM program. The server can be accessed at http://bio.math.berkeley.edu/slam.  相似文献   

5.
Gilligan P  Brenner S  Venkatesh B 《Gene》2002,294(1-2):35-44
The compact genome of the pufferfish, Fugu rubripes, has been proposed as a 'reference' genome to aid in annotating and analysing the human genome. We have annotated and compared 85 kb of Fugu sequence containing 17 genes with its homologous loci in the human draft genome and identified three 'novel' human genes that were missed or incompletely predicted by the previous gene prediction methods. Two of the novel genes contain zinc finger domains and are designated ZNF366 and ZNF367. They map to human chromosomes 5q13.2 and 9q22.32, respectively. The third novel gene, designated C9orf21, maps to chromosome 9q22.32. This gene is unique to vertebrates, and the protein encoded by it does not contain any known domains. We could not find human homologs for two Fugu genes, a novel chemokine gene and a kinase gene. These genes are either specific to teleosts or lost in the human lineage. The Fugu-human comparison identified several conserved non-coding sequences in the promoter and intronic regions. These sequences, conserved during 450 million years of vertebrate evolution, are likely to be involved in gene regulation. The 85 kb Fugu locus is dispersed over four human loci, occupying about 1.5 Mb. Contiguity is conserved in the human genome between six out of 16 Fugu gene pairs. These contiguous chromosomal segments should share a common evolutionary history dating back to the common ancestor of mammals and teleosts. We propose contiguity as strong evidence to identify orthologous genes in distant organisms. This study confirms the utility of the Fugu as a supplementary tool to uncover and confirm novel genes and putative gene regulatory regions in the human genome.  相似文献   

6.
During microbial evolution, genome rearrangement increases with increasing sequence divergence. If the relationship between synteny and sequence divergence can be modeled, gene clusters in genomes of distantly related organisms exhibiting anomalous synteny can be identified and used to infer functional conservation. We applied the phylogenetic pairwise comparison method to establish and model a strong correlation between synteny and sequence divergence in all 634 available Archaeal and Bacterial genomes from the NCBI database and four newly assembled genomes of uncultivated Archaea from an acid mine drainage (AMD) community. In parallel, we established and modeled the trend between synteny and functional relatedness in the 118 genomes available in the STRING database. By combining these models, we developed a gene functional annotation method that weights evolutionary distance to estimate the probability of functional associations of syntenous proteins between genome pairs. The method was applied to the hypothetical proteins and poorly annotated genes in newly assembled acid mine drainage Archaeal genomes to add or improve gene annotations. This is the first method to assign possible functions to poorly annotated genes through quantification of the probability of gene functional relationships based on synteny at a significant evolutionary distance, and has the potential for broad application.  相似文献   

7.
We herein present and discuss the services and content which are available on the web server of IBM's Bioinformatics and Pattern Discovery group. The server is operational around the clock and provides access to a variety of methods that have been published by the group's members and collaborators. The available tools correspond to applications ranging from the discovery of patterns in streams of events and the computation of multiple sequence alignments, to the discovery of genes in nucleic acid sequences and the interactive annotation of amino acid sequences. Additionally, annotations for more than 70 archaeal, bacterial, eukaryotic and viral genomes are available on-line and can be searched interactively. The tools and code bundles can be accessed beginning at http://cbcsrv.watson.ibm.com/Tspd.html whereas the genomics annotations are available at http://cbcsrv.watson.ibm.com/Annotations/.  相似文献   

8.
Of the sequence comparison methods, profile-based methods perform with greater selectively than those that use pairwise comparisons. Of the profile methods, hidden Markov models (HMMs) are apparently the best. The first part of this paper describes calculations that (i) improve the performance of HMMs and (ii) determine a good procedure for creating HMMs for sequences of proteins of known structure. For a family of related proteins, more homologues are detected using multiple models built from diverse single seed sequences than from one model built from a good alignment of those sequences. A new procedure is described for detecting and correcting those errors that arise at the model-building stage of the procedure. These two improvements greatly increase selectivity and coverage.The second part of the paper describes the construction of a library of HMMs, called SUPERFAMILY, that represent essentially all proteins of known structure. The sequences of the domains in proteins of known structure, that have identities less than 95 %, are used as seeds to build the models. Using the current data, this gives a library with 4894 models.The third part of the paper describes the use of the SUPERFAMILY model library to annotate the sequences of over 50 genomes. The models match twice as many target sequences as are matched by pairwise sequence comparison methods. For each genome, close to half of the sequences are matched in all or in part and, overall, the matches cover 35 % of eukaryotic genomes and 45 % of bacterial genomes. On average roughly 15% of genome sequences are labelled as being hypothetical yet homologous to proteins of known structure. The annotations derived from these matches are available from a public web server at: http://stash.mrc-lmb.cam.ac.uk/SUPERFAMILY. This server also enables users to match their own sequences against the SUPERFAMILY model library.  相似文献   

9.
Goode DK  Snell P  Smith SF  Cooke JE  Elgar G 《Genomics》2005,86(2):172-181
Comparative genomic analysis reveals an exceptionally large section of conserved shared synteny between the human 7q36 chromosomal region and the pufferfish (Fugu rubripes) genome. Remarkably, this conservation extends not only to gene order across 16 genes, but also to the position and orientation of a number of prominent conserved noncoding elements (CNEs). A functional assay using zebrafish has shown that most of the CNEs have reproducible and specific enhancer activity. This enhancer activity is often detected in a subset of tissues which reflect the endogenous expression pattern of a proximal gene, though some CNEs may act over a long range. We propose that the distribution of CNEs, and their probable association with a number of genes throughout the region, imposes a critical constraint on genome architecture, resulting in the maintenance of such a large section of conserved synteny across the vertebrate lineage.  相似文献   

10.
A prerequisite to understanding the evolution of the human X chromosome is the analysis of synteny of X-linked genes in different species. We have focused on the spermine synthase gene in human Xp22. 1. We show that whereas the human gene spans a genomic region of 54 kb, the Fugu rubripes gene is encompassed in a 4.7-kb region. However, we could not find conserved synteny between this region of human Xp22 and the equivalent F. rubripes region. A cosmid clone containing the F. rubripes gene does not contain other X-linked genes. Instead we identified homologs of human genes that are autosomally localized: the ryanodine receptor type I (RYRI), which is implicated in malignant hyperthermia and central core disease, and the HE6 gene. Comparison of the F. rubripes, Tetraodon fluviatilis, mouse, human, and Danio rerio 5'UTRs of spermine synthase highlights conserved sequences potentially involved in regulation. Interestingly, pseudogenes of this gene that are present in the human and mouse genomes seem to be absent in the compact F. rubripes genome. Analysis of a D. rerio PAC clone containing spermine synthase shows an intermediate genomic size in this fish. Sequence analysis of this PAC clone did not reveal other known genes: neither the RYRI gene, nor the HE6 gene, nor other human Xp22 genes were identified.  相似文献   

11.
12.
The Japanese pufferfish Fugu rubripes has a 400 Mb genome with high gene density and minimal non-coding complexity, and is therefore an ideal vertebrate model for sequence comparison. The identification of regions of conserved synteny between Fugu and humans would greatly accelerate the mapping and ordering of genes. Fugu C9 was cloned and sequenced as a first step in an attempt to characterize the region in Fugu homologous to human chromosome 5p13. The 11 exons of the Fugu C9 gene share 33% identity with human C9 and span 2.9 kb of genomic DNA. By comparison, human C9 spans 90 kb, representing a 30-fold difference in size. We have also determined by cosmid sequence scanning that DOC-2, a tumour suppresser gene which also maps to human 5p13, lies 6–7 kb from C9 in a head-to-head or 5′ to 5′ orientation. These results demonstrate that the Fugu C9/DOC-2 locus is a region of conserved synteny. Sequence scanning of overlapping cosmids has identified two other genes, GAS-1 and FBP, both of which map to human chromosome 9q22, and lie adjacent to the Fugu C9/DOC-2 locus, indicating the boundary between two syntenic regions.  相似文献   

13.
ParIS Genome Rearrangement server   总被引:2,自引:0,他引:2  
SUMMARY: ParIS Genome Rearrangement is a web server for a Bayesian analysis of unichromosomal genome pairs. The underlying model allows inversions, transpositions and inverted transpositions. The server generates a Markov chain using a Partial Importance Sampler technique, and samples trajectories of mutations from this chain. The user can specify several marginalizations to the posterior: the posterior distribution of number of mutations needed to transform one genome into another, length distribution of mutations, number of mutations that have occurred at a given site. Both text and graphical outputs are available. We provide a limited server, a downloadable unlimited server that can be installed locally on any linux/Unix operating system, and a database of mitochondrial gene orders.  相似文献   

14.
RNAmmer: consistent and rapid annotation of ribosomal RNA genes   总被引:7,自引:0,他引:7  
The publication of a complete genome sequence is usually accompanied by annotations of its genes. In contrast to protein coding genes, genes for ribosomal RNA (rRNA) are often poorly or inconsistently annotated. This makes comparative studies based on rRNA genes difficult. We have therefore created computational predictors for the major rRNA species from all kingdoms of life and compiled them into a program called RNAmmer. The program uses hidden Markov models trained on data from the 5S ribosomal RNA database and the European ribosomal RNA database project. A pre-screening step makes the method fast with little loss of sensitivity, enabling the analysis of a complete bacterial genome in less than a minute. Results from running RNAmmer on a large set of genomes indicate that the location of rRNAs can be predicted with a very high level of accuracy. Novel, unannotated rRNAs are also predicted in many genomes. The software as well as the genome analysis results are available at the CBS web server.  相似文献   

15.
The Zebrafish Information Network (zfin.org) is the central repository for Danio rerio genetic and genomic data. The Zebrafish Information Network has served the zebrafish research community since 1994, expertly curating, integrating, and displaying zebrafish data. Key data types available at the Zebrafish Information Network include, but are not limited to, genes, alleles, human disease models, gene expression, phenotype, and gene function. The Zebrafish Information Network makes zebrafish research data Findable, Accessible, Interoperable, and Reusable through nomenclature, curatorial and annotation activities, web interfaces, and data downloads. Recently, the Zebrafish Information Network and 6 other model organism knowledgebases have collaborated to form the Alliance of Genome Resources, aiming to develop sustainable genome information resources that enable the use of model organisms to understand the genetic and genomic basis of human biology and disease. Here, we provide an overview of the data available at the Zebrafish Information Network including recent updates to the gene page to provide access to single-cell RNA sequencing data, links to Alliance web pages, ribbon diagrams to summarize the biological systems and Gene Ontology terms that have annotations, and data integration with the Alliance of Genome Resources.  相似文献   

16.
The major histocompatibility complex (MHC) region in fish has been subjected to piecemeal analysis centering on the in-depth characterization of single genes. The emphasis has been on those genes proven to be involved in the immune response such as the class I and class II antigen presenting genes and the complement genes. The Fugu genome data presents the opportunity to examine the short-range linkage of potentially all the human MHC orthologues and examine conserved synteny with the human and, to a more limited extent, zebrafish genomes. Analysis confirms the existence of a limited MHC locus in Fugu comprising the MHC class Ia genes and associated class II region genes involved in class I antigen presentation. Identification of additional human MHC orthologues indicates the completely dispersed nature of this region in fish, with a maximum of six MHC genes maintained within close proximity in any one contig. The majority of the other genes are present in the genome data as either singletons or pairs. Comparison with zebrafish substantiates previously observed linkages between class III region orthologues and hints at an ancient conserved class III region.  相似文献   

17.
18.
SUMMARY: The visualization-aided exploration of complex datasets will allow the research community to formulate novel functional hypotheses leading to a better understanding of biological processes at all levels. Therefore, we have developed a web resource termed VIS-O-BAC designed for the functional investigation of expression data for model systems, such as bacterial pathogens based on a graphical display. Genome-scale datasets derived from typical 'omic' approaches can directly be explored with respect to three biologically relevant aspects, the genome structure (operon organization), the organization of genes in pathways (KEGG) and the gene function with Gene Ontology (GO) terms. The integrated viewers can be used in parallel and combine expression data and functional annotations from different external data repositories. The graphical visualizations evidently accelerate both the validation of regulatory information and the detection of affected biological processes. AVAILABILITY: http://leger2.gbf.de/cgi-bin/vis-o-bac.pl. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

19.
Automatic annotation of organellar genomes with DOGMA   总被引:17,自引:0,他引:17  
The Dual Organellar GenoMe Annotator (DOGMA) automates the annotation of organellar (plant chloroplast and animal mitochondrial) genomes. It is a Web-based package that allows the use of BLAST searches against a custom database, and conservation of basepairing in the secondary structure of animal mitochondrial tRNAs to identify and annotate genes. DOGMA provides a graphical user interface for viewing and editing annotations. Annotations are stored on our password-protected server to enable repeated sessions of working on the same genome. Finished annotations can be extracted for direct submission to GenBank.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号