首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
ASAP: the Alternative Splicing Annotation Project   总被引:2,自引:0,他引:2  
Recently, genomics analyses have demonstrated that alternative splicing is widespread in mammalian genomes (30-60% of genes reported to have multiple isoforms), and may be one of their most important mechanisms of functional regulation. However, by comparison with other genomics data such as genome annotation, SNPs, or gene expression, there exists relatively little database infrastructure for the study of alternative splicing. We have constructed an online database ASAP (the Alternative Splicing Annotation Project) for biologists to access and mine the enormous wealth of alternative splicing information coming from genomics and proteomics. ASAP is based on genome-wide analyses of alternative splicing in human (30 793 alternative splice relationships found) from detailed alignment of expressed sequences onto the genomic sequence. ASAP provides precise gene exon-intron structure, alternative splicing, tissue specificity of alternative splice forms, and protein isoform sequences resulting from alternative splicing. Moreover, it can help biologists design probe sequences for distinguishing specific mRNA isoforms. ASAP is intended to be a community resource for collaborative annotation of alternative splice forms, their regulation, and biological functions. The URL for ASAP is http://www.bioinformatics.ucla.edu/ASAP.  相似文献   

2.
Functional and structural genomics using PEDANT   总被引:11,自引:0,他引:11  
MOTIVATION: Enormous demand for fast and accurate analysis of biological sequences is fuelled by the pace of genome analysis efforts. There is also an acute need in reliable up-to-date genomic databases integrating both functional and structural information. Here we describe the current status of the PEDANT software system for high-throughput analysis of large biological sequence sets and the genome analysis server associated with it. RESULTS: The principal features of PEDANT are: (i) completely automatic processing of data using a wide range of bioinformatics methods, (ii) manual refinement of annotation, (iii) automatic and manual assignment of gene products to a number of functional and structural categories, (iv) extensive hyperlinked protein reports, and (v) advanced DNA and protein viewers. The system is easily extensible and allows to include custom methods, databases, and categories with minimal or no programming effort. PEDANT is actively used as a collaborative environment to support several on-going genome sequencing projects. The main purpose of the PEDANT genome database is to quickly disseminate well-organized information on completely sequenced and unfinished genomes. It currently includes 80 genomic sequences and in many cases serves as the only source of exhaustive information on a given genome. The database also acts as a vehicle for a number of research projects in bioinformatics. Using SQL queries, it is possible to correlate a large variety of pre-computed properties of gene products encoded in complete genomes with each other and compare them with data sets of special scientific interest. In particular, the availability of structural predictions for over 300 000 genomic proteins makes PEDANT the most extensive structural genomics resource available on the web.  相似文献   

3.
4.
SUMMARY: GenColors is a new web-based software/database system aimed at an improved and accelerated annotation of prokaryotic genomes, considering information on related genomes and making extensive use of genome comparison. It offers a seamless integration of data from ongoing sequencing projects and annotated genomic sequences obtained from GenBank. The genome comparison tools determine, for example, best-bidirectional hits, gene conservation, syntenies and gene core sets. Swiss-Prot/TrEMBL hits allow annotations in an effective manner. To further support the annotation base-specific quality data can also be displayed if available. With GenColors dedicated genome browsers containing a group of related genomes can be easily set up and maintained. It has been efficiently used for Borrelia garinii and is currently applied to various ongoing genome projects. AVAILABILITY: Detailed information on GenColors is available at http://gencolors.imb-jena.de. Online usage of GenColors-based genome browsers is the preferred application mode. The system is also available upon request for local installation.  相似文献   

5.
KEGG: kyoto encyclopedia of genes and genomes   总被引:85,自引:3,他引:82       下载免费PDF全文
KEGG (Kyoto Encyclopedia of Genes and Genomes) is a knowledge base for systematic analysis of gene functions, linking genomic information with higher order functional information. The genomic information is stored in the GENES database, which is a collection of gene catalogs for all the completely sequenced genomes and some partial genomes with up-to-date annotation of gene functions. The higher order functional information is stored in the PATHWAY database, which contains graphical representations of cellular processes, such as metabolism, membrane transport, signal transduction and cell cycle. The PATHWAY database is supplemented by a set of ortholog group tables for the information about conserved subpathways (pathway motifs), which are often encoded by positionally coupled genes on the chromosome and which are especially useful in predicting gene functions. A third database in KEGG is LIGAND for the information about chemical compounds, enzyme molecules and enzymatic reactions. KEGG provides Java graphics tools for browsing genome maps, comparing two genome maps and manipulating expression maps, as well as computational tools for sequence comparison, graph comparison and path computation. The KEGG databases are daily updated and made freely available (http://www. genome.ad.jp/kegg/).  相似文献   

6.
Magnifying Genomes (MaGe) is a microbial genome annotation system based on a relational database containing information on bacterial genomes, as well as a web interface to achieve genome annotation projects. Our system allows one to initiate the annotation of a genome at the early stage of the finishing phase. MaGe's main features are (i) integration of annotation data from bacterial genomes enhanced by a gene coding re-annotation process using accurate gene models, (ii) integration of results obtained with a wide range of bioinformatics methods, among which exploration of gene context by searching for conserved synteny and reconstruction of metabolic pathways, (iii) an advanced web interface allowing multiple users to refine the automatic assignment of gene product functions. MaGe is also linked to numerous well-known biological databases and systems. Our system has been thoroughly tested during the annotation of complete bacterial genomes (Acinetobacter baylyi ADP1, Pseudoalteromonas haloplanktis, Frankia alni) and is currently used in the context of several new microbial genome annotation projects. In addition, MaGe allows for annotation curation and exploration of already published genomes from various genera (e.g. Yersinia, Bacillus and Neisseria). MaGe can be accessed at http://www.genoscope.cns.fr/agc/mage.  相似文献   

7.
Functional annotation of proteins encoded in newly sequenced genomes can be expected to meet two conflicting objectives: (i) provide as much information as possible, and (ii) avoid erroneous functional assignments and over-predictions. The continuing exponential growth of the number of sequenced genomes makes the quality of sequence annotation a critical factor in the efforts to utilize this new information. When dubious functional assignments are used as a basis for subsequent predictions, they tend to proliferate, leading to "database explosion". It is therefore important to identify the common factors that hamper functional annotation. As a first step towards that goal, we have compared the annotations of the Mycoplasma genitalium and Methanococcus jannaschii genomes produced in several independent studies. The most common causes of questionable predictions appear to be: i) non-critical use of annotations from existing database entries; ii) taking into account only the annotation of the best database hit; iii) insufficient masking of low complexity regions (e.g. non-globular domains) in protein sequences, resulting in spurious database hits obscuring relevant ones; iv) ignoring multi-domain organization of the query proteins and/or the database hits; v) non-critical functional inferences on the basis of the functions of neighboring genes in an operon; vi) non-orthologous gene displacement, i.e. involvement of structurally unrelated proteins in the same function. These observations suggest that case by case validation of functional annotation by expert biologists remains crucial for productive genome analysis.  相似文献   

8.
REGANOR     
With >1,000 prokaryotic genome sequencing projects ongoing or already finished, comprehensive comparative analysis of the gene content of these genomes has become viable. To allow for a meaningful comparative analysis, gene prediction of the various genomes should be as accurate as possible. It is clear that improving the state of genome annotation requires automated gene identification methods to cope with the influence of artifacts, such as genomic GC content. There is currently still room for improvement in the state of annotations. We present a web server and a database of high-quality gene predictions. The web server is a resource for gene identification in prokaryote genome sequences. It implements our previously described, accurate gene finding method REGANOR. We also provide novel gene predictions for 241 complete, or almost complete, prokaryotic genomes. We demonstrate how this resource can easily be utilised to identify promising candidates for currently missing genes from genome annotations with several examples. All data sets are available online. AVAILABILITY: The gene finding server is accessible via https://www.cebitec.uni-bielefeld.de/groups/brf/software/reganor/cgi-bin/reganor_upload.cgi. The server software is available with the GenDB genome annotation system (version 2.2.1 onwards) under the GNU general public license. The software can be downloaded from https://sourceforge.net/projects/gendb/. More information on installing GenDB and REGANOR and the system requirements can be found on the GenDB project page http://www.cebitec.uni-bielefeld.de/groups/brf/software/wiki/GenDBWiki/AdministratorDocumentation/GenDBInstallation  相似文献   

9.
10.
The remarkable advance in sequencing technology and the rising interest in medical and environmental microbiology, biotechnology, and synthetic biology resulted in a deluge of published microbial genomes. Yet, genome annotation, comparison, and modeling remain a major bottleneck to the translation of sequence information into biological knowledge, hence computational analysis tools are continuously being developed for rapid genome annotation and interpretation. Among the earliest, most comprehensive resources for prokaryotic genome analysis, the SEED project, initiated in 2003 as an integration of genomic data and analysis tools, now contains >5,000 complete genomes, a constantly updated set of curated annotations embodied in a large and growing collection of encoded subsystems, a derived set of protein families, and hundreds of genome-scale metabolic models. Until recently, however, maintaining current copies of the SEED code and data at remote locations has been a pressing issue. To allow high-performance remote access to the SEED database, we developed the SEED Servers (http://www.theseed.org/servers): four network-based servers intended to expose the data in the underlying relational database, support basic annotation services, offer programmatic access to the capabilities of the RAST annotation server, and provide access to a growing collection of metabolic models that support flux balance analysis. The SEED servers offer open access to regularly updated data, the ability to annotate prokaryotic genomes, the ability to create metabolic reconstructions and detailed models of metabolism, and access to hundreds of existing metabolic models. This work offers and supports a framework upon which other groups can build independent research efforts. Large integrations of genomic data represent one of the major intellectual resources driving research in biology, and programmatic access to the SEED data will provide significant utility to a broad collection of potential users.  相似文献   

11.
One challenge presented by large-scale genome sequencing efforts is effective display of uniform information to the scientific community. The Comprehensive Microbial Resource (CMR) contains robust annotation of all complete microbial genomes and allows for a wide variety of data retrievals. The bacterial information has been placed on the Web at http://www.tigr.org/CMR for retrieval using standard web browsing technology. Retrievals can be based on protein properties such as molecular weight or hydrophobicity, GC-content, functional role assignments and taxonomy. The CMR also has special web-based tools to allow data mining using pre-run homology searches, whole genome dot-plots, batch downloading and traversal across genomes using a variety of datatypes.  相似文献   

12.
PlasmoDB (http://PlasmoDB.org) is the official database of the Plasmodium falciparum genome sequencing consortium. This resource incorporates the recently completed P. falciparum genome sequence and annotation, as well as draft sequence and annotation emerging from other Plasmodium sequencing projects. PlasmoDB currently houses information from five parasite species and provides tools for intra- and inter-species comparisons. Sequence information is integrated with other genomic-scale data emerging from the Plasmodium research community, including gene expression analysis from EST, SAGE and microarray projects and proteomics studies. The relational schema used to build PlasmoDB, GUS (Genomics Unified Schema) employs a highly structured format to accommodate the diverse data types generated by sequence and expression projects. A variety of tools allow researchers to formulate complex, biologically-based, queries of the database. A stand-alone version of the database is also available on CD-ROM (P. falciparum GenePlot), facilitating access to the data in situations where internet access is difficult (e.g. by malaria researchers working in the field). The goal of PlasmoDB is to facilitate utilization of the vast quantities of genomic-scale data produced by the global malaria research community. The software used to develop PlasmoDB has been used to create a second Apicomplexan parasite genome database, ToxoDB (http://ToxoDB.org).  相似文献   

13.
Rational classification of proteins encoded in sequenced genomes is critical for making the genome sequences maximally useful for functional and evolutionary studies. The family of DNA-binding proteins is one of the most populated and studied amongst the various genomes of bacteria, archaea and eukaryotes and the Web-based system presented here is an approach to their classification. The DnaProt resource is an annotated and searchable collection of protein sequences for the families of DNA-binding proteins. The database contains 3238 full-length sequences (retrieved from the SWISS-PROT database, release 38) that include, at least, a DNA-binding domain. Sequence entries are organized into families defined by PROSITE patterns, PRINTS motifs and de novo excised signatures. Combining global similarities and functional motifs into a single classification scheme, DNA-binding proteins are classified into 33 unique classes, which helps to reveal comprehensive family relationships. To maximize family information retrieval, DnaProt contains a collection of multiple alignments for each DNA-binding family while the recognized motifs can be used as diagnostically functional fingerprints. All available structural class representatives have been referenced. The resource was developed as a Web-based management system for online free access of customized data sets. Entries are fully hyperlinked to facilitate easy retrieval of the original records from the source databases while functional and phylogenetic annotation will be applied to newly sequenced genomes. The database is freely available for online search of a library containing specific patterns of the identified DNA-binding protein classes and retrieval of individual entries from our WWW server (http://kronos.biol.uoa.gr/~mariak/dbDNA.html).  相似文献   

14.
MOTIVATION: Despite increased availability of genome annotation data, a comprehensive resource for in-depth analysis of splice signal distributions and alternative splicing (AS) patterns in eukaryote genomes is still lacking. To meet this need, we have developed EuSplice--a unique splice-centric database which provides reliable splice signal and AS information for 23 eukaryotes. RESULTS: The EuSplice database contains 95,822 AS events and 2.1 million splice signals associated with over 270,000 protein-coding genes. The intuitive, user-friendly EuSplice web interface has powerful data mining and graphics capabilities for inter-genomic comparative analysis of splice signals, putative cryptic splice sites and AS events. Moreover, the seamless integration of splicing data to extensive gene-specific annotations, such as homolog annotations, functional information, mutations and sequence details makes EuSplice a powerful one-stop information resource for investigating the molecular mechanisms of complex splicing events, disease associations and the evolution of splicing in eukaryotes. AVAILABILITY: http://66.170.16.154/EuSplice. SUPPLEMENTARY INFORMATION: Supplementary tables and figures at Bioinfo online.  相似文献   

15.
Sequencing of microbial genomes is important because of microbial-carrying antibiotic and pathogenetic activities. However, even with the help of new assembling software, finishing a whole genome is a time-consuming task. In most bacteria, pathogenetic or antibiotic genes are carried in genomic islands. Therefore, a quick genomic island (GI) prediction method is useful for ongoing sequencing genomes. In this work, we built a Web server called GI-POP (http://gipop.life.nthu.edu.tw) which integrates a sequence assembling tool, a functional annotation pipeline, and a high-performance GI predicting module, in a support vector machine (SVM)-based method called genomic island genomic profile scanning (GI-GPS). The draft genomes of the ongoing genome projects in contigs or scaffolds can be submitted to our Web server, and it provides the functional annotation and highly probable GI-predicting results. GI-POP is a comprehensive annotation Web server designed for ongoing genome project analysis. Researchers can perform annotation and obtain pre-analytic information include possible GIs, coding/non-coding sequences and functional analysis from their draft genomes. This pre-analytic system can provide useful information for finishing a genome sequencing project.  相似文献   

16.
17.
The KEGG pathway maps are widely used as a reference data set for inferring high-level functions of the organism or the ecosystem from its genome or metagenome sequence data. The KEGG modules, which are tighter functional units often corresponding to subpathways in the KEGG pathway maps, are designed for better automation of genome interpretation. Each KEGG module is represented by a simple Boolean expression of KEGG Orthology (KO) identifiers (K numbers), enabling automatic evaluation of the completeness of genes in the genome. Here we focus on metabolic functions and introduce reaction modules for improving annotation and signature modules for inferring metabolic capacity. We also describe how genome annotation is performed in KEGG using the manually created KO database and the computationally generated SSDB database. The resulting KEGG GENES database with KO (K number) annotation is a reference sequence database to be compared for automated annotation and interpretation of newly determined genomes.  相似文献   

18.
Development of joint application strategies for two microbial gene finders   总被引:2,自引:0,他引:2  
MOTIVATION: As a starting point in annotation of bacterial genomes, gene finding programs are used for the prediction of functional elements in the DNA sequence. Due to the faster pace and increasing number of genome projects currently underway, it is becoming especially important to have performant methods for this task. RESULTS: This study describes the development of joint application strategies that combine the strengths of two microbial gene finders to improve the overall gene finding performance. Critica is very specific in the detection of similarity-supported genes as it uses a comparative sequence analysis-based approach. Glimmer employs a very sophisticated model of genomic sequence properties and is sensitive also in the detection of organism-specific genes. Based on a data set of 113 microbial genome sequences, we optimized a combined application approach using different parameters with relevance to the gene finding problem. This results in a significant improvement in specificity while there is similarity in sensitivity to Glimmer. The improvement is especially pronounced for GC rich genomes. The method is currently being applied for the annotation of several microbial genomes. AVAILABILITY: The methods described have been implemented within the gene prediction component of the GenDB genome annotation system.  相似文献   

19.
MIPS: a database for genomes and protein sequences   总被引:17,自引:0,他引:17       下载免费PDF全文
The Munich Information Center for Protein Sequences (MIPS-GSF), Martinsried, near Munich, Germany, continues its longstanding tradition to develop and maintain high quality curated genome databases. In addition, efforts have been intensified to cover the wealth of complete genome sequences in a systematic, comprehensive form. Bioinformatics, supporting national as well as European sequencing and functional analysis projects, has resulted in several up-to-date genome-oriented databases. This report describes growing databases reflecting the progress of sequencing the Arabidopsis thaliana (MATDB) and Neurospora crassa genomes (MNCDB), the yeast genome database (MYGD) extended by functional analysis data, the database of annotated human EST-clusters (HIB) and the database of the complete cDNA sequences from the DHGP (German Human Genome Project). It also contains information on the up-to-date database of complete genomes (PEDANT), the classification of protein sequences (ProtFam) and the collection of protein sequence data within the framework of the PIR-International Protein Sequence Database. These databases can be accessed through the MIPS WWW server (http://www. mips.biochem.mpg.de).  相似文献   

20.
SUMMARY: Phylogenetic Web Profiler (PWP) is a web-based service designed to perform phylogenetic profiling of proteins against genomes. The current version offers a selection of 63 completed genomes and available plasmids as annotated in the PEDANT genome database. Unlike currently available applications, this tool offers several choices of ortholog prediction parameters including E-value cutoff, percent length difference tolerance, and annotation similarity. Additional features include tight integration with the PEDANT database and tools to analyze properties of predicted proteins. PWP should prove very useful for the analysis of functional-linkage between proteins.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号