首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
The use and development of post-genomic tools naturally depends on large-scale genome sequencing projects. The usefulness of post-genomic applications is dependent on the accuracy of genome annotations, for which the correct identification of intron-exon borders in complex genomes of eukaryotic organisms is often an error-prone task. Although automated algorithms for predicting intron-exon structures are available, supporting exon evidence is necessary to achieve comprehensive genome annotation. Besides cDNA and EST support, peptides identified via MS/MS can be used as extrinsic evidence in a proteogenomic approach. We describe an improved version of the Genomic Peptide Finder (GPF), which aligns de novo predicted amino acid sequences to the genomic DNA sequence of an organism while correcting for peptide sequencing errors and accounting for the possibility of splicing. We have coupled GPF and the gene finding program AUGUSTUS in a way that provides automatic structural annotations of the Chlamydomonas reinhardtii genome, using highly unbiased GPF evidence. A comparison of the AUGUSTUS gene set incorporating GPF evidence to the standard JGI FM4 (Filtered Models 4) gene set reveals 932 GPF peptides that are not contained in the Filtered Models 4 gene set. Furthermore, the GPF evidence improved the AUGUSTUS gene models by altering 65 gene models and adding three previously unidentified genes.  相似文献   

3.
Given the ever expanding number of model plant species for which complete genome sequences are available and the abundance of bio-resources such as knockout mutants, wild accessions and advanced breeding populations, there is a rising burden for gene functional annotation. In this protocol, annotation of plant gene function using combined co-expression gene analysis, metabolomics and informatics is provided (Figure 1). This approach is based on the theory of using target genes of known function to allow the identification of non-annotated genes likely to be involved in a certain metabolic process, with the identification of target compounds via metabolomics. Strategies are put forward for applying this information on populations generated by both forward and reverse genetics approaches in spite of none of these are effortless. By corollary this approach can also be used as an approach to characterise unknown peaks representing new or specific secondary metabolites in the limited tissues, plant species or stress treatment, which is currently the important trial to understanding plant metabolism.  相似文献   

4.
SUMMARY: GenColors is a new web-based software/database system aimed at an improved and accelerated annotation of prokaryotic genomes, considering information on related genomes and making extensive use of genome comparison. It offers a seamless integration of data from ongoing sequencing projects and annotated genomic sequences obtained from GenBank. The genome comparison tools determine, for example, best-bidirectional hits, gene conservation, syntenies and gene core sets. Swiss-Prot/TrEMBL hits allow annotations in an effective manner. To further support the annotation base-specific quality data can also be displayed if available. With GenColors dedicated genome browsers containing a group of related genomes can be easily set up and maintained. It has been efficiently used for Borrelia garinii and is currently applied to various ongoing genome projects. AVAILABILITY: Detailed information on GenColors is available at http://gencolors.imb-jena.de. Online usage of GenColors-based genome browsers is the preferred application mode. The system is also available upon request for local installation.  相似文献   

5.
This review describes how intimately proteogenomics and system biology are imbricated. Quantitative cell-wide monitoring of cellular processes and the analysis of this information is the basis for systems biology. Establishing the most comprehensive protein-parts list is an essential prerequisite prior to analysis of the cell-wide dynamics of proteins, their post-translational modifications, their complex network interactions and interpretation of these data as a whole. High-quality genome annotation is, thus, a crucial basis. Proteogenomics consists of high-throughput identification and characterization of proteins by extra-large shotgun MS/MS approaches and the integration of these data with genomic data. Discovery of the remaining unannotated genes, defining translational start sites, listing signal peptide processing events and post-translational modifications, are tasks that can currently be carried out at a full-genomic scale as soon as the genomic sequence is available. Proteomics is increasingly being used at the primary stage of genome annotation and such an approach may become standard in the near future for genome projects. Advantageously, the same experimental proteomic datasets may be used to characterize the specific metabolic traits of the organism under study. Undoubtedly, comparative genomics will experience a renaissance taking into account this new dimension. Synthetic biology aimed at re-engineering living systems will also benefit from these significant progresses.  相似文献   

6.
The Metabolic Models Reconstruction Using Genome-Scale Information (merlin) tool is a user-friendly Java application that aids the reconstruction of genome-scale metabolic models for any organism that has its genome sequenced. It performs the major steps of the reconstruction process, including the functional genomic annotation of the whole genome and subsequent construction of the portfolio of reactions. Moreover, merlin includes tools for the identification and annotation of genes encoding transport proteins, generating the transport reactions for those carriers. It also performs the compartmentalisation of the model, predicting the organelle localisation of the proteins encoded in the genome and thus the localisation of the metabolites involved in the reactions promoted by such enzymes. The gene-proteins-reactions (GPR) associations are automatically generated and included in the model. Finally, merlin expedites the transition from genomic data to draft metabolic models reconstructions exported in the SBML standard format, allowing the user to have a preliminary view of the biochemical network, which can be manually curated within the environment provided by merlin.  相似文献   

7.
8.
Despite the current wealth of sequencing data, one‐third of all biochemically characterized metabolic enzymes lack a corresponding gene or protein sequence, and as such can be considered orphan enzymes. They represent a major gap between our molecular and biochemical knowledge, and consequently are not amenable to modern systemic analyses. As 555 of these orphan enzymes have metabolic pathway neighbours, we developed a global framework that utilizes the pathway and (meta)genomic neighbour information to assign candidate sequences to orphan enzymes. For 131 orphan enzymes (37% of those for which (meta)genomic neighbours are available), we associate sequences to them using scoring parameters with an estimated accuracy of 70%, implying functional annotation of 16 345 gene sequences in numerous (meta)genomes. As a case in point, two of these candidate sequences were experimentally validated to encode the predicted activity. In addition, we augmented the currently available genome‐scale metabolic models with these new sequence–function associations and were able to expand the models by on average 8%, with a considerable change in the flux connectivity patterns and improved essentiality prediction.  相似文献   

9.
The topology of central carbon metabolism of Aspergillus niger was identified and the metabolic network reconstructed, by integrating genomic, biochemical and physiological information available for this microorganism and other related fungi. The reconstructed network may serve as a valuable database for annotation of genes identified in future genome sequencing projects on aspergilli. Based on the metabolic reconstruction, a stoichiometric model was set up that includes 284 metabolites and 335 reactions, of which 268 represent biochemical conversions and 67 represent transport processes between the different intracellular compartments and between the cell and the extracellular medium. The stoichiometry of the metabolic reactions was used in combination with biosynthetic requirements for growth and pseudo-steady state mass balances over intracellular metabolites for the quantification of metabolic fluxes using metabolite balancing. This framework was employed to perform an in silico characterisation of the phenotypic behaviour of A. niger grown on different carbon sources. The effects on growth of single reaction deletions were assessed and essential biochemical reactions were identified for different carbon sources. Furthermore, application of the stoichiometric model for assessing the metabolic capabilities of A. niger to produce metabolites was evaluated by using succinate production as a case study.  相似文献   

10.
Membrane-associated proteins are critical for intra- and intercellular communication. Accordingly approaches are needed for rapid and comprehensive identification of all membrane-targeted gene products in a given cell or tissue. Here we describe a modification of the yeast Ras recruitment system to this end and designate the modified approach the Ras membrane trap (RMT). A pilot RMT screen was carried out on the central nervous system of the mollusk Lymnaea stagnalis, a model organism from a phylum that still lacks a representative with a sequenced genome. 112 gene products were identified in the screen of which 79 lack assignable homologs in available data bases. Currently available annotation tools predicted membrane association of only 45% of the 112 proteins, although experimental verification in mammalian cells confirmed membrane association for all clones tested. Thus, genome annotation using currently available tools is likely to underpredict representation of membrane-associated gene products. The 32 proteins with known homologies include many targeted to the endoplasmic reticulum or the nucleus, thus RMT provides a tool that can cover intracellular membrane proteomes. Two sequences were found to represent gene families not found to date in invertebrate genomes, emphasizing the need for whole genome sequences from mollusks and indeed from representatives of all major invertebrate phyla.  相似文献   

11.
REGANOR     
With >1,000 prokaryotic genome sequencing projects ongoing or already finished, comprehensive comparative analysis of the gene content of these genomes has become viable. To allow for a meaningful comparative analysis, gene prediction of the various genomes should be as accurate as possible. It is clear that improving the state of genome annotation requires automated gene identification methods to cope with the influence of artifacts, such as genomic GC content. There is currently still room for improvement in the state of annotations. We present a web server and a database of high-quality gene predictions. The web server is a resource for gene identification in prokaryote genome sequences. It implements our previously described, accurate gene finding method REGANOR. We also provide novel gene predictions for 241 complete, or almost complete, prokaryotic genomes. We demonstrate how this resource can easily be utilised to identify promising candidates for currently missing genes from genome annotations with several examples. All data sets are available online. AVAILABILITY: The gene finding server is accessible via https://www.cebitec.uni-bielefeld.de/groups/brf/software/reganor/cgi-bin/reganor_upload.cgi. The server software is available with the GenDB genome annotation system (version 2.2.1 onwards) under the GNU general public license. The software can be downloaded from https://sourceforge.net/projects/gendb/. More information on installing GenDB and REGANOR and the system requirements can be found on the GenDB project page http://www.cebitec.uni-bielefeld.de/groups/brf/software/wiki/GenDBWiki/AdministratorDocumentation/GenDBInstallation  相似文献   

12.

Background

The wheat stripe rust fungus (Puccinia striiformis f. sp. tritici, PST) is responsible for significant yield losses in wheat production worldwide. In spite of its economic importance, the PST genomic sequence is not currently available. Fortunately Next Generation Sequencing (NGS) has radically improved sequencing speed and efficiency with a great reduction in costs compared to traditional sequencing technologies. We used Illumina sequencing to rapidly access the genomic sequence of the highly virulent PST race 130 (PST-130).

Methodology/Principal Findings

We obtained nearly 80 million high quality paired-end reads (>50x coverage) that were assembled into 29,178 contigs (64.8 Mb), which provide an estimated coverage of at least 88% of the PST genes and are available through GenBank. Extensive micro-synteny with the Puccinia graminis f. sp. tritici (PGTG) genome and high sequence similarity with annotated PGTG genes support the quality of the PST-130 contigs. We characterized the transposable elements present in the PST-130 contigs and using an ab initio gene prediction program we identified and tentatively annotated 22,815 putative coding sequences. We provide examples on the use of comparative approaches to improve gene annotation for both PST and PGTG and to identify candidate effectors. Finally, the assembled contigs provided an inventory of PST repetitive elements, which were annotated and deposited in Repbase.

Conclusions/Significance

The assembly of the PST-130 genome and the predicted proteins provide useful resources to rapidly identify and clone PST genes and their regulatory regions. Although the automatic gene prediction has limitations, we show that a comparative genomics approach using multiple rust species can greatly improve the quality of gene annotation in these species. The PST-130 sequence will also be useful for comparative studies within PST as more races are sequenced. This study illustrates the power of NGS for rapid and efficient access to genomic sequence in non-model organisms.  相似文献   

13.
MOTIVATION: There is an imperative need to integrate functional genomics data to obtain a more comprehensive systems-biology view of the results. We believe that this is best achieved through the visualization of data within the biological context of metabolic pathways. Accordingly, metabolic pathway reconstruction was used to predict the metabolic composition for Medicago truncatula and these pathways were engineered to enable the correlated visualization of integrated functional genomics data. Results: Metabolic pathway reconstruction was used to generate a pathway database for M. truncatula (MedicCyc), which currently features more than 250 pathways with related genes, enzymes and metabolites. MedicCyc was assembled from more than 225,000 M. truncatula ESTs (MtGI Release 8.0) and available genomic sequences using the Pathway Tools software and the MetaCyc database. The predicted pathways in MedicCyc were verified through comparison with other plant databases such as AraCyc and RiceCyc. The comparison with other plant databases provided crucial information concerning enzymes still missing from the ongoing, but currently incomplete M. truncatula genome sequencing project. MedicCyc was further manually curated to remove non-plant pathways, and Medicago-specific pathways including isoflavonoid, lignin and triterpene saponin biosynthesis were modified or added based upon available literature and in-house expertise. Additional metabolites identified in metabolic profiling experiments were also used for pathway predictions. Once the metabolic reconstruction was completed, MedicCyc was engineered to visualize M. truncatula functional genomics datasets within the biological context of metabolic pathways. Availability: freely accessible at http://www.noble.org/MedicCyc/  相似文献   

14.
Associating phenotypic traits and quantitative trait loci (QTL) to causative regions of the underlying genome is a key goal in agricultural research.InterStoreDB is a suite of integrated databases designed to assist in this process.The individual databases are species independent and generic in design,providing access to curated datasets relating to plant populations,phenotypic traits,genetic maps,marker loci and QTL,with links to functional gene annotation and genomic sequence data.Each component database provides access to associated metadata,including data provenance and parameters used in analyses,thus providing users with information to evaluate the relative worth of any associations identified.The databases include CropStoreDB,for management of population,genetic map,QTL and trait measurement data,SeqStoreDB for sequence-related data and AlignStoreDB,which stores sequence alignment information,and allows navigation between genetic and genomic datasets.Genetic maps are visualized and compared using the CMAP tool,and functional annotation from sequenced genomes is provided via an EnsEMBL-based genome browser.This framework facilitates navigation of the multiple biological domains involved in genetics and genomics research in a transparent manner within a single portal.We demonstrate the value of InterStoreDB as a tool for Brassica research.InterStoreDB is available from:http://www.interstoredb.org  相似文献   

15.
Association mapping currently relies on the identification of genetic markers. Several technologies have been adopted for genetic marker analysis, with single nucleotide polymorphisms (SNPs) being the most popular where a reasonable quantity of genome sequence data are available. We describe several tools we have developed for the discovery, annotation, and visualization of molecular markers for association mapping. These include autoSNPdb for SNP discovery from assembled sequence data; TAGdb for the identification of gene specific paired read Illumina GAII data; CMap3D for the comparison of mapped genetic and physical markers; and BAC and Gene Annotator for the online annotation of genes and genomic sequences.  相似文献   

16.
17.
18.
Expressed sequence tags (ESTs) currently encompass more entries in the public databases than any other form of sequence data. Thus, EST data sets provide a vast resource for gene identification and expression profiling. We have mapped the complete set of 176,915 publicly available Arabidopsis EST sequences onto the Arabidopsis genome using GeneSeqer, a spliced alignment program incorporating sequence similarity and splice site scoring. About 96% of the available ESTs could be properly aligned with a genomic locus, with the remaining ESTs deriving from organelle genomes and non-Arabidopsis sources or displaying insufficient sequence quality for alignment. The mapping provides verified sets of EST clusters for evaluation of EST clustering programs. Analysis of the spliced alignments suggests corrections to current gene structure annotation and provides examples of alternative and non-canonical pre-mRNA splicing. All results of this study were parsed into a database and are accessible via a flexible Web interface at http://www.plantgdb.org/AtGDB/.  相似文献   

19.
Large volumes of genomic data have been generated for several plant species over the past decade, including structural sequence data and functional annotation at the genome level. Various technologies such as expressed sequence tags (ESTs), massively parallel signature sequencing (MPSS) and microarrays have been used to study gene expression and to provide functional data for many genes simultaneously. This review focuses on recent advances in the application of microarrays in plant genomic research and in gene expression databases available for plants. Large sets of Arabidopsis microarray data are publicly available. Recently developed array platforms are currently being used to generate genome-wide expression profiles for several crop species. Coupled to these platforms are public databases that provide access to these large-scale expression data, which can be used to aid the functional discovery of gene function.  相似文献   

20.
Allmer J  Naumann B  Markert C  Zhang M  Hippler M 《Proteomics》2006,6(23):6207-6220
A new high-throughput computational strategy was established that improves genomic data mining from MS experiments. The MS/MS data were analyzed by the SEQUEST search algorithm and a combination of de novo amino acid sequencing in conjunction with an error-tolerant database search tool, operating on a 256 processor computer cluster. The error-tolerant search tool, previously established as GenomicPeptideFinder (GPF), enables detection of intron-split and/or alternatively spliced peptides from MS/MS data when deduced from genomic DNA. Isolated thylakoid membranes from the eukaryotic green alga Chlamydomonas reinhardtii were separated by 1-D SDS gel electrophoresis, protein bands were excised from the gel, digested in-gel with trypsin and analyzed by coupling nano-flow LC with MS/MS. The concerted action of SEQUEST and GPF allowed identification of 2622 distinct peptides. In total 448 peptides were identified by GPF analysis alone, including 98 intron-split peptides, resulting in the identification of novel proteins, improved annotation of gene models, and evidence of alternative splicing.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号