首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
High-throughput genome sequencing continues to accelerate the rate at which complete genomes are available for biological research. Many of these new genome sequences have little or no genome annotation currently available and hence rely upon computational predictions of protein coding genes. Evidence of translation from proteomic techniques could facilitate experimental validation of protein coding genes, but the techniques for whole genome searching with MS/MS data have not been adequately developed to date. Here we describe GENQUEST, a novel method using peptide isoelectric focusing and accurate mass to greatly reduce the peptide search space, making fast, accurate, and sensitive whole human genome searching possible on common desktop computers. In an initial experiment, almost all exonic peptides identified in a protein database search were identified when searching genomic sequence. Many peptides identified exclusively in the genome searches were incorrectly identified or could not be experimentally validated, highlighting the importance of orthogonal validation. Experimentally validated peptides exclusive to the genomic searches can be used to reannotate protein coding genes. GENQUEST represents an experimental tool that can be used by the proteomics community at large for validating computational approaches to genome annotation.  相似文献   

2.
The KEGG pathway maps are widely used as a reference data set for inferring high-level functions of the organism or the ecosystem from its genome or metagenome sequence data. The KEGG modules, which are tighter functional units often corresponding to subpathways in the KEGG pathway maps, are designed for better automation of genome interpretation. Each KEGG module is represented by a simple Boolean expression of KEGG Orthology (KO) identifiers (K numbers), enabling automatic evaluation of the completeness of genes in the genome. Here we focus on metabolic functions and introduce reaction modules for improving annotation and signature modules for inferring metabolic capacity. We also describe how genome annotation is performed in KEGG using the manually created KO database and the computationally generated SSDB database. The resulting KEGG GENES database with KO (K number) annotation is a reference sequence database to be compared for automated annotation and interpretation of newly determined genomes.  相似文献   

3.
Insights from human/mouse genome comparisons   总被引:4,自引:0,他引:4  
Large-scale public genomic sequencing efforts have provided a wealth of vertebrate sequence data poised to provide insights into mammalian biology. These include deep genomic sequence coverage of human, mouse, rat, zebrafish, and two pufferfish (Fugu rubripes and Tetraodon nigroviridis) (Aparicio et al. 2002; Lander et al. 2001; Venter et al. 2001; Waterston et al. 2002). In addition, a high-priority has been placed on determining the genomic sequence of chimpanzee, dog, cow, frog, and chicken (Boguski 2002). While only recently available, whole genome sequence data have provided the unique opportunity to globally compare complete genome contents. Furthermore, the shared evolutionary ancestry of vertebrate species has allowed the development of comparative genomic approaches to identify ancient conserved sequences with functionality. Accordingly, this review focuses on the initial comparison of available mammalian genomes and describes various insights derived from such analysis.  相似文献   

4.
An extensive effort of the International Rice Genome Sequencing Project (IRGSP) has resulted in rapid accumulation of genome sequence, and >137 Mb has already been made available to the public domain as of August 2001. This requires a high-throughput annotation scheme to extract biologically useful and timely information from the sequence data on a regular basis. A new automated annotation system and database called Rice Genome Automated Annotation System (RiceGAAS) has been developed to execute a reliable and up-to-date analysis of the genome sequence as well as to store and retrieve the results of annotation. The system has the following functional features: (i) collection of rice genome sequences from GenBank; (ii) execution of gene prediction and homology search programs; (iii) integration of results from various analyses and automatic interpretation of coding regions; (iv) re-execution of analysis, integration and automatic interpretation with the latest entries in reference databases; (v) integrated visualization of the stored data using web-based graphical view. RiceGAAS also has a data submission mechanism that allows public users to perform fully automated annotation of their own sequences. The system can be accessed at http://RiceGAAS.dna.affrc.go.jp/.  相似文献   

5.
Jonesia denitrificans (Prevot 1961) Rocourt et al. 1987 is the type species of the genus Jonesia, and is of phylogenetic interest because of its isolated location in the actinobacterial suborder Micrococcineae. J. denitrificans is characterized by a typical coryneform morphology and is able to form irregular nonsporulating rods showing branched and club-like forms. Coccoid cells occur in older cultures. J. denitrificans is classified as a pathogenic organism for animals (vertebrates). The type strain whose genome is described here was originally isolated from cooked ox blood. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of a member of the genus for which a complete genome sequence is described. The 2,749,646 bp long genome with its 2558 protein-coding and 71 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.  相似文献   

6.
Applications of InterPro in protein annotation and genome analysis   总被引:2,自引:0,他引:2  
The applications of InterPro span a range of biologically important areas that includes automatic annotation of protein sequences and genome analysis. In automatic annotation of protein sequences InterPro has been utilised to provide reliable characterisation of sequences, identifying them as candidates for functional annotation. Rules based on the InterPro characterisation are stored and operated through a database called RuleBase. RuleBase is used as the main tool in the sequence database group at the EBI to apply automatic annotation to unknown sequences. The annotated sequences are stored and distributed in the TrEMBL protein sequence database. InterPro also provides a means to carry out statistical and comparative analyses of whole genomes. In the Proteome Analysis Database, InterPro analyses have been combined with other analyses based on CluSTr, the Gene Ontology (GO) and structural information on the proteins.  相似文献   

7.
Discovering and detecting transposable elements in genome sequences   总被引:2,自引:0,他引:2  
The contribution of transposable elements (TEs) to genome structure and evolution as well as their impact on genome sequencing, assembly, annotation and alignment has generated increasing interest in developing new methods for their computational analysis. Here we review the diversity of innovative approaches to identify and annotate TEs in the post-genomic era, covering both the discovery of new TE families and the detection of individual TE copies in genome sequences. These approaches span a broad spectrum in computational biology including de novo, homology-based, structure-based and comparative genomic methods. We conclude that the integration and visualization of multiple approaches and the development of new conceptual representations for TE annotation will further advance the computational analysis of this dynamic component of the genome.  相似文献   

8.
Anaerococcus prevotii (Foubert and Douglas 1948) Ezaki et al. 2001 is the type species of the genus, and is of phylogenetic interest because of its arguable assignment to the provisionally arranged family 'Peptostreptococcaceae'. A. prevotii is an obligate anaerobic coccus, usually arranged in clumps or tetrads. The strain, whose genome is described here, was originally isolated from human plasma; other strains of the species were also isolated from clinical specimen. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of a member of the genus. Next to Finegoldia magna, A. prevotii is only the second species from the family 'Peptostreptococcaceae' for which a complete genome sequence is described. The 1,998,633 bp long genome (chromosome and one plasmid) with its 1852 protein-coding and 61 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.  相似文献   

9.
Saccharomonospora viridis (Schuurmans et al. 1956) Nonomurea and Ohara 1971 is the type species of the genus Saccharomonospora which belongs to the family Pseudonocardiaceae. S. viridis is of interest because it is a Gram-negative organism classified among the usually Gram-positive actinomycetes. Members of the species are frequently found in hot compost and hay, and its spores can cause farmer's lung disease, bagassosis, and humidifier fever. Strains of the species S. viridis have been found to metabolize the xenobiotic pentachlorophenol (PCP). The strain described in this study has been isolated from peat-bog in Ireland. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of the family Pseudonocardiaceae, and the 4,308,349 bp long single replicon genome with its 3906 protein-coding and 64 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.  相似文献   

10.
Nakamurella multipartita (Yoshimi et al. 1996) Tao et al. 2004 is the type species of the monospecific genus Nakamurella in the actinobacterial suborder Frankineae. The nonmotile, coccus-shaped strain was isolated from activated sludge acclimated with sugar-containing synthetic wastewater, and is capable of accumulating large amounts of polysaccharides in its cells. Here we describe the features of the organism, together with the complete genome sequence and annotation. This is the first complete genome sequence of a member of the family Nakamurellaceae. The 6,060,298 bp long single replicon genome with its 5415 protein-coding and 56 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.  相似文献   

11.
12.
Thermanaerovibrio acidaminovorans (Guangsheng et al. 1997) Baena et al. 1999 is the type species of the genus Thermanaerovibrio and is of phylogenetic interest because of the very isolated location of the novel phylum Synergistetes. T. acidaminovorans Su883(T) is a Gram-negative, motile, non-spore-forming bacterium isolated from an anaerobic reactor of a sugar refinery in The Netherlands. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first completed genome sequence from a member of the phylum Synergistetes. The 1,848,474 bp long single replicon genome with its 1765 protein-coding and 60 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.  相似文献   

13.
Eggerthella lenta (Eggerth 1935) Wade et al. 1999, emended Würdemann et al. 2009 is the type species of the genus Eggerthella, which belongs to the actinobacterial family Coriobacteriaceae. E. lenta is a Gram-positive, non-motile, non-sporulating pathogenic bacterium that can cause severe bacteremia. The strain described in this study has been isolated from a rectal tumor in 1935. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of the genus Eggerthella, and the 3,632,260 bp long single replicon genome with its 3123 protein-coding and 58 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.  相似文献   

14.
With the increasing quantities of Brassica genomic data being entered into the public domain and in preparation for the complete Brassica genome sequencing effort, there is a growing requirement for the structuring and detailed bioinformatic analysis of Brassica genomic information within a user-friendly database. At the Plant Biotechnology Centre, Melbourne, Australia, we have developed a series of tools and computational pipelines to assist in the processing and structuring of genomic data, to aid its application to agricultural biotechnology research. These tools include a sequence database, ASTRA, a sequence processing pipeline incorporating annotation against GenBank, SwissProt and Arabidopsis Gene Ontology (GO) data and tools for molecular marker discovery and comparative genome analysis. All sequences are mined for simple sequence repeat (SSR) molecular markers using 'SSR primer' and mapped onto the complete Arabidopsis thaliana genome by sequence comparison. The database may be queried using a text-based search of sequence annotation or GO terms, BLAST comparison against resident sequences, or by the position of candidate orthologues within the Arabidopsis genome. Tools have also been developed and applied to the discovery of single nucleotide polymorphism (SNP) molecular markers and the in silico mapping of Brassica BAC end sequences onto the Arabidopsis genome. Planned extensions to this resource include the integration of gene expression data and the development of an EnsEMBL-based genome viewer.  相似文献   

15.
Arcobacter nitrofigilis (McClung et al. 1983) Vandamme et al. 1991 is the type species of the genus Arcobacter in the family Campylobacteraceae within the Epsilonproteobacteria. The species was first described in 1983 as Campylobacter nitrofigilis [1] after its detection as a free-living, nitrogen-fixing Campylobacter species associated with Spartina alterniflora Loisel roots [2]. It is of phylogenetic interest because of its lifestyle as a symbiotic organism in a marine environment in contrast to many other Arcobacter species which are associated with warm-blooded animals and tend to be pathogenic. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of a type stain of the genus Arcobacter. The 3,192,235 bp genome with its 3,154 protein-coding and 70 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.  相似文献   

16.
By incorporating annotation information into the analysis of next-generation sequencing DNA methylation data, we provide an improvement in performance over current testing procedures. Methylation analysis using genome information (MAGI) is applicable for both unreplicated and replicated data, and provides an effective analysis for studies with low sequencing depth. When compared with current tests, the annotation-informed tests provide an increase in statistical power and offer a significance-based interpretation of differential methylation.  相似文献   

17.
Halomicrobium mukohataei (Ihara et al. 1997) Oren et al. 2002 is the type species of the genus Halomicrobium. It is of phylogenetic interest because of its isolated location within the large euryarchaeal family Halobacteriaceae. H. mukohataei is an extreme halophile that grows essentially aerobically, but can also grow anaerobically under a change of morphology and with nitrate as electron acceptor. The strain, whose genome is described in this report, is a free-living, motile, Gram-negative euryarchaeon, originally isolated from Salinas Grandes in Jujuy, Andes highlands, Argentina. Its genome contains three genes for the 16S rRNA that differ from each other by up to 9%. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence from the poorly populated genus Halomicrobium, and the 3,332,349 bp long genome (chromosome and one plasmid) with its 3416 protein-coding and 56 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.  相似文献   

18.
基因组注释是识别出基因组序列中功能组件的过程,其可以直接对序列赋予生物学意义,由此方便研究者探究和分析基因组功能.基因组注释可以帮助研究从三个层次上理解基因组,一种是在核苷酸水平的注释,主要确定DNA序列中基因、RNA、重复序列等组件的物理位置,包括转录起始,翻译起始,外显子边界等具体位置信息.同时可以注释得到变异在不...  相似文献   

19.
The availability of the genome sequences of human and mouse, human sequence variation data and other large genetic data sets will lead to a revolution in understanding of the human machine and the treatment of its diseases. The success of the international genome sequencing consortiums shows what can be achieved by well coordinated large scale public domain projects and the benefits of data access to all. It is already clear that the availability of this sequence is having a huge impact on research worldwide. Complete genome sequences provide a framework to pull all biological data together such that each piece has the potential to say something about biology as a whole. Biology is too complex for any organisation to have a monopoly of ideas or data, so the collection, analysis and access to this data can be contributed to by research institutes around the world. However, although it is possible for all this data to be accessible to all through the internet, the more organisations provide data or analysis separately, the harder it becomes for anyone to collect and integrate the results. To address these problems of intergration of data, open standards for biological data exchange, such as the 'Distributed Annotation System' (DAS) are being developed and bioinformatics (Dowell et al., 2001) as a whole is now being strongly driven by the open source software (OSS) model for collaborative software development (Hubbard and Birney, 1999). The leading provider of human genome annotation, the Ensembl project (http://www.ensembl.org), is entirely an OSS project and has been widely adopted by academic and commerical organisations alike (Hubbard et al., 2002). Accurate automatic annotation of features such as genes in vertebrate genomes currently relies on supporting evidence in the form of homologies to mRNAs, ESTs or protein. However, it appears that sufficient high quality experimentally curated annotation now exists to be used as a substrate for machine learning algorithms to create effective models of biological signal sequences (Down and Hubbard, 2002). Is there hope for ab initio prediction methods after all?  相似文献   

20.
Gap junctions serve for direct intercellular communication by docking of two hemichannels in adjacent cells thereby forming conduits between the cytoplasmic compartments of adjacent cells. Connexin genes code for subunit proteins of gap junction channels and are members of large gene families in mammals. So far, 17 connexin (Cx) genes have been described and characterized in the murine genome. For most of them, orthologues in the human genome have been found (see White and Paul 1999; Manthey et al. 1999; Teubner et al. 2001; S?hl et al. 2001). We have recently performed searches for connexin genes in murine and human gene libraries available at EMBL/Heidelberg, NCBI and the Celera company that have increased the number of identified connexins to 19 in mouse and 20 in humans. For one mouse connexin gene and two human connexin genes we did not find orthologues in the other genome. Here we present a short overview on distinct connexin genes which we found in the mouse and human genome and which may include all members of this gene family, if no further connexin gene will be discovered in the remaining non-sequenced parts (about 1-5%) of the genomes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号