首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
The 2694 ORFs originally annotated as potential genes in the genome of Aeropyrum pernix can be categorized into three clusters (A, B, C), according to their nucleotide composition at three codon positions. Coding potential was found to be responsible for the phenomenon of three clusters in a 9-dimensional space derived from the nucleotide composition of ORFs: ORFs assigned to cluster A are coding ones, while those assigned to clusters B and C are non-coding ORFs. A "codingness" index called the AZ score is defined based on a clustering method used to recognize protein-coding genes in the A. pernix genome. The criterion for a coding or non-coding ORF is based on the AZ score. ORFs with AZ > 0 or AZ < 0 are coding or non-coding, respectively. Consequently, 620 out of 632 ORFs with putative functions based on the original annotation are contained in cluster A, which have positive AZ scores. In addition, all 29 ORFs encoding putative or conserved proteins newly added in RefSeq annotation also have positive AZ scores. Accordingly, the number of re-recognized protein-coding genes in the A. pernix genome is 1610, which is significantly less than 2694 in the original annotation and also much less than 1841 in the RefSeq annotation curated by NCBI staff. Annotation information of re-recognized genes and their AZ scores are available at: http://tubic.tju.edu.cn/Aper/.  相似文献   

3.
Abstract

In this paper, we re-annotated the genome of Pyrobaculum aerophilum str. IM2, particularly for hypothetical ORFs. The annotation process includes three parts. Firstly and most importantly, 23 new genes, which were missed in the original annotation, are found by combining similarity search and the ab initio gene finding approaches. Among these new genes, five have significant similarities with function-known genes and the rest have significant similarities with hypothetical ORFs contained in other genomes. Secondly, the coding potentials of the 1645 hypothetical ORFs are re-predicted by using 33 Z curve variables combined with Fisher linear discrimination method. With the accuracy being 99.68%, 25 originally annotated hypothetical ORFs are recognized as non-coding by our method. Thirdly, 80 hypothetical ORFs are assigned with potential functions by using similarity search with BLAST program. Re-annotation of the genome will benefit related researches on this hyperthermophilic crenarchaeon. Also, the re-annotation procedure could be taken as a reference for other archaeal genomes. Details of the revised annotation are freely available at http://cobi.uestc.edu.cn/resource/paero/  相似文献   

4.
The annotation of the well-studied organism, Saccharomyces cerevisiae, has been improving over the past decade while there are unresolved debates over the amount of biologically significant open reading frames (ORFs) in yeast genome. We revisited the total count of protein-coding genes in S. cerevisiae S288c genome using a theoretical approach by combining the Support Vector Machine (SVM) method with six widely used measurements of sequence statistical features. The accuracy of our method is over 99.5% in 10-fold cross-validation. Based on the annotation data in Saccharomyces Genome Database (SGD), we studied the coding capacity of all 1744 ORFs which lack experimental results and suggested that the overall number of chromosomal ORFs encoding proteins in yeast should be 6091 by removing 488 spurious ORFs. The importance of the present work lies in at least two aspects. First, cross-validation and retrospective examination showed the fidelity of our method in recognizing ORFs that likely encode proteins. Second, we have provided a web service that can be accessed at http://cobi.uestc.edu.cn/services/yeast/, which enables the prediction of protein-coding ORFs of the genus Saccharomyces with a high accuracy.  相似文献   

5.
6.
We report the properties of a draft genome sequence of the bacterium Anaerococcus vaginalis strain PH9, a species within the Anaerococcus genus. This strain, whose genome is described here, was isolated from the fecal flora of a 26-year-old woman suffering from morbid obesity. A. vaginalis is an obligate anaerobic coccus. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 2,048,125-bp long (one chromosome but no plasmid) and contains 2,095 protein-coding and 38 RNA genes, including three rRNA genes.Key words: Anaerococcus vaginalis, genome  相似文献   

7.
Alistipes senegalensis strain JC50T is the type strain of A. senegalensis sp. nov., a new species within the Alistipes genus. This strain, whose genome is described here, was isolated from the fecal flora of an asymptomatic patient. A. senegalensis is an anaerobic Gram-negative rod-shaped bacterium. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 4,017,609 bp long genome (1 chromosome, but no plasmid) contains 3,113 protein-coding and 50 RNA genes, including 5 rRNA genes.  相似文献   

8.
A common practice in computational genomic analysis is to use a set of ‘background’ sequences as negative controls for evaluating the false-positive rates of prediction tools, such as gene identification programs and algorithms for detection of cis-regulatory elements. Such ‘background’ sequences are generally taken from regions of the genome presumed to be intergenic, or generated synthetically by ‘shuffling’ real sequences. This last method can lead to underestimation of false-positive rates. We developed a new method for generating artificial sequences that are modeled after real intergenic sequences in terms of composition, complexity and interspersed repeat content. These artificial sequences can serve as an inexhaustible source of high-quality negative controls. We used artificial sequences to evaluate the false-positive rates of a set of programs for detecting interspersed repeats, ab initio prediction of coding genes, transcribed regions and non-coding genes. We found that RepeatMasker is more accurate than PClouds, Augustus has the lowest false-positive rate of the coding gene prediction programs tested, and Infernal has a low false-positive rate for non-coding gene detection. A web service, source code and the models for human and many other species are freely available at http://repeatmasker.org/garlic/.  相似文献   

9.
Actinoplanes missouriensis Couch 1963 is a well-characterized member of the genus Actinoplanes, which is of morphological interest because its members typically produce sporangia containing motile spores. The sporangiospores are motile by means of flagella and exhibit chemotactic properties. It is of further interest that members of Actinoplanes are prolific sources of novel antibiotics, enzymes, and other bioactive compounds. Here, we describe the features of A. missouriensis 431T, together with the complete genome sequence and annotation. The 8,773,466 bp genome contains 8,125 protein-coding and 79 RNA genes.  相似文献   

10.
Brevibacterium senegalense strain JC43T sp. nov. is the type strain of Brevibacterium senegalense sp. nov., a new species within the Brevibacterium genus. This strain, whose genome is described here, was isolated from the fecal flora of a healthy Senegalese patient. B. senegalense is an aerobic rod-shaped Gram-positive bacterium. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 3,425,960 bp long genome (1 chromosome but no plasmid) contains 3,064 protein-coding and 49 RNA genes.  相似文献   

11.
Aeromicrobium massiliense strain JC14Tsp. nov. is the type strain of Aeromicrobium massiliense sp. nov., a new species within the genus Aeromicrobium. This strain, whose genome is described here, was isolated from the fecal microbiota of an asymptomatic patient. Aeromicrobium massiliense is an aerobic rod-shaped gram-positive bacterium. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 3,322,119 bp long genome contains 3,296 protein-coding and 51 RNA genes.  相似文献   

12.
The Actinobacteria, Corynebacterium pseudotuberculosis strain P54B96, a nonmotile, non-sporulating and a mesophile bacterium, was isolated from liver, lung and mediastinal lymph node lesions in an antelope from South Africa. This strain is interesting in the sense that it has been found together with non-tuberculous mycobacteria (NTMs) which could nevertheless play a role in the lesion formation. In this work, we describe a set of features of C. pseudotuberculosis P54B96, together with the details of the complete genome sequence and annotation. The genome comprises of 2.34 Mbp long, single circular genome with 2,084 protein-coding genes, 12 rRNA, 49 tRNA and 62 pseudogenes and a G+C content of 52.19%. The analysis of the genome sequence provides means to better understanding the molecular and genetic basis of virulence of this bacterium, enabling a detailed investigation of its pathogenesis.Keyword: s: biovar ovis, Gram-positive pathogen, caseous lymphadenitis/cheesy gland disease, liver lesion, Antelope, genome sequencing, Ion Torrent  相似文献   

13.
Gluconobacter thailandicus strain NBRC 3257, isolated from downy cherry (Prunus tomentosa), is a strict aerobic rod-shaped Gram-negative bacterium. Here, we report the features of this organism, together with the draft genome sequence and annotation. The draft genome sequence is composed of 107 contigs for 3,446,046 bp with 56.17% G+C content and contains 3,360 protein-coding genes and 54 RNA genes.  相似文献   

14.
Pyrobaculum oguniense TE7 is an aerobic hyperthermophilic crenarchaeon isolated from a hot spring in Japan. Here we describe its main chromosome of 2,436,033 bp, with three large-scale inversions and an extra-chromosomal element of 16,887 bp. We have annotated 2,800 protein-coding genes and 145 RNA genes in this genome, including nine H/ACA-like small RNA, 83 predicted C/D box small RNA, and 47 transfer RNA genes. Comparative analyses with the closest known relative, the anaerobe Pyrobaculum arsenaticum from Italy, reveals unexpectedly high synteny and nucleotide identity between these two geographically distant species. Deep sequencing of a mixture of genomic DNA from multiple cells has illuminated some of the genome dynamics potentially shared with other species in this genus.  相似文献   

15.
16.
Desulfotomaculum ruminis Campbell and Postgate 1965 is a member of the large genus Desulfotomaculum which contains 30 species and is contained in the family Peptococcaceae. This species is of interest because it represents one of the few sulfate-reducing bacteria that have been isolated from the rumen. Here we describe the features of D. ruminis together with the complete genome sequence and annotation. The 3,969,014 bp long chromosome with a total of 3,901 protein-coding and 85 RNA genes is the second completed genome sequence of a type strain of the genus Desulfotomaculum to be published, and was sequenced as part of the DOE Joint Genome Institute Community Sequencing Program 2009.Keywords : anaerobic, motile, sporulating, mesophilic, sulfate-reducer, hydrogen sulfide, incomplete oxidizer, mixotrophic, CSP 2009, Peptococcaceae, Clostridiales  相似文献   

17.

Background

Spirodela polyrhiza is a species of the order Alismatales, which represent the basal lineage of monocots with more ancestral features than the Poales. Its complete sequence of the mitochondrial (mt) genome could provide clues for the understanding of the evolution of mt genomes in plant.

Methods

Spirodela polyrhiza mt genome was sequenced from total genomic DNA without physical separation of chloroplast and nuclear DNA using the SOLiD platform. Using a genome copy number sensitive assembly algorithm, the mt genome was successfully assembled. Gap closure and accuracy was determined with PCR products sequenced with the dideoxy method.

Conclusions

This is the most compact monocot mitochondrial genome with 228,493 bp. A total of 57 genes encode 35 known proteins, 3 ribosomal RNAs, and 19 tRNAs that recognize 15 amino acids. There are about 600 RNA editing sites predicted and three lineage specific protein-coding-gene losses. The mitochondrial genes, pseudogenes, and other hypothetical genes (ORFs) cover 71,783 bp (31.0%) of the genome. Imported plastid DNA accounts for an additional 9,295 bp (4.1%) of the mitochondrial DNA. Absence of transposable element sequences suggests that very few nuclear sequences have migrated into Spirodela mtDNA. Phylogenetic analysis of conserved protein-coding genes suggests that Spirodela shares the common ancestor with other monocots, but there is no obvious synteny between Spirodela and rice mtDNAs. After eliminating genes, introns, ORFs, and plastid-derived DNA, nearly four-fifths of the Spirodela mitochondrial genome is of unknown origin and function. Although it contains a similar chloroplast DNA content and range of RNA editing as other monocots, it is void of nuclear insertions, active gene loss, and comprises large regions of sequences of unknown origin in non-coding regions. Moreover, the lack of synteny with known mitochondrial genomic sequences shed new light on the early evolution of monocot mitochondrial genomes.  相似文献   

18.
19.
Senegalemassilia anaerobia strain JC110T sp.nov. is the type strain of Senegalemassilia anaerobia gen. nov., sp. nov., the type species of a new genus within the Coriobacteriaceae family, Senegalemassilia gen. nov. This strain, whose genome is described here, was isolated from the fecal flora of a healthy Senegalese patient. S. anaerobia is a Gram-positive anaerobic coccobacillus. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 2,383,131 bp long genome contains 1,932 protein-coding and 58 RNA genes.  相似文献   

20.
The complete mitochondrial genome of Zhikong scallop Chlamys farreri is 21,695 bp in length and contains 12 protein-coding genes (the atp8 gene is absent, as in most bivalves), 2 ribosomal RNA genes, and 22 transfer RNA genes. The heavy strand has an overall A+T content of 58.7%. GC and AT skews for the mt genome of C. farreri are 0.337 and ?0.184, respectively, indicating the nucleotide bias against C and A. The mitochondrial gene order of C. farreri differs drastically from the scallops Argopecten irradians, Mimachlamys nobilis and Placopecten magellanicus, which belong to the same family Pectinidae. 6623 bp non-coding nucleotides exist intergenically in the mitogenome of C. farreri, with a large continuous sequence (4763 bp) between tRNA Val and tRNA Asn . Two repeat families are found in the large continuous sequence, which seems to be a common feature of scallops. Phylogenetic analysis based on 12 concatenated amino acid sequences of protein-coding genes supports the monophyly of Pectinidae and paraphyletic Pteriomorphia with respect to Heteroconchia.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号