首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The complete nucleotide sequence of the genome of a symbiotic bacterium Bradyrhizobium japonicum USDA110 was determined. The genome of B. japonicum was a single circular chromosome 9,105,828 bp in length with an average GC content of 64.1%. No plasmid was detected. The chromosome comprises 8317 potential protein-coding genes, one set of rRNA genes and 50 tRNA genes. Fifty-two percent of the potential protein genes showed sequence similarity to genes of known function and 30% to hypothetical genes. The remaining 18% had no apparent similarity to reported genes. Thirty-four percent of the B. japonicum genes showed significant sequence similarity to those of both Mesorhizobium loti and Sinorhizobium meliloti, while 23% were unique to this species. A presumptive symbiosis island 681 kb in length, which includes a 410-kb symbiotic region previously reported by G?ttfert et al., was identified. Six hundred fifty-five putative protein-coding genes were assigned in this region, and the functions of 301 genes, including those related to symbiotic nitrogen fixation and DNA transmission, were deduced. A total of 167 genes for transposases/104 copies of insertion sequences were identified in the genome. It was remarkable that 100 out of 167 transposase genes are located in the presumptive symbiotic island. DNA segments of 4 to 97 kb inserted into tRNA genes were found at 14 locations in the genome, which generates partial duplication of the target tRNA genes. These observations suggest plasticity of the B. japonicum genome, which is probably due to complex genome rearrangements such as horizontal transfer and insertion of various DNA elements, and to homologous recombination.  相似文献   

2.
The complete sequence of the genome of a hyper-thermophilicarchaebacterium, Pyrococcus horikoshii OT3, has been determinedby assembling the sequences of the physical map-based contigsof fosmid clones and of long polymerase chain reaction (PCR)products which were used for gap-filling. The entire lengthof the genome was 1,738,505 bp. The authenticity of the entiregenome sequence was supported by restriction analysis of longPCR products, which were directly amplified from the genomicDNA. As the potential protein-coding regions, a total of 2061open reading frames (ORFs) were assigned, and by similaritysearch against public databases, 406 (19.7%) were related togenes with putative function and 453 (22.0%) to the sequencesregistered but with unknown function. The remaining 1202 ORFs(58.3%) did not show any significant similarity to the sequencesin the databases. Sequence comparison among the assigned ORFsin the genome provided evidence that a considerable number ofORFs were generated by sequence duplication. By similarity search,11 ORFs were assumed to contain the intein elements. The RNAgenes identified were a single 16S-23S rRNA operon, two 5S rRNAgenes and 46 tRNA genes including two with the intron structure.All the assigned ORFs and RNA coding regions occupied 91.25%of the whole genome. The data presented in this paper are availableon the internet at http://www.nite.go.jp.  相似文献   

3.
The complete sequence of the genome of an aerobic hyper-thermophiliccrenarchaeon, Aeropyrum pernix K1, which optimally grows at95°C, has been determined by the whole genome shotgun methodwith some modifications. The entire length of the genome was1,669,695 bp. The authenticity of the entire sequence was supportedby restriction analysis of long PCR products, which were directlyamplified from the genomic DNA. As the potential protein-codingregions, a total of 2,694 open reading frames (ORFs) were assigned.By similarity search against public databases, 633 (23.5%) ofthe ORFs were related to genes with putative function and 523(19.4%) to the sequences registered but with unknown function.All the genes in the TCA cycle except for that of alpha-ketoglutaratedehydrogenase were included, and instead of the alpha-ketoglutaratedehydrogenase gene, the genes coding for the two subunits of2-oxoacid:ferredoxin oxidoreductase were identified. The remaining1,538 ORFs (57.1%) did not show any significant similarity tothe sequences in the databases. Sequence comparison among theassigned ORFs suggested that a considerable member of ORFs weregenerated by sequence duplication. The RNA genes identifiedwere a single 16S–23S rRNA operon, two 5S rRNA genes and47 tRNA genes including 14 genes with intron structures. Allthe assigned ORFs and RNA coding regions occupied 89.12% ofthe whole genome. The data presented in this paper are availableon the internet homepage (http://www.mild.nite.go.jp).  相似文献   

4.
The 718,122 base pair (bp) sequence of the Escherichia coliK-12 genome corresponding to the region from 12.7 to 28.0 minuteson the genetic map is described. This region contains at least682 potential open reading frames, of which 278 (41%) have beenpreviously identified, 147 (22%) were homologous to other knowngenes, 138 (20%) are identical or similar to the hypotheticalgenes registered in databases, and the remaining 119 (17%) didnot show a significant similarity to any other gene. In thisregion, we assigned a cluster of cit genes encoding multienzymecitrate lyase, two clusters of fimbrial genes and a set of lysogenicphage genes encoding integrase, excisionase and repressor inthe e14 genetic element. In addition, a new valine tRNA gene,designated valZ, and a family of long directly repeated sequences,LDR-A, -B and -C, were found.  相似文献   

5.
Lancefield group C Streptococcus dysgalactiae causes infections in farmed fish. Here, the genome of S. dysgalactiae strain kdys0611, isolated from farmed amberjack (Seriola dumerili) was sequenced. The complete genome sequence of kdys0611 consists of a single chromosome and five plasmids. The chromosome is 2,142,780 bp long and has a GC content of 40%. It possesses 2061 coding sequences and 67 tRNA and 6 rRNA operons. One clustered regularly interspaced short palindromic repeat, 125 insertion sequences, and four predicted prophage elements were identified. Phylogenetic analysis based on 126 core genes suggested that the kdys0611 strain is more closely related to S. dysgalactiae subsp. dysgalactiae than to S. dysgalactiae subsp. equisimilis. The genome of kdys0611 harbors 87 genes with sequence similarity to putative virulence‐associated genes identified in other bacteria, of which 57 exhibit amino acid identity (>52%) to genes of the S. dysgalactiae subsp. equisimilis GGS124 human clinical isolate. Four putative virulence genes, emm5 (FGCSD_0256), spg_2 (FGCSD_1961), skc (FGCSD_1012), and cna (FGCSD_0159), in kdys0611 did not show significant homology with any deposited S. dysgalactiae genes. The chromosomal sequence of kdys0611 has been deposited in GenBank under Accession No. AP018726. This is the first report of the complete genome sequence of S. dysgalactiae isolated from fish.  相似文献   

6.
Complete DNA sequence of yeast chromosome II.   总被引:20,自引:2,他引:18       下载免费PDF全文
In the framework of the EU genome-sequencing programmes, the complete DNA sequence of the yeast Saccharomyces cerevisiae chromosome II (807 188 bp) has been determined. At present, this is the largest eukaryotic chromosome entirely sequenced. A total of 410 open reading frames (ORFs) were identified, covering 72% of the sequence. Similarity searches revealed that 124 ORFs (30%) correspond to genes of known function, 51 ORFs (12.5%) appear to be homologues of genes whose functions are known, 52 others (12.5%) have homologues the functions of which are not well defined and another 33 of the novel putative genes (8%) exhibit a degree of similarity which is insufficient to confidently assign function. Of the genes on chromosome II, 37-45% are thus of unpredicted function. Among the novel putative genes, we found several that are related to genes that perform differentiated functions in multicellular organisms of are involved in malignancy. In addition to a compact arrangement of potential protein coding sequences, the analysis of this chromosome confirmed general chromosome patterns but also revealed particular novel features of chromosomal organization. Alternating regional variations in average base composition correlate with variations in local gene density along chromosome II, as observed in chromosomes XI and III. We propose that functional ARS elements are preferably located in the AT-rich regions that have a spacing of approximately 110 kb. Similarly, the 13 tRNA genes and the three Ty elements of chromosome II are found in AT-rich regions. In chromosome II, the distribution of coding sequences between the two strands is biased, with a ratio of 1.3:1. An interesting aspect regarding the evolution of the eukaryotic genome is the finding that chromosome II has a high degree of internal genetic redundancy, amounting to 16% of the coding capacity.  相似文献   

7.
The nucleotide sequence of the entire genome of a filamentous cyanobacterium, Anabaena sp. strain PCC 7120, was determined. The genome of Anabaena consisted of a single chromosome (6,413,771 bp) and six plasmids, designated pCC7120alpha (408,101 bp), pCC7120beta (186,614 bp), pCC7120gamma (101,965 bp), pCC7120delta (55,414 bp), pCC7120epsilon (40,340 bp), and pCC7120zeta (5,584 bp). The chromosome bears 5368 potential protein-encoding genes, four sets of rRNA genes, 48 tRNA genes representing 42 tRNA species, and 4 genes for small structural RNAs. The predicted products of 45% of the potential protein-encoding genes showed sequence similarity to known and predicted proteins of known function, and 27% to translated products of hypothetical genes. The remaining 28% lacked significant similarity to genes for known and predicted proteins in the public DNA databases. More than 60 genes involved in various processes of heterocyst formation and nitrogen fixation were assigned to the chromosome based on their similarity to the reported genes. One hundred and ninety-five genes coding for components of two-component signal transduction systems, nearly 2.5 times as many as those in Synechocystis sp. PCC 6803, were identified on the chromosome. Only 37% of the Anabaena genes showed significant sequence similarity to those of Synechocystis, indicating a high degree of divergence of the gene information between the two cyanobacterial strains.  相似文献   

8.
9.
The complete nucleotide sequence of the genome of a symbiotic bacterium Mesorhizobium loti strain MAFF303099 was determined. The genome of M. loti consisted of a single chromosome (7,036,071 bp) and two plasmids, designated as pMLa (351,911 bp) and pMLb (208, 315 bp). The chromosome comprises 6752 potential protein-coding genes, two sets of rRNA genes and 50 tRNA genes representing 47 tRNA species. Fifty-four percent of the potential protein genes showed sequence similarity to genes of known function, 21% to hypothetical genes, and the remaining 25% had no apparent similarity to reported genes. A 611-kb DNA segment, a highly probable candidate of a symbiotic island, was identified, and 30 genes for nitrogen fixation and 24 genes for nodulation were assigned in this region. Codon usage analysis suggested that the symbiotic island as well as the plasmids originated and were transmitted from other genetic systems. The genomes of two plasmids, pMLa and pMLb, contained 320 and 209 potential protein-coding genes, respectively, for a variety of biological functions. These include genes for the ABC-transporter system, phosphate assimilation, two-component system, DNA replication and conjugation, but only one gene for nodulation was identified.  相似文献   

10.
Leishmania parasites (order Kinetoplastida, family Trypanosomatidae) cause a spectrum of human diseases ranging from asymptomatic to lethal. The ~33.6 Mb genome is distributed among 36 chromosome pairs that range in size from ~0.3 to 2.8 Mb. The complete nucleotide sequence of Leishmania major Friedlin chromosome 1 revealed 79 protein-coding genes organized into two divergent polycistronic gene clusters with the mRNAs transcribed towards the telomeres. We report here the complete nucleotide sequence of chromosome 3 (384 518 bp) and an analysis revealing 95 putative protein-coding ORFs. The ORFs are primarily organized into two large convergent polycistronic gene clusters (i.e. transcribed from the telomeres). In addition, a single gene at the left end is transcribed divergently towards the telomere, and a tRNA gene separates the two convergent gene clusters. Numerous genes have been identified, including those for metabolic enzymes, kinases, transporters, ribosomal proteins, spliceosome components, helicases, an RNA-binding protein and a DNA primase subunit.  相似文献   

11.
The entire genome of the unicellular cyanobacterium Synechococcus elongatus PCC 6301 (formerly Anacystis nidulans Berkeley strain 6301) was sequenced. The genome consisted of a circular chromosome 2,696,255 bp long. A total of 2,525 potential protein-coding genes, two sets of rRNA genes, 45 tRNA genes representing 42 tRNA species, and several genes for small stable RNAs were assigned to the chromosome by similarity searches and computer predictions. The translated products of 56% of the potential protein-coding genes showed sequence similarities to experimentally identified and predicted proteins of known function, and the products of 35% of the genes showed sequence similarities to the translated products of hypothetical genes. The remaining 9% of genes lacked significant similarities to genes for predicted proteins in the public DNA databases. Some 139 genes coding for photosynthesis-related components were identified. Thirty-seven genes for two-component signal transduction systems were also identified. This is the smallest number of such genes identified in cyanobacteria, except for marine cyanobacteria, suggesting that only simple signal transduction systems are found in this strain. The gene arrangement and nucleotide sequence of Synechococcus elongatus PCC 6301 were nearly identical to those of a closely related strain Synechococcus elongatus PCC 7942, except for the presence of a 188.6 kb inversion. The sequences as well as the gene information shown in this paper are available in the Web database, CYORF (http://www.cyano.genome.jp/).  相似文献   

12.
桔小实蝇线粒体基因组全序列及其分析   总被引:1,自引:0,他引:1  
桔小实蝇Bactrocera dorsalis线粒体基因组全序列对研究实蝇分子系统进化具有重要意义。本研究通过DNA测序和克隆技术,对桔小实蝇mtDNA全序列进行了测定和分析。结果表明:桔小实蝇线粒体基因组全长15 915 bp(GenBank序列号: DQ845759)。基因组碱基组成为39.3%A,16.2%C,10.2%G,34.3%T,由13个蛋白编码基因、22个tRNA基因、2个rRNA基因以及一个非编码的控制区域(A+T-rich区)组成。7个蛋白编码基因和13个tRNA基因从J链编码,其余6个蛋白编码基因和9个tRNA基因从N链编码。位于J链上的蛋白编码基因具有近似的A、T含量,而位于N链上的蛋白编码基因的A的含量明显高于T的含量。以mtDNA COⅠ基因为例,比较了桔小实蝇与其他14种实蝇的亲缘关系,结果显示其与同亚属(果实蝇亚属Bactrocera)内的其他近缘种相互间的同源性很高。  相似文献   

13.
14.
Microcystis aeruginosa is one of the most common bloom-forming cyanobacteria in freshwater ecosystems worldwide. This species produces numerous secondary metabolites, including microcystins, which are harmful to human health. We sequenced the genomes of ten strains of M. aeruginosa in order to explore the genomic basis of their ability to occupy varied environments and proliferate. Our findings show that M. aeruginosa genomes are characterized by having a large open pangenome, and that each genome contains similar proportions of core and flexible genes. By comparing the GC content of each gene to the mean value of the whole genome, we estimated that in each genome, around 11% of the genes seem to result from recent horizontal gene transfer events. Moreover, several large gene clusters resulting from HGT (up to 19 kb) have been found, illustrating the ability of this species to integrate such large DNA molecules. It appeared also that all M. aeruginosa displays a large genomic plasticity, which is characterized by a high proportion of repeat sequences and by low synteny values between the strains. Finally, we identified 13 secondary metabolite gene clusters, including three new putative clusters. When comparing the genomes of Microcystis and Prochlorococcus, one of the dominant picocyanobacteria living in marine ecosystems, our findings show that they are characterized by having almost opposite evolutionary strategies, both of which have led to ecological success in their respective environments.  相似文献   

15.

Background

In addition to gene identification and annotation, repetitive sequence analysis has become an integral part of genome sequencing projects. Identification of repeats is important not only because it improves gene prediction, but also because of the role that repetitive sequences play in determining the structure and evolution of genes and genomes. Several methods using different repeat-finding strategies are available for whole-genome repeat sequence analysis. Four independent approaches were used to identify and characterize the repetitive fraction of the Mycosphaerella graminicola (synonym Zymoseptoria tritici) genome. This ascomycete fungus is a wheat pathogen and its finished genome comprises 21 chromosomes, eight of which can be lost with no obvious effects on fitness so are dispensable.

Results

Using a combination of four repeat-finding methods, at least 17% of the M. graminicola genome was estimated to be repetitive. Class I transposable elements, that amplify via an RNA intermediate, account for about 70% of the total repetitive content in the M. graminicola genome. The dispensable chromosomes had a higher percentage of repetitive elements as compared to the core chromosomes. Distribution of repeats across the chromosomes also varied, with at least six chromosomes showing a non-random distribution of repetitive elements. Repeat families showed transition mutations and a CpA → TpA dinucleotide bias, indicating the presence of a repeat-induced point mutation (RIP)-like mechanism in M. graminicola. One gene family and two repeat families specific to subtelomeres also were identified in the M. graminicola genome. A total of 78 putative clusters of nested elements was found in the M. graminicola genome. Several genes with putative roles in pathogenicity were found associated with these nested repeat clusters. This analysis of the transposable element content in the finished M. graminicola genome resulted in a thorough and highly curated database of repetitive sequences.

Conclusions

This comprehensive analysis will serve as a scaffold to address additional biological questions regarding the origin and fate of transposable elements in fungi. Future analyses of the distribution of repetitive sequences in M. graminicola also will be able to provide insights into the association of repeats with genes and their potential role in gene and genome evolution.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-1132) contains supplementary material, which is available to authorized users.  相似文献   

16.
17.
The contiguous 874.423 base pair sequence corresponding to the50.0–68.8 min region on the genetic map of the Escherichiacoli K-12 (W3110) was constructed by the determination of DNAsequences in the 50.0–57.9 min region (360 kb) and twolarge (100 kb in all) and five short gaps in the 57.9–68.8min region whose sequences had been registered in the DNA databases.We analyzed its sequence features and found that this regioncontained at least 894 potential open reading frames (ORFs),of which 346 (38.7%) were previously reported, 158 (17.7%) werehomologous to other known genes, 232 (26.0%) were identicalor similar to hypothetical genes registered in databases, andthe remaining 158 (17.7%) showed no significant similarity toany other genes. A homology search of the ORFs also identifiedseveral new gene clusters. Those include two clusters of fimbrialgenes, a gene cluster of three genes encoding homologues ofthe human long chain fatty acid degradation enzyme complex inthe mitochondrial membrane, a cluster of at least nine genesinvolved in the utilization of ethanolamine, a cluster of thesecondary set of 11 hyc genes participating in the formate hydrogenlyasereaction and a cluster of five genes coding for the homologuesof degradation enzymes for aromatic hydrocarbons in Pseudomonasputida. We also noted a variety of novel genes, including twoORFs, which were homologous to the putative genes encoding xanthinedehydrogenase in the fly and a protein responsible for axonalguidance and outgrowth of the rat, mouse and nematode. An isoleucinetRNA gene, designated ileY , was also newly identified at 60.0min.  相似文献   

18.
The complete nucleotide sequence of the cucumber (C. sativus L. var. Borszczagowski) chloroplast genome has been determined. The genome is composed of 155,293 bp containing a pair of inverted repeats of 25,191 bp, which are separated by two single-copy regions, a small 18,222-bp one and a large 86,688-bp one. The chloroplast genome of cucumber contains 130 known genes, including 89 protein-coding genes, 8 ribosomal RNA genes (4 rRNA species), and 37 tRNA genes (30 tRNA species), with 18 of them located in the inverted repeat region. Of these genes, 16 contain one intron, and two genes and one ycf contain 2 introns. Twenty-one small inversions that form stem-loop structures, ranging from 18 to 49 bp, have been identified. Eight of them show similarity to those of other species, while eight seem to be cucumber specific. Detailed comparisons of ycf2 and ycf15, and the overall structure to other chloroplast genomes were performed.  相似文献   

19.
The mitochondrial genome (mtDNA) of the entomopathogenic fungus Metarhizium anisopliae var. anisopliae, with a total size of 24,673 bp, was one of the smallest known mtDNAs of Pezizomycotina. It contained the 14 typical genes coding for proteins related to oxidative phosphorylation, the two rRNA genes, a single intron that harbored an intronic ORF coding for a putative ribosomal protein (rps) within the large rRNA gene (rnl), and a set of 24 tRNA genes which recognized codons for all amino acids, except proline and valine. Gene order comparison with all known mtDNAs of Sordariomycetes illustrated a highly conserved genome organization for all the protein- and rRNA-coding genes, as well as three clusters of tRNA genes. By considering all mitochondrial essential protein-coding genes as one unit a phylogenetic study of these small genomes strongly supported the common evolutionary course of Sordariomycetes (100% bootstrap support) and highlighted the advantages of analyzing small genomes (mtDNA) over single genes. In addition, comparative analysis of three intergenic regions demonstrated sequence variability that can be exploited for intra- and inter-specific identification of Metarhizium. Electronic Supplementary Material Supplementary material is available for this article at and is accessible for authorized users.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号