首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The complete nucleotide sequence of Saccharomyces cerevisiae chromosome X (745 442 bp) reveals a total of 379 open reading frames (ORFs), the coding region covering approximately 75% of the entire sequence. One hundred and eighteen ORFs (31%) correspond to genes previously identified in S. cerevisiae. All other ORFs represent novel putative yeast genes, whose function will have to be determined experimentally. However, 57 of the latter subset (another 15% of the total) encode proteins that show significant analogy to proteins of known function from yeast or other organisms. The remaining ORFs, exhibiting no significant similarity to any known sequence, amount to 54% of the total. General features of chromosome X are also reported, with emphasis on the nucleotide frequency distribution in the environment of the ATG and stop codons, the possible coding capacity of at least some of the small ORFs (<100 codons) and the significance of 46 non-canonical or unpaired nucleotides in the stems of some of the 24 tRNA genes recognized on this chromosome.  相似文献   

2.
Complete DNA sequence of yeast chromosome II.   总被引:20,自引:2,他引:18       下载免费PDF全文
In the framework of the EU genome-sequencing programmes, the complete DNA sequence of the yeast Saccharomyces cerevisiae chromosome II (807 188 bp) has been determined. At present, this is the largest eukaryotic chromosome entirely sequenced. A total of 410 open reading frames (ORFs) were identified, covering 72% of the sequence. Similarity searches revealed that 124 ORFs (30%) correspond to genes of known function, 51 ORFs (12.5%) appear to be homologues of genes whose functions are known, 52 others (12.5%) have homologues the functions of which are not well defined and another 33 of the novel putative genes (8%) exhibit a degree of similarity which is insufficient to confidently assign function. Of the genes on chromosome II, 37-45% are thus of unpredicted function. Among the novel putative genes, we found several that are related to genes that perform differentiated functions in multicellular organisms of are involved in malignancy. In addition to a compact arrangement of potential protein coding sequences, the analysis of this chromosome confirmed general chromosome patterns but also revealed particular novel features of chromosomal organization. Alternating regional variations in average base composition correlate with variations in local gene density along chromosome II, as observed in chromosomes XI and III. We propose that functional ARS elements are preferably located in the AT-rich regions that have a spacing of approximately 110 kb. Similarly, the 13 tRNA genes and the three Ty elements of chromosome II are found in AT-rich regions. In chromosome II, the distribution of coding sequences between the two strands is biased, with a ratio of 1.3:1. An interesting aspect regarding the evolution of the eukaryotic genome is the finding that chromosome II has a high degree of internal genetic redundancy, amounting to 16% of the coding capacity.  相似文献   

3.
The complete sequence of the genome of an aerobic hyper-thermophiliccrenarchaeon, Aeropyrum pernix K1, which optimally grows at95°C, has been determined by the whole genome shotgun methodwith some modifications. The entire length of the genome was1,669,695 bp. The authenticity of the entire sequence was supportedby restriction analysis of long PCR products, which were directlyamplified from the genomic DNA. As the potential protein-codingregions, a total of 2,694 open reading frames (ORFs) were assigned.By similarity search against public databases, 633 (23.5%) ofthe ORFs were related to genes with putative function and 523(19.4%) to the sequences registered but with unknown function.All the genes in the TCA cycle except for that of alpha-ketoglutaratedehydrogenase were included, and instead of the alpha-ketoglutaratedehydrogenase gene, the genes coding for the two subunits of2-oxoacid:ferredoxin oxidoreductase were identified. The remaining1,538 ORFs (57.1%) did not show any significant similarity tothe sequences in the databases. Sequence comparison among theassigned ORFs suggested that a considerable member of ORFs weregenerated by sequence duplication. The RNA genes identifiedwere a single 16S–23S rRNA operon, two 5S rRNA genes and47 tRNA genes including 14 genes with intron structures. Allthe assigned ORFs and RNA coding regions occupied 89.12% ofthe whole genome. The data presented in this paper are availableon the internet homepage (http://www.mild.nite.go.jp).  相似文献   

4.
The complete sequence of the genome of a hyper-thermophilicarchaebacterium, Pyrococcus horikoshii OT3, has been determinedby assembling the sequences of the physical map-based contigsof fosmid clones and of long polymerase chain reaction (PCR)products which were used for gap-filling. The entire lengthof the genome was 1,738,505 bp. The authenticity of the entiregenome sequence was supported by restriction analysis of longPCR products, which were directly amplified from the genomicDNA. As the potential protein-coding regions, a total of 2061open reading frames (ORFs) were assigned, and by similaritysearch against public databases, 406 (19.7%) were related togenes with putative function and 453 (22.0%) to the sequencesregistered but with unknown function. The remaining 1202 ORFs(58.3%) did not show any significant similarity to the sequencesin the databases. Sequence comparison among the assigned ORFsin the genome provided evidence that a considerable number ofORFs were generated by sequence duplication. By similarity search,11 ORFs were assumed to contain the intein elements. The RNAgenes identified were a single 16S-23S rRNA operon, two 5S rRNAgenes and 46 tRNA genes including two with the intron structure.All the assigned ORFs and RNA coding regions occupied 91.25%of the whole genome. The data presented in this paper are availableon the internet at http://www.nite.go.jp.  相似文献   

5.
The nucleotide sequence of 45,389 bp in the 184°-;180°region of the Bacillus subtilis chromosome, containing the cgecluster, which is controlled by the sporulation regulatory proteinGerE, was determined. Fifty-four putative ORFs with putativeribosome-binding sites were recognized. Seven of them correspondto previously characterized genes: cgeB, cgeA, cgeC, cgeD, cgeE,ctpA, and odhA. The deduced products of 25 ORFs were found todisplay significant similarities to proteins in the data banks.We have identified genes involved in detoxification, cell walls,and in the metabolism of biotins, purines, fatty acids, carbohydratesand amino acids. The remaining 22 ORFs showed no similarityto known proteins. Both an attachment site of the SPßprophage and 2 new putative DNA replication terminators wereidentified in this region.  相似文献   

6.
We have determined a 180 kb contiguous sequence in the replicationorigin region of the Bacillus subtilis chromosome. Open readingframes (ORF) in this region were unambiguously identified fromthe determined sequence, using criteria characteristic for theB. subtilis gene structure, i.e., starting with an ATG, GTGor TTG codon preceded by sequences complementary to the 3' endof the 16S rRNA. Four rRNA gene sets, 7 individual tRNA genesand 1 scRNA gene were identified, occupying 20 kb in total.In the remaining 160 kb region, 158 ORFs were identified, suggestingthat 1 ORF is coded on average by 1 kb of DNA of the B. subtilisgenome. Among the 158 ORFs, the functions of 48 ORFs were assignedand those of 11 ORFs are suggested through significant similaritiesto known proteins present in data banks. However, the functionsof more than half of the ORFs (63%) remain to be determined.  相似文献   

7.
Within the framework of an international Bacillus subtilis genomesequencing project, we have determined a 36-kb sequence coveringthe region between the gntZ and trnY genes. In addition to fivegenes sequenced and characterized previously, 27 putative proteincoding sequences (open reading frame; ORF) were identified.A homology search for the newly identified ORFs revealed thatsix of them had similarities to known proteins. It is notablethat new ORFs belonging to response-regulator aspartate phosphatase(Rap) and its regulator (Phr) families, and response regulatorand sensory kinase families of two-component signal transductionsystems have been identified. Furthermore, we found that some180-bp non-coding sequence, that might be an remnant of an ancientIS element, is preserved in at least five loci of the B. subtilisgenome.  相似文献   

8.
Zhang CT  Wang J 《Nucleic acids research》2000,28(14):2804-2814
The Z curve is a three-dimensional space curve constituting the unique representation of a given DNA sequence in the sense that each can be uniquely reconstructed from the other. Based on the Z curve, a new protein coding gene-finding algorithm specific for the yeast genome at better than 95% accuracy has been proposed. Six cross-validation tests were performed to confirm the above accuracy. Using the new algorithm, the number of protein coding genes in the yeast genome is re-estimated. The estimate is based on the assumption that the unknown genes have similar statistical properties to the known genes. It is found that the number of protein coding genes in the 16 yeast chromosomes is ≤5645, significantly smaller than the 5800–6000 which is widely accepted, and much larger than the 4800 estimated by another group recently. The mitochondrial genes were not included into the above estimate. A codingness index called the YZ score (YZ Œ [0,1]) is proposed to recognize protein coding genes in the yeast genome. Among the ORFs annotated in the MIPS (Munich Information Centre for Protein Sequences) database, those recognized as non-coding by the present algorithm are listed in this paper in detail. The criterion for a coding or non-coding ORF is simply decided by YZ > 0.5 or YZ < 0.5, respectively. The YZ scores for all the ORFs annotated in the MIPS database have been calculated and are available on request by sending email to the corresponding author.  相似文献   

9.
The contiguous 874.423 base pair sequence corresponding to the50.0–68.8 min region on the genetic map of the Escherichiacoli K-12 (W3110) was constructed by the determination of DNAsequences in the 50.0–57.9 min region (360 kb) and twolarge (100 kb in all) and five short gaps in the 57.9–68.8min region whose sequences had been registered in the DNA databases.We analyzed its sequence features and found that this regioncontained at least 894 potential open reading frames (ORFs),of which 346 (38.7%) were previously reported, 158 (17.7%) werehomologous to other known genes, 232 (26.0%) were identicalor similar to hypothetical genes registered in databases, andthe remaining 158 (17.7%) showed no significant similarity toany other genes. A homology search of the ORFs also identifiedseveral new gene clusters. Those include two clusters of fimbrialgenes, a gene cluster of three genes encoding homologues ofthe human long chain fatty acid degradation enzyme complex inthe mitochondrial membrane, a cluster of at least nine genesinvolved in the utilization of ethanolamine, a cluster of thesecondary set of 11 hyc genes participating in the formate hydrogenlyasereaction and a cluster of five genes coding for the homologuesof degradation enzymes for aromatic hydrocarbons in Pseudomonasputida. We also noted a variety of novel genes, including twoORFs, which were homologous to the putative genes encoding xanthinedehydrogenase in the fly and a protein responsible for axonalguidance and outgrowth of the rat, mouse and nematode. An isoleucinetRNA gene, designated ileY , was also newly identified at 60.0min.  相似文献   

10.
The unannotated regions of the Escherichia coli genome DNA sequence from the EcoSeq6 database, totaling 1,278 'intergenic' sequences of the combined length of 359,279 basepairs, were analyzed using computer-assisted methods with the aim of identifying putative unknown genes. The proposed strategy for finding new genes includes two key elements: i) prediction of expressed open reading frames (ORFs) using the GeneMark method based on Markov chain models for coding and non-coding regions of Escherichia coli DNA, and ii) search for protein sequence similarities using programs based on the BLAST algorithm and programs for motif identification. A total of 354 putative expressed ORFs were predicted by GeneMark. Using the BLASTX and TBLASTN programs, it was shown that 208 ORFs located in the unannotated regions of the E. coli chromosome are significantly similar to other protein sequences. Identification of 182 ORFs as probable genes was supported by GeneMark and BLAST, comprising 51.4% of the GeneMark 'hits' and 87.5% of the BLAST 'hits'. 73 putative new genes, comprising 20.6% of the GeneMark predictions, belong to ancient conserved protein families that include both eubacterial and eukaryotic members. This value is close to the overall proportion of highly conserved sequences among eubacterial proteins, indicating that the majority of the putative expressed ORFs that are predicted by GeneMark, but have no significant BLAST hits, nevertheless are likely to be real genes. The majority of the putative genes identified by BLAST search have been described since the release of the EcoSeq6 database, but about 70 genes have not been detected so far. Among these new identifications are genes encoding proteins with a variety of predicted functions including dehydrogenases, kinases, several other metabolic enzymes, ATPases, rRNA methyltransferases, membrane proteins, and different types of regulatory proteins.  相似文献   

11.
Within the framework of an international project for the sequencingof the entire Bacillus subtilis genome, a 36-kb chromosome segment,which covers the region between the gnt and iol operons, hasbeen cloned and sequenced. This region (36447 bp) contains 33complete open reading frames (ORFs; genes) including the fourgnt genes and one partial gene. A homology search for the productsof the 33 complete ORFs revealed significant homology to knownproteins in 16 of them such as tetracycline resistance protein(Clostridium perfringens), asparagine synthetase (Arabidopsisthaliana), aldehyde dehydrogenase (Pseudomonas oleovorans),2,5-dichloro-2,5-cyclohexadiene-1,4-diol dehydrogenase (P. paucimobilis),heat shock protein HtpG (Escherichia coli), galactose-protonsymporter (E. coli), auxin-induced protein (common tobacco),glucitol operon repressor (E. coli) and methylmalonate-semialdehydedehydrogenase (P. aeruginosa). Unlike the regions we sequencedso far, this region contained two short sequence multiplications:one was a tandem sequence duplication (409 and 410 bp), andthe other a triplication consisting of two highly conserved118-bp tandem sequences preceded by a less conserved similarsequence (129 bp). The reasons for the presence of these sequencemultiplications in the gnt to iol region were deduced.  相似文献   

12.
We have determined a 35-kb sequence of the groESL-gutR-cotA(45°–52°) region of the Bacillus subtilis genome.In addition to the groESL, gutRB and cotA genes reported previously,we have newly identified 24 ORFs including gutA and fruC genes,encoding glucitol permease and fructokinase, respectively. Theinherent restriction/modification system genes, hsdMR and hsdMM,were mapped between groESL and gutRB, and we have identifiedtwo open reading frames (ORFs) encoding 5-methylcytosine formingDNA methyl transferase and an operon probably encoding a restrictionenzyme complex. The unusual genome structure of few ORFs andlower GC content around the restriction/modification genes stronglysuggests that the region originated from a bacteriophage integratedduring evolution.  相似文献   

13.
14.
As part of the Bacillus subtilis genome sequencing project,we determined the complete nucleotide sequence of an 8000-bpfragment downstream of the sspC gene (184°) of the B. subtilis168 chromosome. The sequence analysis shows that the sspC geneis located inside of the SPß region, which differsfrom the current genetic map of B. subtilis 168. This regioncontains 12 putative ORFs (yojQ through yojZ and sspC). A homologysearch for the deduced products of the ORFs shows signi.cantsimilarities to enzymes involved in deoxyribonucleotide metabolism:ribonucleotide reductase (Nrd) E, NrdF, thioredoxinand dUTPase.Interestingly, this DNA fragment includes two split genes, yojPcontaining conserved motifs of an intein and yojQ and yojS withan 808-bp intervening sequence for a putative intron structure.In addition, the yojR gene includes a putative new DNA replicationterminator.  相似文献   

15.
Sixteen Pl and TAC clones assigned to Arabidopsis thaliana chromosome5 were sequenced, and their sequence features were analyzedusing various computer programs. The total length of the sequencesdetermined was 1,013,767 bp. Together with the nucleotide sequencesof 109 clones previously reported, the regions of chromosome5 sequenced so far now total 9,072,622 bp, which presumablycovers approximately one-third of the chromosome. A similaritysearch against the reported gene sequences predicted the presenceof a total of 225 protein-coding genes and/or gene segmentsin the newly sequenced regions, indicating an average gene densityof one gene per 4.5 kb. Introns were identified in 72.4% ofthe potential protein genes for which the entire gene structurewas predicted, and the average number per gene and the averagelength of the introns were 3.3 and 163 bp, respectively. Thesesequence features are essentially identical to those in thepreviously reported sequences. The sequence data and gene informationare available on the World Wide Web database KAOS (Kazusa Arabidopsisdata Opening Site) at http://www.kazusa.or.jp/arabi/.  相似文献   

16.
17.
Chorea-acanthocytosis (CHAC) (OMIM 200150) is a rare neurological syndrome characterized by neurodegeneration in combination with morphologically abnormal red cells (acanthocytosis). A partial yeast artificial chromosome contig of the CHAC critical region on chromosome 9q21 has been constructed, and 21 expressed sequence tags have been mapped. We have subsequently cloned Galpha14, a member of the G-protein alpha-subunit multigene family, and have identified Galphaq in the contig. The genomic structure of both genes has been established after construction of a bacterial artificial chromosome contig that showed Galphaq and Galpha14 to be in a head-to-tail arrangement (Cen-Galphaq-Galpha14-qter). Northern analysis found Galphaq to be ubiquitously expressed and Galpha14 to display a more restricted pattern of expression. Mutation analysis of the coding regions and splice sites for Galphaq and Galpha14 in 10 affected individuals from different families identified no changes likely to cause disease; however, two distinct single nucleotide polymorphisms in the coding region of Galpha14 have been identified. This study has excluded two plausible candidate genes from involvement in CHAC and has provided a solid platform for a positional cloning initiative.  相似文献   

18.
Previous analyses of Saccharomyces cerevisiae chromosome I have suggested that the majority (greater than 75%) of single-copy essential genes on this chromosome are difficult or impossible to identify using temperature-sensitive (Ts-) lethal mutations. To investigate whether this situation reflects intrinsic difficulties in generating temperature-sensitive proteins or constraints on mutagenesis in yeast, we subjected three cloned essential genes from chromosome I to mutagenesis in an Escherichia coli mutator strain and screened for Ts- lethal mutations in yeast using the "plasmid-shuffle" technique. We failed to obtain Ts- lethal mutations in two of the genes (FUN12 and FUN20), while the third gene yielded such mutations, but only at a low frequency. DNA sequence analysis of these mutant alleles and of the corresponding wild-type region revealed that each mutation was a single substitution not in the previously identified gene FUN19, but in the adjacent, newly identified essential gene FUN53. FUN19 itself proved to be non-essential. These results suggest that many essential proteins encoded by genes on chromosome I cannot be rendered thermolabile by single mutations. However, the results obtained with FUN53 suggest that there may also be significant constraints on mutagenesis in yeast. The 5046 base-pair interval sequenced contains the complete FUN19, FUN53 and FUN20 coding regions, as well as a portion of the adjacent non-essential FUN21 coding region. In all, 68 to 75% of this interval is open reading frame. None of the four predicted products shows significant homologies to known proteins in the available databases.  相似文献   

19.
Acetyl coenzyme A (CoA) synthetase (ADP forming) (ACD) represents a novel enzyme of acetate formation and energy conservation (acetyl-CoA + ADP + P(i) right harpoon over left harpoon acetate + ATP + CoA) in Archaea and eukaryotic protists. The only characterized ACD in archaea, two isoenzymes from the hyperthermophile Pyrococcus furiosus, constitute 145-kDa heterotetramers (alpha(2), beta(2)). The coding genes for the alpha and beta subunits are located at different sites in the P. furiosus chromosome. Based on significant sequence similarity of the P. furiosus genes, five open reading frames (ORFs) encoding putative ACD were identified in the genome of the hyperthermophilic sulfate-reducing archaeon Archaeoglobus fulgidus and one ORF was identified in the hyperthermophilic methanogen Methanococcus jannaschii. The ORFs constitute fusions of the homologous P. furiosus genes encoding the alpha and beta subunits. Two ORFs, AF1211 and AF1938, of A. fulgidus and ORF MJ0590 of M. jannaschii were cloned and functionally overexpressed in Escherichia coli. The purified recombinant proteins were characterized as distinctive isoenzymes of ACD with different substrate specificities. In contrast to the Pyrococcus ACD, the ACDs of Archaeoglobus and Methanococcus constitute homodimers of about 140 kDa composed of two identical 70-kDa subunits, which represent fusions of the homologous P. furiosus alpha and beta subunits in an alphabeta (AF1211 and MJ0590) or betaalpha (AF1938) orientation. The data indicate that A. fulgidus and M. jannaschii contains a novel type of ADP-forming acetyl-CoA synthetase in Archaea, in which the subunit polypeptides and their coding genes are fused.  相似文献   

20.
The nucleotide sequence of 42 775 bp of the vir-region from the Agrobacterium tumefaciens octopine Ti plasmid pTi15955 is reported here. Although the nucleotide sequences of several parts of this region from this or closely related plasmids have been published previously, the present work establishes for the first time the complete arrangement of all the essential virulence genes and their intergenic regions of an octopine Ti plasmid. The disruption of some of the intergenic areas by insertion (IS) elements is typical for the octopine Ti plasmids. Several new ORFs were identified, including ORFs immediately downstream of virD4 and virE2, which probably represent new genes involved in virulence.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号