首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Synonymous codon usage patterns of bacteriophage and host genomes were compared. Two indexes, G + C base composition of a gene (fgc) and fraction of translationally optimal codons of the gene (fop), were used in the comparison. Synonymous codon usage data of all the coding sequences on a genome are represented as a cloud of points in the plane of fop vs. fgc. The Escherichia coli coding sequences appear to exhibit two phases, "rising" and "flat" phases. Genes that are essential for survival and are thought to be native are located in the flat phase, while foreign-type genes from prophages and transposons are found in the rising phase with a slope of nearly unity in the fgc vs. fop plot. Synonymous codon distribution patterns of genes from temperate phages P4, P2, N15 and lambda are similar to the pattern of E. coli rising phase genes. In contrast, genes from the virulent phage T7 or T4, for which a phage-encoded DNA polymerase is identified, fall in a linear curve with a slope of nearly zero in the fop vs. fgc plane. These results may suggest that the G + C contents for T7, T4 and E. coli flat phase genes are subject to the directional mutation pressure and are determined by the DNA polymerase used in the replication. There is significant variation in the fop values of the phage genes, suggesting an adjustment to gene expression level. Similar analyses of codon distribution patterns were carried out for Haemophilus influenzae, Bacillus subtilis, Mycobacterium tuberculosis and their phages with complete genomic sequences available.  相似文献   

2.
Asymmetric substitution patterns in the two DNA strands of bacteria   总被引:35,自引:10,他引:25  
  相似文献   

3.
Oligonucleotide and codon frequencies have been determined in published sequences of E. coli DNA totaling 103,100bp with 18,459 reading frame trinucleotides; corresponding to 2.5% of the total genome. Dinucleotide frequencies are in excellent agreement with those determined by nearest neighbor chemical analysis, indicating the computer count of a limited sampling to be a good representation of the overall frequencies in total genomic DNA. The distinctive nonrandom codon pattern is found to be uniformly distributed and contributes to a distinctive nonrandom oligonucleotide pattern; enabling correlations between frequency levels to be extended beyond reading frame sequences. Correlation analysis indicates a surprisingly high degree of correlation everywhere in the genome. Coefficients of correlation between oligonucleotide frequencies overall and those in specific segments vary as follows: primary strands of individual coding sequences greater than 0.9 greater than lambda DNA greater than noncoding, non-RNA greater than phi X174 DNA greater than complementary strands greater than RNA genes congruent to 0.6 greater than transposon-insertion elements greater than T7DNA much greater than eukaryotic sequences congruent to 0. It is concluded that this high degree of oligonucleotide and codon correspondence in E. coli reflects the widespread distribution of remnants of an early and slowly changing codon pattern that has been continually dispersed by duplication-divergence processes, leading to the present genome.  相似文献   

4.
In the present study, we developed a method for detecting sequences whose similarity to a target sequence is statistically significant and we examined the distribution of these sequences in the E. coli K-12 genome. Target sequences examined are as follows: (i) short repeat: Crossover hot-spot instigator (Chi) sequence, replication termination (Ter) sequence, and DnaA binding sequence (DnaA box); (ii) potential stem-loop structure repeats: palindromic unit (PU), boxC sequences, and intergenic repeat unit (IRU); (iii) potential RNA coding repeats: rRNAs, PAIR, TRIP, and QUAD; and (iv) potential protein coding repeats: insertion elements (ISs) and Long Direct Repeats (LDRs). We also examined the distribution of these sequences on leading and lagging strands. We obtained another four statistically significant LDR sequences with more than 187 bp matched to LDR-A near the LDR loci, suggesting that these regions might be used as high recombination hot spots for LDR. Adaptation of individual LDRs to E. coli genome is also discussed on the basis of codon usage.  相似文献   

5.
Does the 'non-coding' strand code?   总被引:3,自引:2,他引:1       下载免费PDF全文
The hypothesis that DNA strands complementary to the coding strand contain in phase coding sequences has been investigated. Statistical analysis of the 50 genes of bacteriophage T7 shows no significant correlation between patterns of codon usage on the coding and non-coding strands. In Bacillus and yeast genes the correlation observed is not different from that expected with random synonymous codon usage, while a high correlation seen in 52 E. coli genes can be explained in terms of an excess of RNY codons. A deficiency of UUA, CUA and UCA codons (complementary to termination) seems to be restricted to the E. coli genes, and may be due to low abundance of the relevant cognate tRNA species. Thus the analysis shows that the non-coding strand has the properties expected of a sequence complementary to a coding strand, with no indications that it encodes, or may have encoded, proteins.  相似文献   

6.
Simple sequence repeats (SSRs) are omnipresent in prokaryotes and eukaryotes, and are found anywhere in the genome in both protein encoding and noncoding regions. In present study the whole genome sequences of seven chromosomes (Shigella flexneri 2a str301 and 2457T, Shigella sonnei, Escherichia coli k12, Mycobacterium tuberculosis, Mycobacterium leprae and Staphylococcus saprophyticus) have downloaded from the GenBank database for identifying abundance, distribution and composition of SSRs and also to determine difference between the tandem repeats in real genome and randomness genome (using sequence shuffling tool) of the organisms included in this study. The data obtained in the present study show that: (i) tandem repeats are widely distributed throughout the genomes; (ii) SSRs are differentially distributed among coding and noncoding regions in investigated Shigella genomes; (iii) total frequency of SSRs in noncoding regions are higher than coding regions; (iv) in all investigated chromosomes ratio of Trinucleotide SSRs in real genomes are much higher than randomness genomes and Di nucleotide SSRs are lower; (v) Ratio of total and mononucleotide SSRs in real genome is higher than randomness genomes in E. coli K12, S. flexneri str 301 and S. saprophyticus, while it is lower in S. flexneri str 2457T, S.sonnei and M. tuberculosis and it is approximately same in M. leprae; (vi) frequency of codon repetitions are vary considerably depending on the type of encoded amino acids.  相似文献   

7.
The contribution of slippage-like processes to genome evolution   总被引:19,自引:0,他引:19  
Simple sequences present in long (>30 kb) sequences representative of the single-copy genome of five species (Homo sapiens, Caenorhabditis elegans Saccharomyces cerevisiae, E. coli, and Mycobacterium leprae) have been analyzed. A close relationship was observed between genome size and the overall level of sequence repetition. This suggested that the incorporation of simple sequences had accompanied increases of genome size during evolution. Densities of simple sequence motifs were higher in noncoding regions than in coding regions in eukaryotes but not in eubacteria. All five genomes showed very biased frequency distributions of simple sequence motifs in all species, particularly in eukaryotes where AAA and TTT predominated. Interspecific comparisons showed that noncoding sequences in eukaryotes showed highly significantly similar frequency distributions of simple sequence motifs but this was not true of coding sequences. ANOVA of the frequency distributions of simple sequence motifs indicated strong contributions from motif base composition and repeat unit length, but much of the variation remained unexplained by these parameters. The sequence composition of simple sequences therefore appears to reflect both underlying sequence biases in slippage-like processes and the action of selection. Frequency distributions of simple sequence motifs in coding sequences correlated weakly or not at all with those in noncoding sequences. Selection on coding sequences to eliminate undesirable sequences may therefore have been strong, particularly in the human lineage.  相似文献   

8.
9.
10.
11.
Bacillus subtilis dnaE encodes a protein essential for DNA replication and is tightly linked to rpoD, the gene for the major sigma factor of RNA polymerase. We have now determined the 1809-base pair sequence of the dnaE coding region, which precedes rpoD and is transcribed in the same counterclockwise direction on the chromosome. From the DNA sequence, we found that the dnaE protein comprised 603 amino acids with a calculated molecular mass of 68,428 daltons. This protein had significant and extensive regions of homology with Escherichia coli DNA primase, the polymerase that synthesizes short RNA primers during discontinuous DNA replication. Features of the coding and flanking regions that may modulate dnaE expression include a relatively weak ribosomal binding site (delta G' = -13.8 kcal), the use of uncommon codons in the reading frame, and no obvious promoter sequence for either dnaE or rpoD. Together, these results suggest that dnaE codes for B. subtilis DNA primase and, in light of the similarities to the organization of the E. coli sigma operon, that expression of dnaE may be coregulated with rpoD in B. subtilis.  相似文献   

12.
A segment of Bacillus subtilis chromosomal DNA homologous to the Escherichia coli spc ribosomal protein operon was isolated using cloned E. coli rplE (L5) DNA as a hybridization probe. DNA sequence analysis of the B. subtilis cloned DNA indicated a high degree of conservation of spc operon ribosomal protein genes between B. subtilis and E. coli. This fragment contains DNA homologous to the promoter-proximal region of the spc operon, including coding sequences for ribosomal proteins L14, L24, L5, S14, and part of S8; the organization of B. subtilis genes in this region is identical to that found in E. coli. A region homologous to the E. coli L16, L29 and S17 genes, the last genes of the S10 operon, was located upstream from the gene for L14, the first gene in the spc operon. Although the ribosomal protein coding sequences showed 40-60% amino acid identity with E. coli sequences, we failed to find sequences which would form a structure resembling the E. coli target site for the S8 translational repressor, located near the beginning of the L5 coding region in E. coli, in this region or elsewhere in the B. subtilis spc DNA.  相似文献   

13.
In this paper, a self-training method is proposed to recognize translation start sites in bacterial genomes without a prior knowledge of rRNA in the genomes concerned. Many features with biological meanings are incorporated, including mononucleotide distribution patterns near the start codon, the start codon itself, the coding potential and the distance from the most-left start codon to the start codon. The proposed method correctly predicts 92% of the translation start sites of 195 experimentally confirmed Escherichia coli CDSs, 96% of 58 reliable Bacillus subtilis CDSs and 82% of 140 reliable Synechocystis CDSs. Moreover, the self-training method presented might also be used to relocate the translation start sites of putative CDSs of genomes, which are predicted by gene-finding programs. After post-processing by the method presented, the improvement of gene start prediction of some gene-finding programs is remarkable, e.g., the accuracy of gene start prediction of Glimmer 2.02 increases from 63 to 91% for 832 E. coli reliable CDSs. An open source computer program to implement the method, GS-Finder, is freely available for academic purposes from http://tubic.tju.edu.cn/GS-Finder/.  相似文献   

14.
C Parsot 《The EMBO journal》1986,5(11):3013-3019
The Bacillus subtilis genes encoding threonine synthase (thrC) and homoserine kinase (thrB) have been cloned via complementation of Escherichia coli thr mutants. Determination of their nucleotide sequences indicates that the thrC stop codon overlaps the thrB start codon; this genetic organization suggests that the two genes belong to the same operon, as in E. coli. However, the gene order is thrC-thrB in B. subtilis whereas it is thrB-thrC in the thr operon of E. coli. This inversion of the thrC and thrB genes between E. coli and B. subtilis is indicative of a possible independent construction of the thr operon in these two organisms. In other respects, comparison of the predicted amino acid sequences of the B. subtilis and E. coli threonine synthases with that of Saccharomyces cerevisiae threonine dehydratase and that of E. coli D-serine dehydratase revealed extensive homologies between these pyridoxal phosphate-dependent enzymes. This sequence homology, which correlates with similarities in the catalytic mechanisms of these enzymes, indicates that these proteins, catalyzing different reactions in different metabolic pathways, may have evolved from a common ancestor.  相似文献   

15.
通过PCR的方法从Bacillus subtilis基因组中克隆了中性植酸酶基因nphy,DNA全序列分析表明其结构基因全长1152个核苷酸(编码383个氨基酸),5′端有一编码26个氨基酸的信号肽序列。去除信号肽编码序列的nphy克隆到大肠杆菌IPTG诱导表达载体pTYB40上,在大肠杆菌中得到了高效表达,表达量达到大肠杆菌可溶性蛋白的40%以上,表达产物具有生物学活性,证实了克隆到的中性植酸酶的基因有正常的生物学功能。  相似文献   

16.
17.
Directional mutation pressure associated with replication processes is the main cause of the asymmetry between the leading and lagging DNA strands in bacterial genomes. On the other hand, the asymmetry between sense and antisense strands of protein coding sequences is a result of both mutation and selection pressures. Thus, there are two different ways of superposition of the sense strand, on the leading or lagging strand. Besides many other implications of these two possible situations, one seems to be very important - because of the asymmetric replication-associated mutation pressure, the mutation rate of genes depends on their location. Using Monte Carlo methods, we have simulated, under experimentally determined directional mutation pressure, the divergence rate and the elimination rate of genes depending on their location in respect to the leading/lagging DNA strands in the asymmetric prokaryotic genome. We have found that the best survival strategy for the majority of genes is to sometimes switch between DNA strands. Paradoxically, this strategy results in higher substitution rates but remains in agreement with observations in bacterial genomes that such inversions are very frequent and divergence rate between homologs lying on different DNA strands is very high.  相似文献   

18.
Bacterial start site prediction.   总被引:5,自引:1,他引:4       下载免费PDF全文
With the growing number of completely sequenced bacterial genes, accurate gene prediction in bacterial genomes remains an important problem. Although the existing tools predict genes in bacterial genomes with high overall accuracy, their ability to pinpoint the translation start site remains unsatisfactory. In this paper, we present a novel approach to bacterial start site prediction that takes into account multiple features of a potential start site, viz., ribosome binding site (RBS) binding energy, distance of the RBS from the start codon, distance from the beginning of the maximal ORF to the start codon, the start codon itself and the coding/non-coding potential around the start site. Mixed integer programing was used to optimize the discriminatory system. The accuracy of this approach is up to 90%, compared to 70%, using the most common tools in fully automated mode (that is, without expert human post-processing of results). The approach is evaluated using Bacillus subtilis, Escherichia coli and Pyrococcus furiosus. These three genomes cover a broad spectrum of bacterial genomes, since B.subtilis is a Gram-positive bacterium, E.coli is a Gram-negative bacterium and P. furiosus is an archaebacterium. A significant problem is generating a set of 'true' start sites for algorithm training, in the absence of experimental work. We found that sequence conservation between P. furiosus and the related Pyrococcus horikoshii clearly delimited the gene start in many cases, providing a sufficient training set.  相似文献   

19.
20.
We present here the use of a new statistical segmentation method on the Bacillus subtilis chromosome sequence. Maximum likelihood parameter estimation of a hidden Markov model, based on the expectation-maximization algorithm, enables one to segment the DNA sequence according to its local composition. This approach is not based on sliding windows; it enables different compositional classes to be separated without prior knowledge of their content, size and localization. We compared these compositional classes, obtained from the sequence, with the annotated DNA physical map, sequence homologies and repeat regions. The first heterogeneity revealed discriminates between the two coding strands and the non-coding regions. Other main heterogeneities arise; some are related to horizontal gene transfer, some to t-enriched composition of hydrophobic protein coding strands, and others to the codon usage fitness of highly expressed genes. Concerning potential and established gene transfers, we found 9 of the 10 known prophages, plus 14 new regions of atypical composition. Some of them are surrounded by repeats, most of their genes have unknown function or possess homology to genes involved in secondary catabolism, metal and antibiotic resistance. Surprisingly, we notice that all of these detected regions are a + t-richer than the host genome, raising the question of their remote sources.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号