首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 359 毫秒
1.
Abstract

Oligonucleotide and codon frequencies have been determined in published sequences of E. coli DNA totaling 103,100bp with 18,459 reading frame trinucleotides; corresponding to 2.5% of the total genome. Dinucleotide frequencies are in excellent agreement with those determined by nearest neighbor chemical analysis, indicating the computer count of a limited sampling to be a good representation of the overall frequencies in total genomic DNA. The distinctive nonrandom codon pattern is found to be uniformly distributed and contributes to a distinctive nonrandom oligonucleotide pattern; enabling correlations between frequency levels to be extended beyond reading frame sequences. Correlation analysis indicates a surprisingly high degree of correlation everywhere in the genome. Coefficients of correlation between oligonucleotide frequencies overall and those in specific segments vary as follows: primary strands of individual coding sequences >0.9> lambda DNA> noncoding, non-RNA>φiX174 DNA> complementary strands> RNA genes ?0.6> transposon-insertion elements> T7DNA? eukaryotic sequences ?0. It is concluded that this high degree of oligonucleotide and codon correspondence in E. coli reflects the widespread distribution of remnants of an early and slowly changing codon pattern that has been continually dispersed by duplication-divergence processes, leading to the present genome.  相似文献   

2.
I have examined potential determinants of the asymmetric distribution of nucleotide sequences in the genome of Escherichia coli as cataloged in GenBank release 44. I have used the frequency of occurrence of all possible tetranucleotides in a given sequence catalog or derivative as a comparative measure of asymmetry. The GenBank-cataloged strand and its complement show statistically similar (not complementary) distributions. The distribution is statistically similar in comparisons between the protein coding subset and the total genome, the coding subset and selected non-coding genes, the coding subset and the remainder of the DNA, and the coding subset and stable RNA sequences. I have compared the distribution in the genome of E. coli with the distributions found in the cataloged genomes of Salmonella typhimurium, Bacillus subtilis, and of coliphages lambda and T7. The distribution summed in both strands of the cataloged DNA differs statistically only in comparisons with lytic bacteriophage T7 because only the two strands of T7 show statistically dissimilar distributions. Despite similarities in tetranucleotide distribution, the pattern of codon complementarity in B. subtilis is different than that documented for E. coli. Thus, sequence asymmetry does not seem related to specific DNA function or to documented similarities or differences in codon bias. The sequence asymmetry of the E. coli genome may thus reflect a hitherto unsuspected pattern impressed on both strands of DNA which is or can be packaged into bacterial genomes.  相似文献   

3.
Uno R  Nakayama Y  Tomita M 《Gene》2006,380(1):30-37
Chi sequences (5'-GCTGGTGG-3') are cis-acting 8 bp sequence elements that enhance homologous recombination promoted by the RecBCD pathway in Escherichia coli. The genome of E. coli K-12 MG1655 contains 1009 Chi sequences and this frequency far exceeds the expected value for occurrence of an 8 bp sequence in a genome of this size. It is generally thought that the over-representation of Chi sequences indicates that they have been selected for during evolution because of their function in recombination. The genes from three E. coli strains (K-12, O157 and CFT) were classified into three categories (island, match to other E. coli, and backbone). Island genes have a different base composition and codon usage in comparison with those in the backbone genes, therefore they were relatively new and not yet adapted to the base composition patterns and codon usage typical of the recipient genome. The over-representation of Chi sequences was examined by comparing Chi frequencies and codon frequencies between island and backbone genes. The difference in the CTGGTG di-codon frequency between the backbone and island genes was correlated with the frequency of Chi sequences which were translated in the Leu-Val (-G/CTG/GTG/G-) reading frame in the K-12 strain. These results suggest that the main reading frame of Chi sequences increased as a result of the di-codon CTG-GTG increasing under a genome-wide pressure for adapting to the codon usage and base composition of the E. coli K-12 strain, and that the RecBCD recombinase might adjust its recognition sequence to a frequently occurring oligomer such as G-CTG-GTG-G.  相似文献   

4.
The frequencies of oligonucleotides of length 3-6 were studied in 211 sequences of human DNA (659 kilobases), 22 sequences of DNA of human viruses (120 kbs), in 181 sequences of E. coli (442 kbs), and in 42 sequences of phages of E. coli (137 kbs). The sequences were obtained from Genbank(R) 48. The observed frequencies (O) were compared to the expected frequencies (E) obtained in two ways: 1) according to nucleotide composition for each series, and 2) according to first order Markow chains for triplets, second order for quadruplets, and third order for quintuplets and sextuplets. The ratio O/E was obtained for each oligonucleotide. Then, the correlation between the ratio O/E in a pair of series was calculated. Strong correlations were observed for sequences of man and human viruses, and for E. coli and its phages. Other correlations were small. For higher order Markov chains, there is indication of some correlation also between viruses and phages. It was concluded that through analysis of parallel oligonucleotide series it may be possible to infer some of the complex evolutionary relationships existing between cells and their infectors beyond the level of codon usage.  相似文献   

5.
Structure and function of the yeast URA3 gene: expression in Escherichia coli   总被引:50,自引:0,他引:50  
M Rose  P Grisafi  D Botstein 《Gene》1984,29(1-2):113-124
  相似文献   

6.
DNA sequences, potentially coding for histidine-rich proteins, were isolated from a P. falciparum genomic library using an oligonucleotide probe consisting of histidine codon repeats. Sequencing revealed that the different DNA fragments contain long repetitive regions very homologous to the probe. One clone was fully sequenced and contains two open reading frames that overlap in the repetitive region but are located on opposite strands. Analysis suggests that both are coding. One frame could code for a small histidine-rich protein, the other for a protein containing many aspartic acid residues. Southern blotting revealed that these sequences are conserved in all three P. falciparum strains studied.  相似文献   

7.
The pattern of segregation of DNA in Escherichia coli K-12 was analyzed by labeling replicating DNA with 5-bromodeoxyuridine followed by differential staining of nucleoids. Three types of visible arrangement were found in four-nucleoid groups derived from a native nucleoid after two replication rounds. Type A, segregation of both old strands toward cell poles, appeared with the highest frequency (0.6 to 0.8). Type B, segregation of one old strand toward the cell pole and the other toward the cell center, was twice as frequent as type C, segregation of both old strands toward the cell center. These results confirm previous data showing that DNA segregation in E. coli is nonrandom while presenting a certain degree of randomness. The proportions of the three indicated types of arrangement suggest a new probabilistic model to explain the observed segregation pattern. It is proposed that DNA strands segregate either nonrandomly, with a probability of between 0 and 1, or randomly. In nonrandom segregation, both old strands are always directed toward cell poles. Experimental data reported here or by other authors fit better with the predictions of this model than with those of other previously proposed proposed deterministic or probabilistic models.  相似文献   

8.
Several statistical methods were tested for accuracy in predicting observed frequencies of di- through hexanucleotides in 74,444 bp of E. coli DNA. A Markov chain was most accurate overall, whereas other methods, including a random model based on mononucleotide frequencies, were very inaccurate. When ranked highest to lowest abundance, the observed frequencies of oligonucleotides up to six bases in length in E. coli DNA were highly asymmetric. All ordered abundance plots had a wide linear range containing the majority of the oligomers which deviated sharply at the high and low ends of the curves. In general, values predicted by a Markov chain closely followed the overall shape of the ordered abundance curves. A simple equation was derived by which the frequency of any nucleotide longer than four bases in the E. coli genome (or any genome) can be relatively accurately estimated from the nested set of component tri- and tetranucleotides by serial application of a 3rd order Markov chain. The equation yielded a mean ratio of 1.03 +/- 0.94 for the observed-to-expected frequencies of the 4,096 hexanucleotides. Hence, the method is a relatively accurate but not perfect predictor of the length in nucleotides between hexanucleotide sites. Higher accuracy can be achieved using a 4th order Markov chain and larger data sets. The high asymmetry in oligonucleotide abundance means that in the E. coli genome of 4.2 X 10(6) bp many relatively short sequences of 7-9 bp are very rare or absent.  相似文献   

9.
As shown in the accompanying paper (5), the oligonucleotide composition of the E. coli genome is highly asymmetric for sequences up to 6 bp in length when ranked from highest to lowest abundance. We show here that this largely reflects codon usage because heavily used codons were found in the highly abundant oligomers whereas rarely used codons, with some exceptions, occurred in sequences in low abundance. Furthermore, linear regression analysis revealed a strong correlation between the frequencies of each trinucleotide and its usage as a codon. Dinucleotides are also not randomly distributed across each codon position and the dinucleotide composition of genes that are transcribed but not translated (rRNA and tRNA genes) was highly related to that seen in genes encoding polypeptides. However, 45 tetra-, 8 penta-, and 6 hexanucleotides were significantly over- or underabundant by Markov chain analysis and could not be accounted for by codon usage. Of these underrepresented sequences, many were palindromes, including the Dam methylation site.  相似文献   

10.
耐辐射奇球菌超氧化物歧化酶基因的克隆与序列分析   总被引:1,自引:0,他引:1  
By using a 453 bp length gene fragment of superoxide dismutase(SOD)as a probe,which was firstly amplified from Deinococcus radiodurans genomic DNA by PCR with degenerate oligonucleotide primers corresponding to the conservative regions of known SODs,a putative SOD gene was identified from the database of D.radiodurans whole genome.Its 636 bp length open reading frame and 5′ and 3′ flanking sequence was determined.The conventional E.coli ribosomal and RNA polymerase binding sites were found upstream from SOD encoding region and an inverted repeat sequence downstream of the termination codon.The deduced 211 amino acid sequence of the structural gene showed a high similarity to other manganese and iron containing SODs in normally conserve regions.  相似文献   

11.
Nucleotide sequences of the cysB region of Salmonella typhimurium and Escherichia coli have been determined and compared. A total of 1759 nucleotides were sequenced in S. typhimurium and 1840 in E. coli. Both contain a 972-nucleotide open reading frame identified as the coding region for the cysB regulatory protein on the basis of sequence homology and by comparison of the deduced amino acid sequences with known physicochemical properties of this protein. The DNA sequence identity for the cysB coding region in the two species is 80.5%. The deduced amino acid sequences are 95% identical. The predicted cysB polypeptide molecular weights are 36,013 for S. typhimurium and 36,150 for E. coli. For both proteins a helix-turn-helix region similar to that found in other DNA-binding proteins is predicted from the deduced amino acid sequence. Sequences upstream to cysB contain open reading frames which represent the carboxyl-terminal end of the topA gene product, DNA topoisomerase I. A pattern of highly conserved nucleotide sequences in the 151 nucleotides immediately preceding the cysB initiator codon in both species suggests that this region may contain multiple signals for the regulation of cysB expression.  相似文献   

12.
13.
Does the 'non-coding' strand code?   总被引:3,自引:2,他引:1       下载免费PDF全文
The hypothesis that DNA strands complementary to the coding strand contain in phase coding sequences has been investigated. Statistical analysis of the 50 genes of bacteriophage T7 shows no significant correlation between patterns of codon usage on the coding and non-coding strands. In Bacillus and yeast genes the correlation observed is not different from that expected with random synonymous codon usage, while a high correlation seen in 52 E. coli genes can be explained in terms of an excess of RNY codons. A deficiency of UUA, CUA and UCA codons (complementary to termination) seems to be restricted to the E. coli genes, and may be due to low abundance of the relevant cognate tRNA species. Thus the analysis shows that the non-coding strand has the properties expected of a sequence complementary to a coding strand, with no indications that it encodes, or may have encoded, proteins.  相似文献   

14.
A Mustafa  L Yuen 《DNA sequence》1991,2(1):39-45
A degenerate oligonucleotide probe corresponding to a highly conserved amino acid sequence in several DNA polymerases was used to locate the DNA polymerase gene in the Choristoneura biennis entomopoxvirus. Southern blot analysis of the entomopoxvirus genome using the degenerate oligonucleotide probe showed specific interaction between the probe and an eight kilobasepair EcoRI fragment from the entomopoxvirus genome. Sequencing this EcoRI fragment revealed an open reading frame 2892 nucleotides in length, capable of encoding a protein about 115 kilodaltons. Homology search of this open reading frame against other proteins indicated a high degree of homology in four distinct regions with DNA polymerases from other organisms. The highest degree of homology (24.9% at the amino acid level) was found between the vaccinia DNA polymerase gene and the entomopoxvirus open reading frame.  相似文献   

15.
16.
Codon usage tables have been produced for E. coli, yeast, human, and mouse. The nonrandom employment of codons allows assignment of probability values to trinucleotides in any DNA sequence. These values represent the probability that a given trinucleotide is used as a codon in the organism from which the table is derived. For the graphical delineation of coding areas in DNA sequences, a probability is assigned to each trinucleotide equal to its frequency in the codon table. Averaging and smoothing procedures then greatly enhance the detectability of areas of high average codon probability and better represent the mean codon probability. These manipulations increase graphical clarity without altering the overall magnitude of probabilities. Averaging introduces an error of less than 0.5% between "raw" and smoothed data. This graphical delineation of coding sequences does not depend on the presence of punctuation, ribosomal binding sites, etc: moreover the delineation of introns and exons is also possible.  相似文献   

17.
Selection Intensity for Codon Bias   总被引:26,自引:7,他引:19       下载免费PDF全文
D. L. Hartl  E. N. Moriyama    S. A. Sawyer 《Genetics》1994,138(1):227-234
The patterns of nonrandom usage of synonymous codons (codon bias) in enteric bacteria were analyzed. Poisson random field (PRF) theory was used to derive the expected distribution of frequencies of nucleotides differing from the ancestral state at aligned sites in a set of DNA sequences. This distribution was applied to synonymous nucleotide polymorphisms and amino acid polymorphisms in the gnd and putP genes of Escherichia coli. For the gnd gene, the average intensity of selection against disfavored synonymous codons was estimated as approximately 7.3 X 10(-9); this value is significantly smaller than the estimated selection intensity against selectively disfavored amino acids in observed polymorphisms (2.0 X 10(-8)), but it is approximately of the same order of magnitude. The selection coefficients for optimal synonymous codons estimated from PRF theory were consistent with independent estimates based on codon usage for threonine and glycine. Across 118 genes in E. coli and Salmonella typhimurium, the distribution of estimated selection coefficients, expressed as multiples of the effective population size, has a mean and standard deviation of 0.5 +/- 0.4. No significant differences were found in the degree of codon bias between conserved positions and replacement positions, suggesting that translational misincorporation is not an important selective constraint among synonymous polymorphic codons in enteric bacteria. However, across the first 100 codons of the genes, conserved amino acids with identical codons have significantly greater codon bias than of either synonymous or nonidentical codons, suggesting that there are unique selective constraints, perhaps including mRNA secondary structures, in this part of the coding region.  相似文献   

18.
The Escherichia coli single-stranded DNA binding (SSB) protein is a non-sequence-specific DNA binding protein that functions as an accessory factor for the RecA protein-promoted three-strand exchange reaction. An open reading frame encoding a protein similar in size and sequence to the E. coli SSB protein has been identified in the Streptococcus pneumoniae genome. The open reading frame has been cloned, an overexpression system has been developed, and the protein has been purified to greater than 99% homogeneity. The purified protein binds to ssDNA in a manner similar to that of the E. coli SSB protein. The protein also stimulates the S. pneumoniae RecA protein and E. coli RecA protein-promoted strand exchange reactions to an extent similar to that observed with the E. coli SSB protein. These results indicate that the protein is the S. pneumoniae analog of the E. coli SSB protein. The availability of highly-purified S. pneumoniae SSB protein will facilitate the study of the molecular mechanisms of RecA protein-mediated transformational recombination in S. pneumoniae.  相似文献   

19.
Codon usage data for 56 Bacillus subtilis genes show that synonymous codon usage in B. subtilis is less biased than in Escherichia coli, or in Saccharomyces cerevisiae. Nevertheless, certain genes with a high codon bias can be identified by correspondence analysis, and also by various indices of codon bias. These genes are very highly expressed, and a general trend (a decrease) in codon bias across genes seems to correspond to decreasing expression level. This, then, may be a general phenomenon in unicellular organisms. The unusually small effect of translational selection on the pattern of codon usage in lowly expressed genes in B. subtilis yields similar dinucleotide frequencies among different codon positions, and on complementary strands. These patterns could arise through selection on DNA structure, but more probably are largely determined by mutation. This prevalence of mutational bias could lead to difficulties in assessing whether open reading frames encode proteins.  相似文献   

20.
K. Nelson  F. S. Wang  E. F. Boyd    R. K. Selander 《Genetics》1997,147(4):1509-1520
The sequence of aceK, which codes for the regulatory catalytic enzyme isocitrate dehydrogenase kinase/phosphatase (IDH K/P), and sequences of the 5' flanking region and part or all of the 3' flanking region were determined for 32 strains of Salmonella enterica and Escherichia coli. In E. coli, the aceK gene was 1734 bp long in 13 strains, but in three strains it was 12 bp shorter and the stop codon was TAA rather than TGA. Strains with the shorter aceK lacked an open reading frame (f728) downstream between aceK and iclR that was present, in variable length, in the other strains. Among the 72 ECOR strains, the truncated aceK gene was present in all isolates of the B2 group and half of those of the D group. Other variant conditions included the presence of IS1 elements in two strains and large deletions in two strains. The aceK-aceA intergenic region varied in length from 48 to 280 bp in E. coli, depending largely on the number of repetitive extragenic palindromic (REP) sequences present. Among the ECOR strains, the number of REP elements showed a high degree of phylogenetic association, and sequencing of the region in the ECOR strains permitted partial reconstruction of its evolutionary history. In S. enterica, the normal length of aceK was 1752 bp, but three other length variants, ranging from 1746 to 1785 bp, were represented in five of the 16 strains examined. The flanking intergenic regions showed relatively minor variation in length and sequence. The occurrence of several nonrandom patterns of distribution of polymorphic synonymous nucleotide sites indicated that intragenic recombination of horizontally exchanged DNA has contributed to the generation of allelic diversity at the aceK locus in both species.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号