首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This study reports the analysis of codon usage in 35 complete Homo sapiens genes. Both codon frequency and inter-codon interference exhibit patterns of evolutionary interest. There is a significant positive correlation between the frequency with which a given codon is used and the frequency with which its complement is used. Since the frequency of appearance of the complementary codon on the coding strand is equal to the frequency of appearance of the original codon on the non-coding strand, in the same phase, the non-coding strand is found to resemble the coding strand in triplet composition. The same effect has been observed in Escherichia coli. This preference for the use of certain complementary triplets as codons suggests that the evolution of the use of the genetic code depended to some extent upon the double-stranded nature of the coding material. In addition, the effect of discrimination against the use of two dinucleotides, CpG and UpA, is observed in codon usage and also in adjacent codon interference. Codons beginning with G, or A, are unlikely to be preceded by codons ending in C, or U, respectively. Consideration of codon assignment in the genetic code together with the observed CpG infrequency suggests that the evolution of the code may have been influenced by conditions in which the use of CpG dinucleotides was unfavorable. The infrequent use of UpA dinucleotides can be explained as the result of frameshift mutation during gene evolution.  相似文献   

2.
76种细菌DNA双链碱基使用频率的比较及其意义   总被引:1,自引:0,他引:1  
应用生物信息学方法,对已完成测序的76种细菌基因组进行比较,分析细菌基因组中编码区及密码子上碱基使用频率情况,结果显示:1.先导链与滞后链上在编码区的碱基使用频率无明显差异且显著正相关;2.先地链与滞后链在第一,第二,第三密码子碱基使用频率基本一致且显著正相关,结果表明,选择压力及自然突变对DNA双链总体碱基分布的影响相等。  相似文献   

3.
We find a region in the non-coding part of bacteriophage lambda genome that codes for the conserved fold which repressors and other proteins use for specific DNA binding. The region is involved in a long open reading frame exceeding one kilobase and is read in the same frame as gene A in the opposite strand. The putative translation product of this open reading frame has a highly ordered secondary structure with a predominance of alpha helices, which is typical of repressors. In addition, codon usage in this frame suggests a protein-coding region. However, there is a TGA stop codon located between the putative gene start point and the region coding for the DNA binding fold. It thus appears that bacteriophage lambda had one more DNA binding protein, perhaps repressor, in the past that was inactivated by a mutation.  相似文献   

4.
Long Open Reading Frames (ORFs) in antisense DNA strands have been reported in the literature as being rare events. However, an extensive analysis of the GenBank database revealed that a substantial number of genes from several species contain an in-phase ORF in the antisense strand, that overlaps entirely the coding sequence of the sense strand, or even extends beyond. The findings described in this paper show that this is a frequent, non-random phenomenon, which is primarily dependent on codon usage, and to a lesser extent on gene size and GC content. Examination of the sequence database for several prokaryotic and eukaryotic organisms, demonstrates that coding sequences with in-phase, 100% overlapping antisense ORFs are present in every genome studied so far.  相似文献   

5.
I have examined potential determinants of the asymmetric distribution of nucleotide sequences in the genome of Escherichia coli as cataloged in GenBank release 44. I have used the frequency of occurrence of all possible tetranucleotides in a given sequence catalog or derivative as a comparative measure of asymmetry. The GenBank-cataloged strand and its complement show statistically similar (not complementary) distributions. The distribution is statistically similar in comparisons between the protein coding subset and the total genome, the coding subset and selected non-coding genes, the coding subset and the remainder of the DNA, and the coding subset and stable RNA sequences. I have compared the distribution in the genome of E. coli with the distributions found in the cataloged genomes of Salmonella typhimurium, Bacillus subtilis, and of coliphages lambda and T7. The distribution summed in both strands of the cataloged DNA differs statistically only in comparisons with lytic bacteriophage T7 because only the two strands of T7 show statistically dissimilar distributions. Despite similarities in tetranucleotide distribution, the pattern of codon complementarity in B. subtilis is different than that documented for E. coli. Thus, sequence asymmetry does not seem related to specific DNA function or to documented similarities or differences in codon bias. The sequence asymmetry of the E. coli genome may thus reflect a hitherto unsuspected pattern impressed on both strands of DNA which is or can be packaged into bacterial genomes.  相似文献   

6.
Does the 'non-coding' strand code?   总被引:3,自引:2,他引:1       下载免费PDF全文
The hypothesis that DNA strands complementary to the coding strand contain in phase coding sequences has been investigated. Statistical analysis of the 50 genes of bacteriophage T7 shows no significant correlation between patterns of codon usage on the coding and non-coding strands. In Bacillus and yeast genes the correlation observed is not different from that expected with random synonymous codon usage, while a high correlation seen in 52 E. coli genes can be explained in terms of an excess of RNY codons. A deficiency of UUA, CUA and UCA codons (complementary to termination) seems to be restricted to the E. coli genes, and may be due to low abundance of the relevant cognate tRNA species. Thus the analysis shows that the non-coding strand has the properties expected of a sequence complementary to a coding strand, with no indications that it encodes, or may have encoded, proteins.  相似文献   

7.
The human genomic H-ras proto-oncogene was inserted into an Epstein-Barr virus (EBV) vector (p220.2) that replicates synchronously with the cell cycle. Unique restriction enzyme sites, 30 bp apart, were created on either side of codon 12 to enable the construction of gapped heteroduplex (GHD) DNA. Depending upon experimental protocol, the gap could be located either on the coding (non-transcribed) strand or the non-coding (transcribed) strand. GHD DNA was created using a 1.8 kb segment of H-ras DNA containing exon 1, into which a synthetic 30 nucleotide oligomer containing a strand- and site-specific mismatched nucleotide was annealed. The 1.8 kb segment of H-ras DNA containing a codon 12; middle G:T, A:C or T:C mismatch has been religated with high efficiency into the EBV vector and transfected into NIH 3T3 cells using a mild liposome-mediated protocol. Subsequent hygromycin resistant NIH 3T3 colonies have been PCR amplified and sequenced. In this study, codon 12; middle nucleotide mismatch correction rates to wild-type G:C during replication in NIH 3T3 cells were 96.4% of G:T mismatches, 87.5% of A:C mismatches and 67% of T:C mismatches.  相似文献   

8.
A multiply damaged site (MDS) is defined as > or =2 lesions within a distance of 10-15 base pairs (bp). MDS generated by ionizing radiation contain oxidative base damage, and in vitro studies have indicated that if the base damage is <3bp apart, repair of one lesion is inhibited until repair of the lesion in the opposite strand is completed. Inhibition of repair could result in an increase in the mutation frequency of the base damage. We have designed an assay to determine whether a closely opposed lesion causes an increase in adenine insertion opposite an 8-oxodG in bacteria. We have positioned the MDS (an 8-oxodG in the transcribed strand and a second 8-oxodG immediately 5' to this lesion in the non-transcribed strand) within the firefly luciferase coding region. During two rounds of replication, insertion of adenine opposite the 8-oxodG in the transcribed (T) or non-transcribed (NT) strand results in a translation termination codon at position 444 or 445, respectively. The truncated luciferase protein is inactive. We have generated double-stranded oligonucleotides that contain no damage, each single 8-oxodG or the MDS. Each double-stranded molecule was ligated into the reporter vector and the ligation products transformed into wild-type or Mut Y-deficient bacteria. The plasmid DNA was isolated and sequenced from colonies that did not express luciferase activity. In wild-type bacteria, we detected a translation stop at a frequency of 0.15% (codon 444) and 0.09% (codon 445) with a single 8-oxodG in the T or NT strand, respectively. This was enhanced approximately 3-fold when single lesions were replicated in Mut Y-deficient bacteria. Positioning an 8-oxodG in the T strand within the MDS enhanced the mutation frequency by approximately 2-fold in wild-type bacteria and 8-fold in Mut Y-deficient bacteria, while the mutation frequency of the 8-oxodG in the NT strand increased by 6-fold in Mut Y-deficient bacteria. This enhancement of mutation frequency supports the in vitro MDS studies, which demonstrated the inability of base excision repair to completely repair closely opposed lesions.  相似文献   

9.
10.
The structure of the rye chloroplast DNA, which contains psbC gene coding for 43-kDa chlorophyll(a)-binding subunit of photosystem II, is determined. The sequence of trnS (UGA) gene encoding tRNA Ser is located at a distance of 140 bp downstream from the stop codon of psbC gene on the opposite DNA strand. The 5'-terminal part of psbC gene, like in other plants, overlaps by 50 bp the 3'-terminal region of psbD gene coding for D2 protein of photosystem II. The amino acid sequence of the psbC gene product reveals common features with the structure of the psbB gene product (CPa-1 protein). The structural similarity of these two proteins seems to reflect their similar functions.  相似文献   

11.
Summary It has been shown that codons coding for strongly hydrophilic amino acids are complemented by codons that code for strongly hydrophobic ones, leading to a hypothesis stating that peptides thus encoded should interact. Though the principle has been validated in a number of experimental models, its general applicability has been questioned. I have discussed this principle, showing that the correlation between coding and noncoding strand amino acids was maintained, indeed slightly improved, when weighted averages based on codon usage tables were used to determine noncoding strand amino acid hydropathies. The coding capacity of the noncoding strand and its content of open reading frames were also discussed. Another point of contention that was afforded further clarification is the chemical plausibility of interactions between hydrophobic and hydrophilic amino acids implicit in this concept. The extension of complementary domains was also dealt with. Finally, I have discussed what I called the evolutionary drift of primary structure, and I showed as an example that though nucleotide sequences coding for the substance K receptor bear little resemblance to the inverse complement of that which codes for the SK peptide, a peptide spanning residues 130–139 is hydropathically very similar to that predicted from such an inverse complement.  相似文献   

12.
It is shown that synonymous codon usage is less biased in favor of those codons preferred by highly expressed genes at the end ofEscherichia coli genes than in the middle. This appears to be due to the close proximity of manyE. coli genes. It is shown that a substantial number of genes overlap either the Shine-Dalgarno sequence or the coding sequence of the next gene on the chromosome and that the codons that overlap have lower synonymous codon bias than those which do not. It is also shown that there is an increase in the frequency of A-ending codons, and a decrease in the frequency of G-ending codons at the end ofE. coli genes that lie close to another gene. It is suggested that these trends in composition could be associated with selection against the formation of mRNA secondary structure near the start of the next gene on the chromosome. Stop codon use is also affected by the close proximity of genes; many genes are forced to use TGA and TAG stop codons because they terminate either within the Shine-Dalgarno or coding sequence of the next gene on the chromosome. The implications these results have for the evolution of synonymous codon use are discussed.  相似文献   

13.
Human spermidine synthase: cloning and primary structure   总被引:1,自引:0,他引:1  
Using a synthetic deoxyoligonucleotide mixture constructed for a tryptic peptide of the bovine enzyme as a probe, cDNA coding for the full-length subunit of spermidine synthase was isolated from a human decidual cDNA library constructed on phage lambda gt11. After subcloning into the Eco RI site of pBR322 and propagation, both strands of the insert were sequenced using a shotgun strategy. Starting from the first start codon, which was immediately preceded by a GC-rich region including four overlapping CCGCC consensus sequences, an open reading frame for a 302-amino-acid polypeptide was resolved. This peptide had an Mr of 33,827, started with methionine, and ended with serine. The identity of the isolated cDNA was confirmed by comparison of the deduced amino acid sequence with resolved sequences of the tryptic peptides of bovine spermidine synthase. The coding strand of the cDNA revealed no special regulatory or ribosome-binding signals within 82 nucleotides preceding the start codon and no polyadenylation signal within 247 nucleotides following the stop codon. The coding region, containing a 13-nucleotide repeat close to the 5' end, was longer than, and very different from, that of the bacterial counterpart. This region seems to be of retroviral origin and shows marked homology with sequences found in a variety of human, mammalian, avian, and viral genes and mRNAs. By computer analysis, the first 200 nucleotides of the 5' end of the coding strand appear able to form a very stable secondary structure with a free energy change of -157.6 kcal/mole.(ABSTRACT TRUNCATED AT 250 WORDS)  相似文献   

14.
The first 14 exons of the APC gene have been screened by the denaturation gradient gel electrophoresis method in 160 unrelated patients with familial adenomatous polyposis coli (APC) syndrome. Four polymorphic variants corresponding to silent mutations not associated with the disease phenotype were observed. Mutations predicted to alter the coding property of the APC gene were observed in 26 patients. All these mutations are expected to lead either to aberrant splicing, to synthesis of a truncated APC protein because of the emergence of a stop codon, or to a change in the translation reading frame. Single-base-pair substitutions were observed on 21 occasions. The most frequent mutation (eight cases) was a C-to-T change which exclusively occurred on the nontranscribed strand within a CG dinucleotide.  相似文献   

15.
The question whether the noncoding DNA strand had or still has the capability for encoding functional polypeptides has been addressed in several articles. The theoretical background of the views advocating this idea arose from two groups of findings. One of them was based on various observations implying that the genetic code was adapted for double-strand coding. The other group of theories arose from the observation of gene-length overlapping open reading frames (O-ORFs) on the antisense DNA strand in a number of genes. In fact, the above theories, which I term selectionist, conceive a novel conception of gene evolution, proposing that new genes can be created by the utilization of antisense DNA strand. In contrast, neutralist theory claims that the O-ORFs are mere by-products of evolutionary processes acting to create special codon usage and base distribution patterns in the coding sequences. Received: 16 June 2000 / Accepted: 31 August 2000  相似文献   

16.
J L Weber 《Gene》1987,52(1):103-109
The genome of the human malaria parasite Plasmodium falciparum has an A + T content of about 82%, higher than any other organism whose DNA has been characterized. Computer analysis of 36 kb of available nucleotide sequences from this species showed that the coding regions, with an A + T content of 69.0%, are flanked by more A + T-rich regions of 86.0% A + T. Within the coding sequences, the A/T ratio was 1.68 in the mRNA sense strand, and overall A + T content in the three codon positions increased in the order 1st-2nd-3rd position. Codons with T or especially A in the third position were strongly preferred. Codon usage among individual parasite genes was very similar compared to genes from other species. Dinucleotide frequencies for the parasite DNA were close to those expected for a random sequence with the known base composition, except that the CpG frequency in the coding sequences was low.  相似文献   

17.
Biased usage of synonymous codons has been elucidated under the perspective of cellular tRNA abundance for quite a long time now. Taking advantage of publicly available gene expression data for Saccharomyces cerevisiae, a systematic analysis of the codon and amino acid usages in two different coding regions corresponding to the regular (helix and strand) as well as the irregular (coil) protein secondary structures, have been performed. Our analyses suggest that apart from tRNA abundance, mRNA folding stability is another major evolutionary force in shaping the codon and amino acid usage differences between the highly and lowly expressed genes in S. cerevisiae genome and surprisingly it depends on the coding regions corresponding to the secondary structures of the encoded proteins. This is obviously a new paradigm in understanding the codon usage in S. cerevisiae. Differential amino acid usage between highly and lowly expressed genes in the regions coding for the irregular protein secondary structure in S. cerevisiae is expounded by the stability of the mRNA folded structure. Irrespective of the protein secondary structural type, the highly expressed genes always tend to encode cheaper amino acids in order to reduce the overall biosynthetic cost of production of the corresponding protein. This study supports the hypothesis that the tRNA abundance is a consequence of and not a reason for the biased usage of amino acid between highly and lowly expressed genes.  相似文献   

18.
We have written a computer program, BIGPROBE, which facilitates the design of long nucleic acid probes from the partial or complete amino acid sequence of a protein. BIGPROBE relies upon information on codon usage, intercodon dinucleotide frequency, and potential probe self-complementarity. We have examined the accuracy with which the program predicts coding sequences using sample human and rat genes and probe lengths of 30-60 nucleotides. Rat probe sequences selected by BIGPROBE using either codon usage or dinucleotide frequency data alone averaged 86-92% homology with the known exons of the corresponding gene sequences. Predictive accuracy with rat gene probes could be improved to 89-94%, depending upon probe length, by applying codon usage and dinucleotide frequency data in combination. Similar accuracy was achieved for human genes.  相似文献   

19.
Our previous work applied neural network techniques to the problem of discriminating open reading frame (ORF) sequences taken from introns versus exons. The method counted the codon frequencies in an ORF of a specified length, and then used this codon frequency representation of DNA fragments to train a neural net (essentially a Perceptron with a sigmoidal, or "soft step function", output) to perform this discrimination. After training, the network was then applied to a disjoint "predict" set of data to assess accuracy. The resulting accuracy in our previous work was 98.4%, exceeding accuracies reported in the literature at that time for other algorithms. Here, we report even higher accuracies stemming from calculations of mutual information (a correlation measure) of spatially separated codons in exons, and in introns. Significant mutual information exists in exons, but not in introns, between adjacent codons. This suggests that dicodon frequencies of adjacent codons are important for intron/exon discrimination. We report that accuracies obtained using a neural net trained on the frequency of dicodons is significantly higher at smaller fragment lengths than even our original results using codon frequencies, which were already higher than simple statistical methods that also used codon frequencies. We also report accuracies obtained from including codon and dicodon statistics in all six reading frames, i.e. the three frames on the original and complement strand. Inclusion of six-frame statistics increases the accuracy still further. We also compare these neural net results to a Bayesian statistical prediction method that assumes independent codon frequencies in each position. The performance of the Bayesian scheme is poorer than any of the neural based schemes, however many methods reported in the literature either explicitly, or implicitly, use this method. Specifically, Bayesian prediction schemes based on codon frequencies achieve 90.9% accuracy on 90 codon ORFs, while our best neural net scheme reaches 99.4% accuracy on 60 codon ORFs. "Accuracy" is defined as the average of the exon and intron sensitivities. Achievement of sufficiently high accuracies on short fragment lengths can be useful in providing a computational means of finding coding regions in unannotated DNA sequences such as those arising from the mega-base sequencing efforts of the Human Genome Project. We caution that the high accuracies reported here do not represent a complete solution to the problem of identifying exons in "raw" base sequences. The accuracies are considerably lower from exons of small length, although still higher than accuracies reported in the literature for other methods. Short exon lengths are not uncommon.(ABSTRACT TRUNCATED AT 400 WORDS)  相似文献   

20.
We have previously constructed and selected six recombinant plasmids containing cDNA sequences specific for different ribosomal proteins of Xenopus laevis (Bozzoni et al., 1981). DNA cloned in these plasmids have been isolated and sequenced. Amino acid sequences of the corresponding portions of the proteins have been derived from DNA sequences; they are arginine- and lysine-rich as expected for ribosomal proteins. One of the cDNA sequences has an open reading frame also on the strand complementary to the one coding for the ribosomal protein; this fragment has inverted repeats twenty nucleotides long at the two ends. The codon usage for the six sequences appears to be non-random with some differences among the ribosomal proteins analysed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号