首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Coding capacity of complementary DNA strands.   总被引:7,自引:4,他引:3       下载免费PDF全文
A Fortran computer algorithm has been used to analyze the nucleotide sequence of several structural genes. The analysis performed on both coding and complementary DNA strands shows that whereas open reading frames shorter than 100 codons are randomly distributed on both DNA strands, open reading frames longer than 100 codons ("virtual genes") are significantly more frequent on the complementary DNA strand than on the coding one. These "virtual genes" were further investigated by looking at intron sequences, splicing points, signal sequences and by analyzing gene mutations. On the basis of this analysis coding and complementary DNA strands of several eukaryotic structural genes cannot be distinguished. In particular we suggest that the complementary DNA strand of the human epsilon-globin gene might indeed code for a protein.  相似文献   

2.
3.
The bacterial DNA sequence in GenBank database were divided into coding and noncoding regions and examined for the base-trimer distribution in every triplet frame on the sense and antisense strands. The results revealed that for the noncoding region, both strands have very similar base-trimer distributions and have no frame specificity; that is, DNA is symmetric in the noncoding region. For the coding region, on the other hand, the symmetry is broken only in the triplet framework, and we found a special triplet-frame-specific symmetry which appears when the two complementary strands of the coding region are read from their 5 ends. In addition, the following frame specificity was also observed in the distribution of stop codons on the antisense strand of the coding region. When the antisense sequences of the open reading frames (ORFs) in the database are read in the three reading frames, the same reading frame as the corresponding ORF contains a significantly larger amount of long open frames without stop codons (i.e., nonstop frames [NSFs]) than expected, while the number of NSFs in the other two reading frames is similar to that of the expected one. That is, NSFs as well as ORFs are maintained in a frame-specific manner, and in this sense, DNA becomes symmetrical even in the coding region. These two kinds of frame-specific symmetries indicate that only an ORF and its complementary triplets are specifically recognized and maintained in DNA. We suppose that the antisense strands as well as the sense strands in the coding region may be transcribed, thereby producing various kinds of proteins corresponding to NSFs, though their amount may not be large. The presence of these proteins should have some benefits for living organisms, and therefore we propose that these proteins are upcoming enzymes having novel functions.Correspondence to: I. Urabe  相似文献   

4.
M.J. Bibb  P.R. Findlay  M.W. Johnson   《Gene》1984,30(1-3):157-166
Bacterial genes that code for proteins appear to possess a codon usage characteristic of their overall base composition. This results in different but predictable non-random distributions of nucleotides within codons, permitting the recognition of protein-coding sequences in a wide range of bacterial species. The nature of this distribution depends on the base composition of the coding sequence. The position-specific differences are especially conspicuous in genes of extreme G + C content, allowing the particularly reliable prediction of the reading frame and coding strand of experimentally determined DNA sequences. This fmding has been exploited to identify the coding sequence of the viomycin phosphotransferase (vph) gene of Streptomyces vinaceus. An easily applied computer program (“Frame”) has been written to carry out and display such analyses.  相似文献   

5.
6.
7.
The distribution patterns of bases of DNA fragments in different regions in P. aeruginosa genome are analyzed in this paper. It's shown that 5565 protein-coding genes, 17315 non-coding ORFs, and 1104 intergenic sequences are located into seven clusters based on their base frequencies. Almost all the protein-coding genes are contained in one of the seven clusters. The significant difference of base frequencies among three codon positions in high GC genome, which arouse the division between the distribution patterns of bases of six reading frames of protein-coding genes, is responsible for the appearance of the clustering phenomenon. In the light of the clustering phenomenon, the author supposes that the anitisense strand ORFs, particularly those corresponding to Frame 2' and Frame 3', may not code for proteins in P. aeruginosa genome.  相似文献   

8.
Abstract

The distribution patterns of bases of DNA fragments in different regions in P. aeruginosa genome are analyzed in this paper. It's shown that 5565 protein-coding genes, 17315 non- coding ORFs, and 1104 intergenic sequences are located into seven clusters based on their base frequencies. Almost all the protein-coding genes are contained in one of the seven clusters. The significant difference of base frequencies among three codon positions in high GC genome, which arouse the division between the distribution patterns of bases of six reading frames of protein-coding genes, is responsible for the appearance of the clustering phenomenon. In the light of the clustering phenomenon, the author supposes that the anitisense strand ORFs, particularly those corresponding to Frame 2′ and Frame 3′, may not code for proteins in P. aeruginosa genome.  相似文献   

9.
Since base composition of translational stop codons (TAG, TAA, and TGA) is biased toward a low G+C content, a differential density for these termination signals is expected in random DNA sequences of different base compositions. The expected length of reading frames (DNA segments of sense codons flanked by in-phase stop codons) in random sequences is thus a function of GC content. The analysis of DNA sequences from several genome databases stratified according to GC content reveals that the longest coding sequences—exons in vertebrates and genes in prokaryotes—are GC-rich, while the shortest ones are GC-poor. Exon lengthening in GC-rich vertebrate regions does not result, however, in longer vertebrate proteins, perhaps because of the lower number of exons in the genes located in these regions. The effects on coding-sequence lengths constitute a new evolutionary meaning for compositional variations in DNA GC content. Correspondence to: J. L. Oliver  相似文献   

10.
Does the 'non-coding' strand code?   总被引:3,自引:2,他引:1       下载免费PDF全文
The hypothesis that DNA strands complementary to the coding strand contain in phase coding sequences has been investigated. Statistical analysis of the 50 genes of bacteriophage T7 shows no significant correlation between patterns of codon usage on the coding and non-coding strands. In Bacillus and yeast genes the correlation observed is not different from that expected with random synonymous codon usage, while a high correlation seen in 52 E. coli genes can be explained in terms of an excess of RNY codons. A deficiency of UUA, CUA and UCA codons (complementary to termination) seems to be restricted to the E. coli genes, and may be due to low abundance of the relevant cognate tRNA species. Thus the analysis shows that the non-coding strand has the properties expected of a sequence complementary to a coding strand, with no indications that it encodes, or may have encoded, proteins.  相似文献   

11.
12.
13.
The nucleotide sequence of 1179 b.p. preceding the trp operon genes has been established. There are no open reading frames large enough to code for proteins containing more than 97 amino acid residues. In all cases the coding sequences do not contain the initiation codons. The determined sequence is concluded to represent an intercistronic region.  相似文献   

14.
Thalassiosira weissflogii (Grun.) Fryxell et Hasle is one of the more commonly studied centric diatoms, and yet molecular studies of this organism are still in their infancy. The ability to identify open reading frames and thus distinguish between introns and exons, coding and noncoding sequence is essential to move from nuclear DNA sequences to predicted amino acid sequences. To facilitate the identification of open reading frames in T. weissflogii , two newly identified nuclear genes encoding β-tubulin and t  -complex polypeptide (TCP)-γ, along with six previously published nuclear DNA sequences, were examined for general structural features. The coding region of the nuclear open reading frames had a G + C content of about 49% and could readily be distinguished from noncoding sequence due to a significant difference in G + C content. The introns were uniformly small, about 100 base pairs in size. Furthermore, the 5' and 3' splice sites of introns displayed the canonical GT/AG sequence, further facilitating recognition of noncoding regions. Six of the nuclear open reading frames displayed relatively little bias in the use of synonymous codons, as exemplified by the cDNAs encoding β-tubulin and TCP-γ. Two open reading frames displayed strong bias in the use of particular codons (although the codons used were different), as exemplified by the cDNA encoding fucoxanthin chlorophyll a/c binding protein. Knowledge of codon bias should facilitate, for example, design of degenerate PCR primers and potential heterologous reporter gene constructs.  相似文献   

15.
Overlapping genes are two protein-coding sequences sharing a significant part of the same DNA locus in different reading frames. Although in recent times an increasing number of examples have been found in bacteria the underlying mechanisms of their evolution are unknown. In this work we explore how selective pressure in a protein-coding sequence influences its overlapping genes in alternative reading frames. We model evolution using a time-continuous Markov process and derive the corresponding model for the remaining frames to quantify selection pressure and genetic noise. Our findings lead to the presumption that, once information is embedded in the reverse reading frame −2 (relative to the mother gene in +1) purifying selection in the protein-coding reading frame automatically protects the sequences in both frames. We also found that this coincides with the fact that the genetic noise measured using the conditional entropy is minimal in frame −2 under selection in the coding frame.  相似文献   

16.
The short-chain oxidoreductase (SCOR) family of enzymes includes over 6000 members, extending from bacteria and archaea to humans. Nucleic acid sequence analysis reveals that significant numbers of these genes are remarkably free of stopcodons in reading frames other than the coding frame, including those on the antisense strand. The genes from this subset also use almost entirely the GC-rich half of the 64 codons. Analysis of a million hypothetical genes having random nucleotide composition shows that the percentage of SCOR genes having multiple open reading frames exceeds random by a factor of as much as 1 x 10(6). Nevertheless, screening the content of the SWISS-PROT TrEMBL database reveals that 15% of all genes contain multiple open reading frames. The SCOR genes having multiple open reading frames and a GC-rich coding bias exhibit a similar GC bias in the nucleotide triple composition of their DNA. This bias is not correlated with the GC content of the species in which the SCOR genes are found. One possible explanation for the conservation of multiple open reading frames and extreme bias in nucleic acid composition in the family of Rossman folds is that the primordial member of this family was encoded early using only very stable GC-rich DNA and that evolution proceeded with extremely limited introduction of any codons having two or more adenine or thymine nucleotides. These and other data suggest that the SCOR family of enzymes may even have diverged from a common ancestor before most of the AT-rich half of the genetic code was fully defined.  相似文献   

17.
The nucleotide sequence of tobacco chloroplast genes for tRNASer (GCU) and tRNAGln (UUG) have been determined. These tRNA genes are encoded on the same DNA strand and separated by 1144 bp. Two open reading frames of 52 codons and 98 codons have been found in this spacer region. The tRNASer (GCU) and tRNAGln (UUG) deduced from the DNA sequences show 67% and 76% sequence homologies with E. coli tRNASer (GCU) and tRNAGln (UUG), respectively.  相似文献   

18.
Nucleotide sequence of cauliflower mosaic virus DNA   总被引:1,自引:0,他引:1  
The complete nucleotide sequence (8024 nucleotides) of the circular double-stranded DNA of cauli-flower mosaic virus has been established. The DNA molecule is known to possess three discrete single-stranded discontinuities, often referred to as “gaps”, two in one strand and one in the other. The sequence data indicate that gap 1, the single discontinuity in the α strand, corresponds to the absence of no more than one or two nucleotides with respect to the complementary β strand. The two discontinuities in the β strand, however, are not authentic gaps since no nucleotides are missing, but are instead regions of sequence overlap: a short sequence (19 residues for gap 2, at least 2 residues for gap 3) at one terminus of each discontinuity, probably the 5′ terminus, is displaced from the double helix by an identical sequence at the other boundary of the discontinuity. Analysis of the distribution of nonsense codons in the DNA sequence is consistent with other evidence that only the α strand is transcribed. The coding region extends around the circular molecule from 4 map units of gap 1, the map origin, to map position 91, and consists of six long open reading frames. Our findings suggest, but do not prove, that the DNA sequence of the open reading frames is colinear with viral protein sequences. The cistron for the viral coat protein, which is probably synthesized in the form of a precursor, has been situated in coding region IV on the basis of its unusual amino acid composition.  相似文献   

19.
The goal of this study was to identify and map genes expressed during the elongation phase of embryogenesis in swine. Expressed sequence tags were analysed from a previously described porcine cDNA library prepared from elongating swine embryos. Average insert length of randomly selected clones was approximately 600 bp, with a range from < 100 to > 2500 bp. Single-pass, coding strand sequences from 1132 independent clones were compared with the GenBank non-redundant (nr) database via BLASTN analysis to identify potential porcine homologous of known genes. Among these sequences, 781 (69%) showed significant (score > 300) homology to non- mitochondrial sequences previously deposited in GenBank. Sequences matching interleucin 1 beta and thymosin beta 10 were most frequently observed (24 and 18 clones, respectively), in addition to matches with 310 other distinct genes. No significant match in the GenBank nr database was obtained for 303 sequences. Analysis demonstrated that 151 (50%) had open reading frames (ORF) extending at least 50 codons from the first base of the clone insert. Genetic markers were developed and used to map a subset of 17 genes, selected on the basis of function or of the ability to design primers that successfully amplified porcine genomic DNA, to 10 different porcine chromosomes, providing a set of mapped markers corresponding to genes expressed during conceptus elongation.  相似文献   

20.
Pseudogenes are defined as nonfunctional DNA sequences with homology to functional protein-coding genes, and they typically contain nonfunctional mutations within the presumptive coding region. In theory, pseudogenes can also be caused by mutations in upstream regulatory regions, appearing as open reading frames with attenuated expression. In this study, we identified 1,939 annotated protein-coding genes with little evidence of expression in Arabidopsis thaliana and characterized their molecular evolutionary characteristics. On average, this set of genes was shorter than expressed genes and evolved with a 2-fold higher rate of nonsynonymous substitutions. The divergence of upstream sequences, based on ortholog comparisons to A. lyrata, was also higher than expressed genes, suggesting that these lowly expressed genes could be examples of pseudogenization by promoter disablement, often due to transposable element insertion. We complemented our empirical study by extending the models of Force et al. (Force A, Lynch M, Pickett FB, Amores A, Yan YL, Postlethwait J. 1999. Preservation of duplicate genes by complementary, degenerative mutations. Genetics 151:1531-1545.) to derive the probability of promoter disablements after gene duplication.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号