首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 378 毫秒
1.
A plausible architecture of an ancient genetic code is derived from an extended base triplet vector space over the Galois field of the extended base alphabet {D, A, C, G, U}, where symbol D represents one or more hypothetical bases with unspecific pairings. We hypothesized that the high degeneration of a primeval genetic code with five bases and the gradual origin and improvement of a primeval DNA repair system could make possible the transition from ancient to modern genetic codes. Our results suggest that the Watson-Crick base pairing G ≡ C and A = U and the non-specific base pairing of the hypothetical ancestral base D used to define the sum and product operations are enough features to determine the coding constraints of the primeval and the modern genetic code, as well as, the transition from the former to the latter. Geometrical and algebraic properties of this vector space reveal that the present codon assignment of the standard genetic code could be induced from a primeval codon assignment. Besides, the Fourier spectrum of the extended DNA genome sequences derived from the multiple sequence alignment suggests that the called period-3 property of the present coding DNA sequences could also exist in the ancient coding DNA sequences. The phylogenetic analyses achieved with metrics defined in the N-dimensional vector space (B3)N of DNA sequences and with the new evolutionary model presented here also suggest that an ancient DNA coding sequence with five or more bases does not contradict the expected evolutionary history.  相似文献   

2.
不具有3-碱基周期性的编码序列初探   总被引:4,自引:0,他引:4  
对120个较短编码序列(<1 200 bp)的Fourier频谱进行分析表明,3-碱基周期性在短编码序列中并不是绝对存在的.统计分析提示,编码序列有无3-碱基周期性与序列的碱基组成和分布、所编码蛋白质氨基酸的选用和顺序以及同义密码子的使用都有一定的关系.一般地,非周期-3序列中A+U含量高于G+C含量,周期-3序列的情况则相反;非周期-3序列中碱基在密码子三个位点上的分布比周期-3序列中的分布均匀;非周期-3序列密码子和氨基酸的使用偏向没有周期-3序列的大.在利用Fourier分析方法预测DNA序列中的基因和外显子时,应充分考虑到这些现象.  相似文献   

3.
4.
A program is described to perform general DNA sequence analysis on the Hewlett-Packard Model 86/87 microcomputer operating on 128 K of RAM. The following analytical procedures can be performed: 1. display of the sequence, in whole or part, or its complement; 2. search for specified sequences e.g. restriction sites, and in the case of the latter give fragment sizes; 3. perform a comprehensive search for all known restriction enzyme sites; 4. map sites graphically; 5. perform editing functions; 6. base frequency analysis; 7. search for repeated sequences; 8. search for open reading frames or translate into the amino acid sequence and analyse for basic and acidic amino acids, hydrophobicity, and codon usage. Two sequences, or parts thereof, can be merged in various orientations to mimic recombination strategies, or can be compared for homologies. The program is written in HP BASIC and is designed principally as a tool for the laboratory investigator manipulating a defined set of vectors and recombinant DNA constructs.  相似文献   

5.
Summary We construct a codon space in which a given DNA sequence can be plotted as a function of its base composition in each of the three codon positions. We demonstrate that the base composition is very highly nonrandom, with sequences from more primitive organisms having the least random compositions. By using cluster analysis on the points plotted in codon space we show that there is a strong correlation between base composition and type of organism, with the most primitive organisms having the highest A or T content in the second and third codon positions. A smooth transition toward lower A+T and higher G+C content is observed in the second and third codon positions as the evolutionary complexity of the organism increases. Besides this general trend, more detailed structure can be observed in the clustering that will become clearer as the data base is increased.  相似文献   

6.
DNA序列信息的一种新的测度   总被引:4,自引:3,他引:1  
根据信息理论给出了测度DNA序列信息的一种新的方法,获得DNA序列4个层次的信息量测度:Ib,If(1),If(2)andIf(3),这4种信息测度可分别用来测度DNA的碱基序列、密码子序列、编码蛋白质序列和功能蛋白质序列的信息量。从M.edulis的线粒体基因组中两个较短的编码蛋白质的DNA序列和使用具有不同倍性的间并密码子组组成的模拟DNA序列中所获得计算结果表明,这些信息测度确实能用来揭示所  相似文献   

7.
Galtier N  Bazin E  Bierne N 《Genetics》2006,172(1):221-228
The study of base composition evolution in Drosophila has been achieved mostly through the analysis of coding sequences. Third codon position GC content, however, is influenced by both neutral forces (e.g., mutation bias) and natural selection for codon usage optimization. In this article, large data sets of noncoding DNA sequence polymorphism in D. melanogaster and D. simulans were gathered from public databases to try to disentangle these two factors-noncoding sequences are not affected by selection for codon usage. Allele frequency analyses revealed an asymmetric pattern of AT vs. GC noncoding polymorphisms: AT --> GC mutations are less numerous, and tend to segregate at a higher frequency, than GC --> AT ones, especially at GC-rich loci. This is indicative of nonstationary evolution of base composition and/or of GC-biased allele transmission. Fitting population genetics models to the allele frequency spectra confirmed this result and favored the hypothesis of a biased transmission. These results, together with previous reports, suggest that GC-biased gene conversion has influenced base composition evolution in Drosophila and explain the correlation between intron and exon GC content.  相似文献   

8.
We have studied the statistical constraints on synonymous codon choice to evaluate various proposals regarding the origin of the bias in synonymous codon usage observed by Fiers et al. (1975), Air et al. (1976), Grantham et al. (1980) and others. We have determined the statistical dependence of the degenerate third base on either of its nearest neighbors in mitochondrial, prokaryotic, and eukaryotic coding sequences. We noted an increasing dependence of the third base on its nearest neighbors in moving from mitochrondria to prokaryotes to eukaryotes.A statistical model assuming random equiprobable selection of synonymous codons was found grossly adequate for the mitochondria, but totally indequate for prokaryotes and eukaryotes. A model assuming selection of synonymous codons reflecting a genomic strategy, i.e. the genome hypothesis of Grantham et al. (1980), gave a good approximation of the mitochondrial sequences. A statistical model which exactly maintains codon frequency, but allows the position of corresponding synonymous codons to vary was only grossly adequate for prokaryotes and totally inadequate for eukaryotes. The results of these simulations are consistent with the measures on experimental sequences and suggest that a “frequency constraint” model such as that of Grantham et al. (1980) may be an adequate explanation of the codon usage in mitochondria. However, in addition to this frequency constraint, there may be constraints on synonymous codon choice in prokaryotes due to codon context. Furthermore, any proposal to explain codon usage in eukaryotes must involve a constraint on the context of a codon in the sequence.  相似文献   

9.
Blunt-end palindromic DNA linkers with a central restriction site have been designed for the multiple reading frame insertion (abbreviated MURFI) of a sense or nonsense codon into DNA. We have utilized an amber MURFI linker, 5'CTAG TCTAGA CTAG3' to disrupt the lacZ gene, yielding truncated beta-galactosidase proteins. Conditional disruption of the tetr gene in E. coli has also been demonstrated. Nonsense codon MURFI linkers permit conditional fusion of multiple gene products while sense codon linkers can add structural elements (e.g. beta-turn, cationic segment, hydrophobic segment) or a desired amino acid to a protein (e.g. methionine, cysteine). Shotgun or alternatively site-directed insertion of the symmetric linkers is possible. The over-all length of the linker may be adjusted to retain the original reading frame, matching nucleotide additions or subtractions at recipient DNA sites. If a linker restriction site occurs elsewhere in the target DNA, single linker copies may still be inserted using non-phosphorylated linkers.  相似文献   

10.
The 50 non-coding bases immediately internal to the telomeric repeats in the two 5′ ends of macronuclear DNA molecules of a group of hypotrichous ciliates are anomalous in composition, consisting of 61% purines and 39% pyrimidines, A>T (ratio of 44:32), and G>C (ratio of 17:7). These ratio imbalances violate parity rule 2, according to which A should equal T and G should equal C within a DNA strand and therefore pyrimidines should equal purines. The purine-rich and base ratio imbalances are in marked contrast to the rest of the non-coding parts of the molecules, which have the theoretically expected purine content of 50%, with A = T and G = C. The ORFs contain an average of 52% purines as a result of bias in codon usage. The 50 bases that flank the 5′ ends of macronuclear sequences in micronuclear DNA (12 cases) consist of ~50% purines. Thus, the 50 bases in the 5′ ends of macronuclear sequences in micronuclear DNA are islands of purine richness in which A>T and G>C. These islands may serve as signals for the excision of macronuclear molecules during macronuclear development. We have found no published reports of coding or non-coding native DNA with such anomalous base composition.  相似文献   

11.
A DNA fragment including most of the tyrA gene from E. coli B/r strain WU (Tyr-, Leu-) was amplified in vitro by polymerase chain reaction. The sequence was determined, first, for essentially all of the fragment to locate an ochre nonsense defect, and second, repeatedly for a region of the fragment from several independent isolates containing backmutations at the ochre codon (spontaneous and UV-induced). There were 20 single base differences in the tyrA gene region from the analogous wild-type E. coli K12 sequence: an ochre codon at amino acid position 161, 18 silent changes (1 at the first codon base and 17 at the third) and one replacement of valine by alanine. Different backmutations at the ochre codon encoded lysine, glutamine, glutamic acid, leucine, cysteine, phenylalanine, serine or tyrosine. The diversities of base substitutions at the ochre codon after UV mutagenesis or after mutagenesis where targeting by dimers was reduced or eliminated (after photoreversal of irradiated cells treated with nalidixic acid to induce SOS functions or after UV mutagenesis of cells containing amplified DNA photolyase) were similar (with two notable exceptions). The overall differences between the gene sequences for E. coli K12 or B/r seemed consistent with the neutral theory of molecular evolution.  相似文献   

12.
Approximate methods for estimating the numbers of synonymous and nonsynonymous substitutions between two DNA sequences involve three steps: counting of synonymous and nonsynonymous sites in the two sequences, counting of synonymous and nonsynonymous differences between the two sequences, and correcting for multiple substitutions at the same site. We examine complexities involved in those steps and propose a new approximate method that takes into account two major features of DNA sequence evolution: transition/transversion rate bias and base/codon frequency bias. We compare the new method with maximum likelihood, as well as several other approximate methods, by examining infinitely long sequences, performing computer simulations, and analyzing a real data set. The results suggest that when there are transition/transversion rate biases and base/codon frequency biases, previously described approximate methods for estimating the nonsynonymous/synonymous rate ratio may involve serious biases, and the bias can be both positive and negative. The new method is, in general, superior to earlier approximate methods and may be useful for analyzing large data sets, although maximum likelihood appears to always be the method of choice.  相似文献   

13.
Summary In order to clone the Escherichia coli gene for the stringent starvation protein (SSP), we determined its N-terminal sequence as well as the sequence of two peptide fragments obtained by cyanogen bromide cleavage of the protein. We then chemically synthesized four sets of oligodeoxyribonucleotide mixtures that represented possible codon combinations for parts of these amino acid sequences. The synthetic oligonucleotides were labelled with 32P at their 5-termini and used as hybridization probes to detect DNA fragments containing the complementary sequences. Genomic Southern hybridization of E. coli chromosomal DNA gave up to ten DNA fragments hybridizing with each probe but only a few hybridized with two or more of the probes. The latter fragments were coloned in pBR322. By determining partial base sequences with a rapid method and examining proteins encoded by the DNA fragments, we were able to show that we had isolated a clone containing the complete SSP structural gene.Abbreviations SSP stringent starvation protein - PTH phenylthiohydantoin  相似文献   

14.
15.
16.
An ab initio model for gene prediction in prokaryotic genomes is proposed based on physicochemical characteristics of codons calculated from molecular dynamics (MD) simulations. The model requires a specification of three calculated quantities for each codon: the double-helical trinucleotide base pairing energy, the base pair stacking energy, and an index of the propensity of a codon for protein-nucleic acid interactions. The base pairing and stacking energies for each codon are obtained from recently reported MD simulations on all unique tetranucleotide steps, and the third parameter is assigned based on the conjugate rule previously proposed to account for the wobble hypothesis with respect to degeneracies in the genetic code. The third interaction propensity parameter values correlate well with ab initio MD calculated solvation energies and flexibility of codon sequences as well as codon usage in genes and amino acid composition frequencies in ∼175,000 protein sequences in the Swissprot database. Assignment of these three parameters for each codon enables the calculation of the magnitude and orientation of a cumulative three-dimensional vector for a DNA sequence of any length in each of the six genomic reading frames. Analysis of 372 genomes comprising ∼350,000 genes shows that the orientations of the gene and nongene vectors are well differentiated and make a clear distinction feasible between genic and nongenic sequences at a level equivalent to or better than currently available knowledge-based models trained on the basis of empirical data, presenting a strong support for the possibility of a unique and useful physicochemical characterization of DNA sequences from codons to genomes.  相似文献   

17.
The similarity of two nucleotide sequences is often expressed in terms of evolutionary distance, a measure of the amount of change needed to transform one sequence into the other. Given two sequences with a small distance between them, can their similarity be explained by their base composition alone? The nucleotide order of these sequences contributes to their similarity if the distance is much smaller than their average permutation distance, which is obtained by calculating the distances for many random permutations of these sequences. To determine whether their similarity can be explained by their dinucleotide and codon usage, random sequences must be chosen from the set of permuted sequences that preserve dinucleotide and codon usage. The problem of choosing random dinucleotide and codon-preserving permutations can be expressed in the language of graph theory as the problem of generating random Eulerian walks on a directed multigraph. An efficient algorithm for generating such walks is described. This algorithm can be used to choose random sequence permutations that preserve (1) dinucleotide usage, (2) dinucleotide and trinucleotide usage, or (3) dinucleotide and codon usage. For example, the similarity of two 60-nucleotide DNA segments from the human beta-1 interferon gene (nucleotides 196-255 and 499-558) is not just the result of their nonrandom dinucleotide and codon usage.   相似文献   

18.
The molecular and genetic basis of a compound heterozygote for dys- and hypoprothrombinemia was analyzed. Abnormal nucleotide sequences of the human prothrombin gene were screened by PCR-single-strand conformation polymorphism (PCR-SSCP) with endonuclease digestion and mutated primer-mediated PCR-RFLP. A single nucleotide substitution responsible for dysprothrombinemia of prothrombin Tokushima was detected, as were three polymorphisms. The mutation for hypoprothrombinemia was detected by PCR-single-strand conformation polymorphism (PCR-SSCP) with endonuclease digestion in exon 6, near MboII-RFLP and NcoI-RFLP. Sequencing of PCR-amplified genomic DNA revealed a single base insertion of thymine (T) at position 4177. The resulting frameshift mutation caused both an altered amino acid sequence from codon 114 and a premature termination codon (i.e., TGA) at codon 174 in exon 7. Because exon 7 encodes the kringle 2 domain preceding the thrombin sequence, this frameshift leads to the null prothrombin phenotype. The inheritance of the hypoprothrombinemia gene from the father to the proband was proved by PCR-SSCP with endonuclease digestion and mutated primer-mediated PCR-RFLP.  相似文献   

19.
In bacteriophage T4, there is a strong tendency for genes that encode interacting proteins to be clustered on the chromosome. There is 1.6 kb of DNA between the DNA helicase (gene 41) and the DNA primase (gene 61) genes of this virus. The DNA sequence of this region suggests that it contains five genes, designated as open reading frames (ORFs) 61.1 to 61.5, predicted to encode proteins ranging in size from 5.94 to 22.88 kDa. Are these ORFs actually genes? As one test, we compared the DNA sequence of this region in bacteriophages T2, T4, and T6 and found that ORFs 61.1, 61.3, 61.4, and 61.5 are highly conserved among the three closely related viruses. In contrast, ORF 61.2 is conserved between phages T4 and T6 yet is absent from phage T2, where it is replaced by another ORF, T2 ORF 61.2, which is not found in the T4 and T6 genomes. As a second, independent test for coding sequences, we calculated the codon base position preferences for all ORFs in this region that could encode proteins that contain at least 30 amino acids. Both the T4/T6 and T2 versions of ORF 61.2, as well as the other ORFs, have codon base position preferences that are indistinguishable from those of known T4 genes (coefficients of 0.81 to 0.94); the six other possible ORFs of at least 90 bp in this region are ruled out as genes by this test (coefficients less than zero). Thus, both evolutionary conservation and codon usage patterns lead us to conclude that ORFs 61.1 to 61.5 represent important protein-coding sequences for this family of bacteriophages. Because they are located between the genes that encode the two interacting proteins of the T4 primosome (DNA helicase plus DNA primase), one or more may function in DNA replication by modulating primosome function.  相似文献   

20.
The use of sequences encoding identical peptides (SEIP) for the in silico analysis of coding DNA from different species has not been reported; the study of such sequences could directly reveal properties of coding DNA that are independent of peptide sequences. For practical purposes SEIP might also be manipulated for e.g. heterologous protein expression. We extracted 1,551 SEIP from human and E. coli and 2,631 SEIP from human and D. melanogaster. We then analyzed codon usage and intercodon dinucleotide tendencies and found differences in both, with more conspicuous disparities between human and E. coli than between human and D. melanogaster. We also briefly manipulated SEIP to find out if they could be used to create new coding sequences. We hence attempted replacement of human by E. coli codons via dicodon exchange but found that full replacement was not possible, this indicated robust species-specific dicodon tendencies. To test another form of codon replacement we isolated SEIP from human and the jellyfish green fluorescent protein (GFP) and we then re-constructed the GFP coding DNA with human tetra-peptide-coding sequences. Results provide proof-of-principle that SEIP may be used to reveal differences in the properties of coding DNA and to reconstruct in pieces a protein coding DNA with sequences from a different organism, the latter might be exploited in heterologous protein expression.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号