首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We have analyzed correlations of nucleotide distributions along more than 50 megabases of the longest sequenced parts of the human, mouse, Drosophila, Arabidopsis, yeast, E.coli and three kinds of viral genomes. The strongest correlations were observed between the distributions of C and G, in particular in the genome of Drosophila. This correlation was much weaker, though still strong, in the human genome and E.coli that exhibited the same level of this correlation. The C/G correlation hardly originates from the isochores because the isochores were not reported to occur in the genomes of Drosophila and E. coil. The genomic distribution curves of adenine and thymine were also positively correlated in all analyzed organisms except for the yeast where they were anticorrelated. Still stronger anticorrelations were, however, observed between the genomic distributions of A and C and between G and T. These genomic distributions anticorrelated almost generally and very strong. These anticorrelations are likely to originate from point mutations resulting from unrepaired GA mispairing as a replication intermediate. The C/A or G/T anticorrelation or compensation is a very strong and general new phenomenon that shapes the genomic nucleotide sequences.  相似文献   

2.
We calculated the variation coefficients of the mononucleotide and short oligonucleotide distributions in over 1700 long genomic sequences originating from six organisms to demonstrate that the human and Escherichia coli genomic sequences were the least and the most uniform, respectively. The most non-random genomic distributions were exhibited by the four canonical nucleotides, followed by the strong and weak nucleotides, while the distributions of purine or pyrimidine nucleotides and especially the distributions of (A+C) and (G+T) were significantly more uniform even in the human genome. In the human and mouse genomes, the highest coefficients of variation were further observed with the oligonucleotides where CG was combined with the strong nucleotides while its combination with the weak nucleotides significantly decreased the variation which, however, was still very high. High variation was also exhibited by the remaining oligonucleotides composed exclusively of the strong nucleotides or those containing only weak nucleotides. On the other hand, the distributions of oligonucleotides containing similar and especially the same numbers of the strong and weak nucleotides, but no CG or TA dinucleotide, were the most uniform. The information following from the present analysis will be useful not only in the identification of important genomic regions but also in computer simulations of the genomic nucleotide sequences in order to trace and reproduce the pathways of genome evolution.  相似文献   

3.
The gene for L-lactate dehydrogenase (LDH) (EC 1.1.1.27) of Thermus caldophilus GK24 was cloned in Escherichia coli using synthetic oligonucleotides as hybridization probes. The nucleotide sequence of the cloned DNA was determined. The primary structure of the LDH was deduced from the nucleotide sequence. The deduced amino acid sequence agreed with the NH2-terminal and COOH-terminal sequences previously reported and the determined amino acid sequences of the peptides obtained from trypsin-digested T. caldophilus LDH. The LDH comprised 310 amino acid residues and its molecular mass was determined to be 32,808. On alignment of the whole amino acid sequences, the T. caldophilus LDH showed about 40% identity with the Bacillus stearothermophilus, Lactobacillus casei and dogfish muscle LDHs. The T. caldophilus LDH gene was expressed with the E. coli lac promoter in E. coli, which resulted in the production of the thermophilic LDH. The gene for the T. caldophilus LDH showed more than 40% identity with those for the human and mouse muscle LDHs on alignment of the whole nucleotide sequences. The G + C content of the coding region for the T. caldophilus LDH was 74.1%, which was higher than that of the chromosomal DNA (67.2%). The G + C contents in the first, second and third positions of the codons used were 77.7%, 48.1% and 95.5% respectively. The high G + C content in the third base caused extremely non-random codon usage in the LDH gene. About half (48.7%) the codons in the LDH gene started with G, and hence there were relatively high contents of Val, Ala, Glu and Gly in the LDH. The contents of Pro, Arg, Ala and Gly, which have high G + C contents in their codons, were also high. Rare codons with U or A as the third base were sometimes used to avoid the TCGA sequence, the recognition site for the restriction endonuclease, TaqI. Two TCGA sequences were found only in the sequence of CTCGAG (XhoI site) in the sequenced region of the T. caldophilus DNA. There were three segments with similar sequences in the two 5' non-coding regions, probably the promoter and ribosome-binding regions, of the genes for the T. caldophilus LDH and the Thermus thermophilus 3-isopropylmalate dehydrogenase.  相似文献   

4.
Characterization of AluI repeats of zebrafish (Brachydanio rerio).   总被引:1,自引:0,他引:1  
Two families of repetitive DNA sequences were isolated from the zebrafish genome and characterized. Eight different sequences were sequenced and classified by two standards, their (G + C) composition and their lengths. For convenience, the sequences were first divided into two types. Type I was (A + T)-rich, was repeated approximately 500,000 times, and constituted approximately 5% of the zebrafish genome. Type II was (G + C)-rich, was reiterated approximately 90,000 times, and comprised approximately 0.5% of the genome. Agarose gel electrophoresis of zebrafish DNA cleaved with AluI revealed three distinguishable bands of repetitive fragments: large (approximately 180 bp, designated RFAL), medium (approximately 140 bp, RFAM), and small (approximately 90 bp, RFAS). The RFAL fragments contained both type I and type II sequences. Limited digestion of genomic DNA indicated that RFAL and RFAM were tandemly arranged in the genome, whereas RFAS showed a mixed pattern of both tandem and interspersed repeated arrangements. Although inclusion of a repetitive sequence in a transgenic construct did not appreciably accelerate homologous integration of transgenes into the zebrafish genome, the AluI sequences could facilitate transgene mapping following chromosomal integration.  相似文献   

5.
6.
Synonymous codon usage patterns of bacteriophage and host genomes were compared. Two indexes, G + C base composition of a gene (fgc) and fraction of translationally optimal codons of the gene (fop), were used in the comparison. Synonymous codon usage data of all the coding sequences on a genome are represented as a cloud of points in the plane of fop vs. fgc. The Escherichia coli coding sequences appear to exhibit two phases, "rising" and "flat" phases. Genes that are essential for survival and are thought to be native are located in the flat phase, while foreign-type genes from prophages and transposons are found in the rising phase with a slope of nearly unity in the fgc vs. fop plot. Synonymous codon distribution patterns of genes from temperate phages P4, P2, N15 and lambda are similar to the pattern of E. coli rising phase genes. In contrast, genes from the virulent phage T7 or T4, for which a phage-encoded DNA polymerase is identified, fall in a linear curve with a slope of nearly zero in the fop vs. fgc plane. These results may suggest that the G + C contents for T7, T4 and E. coli flat phase genes are subject to the directional mutation pressure and are determined by the DNA polymerase used in the replication. There is significant variation in the fop values of the phage genes, suggesting an adjustment to gene expression level. Similar analyses of codon distribution patterns were carried out for Haemophilus influenzae, Bacillus subtilis, Mycobacterium tuberculosis and their phages with complete genomic sequences available.  相似文献   

7.
以玉米叶绿体基因组为参照序列,采用三序列比较法系统分析了小麦和水稻分化过程中叶绿体基因组核苷酸替代的发生方式.结果表明,小麦中存在(A+T)/(G+C)替代偏差,水稻则无,该差异对小麦和水稻分化后叶绿体基因组G+C含量产生不同的影响,替代使小麦叶绿体基因组G+C含量降低、水稻叶绿体基因组G+C含量表现增加.无论在编码区、非编码区,还是不同功能基因区,小麦叶绿体基因组转换与颠换的比值都显著低于水稻.小麦和水稻叶绿体基因组进化中核苷酸替代呈现种属特异性.  相似文献   

8.
An obligately anaerobic and extremely thermophilic bacterium, Dictyoglomus thermophilum, produces multiple extracellular amylases. In addition to one of the amylase genes, amyA, which we previously cloned and characterized, we have cloned two additional genes, amyB and amyC, coding for amylases of this thermophile, into Escherichia coli and determined their nucleotide sequences. The two amylase genes were expressed under the control of E. coli promoters. Almost all activity was detected in the intracellular fraction in the E. coli cells. The molecular mass and NH2-terminal amino acid sequence of the AmyB enzyme, which was purified from an E. coli transformant containing the amyB gene, confirmed that the reading frame of amyB consisted of 562 amino acids (Mr 67,000). The molecular mass of the AmyC enzyme, estimated by activity staining of a crude extract of E. coli containing amyC, confirmed that AmyC consisted of 498 amino acids (Mr 59,000). The optimal temperatures for AmyB and AmyC activities on soluble starch were 80 degrees C and 70 degrees C, respectively. Both AmyB and AmyC showed a pH optimum of 5.5. AmyB and AmyC showed a different pattern of starch hydrolysis when examined by thin-layer chromatography. Some homology in the amino acid sequences with the functional regions of Taka-amylase A was found in both AmyB and AmyC. The codon usage in the amyA, amyB and amyC genes was highly biased, which reflects the fact that the guanine-plus-cytosine (G + C) content of DNA of D. thermophilum is 29 mol%. The distribution of G and C at each position of the codons was non-random; the G + C content of the first position of codons is significantly high, whereas that of the third position is somewhat low. In addition, codons consisting only of A and T were preferentially used in this thermophile.  相似文献   

9.
We analysed complete or almost complete nucleotide sequences of the human, chimp, mouse, rat, chicken, dog, and other genomes to find that they contain extremely long (A+T) a (G+C) blocks that do not occur at all in the corresponding randomized sequences. The longest is an (A+T) block containing 1040 consecutive AT pairs that occurs in the 16th human chromosome. The longest human (G+C) block has 261 bp in length. About a half of the longest blocks occur in introns. The (A+T) blocks are discrete units whereas the (G+C) blocks are diffuse. They are imbedded in the genome through connectors longer than 1 kilobase where the (G+C) content gradually decreases to the value of 50%. Remarkably, the (A+T) as well as (G+C) blocks are substantially shorter in the chimp genome. Chicken is characteristic by very long (G+C) blocks that are even longer than in the human genome. Though much shorter, long (G+C) and especially (A+T) blocks occur in lower organisms as well, which means that AT and GC pair clustering is an ancient property that has evolved into large scales in higher eukaryote genomes and the human genome in particular. Very long (A+T) and (G+C) blocks confer specific biophysical properties on DNA that are likely to influence genome folding in cell nuclei and its functional properties.  相似文献   

10.
T7 and E. coli share homology for replication-related gene products   总被引:2,自引:0,他引:2  
H Toh 《FEBS letters》1986,194(2):245-248
Recently, the complete nucleotide sequence of the bacteriophage T7 genome was determined and 50 genes were identified on the genome. We compared amino acid sequences of all the gene products of T7 and replication-related gene products of E. coli. As a result, we found that T7 and E. coli share homology for each pair of exonuclease, DNA primase and helix-destabilizing protein. For E. coli, these gene products are known to be involved in the process of discontinuous DNA replication. These observations suggest that T7 and E. coli have a common origin for a part of their replication systems.  相似文献   

11.
Jiang C  Zhao Z 《Genomics》2006,88(5):527-534
So far, there is no genome-wide estimation of the mutational spectrum in humans. In this study, we systematically examined the directionality of the point mutations and maintenance of GC content in the human genome using approximately 1.8 million high-quality human single nucleotide polymorphisms and their ancestral sequences in chimpanzees. The frequency of C-->T (G-->A) changes was the highest among all mutation types and the frequency of each type of transition was approximately fourfold that of each type of transversion. In intergenic regions, when the GC content increased, the frequency of changes from G or C increased. In exons, the frequency of G:C-->A:T was the highest among the genomic categories and contributed mainly by the frequent mutations at the CpG sites. In contrast, mutations at the CpG sites, or CpG-->TpG/CpA mutations, occurred less frequently in the CpG islands relative to intergenic regions with similar GC content. Our results suggest that the GC content is overall not in equilibrium in the human genome, with a trend toward shifting the human genome to be AT rich and shifting the GC content of a region to approach the genome average. Our results, which differ from previous estimates based on limited loci or on the rodent lineage, provide the first representative and reliable mutational spectrum in the recent human genome and categorized genomic regions.  相似文献   

12.
Heteroduplexes with single base pair mismatches of known sequence were prepared by annealing separated strands of bacteriophage lambda DNA and used to transfect Escherichia coli. A series of transition (G:T and A:C) and transversion (G:A and C:T) mismatches located throughout most of the bacteriophage lambda cI gene has been examined. The results suggest that the transition mismatches are generally better repaired than the transversion mismatches and that, at least for the transversion mismatches studied, repair efficiency increases with increasing G:C content in the neighboring nucleotide sequence. This specificity of the E. coli mismatch repair system can account, in part, for the similar frequencies of base substitution mutations throughout the E. coli genome.  相似文献   

13.
The enteric bacterium Escherichia coli synthesizes cobalamin (coenzyme B12) only when provided with the complex intermediate cobinamide. Three cobalamin biosynthetic genes have been cloned from Escherichia coli K-12, and their nucleotide sequences have been determined. The three genes form an operon (cob) under the control of several promoters and are induced by cobinamide, a precursor of cobalamin. The cob operon of E. coli comprises the cobU gene, encoding the bifunctional cobinamide kinase-guanylyltransferase; the cobS gene, encoding cobalamin synthetase; and the cobT gene, encoding dimethylbenzimidazole phosphoribosyltransferase. The physiological roles of these sequences were verified by the isolation of Tn10 insertion mutations in the cobS and cobT genes. All genes were named after their Salmonella typhimurium homologs and are located at the corresponding positions on the E. coli genetic map. Although the nucleotide sequences of the Salmonella cob genes and the E. coli cob genes are homologous, they are too divergent to have been derived from an operon present in their most recent common ancestor. On the basis of comparisons of G+C content, codon usage bias, dinucleotide frequencies, and patterns of synonymous and nonsynonymous substitutions, we conclude that the cob operon was introduced into the Salmonella genome from an exogenous source. The cob operon of E. coli may be related to cobalamin synthetic genes now found among non-Salmonella enteric bacteria.  相似文献   

14.
A novel method to calculate the G+C content of genomic DNA sequences.   总被引:2,自引:0,他引:2  
The base composition of a DNA fragment or genome is usually measured by the proportion of A+T or G+C in the sequence. The G+C content along genomic sequences is usually calculated using an overlapping or non-overlapping sliding window method. The result and accuracy of such an approach depends on the size of the window and the moving distance adopted. In this paper, a novel windowless technique to calculate the G+C content of genomic sequences is proposed. By this method, the G+C content can be calculated at different "resolution". In an extreme case, the G+C content may be computed at a specific point, rather than in a window of finite size. This is particularly useful to analyze the fine variation of base composition along genomic sequences. As the first example, the variation of G+C content along each of 16 yeast chromosomes is analyzed. The G+C-rich regions with length larger than 5 kb sequences are detected and listed in details. It is found that each chromosome consists of several G+C-rich and G+C-poor regions alternatively, i.e., a mosaic structure. Another example is to analyze the G+C content for each of the two chromosomes of the Vibrio cholerae genome. Based on the variations of the G+C content in each chromosome, it is shown that some fragments in the Vibrio cholerae genome may have been transferred from other species. Especially, the position and size of the large integron island on the smaller chromosome was precisely predicted. This method would be a useful tool for analyzing genomic sequences.  相似文献   

15.
The nucleotide sequence of formylmethionine tRNA from an extreme thermophile, Thermus thermophilus HB8, was determined by a combination of classical methods using unlabeled samples to determine the sequences of the oligonucleotides of RNase T1 and RNase A digests and a rapid sequencing gel technique using 5'-32P labeled samples to determine overlapping sequences. Formylmethionine tRNA from T. thermophilus is composed of two species, tRNAf1Met and tRNAf2Met. Their nucleotide sequences are almost identical, and are also almost identical with that of E. coli tRNAfMet, except for slight modifications and replacements. Both species have modifications at three points which do not exist in E. coli tRNAfMet: 2'-O-methylation at G19, N-1-methylation at A59 and 2-thiolation at T55. Moreover U51 in E. coli tRNAfMet is replaced by C51 in both species, so that a G-C pair is formed between this C51 and G65. tRNAf2Met has a reversed G-C pair at positions 52 and 64 compared with those in tRNAf1Met and E. coli tRNAfMet. Other regions are mostly the same as those in all prokaryotic initiator tRNAs so far reported. The thermostability of these thermophile initiator tRNAs is discussed in relation to their unique modifications.  相似文献   

16.
We demonstrated the genetic polymorphism of aldehyde oxidase (AO) in Donryu strain rats: the ultrarapid metabolizer (UM) with nucleotide mutation of (377G, 2604C) coding for amino acid substitution of (110Gly, 852Val), extensive metabolizer (EM) with (377G/A, 2604C/T) coding for (110Gly/Ser, 852Val/Ala), and poor metabolizer (PM) with (377A, 2604T) coding for (110Ser, 852Ala), respectively. The results suggested that 377G > A and/or 2604C > T should be responsible for the genetic polymorphism. In this study, we constructed an E. coli expression system of four types of AO cDNA including Mut-1 with (377G, 2604T) and Mut-2 with (377A, 2604C) as well as naturally existing nucleotide sequences of UM and PM in order to clarify which one is responsible for the polymorphism. Mut-1 and Mut-2 showed almost the same high and low activity as that of the UM and PM groups, respectively. Thus, the expression study of mutant AO cDNA directly revealed that the nucleotide substitution of 377G > A, but not that of 2604C > T, will play a critical role in the genetic polymorphism of AO in Donryu strain rats. The reason amino acid substitution will cause genetic polymorphism in AO activity was discussed.  相似文献   

17.
Abstract

We analysed complete or almost complete nucleotide sequences of the human, chimp, mouse, rat, chicken, dog, and other genomes to find that they contain extremely long (A+T) a (G+C) blocks that do not occur at all in the corresponding randomized sequences. The longest is an (A+T) block containing 1040 consecutive AT pairs that occurs in the 16th human chromosome. The longest human (G+C) block has 261 bp in length. About a half of the longest blocks occur in introns. The (A+T) blocks are discrete units whereas the (G+C) blocks are diffuse. They are embeeded in the genome through connectors longer than 1 kilobase where the (G+C) content gradually decreases to the value of 50%. Remarkably, the (A+T) as well as (G+C) blocks are substantially shorter in the chimp genome. Chicken is characteristic by very long (G+C) blocks that are even longer than in the human genome. Though much shorter, long (G+C) and especially (A+T) blocks occur in lower organisms as well, which means that AT and GC pair clustering is an ancient property that has evolved into large scales in higher eukaryote genomes and the human genome in particular. Very long (A+T) and (G+C) blocks confer specific biophysical properties on DNA that are likely to influence genome folding in cell nuclei and its functional properties.  相似文献   

18.
19.
We have determined the complete nucleotide sequence of Xenopus laevis 28S rDNA (4110 bp). In order to locate evolutionarily conserved regions within rDNA, we compared the Xenopus 28S sequence to homologous rDNA sequences from yeast, Physarum, and E. coli. Numerous regions of sequence homology are dispersed throughout the entire length of rDNA from all four organisms. These conserved regions have a higher A + T base composition than the remainder of the rDNA. The Xenopus 28S rDNA has nine major areas of sequence inserted when compared to E. coli 23S rDNA. The total base composition of these inserts in Xenopus is 83% G + C, and is generally responsible for the high (66%) G + C content of Xenopus 28S rDNA as a whole. Although the length of the inserted sequences varies, the inserts are found in the same relative positions in yeast 26S, Physarum 26S, and Xenopus 28S rDNAs. In one insert there are 25 bases completely conserved between the various eukaryotes, suggesting that this area is important for eukaryotic ribosomes. The other inserts differ in sequence between species and may or may not play a functional role.  相似文献   

20.
Two types of C3G cDNA were isolated from mouse 3T3-L1 adipocyte cDNA library. A 114-bp sequence in the middle of C3G cDNA is deleted in the short type cDNA. By RT-PCR analysis, it was found that these two types of C3G mRNA existed in all the mouse tissues. Sequence comparison revealed 88% nucleotide sequence identity between mouse and human C3G cDNA. Comparison of mouse C3G cDNA with the human genome database suggested that this 114-bp sequence comprised an entire exon, and it is confirmed by PCR analysis using mouse genomic DNA and cDNA template. These results indicate that two C3G mRNAs and proteins result from alternative RNA splicing.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号