首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
Abstract

Distribution of double-helix thermal stability of Eschericia coli and eukaryotic DNAs was analyzed. The results confirmed the previous propositions based on the study of the stability distribution in phage DNAs: (1) stability fluctuation appears near the boundaries of protein coding regions (PCRs) and non protein coding regions (NPCRs); (2) PCRs have less fluctuation than NPCRs. The present analysis also revealed that the local G+C content is lower in the beginning of PCRs of E. coli than the average G+C content of PCR and that deviations in the amino acid composition and the third letter usage PCRs are involved in the low G+C content; the biological meaning of this is discussed in relation to mRNA structure.  相似文献   

2.
The mean (G + C) composition (51.0%) and standard deviation (+/- 3.8%) of published DNA sequences accounting for 10% of the E. coli genome is in excellent agreement with the principal overall distribution determined by high resolution melting. While differences in base and neighbor characteristics are small and uniform throughout all regions of the genome, it is found that the (G + C) content of sequences varies in segmented fashion within boundaries corresponding to coding (53% G + C) and noncoding (46% G + C) regions; with variances in the latter being six-fold greater than in coding regions. The variance in different regions shows a strong negative dependence on (G + C) content of the region, reflecting the condition that A-T and G-C base pairs are preferred neighbors of A-T and C-G pairs, respectively; with the bias increasing with decreasing (G + C) content. Neighbor analysis indicates the most extreme positive biases occur in AA, TT, GC and CG throughout all regions, but particularly in noncoding regions. Extraordinary numbers of oligomeric strings of (A)n, etc., are the further consequence of this bias. These and other characteristics point to the existence of inherent biases in neighbor frequencies levied during replication or repair, and which reflect, in turn, neighbor influences during mutation. The bias in codon usage noted by Grantham and others is seen here as due, in part, to the adaptation of coding sequences to this microenvironment through selection among synonymous codons so as to preserve inherent neighbor biases.  相似文献   

3.
The gene encoding ribosomal protein S11 (Escherichia coli S15 homologue) from Halobacterium marismortui was cloned employing two synthetic oligonucleotide mixtures, 23 and 32 bases in length, as hybridization probes. The nucleotide sequence of the gene and the adjacent 5'- and 3'-flanking regions (1300 base pairs) were then determined by the dideoxy chain termination method. Comparison of the nucleotide sequence of the H. marismortui S11 gene with that of the E. coli S15 gene (rpsO) showed that the 3'-end of the S11 gene can be aligned with the entire E. coli S15 gene, sharing 44% identical nucleotides. It has been found that the S11 gene has a higher G + C content (G + C = 65%) than that of the E. coli S15 gene (G + C = 53%). This increase in G + C content specifically shows up as a preference for G + C in the 3rd position of the codon. Upstream of the S11 gene, an archaebacterial promoter sequence (GGACTTTCA) and a putative ribosomal binding site (GCGGT) have been found, 88 and 15 (or 24) base pairs from the initiation codon of the gene. In addition, an open reading frame could be identified immediately after the stop codon for the S11 gene. Northern blotting analysis using the S11 coding region as probe has shown that the S11 gene is located on a 2.4-kilobase mRNA, suggesting that it is cotranscribed with other downstream gene(s).  相似文献   

4.
Summary To investigate the dependence of protein composition on DNA base composition, a set of data on individual proteins with known amino acid compositions from a spectrum of bacterial species has been compiled. It is found that similar relationships of amino acid frequency to G + C content exist for these proteins as for the bulk proteins studied by Sueoka (1961). The data are analysed by linear and cubic regression, and a measure of the proportions of A + T-rich and G + C-rich codons in the underlying messenger RNAs is put forward. The theoretical limits on the G + C content of coding DNA are discussed, and inference are made about the various selective forces acting on DNAs of different G + C contents.  相似文献   

5.
Statistical analyses on the positional correlation of physical-stability and base-sequence distribution maps with genetic map are made for the whole DNA (48502 bases) of lambda-phage. The susceptibility to a double-helix unfolding perturbation and the fraction of the transient opening of a particular region of the double helix are adopted to define this physical stability. The principal features obtained are: A) The DNA double strand of protein coding regions is found to have homostabilizing propensity around a defined stability which is characteristic to each individual gene. B) The stability of the double helix in non-protein coding region fluctuates, on average over the whole region, more than that in protein coding region. C) Boundary regions of protein coding and non-protein coding regions are regions of high stability-fluctuation. Stability especially fluctuates at the protein-coding-region side of the boundary. Contrary to the quiet feature of the interior part of protein coding region rather noisy part exists at its edge. D) One frequently opening region coincides with the attaching site for the site specific recombination between phage and bacterial DNA. There are two possible ways to explain the noisy feature in the stability distribution in non-protein coding regions: 1) The region has been used as the locus of recombination as evolution took place. Thus DNAs which were homostabilized around a different value characteristic to each individual DNA, have been joined there many times, so that the noise has accumulated as a remnant of evolutional history; and/or 2) the base-composition homogenizing or double-helix homostabilizing mechanism does not work in unneeded region such as non-protein coding region or introns. Since corresponding characteristics have been found in our previous analyses on other viral and globin-gene DNAs, the rules mentioned above may be comprehensively extended to other DNAs.  相似文献   

6.
Abstract

Statistical analyses on the positional correlation of physical-stability and base-sequence distribution maps with genetic map are made for the whole DNA (48502 bases) of λ-phage. The susceptibility to a double-helix unfolding perturbation and the fraction of the transient opening of a particular region of the double helix are adopted to define this physical stability.

The principal features obtained are: A) The DNA double strand of protein coding regions is found to have homostabilizing propensity around a defined stability which is characteristic to each individual gene. B) The stability of the double helix in non-protein coding region fluctuates, on average over the whole region, more than that in protein coding region. C) Boundary regions of protein coding and non-protein coding regions are regions of high stability-fluctuation. Stability especially fluctuates at the protein-coding-region side of the boundary. Contrary to the quiet feature of the interior part of protein coding region rather noisy part exists at its edge. D) One frequently opening region coincides with the attaching site for the site specific recombination between phage and bacterial DNA.

There are two possible ways to explain the noisy feature in the stability distribution in non-protein coding regions: 1) The region has been used as the locus of recombination as evolution took place. Thus DNAs which were homostabilized around a different value characteristic to each individual DNA, have been joined there many times, so that the noise has accumulated as a remnant of evolutional history; and/or 2) the base-composition homogenizing or double-helix homostabilizing mechanism does not work in unneeded region such as non-protein coding region or introns.

Since corresponding characteristics have been found in our previous analyses on other viral and globin-gene DNAs, the rules mentioned above may be comprehensively extended to other DNAs.  相似文献   

7.
Dynamic flexibility in the Escherichia coli genome.   总被引:2,自引:0,他引:2  
L Tsai  Z Sun 《FEBS letters》2001,507(2):225-230
Empirical rules based on tetranucleotide parameters were presented to predict the structural parameters twist (Omega), roll (rho), tilt (tau) and slide (D(y)). A statistical mechanical model was used to analyze the flexibility of the Escherichia coli genome. The replication terminus region displayed a low level of flexibility. A strong correlation can be seen between G+C content and flexibility. Average flexibilities in the coding regions were found to be significantly larger than those in non-coding regions. The flexible characteristics in the 5'-neighborhood of the coding regions and in three class sigma promoter sequences in the E. coli genome were also analyzed.  相似文献   

8.
9.
10.
11.
F Fuller  H Boedtker 《Biochemistry》1981,20(4):996-1006
Three pro-alpha 1 collagen cDNA clones, pCg1, pCg26, and pCg54, and two pro-alpha 2 collagen cDNA clones, pCg 13 and pCg45, were subjected to extensive DNA sequence determination. The combined sequences specified the amino acid sequences for chicken pro-alpha 1 and pro-alpha 2 type I collagens starting at residue 814 in the collagen triple-helical region and continuing to the procollagen C-termini as determined by the first in-phase termination codon. Thus, the sequences of 272 pro-alpha 1 C-terminal, 260 pro-alpha 2 C-terminal, 201 pro-alpha 1 helical, and 201 pro-alpha 2 helical amino acids were established. In addition, the sequences of several hundred nucleotides corresponding to noncoding regions of both procollagen mRNAs were determined. In total, 1589 pro-alpha 1 base pairs and 1691 pro-alpha 2 base pairs were sequenced, corresponding to approximately one-third of the total length of each mRNA. Both procollagen mRNA sequences have a high G+C content. The pro-alpha 1 mRNA is 75% G+C in the helical coding region sequenced and 61% G&C in the C-terminal coding region while the pro-alpha 2 mRNA is 60% and 48% G+C, respectively, in these regions. The dinucleotide sequence pCG occurs at a higher frequence in both sequences than is normally found in vertebrate DNAs and is approximately 5 times more frequent in the pro-alpha 1 sequence than in the pro-alpha 2 sequence. Nucleotide homology in the helical coding regions is very limited given that these sequences code for the repeating Gly-X-Y tripeptide in a region where X and Y residues are 50% conserved. These differences are clearly reflected in the preferred codon usages of the two mRNAs.  相似文献   

12.
CpG deficiency, dinucleotide distributions and nucleosome positioning   总被引:2,自引:0,他引:2  
The dinucleotide CpG is deficient in (A + T)-rich regions of vertebrate DNA in both coding and non-coding sequences and there is a corresponding increase above expectation in the occurrence of TpG and CpA. By contrast in (G + C)-rich regions no deficiency of CpG is found. Such (G + C)-rich sequences, containing the expected number of CpG dinucleotides, alternate along the genome with (A + T)-rich sequences which have a lower than expected CpG content. The G + C content of vertebrate DNA can oscillate with a period of 150-200 bp and this may be a factor in positioning nucleosomes. The role of mutagenesis in loss of CpG and increase of A + T, particularly in non-coding regions, is discussed.  相似文献   

13.
Phycobiliproteins function as a major light harvesting protein-pigment complex in the cyanobacteria and the eukaryotic algae. Phycoerythrin (PE) is a kind of phycobiliproteins, widely located in all rhodophytes, some species of cyanobacteria and cryptophytes, and different ecotypes of Prochlorococcus populations. PeBA encoding beta and alpha subunits of PE from Ceramium boydenn was cloned and sequenced in this research. A peBA specific PCR primer was synthesized, based on the peBA gene conserved sequences. The beta subunit encoding gene (peB) contained an open reading frame of 534 bp, while the alpha subunit (peA) was 495 bp. Recombinant expression plasmid pET-peAB was constructed and expressed in Escherichia coli BL21. The molecular weight of expressive product of peB and peA was about 23.3 and 18.2 KD, respectively. Results of codon usage analysis show that G + C content is heterogeneous among different groups of PE and spacers have dramatically lower G + C contents than coding regions. Also there is a high variance in G + C content among sequences at the third position sites. It is also found in this paper that several sequence regions, which might reflect functional or structural requirements of the PE organization, and several residues known for their functional importance are conserved in almost all the sequences.  相似文献   

14.
Cloning and sequencing of Serratia protease gene.   总被引:46,自引:1,他引:45       下载免费PDF全文
The gene encoding an extracellular metalloproteinase from Serratia sp. E-15 has been cloned, and its complete nucleotide sequence determined. The amino acid sequence deduced from the nucleotide sequence reveals that the mature protein of the Serratia protease consists of 470 amino acids with a molecular weight of 50,632. The G+C content of the coding region for the mature protein is 58%; this high G+C content is due to a marked preference for G+C bases at the third position of the codons. The gene codes for a short pro-peptide preceding the mature protein. The Serratia protease gene was expressed in Escherichia coli and Serratia marcescens; the former produced the Serratia protease in the cells and the latter in the culture medium. Three zinc ligands and an active site of the Serratia protease were predicted by comparing the structure of the enzyme with those of thermolysin and Bacillus subtilis neutral protease.  相似文献   

15.
MOTIVATION: MELTSIM is a windows-based statistical mechanical program for simulating melting curves of DNAs of known sequence and genomic dimensions under different conditions of ionic strength with great accuracy. The program is useful for mapping variations of base compositions of sequences, conducting studies of denaturation, establishing appropriate conditions for hybridization and renaturation, determinations of sequence complexity, and sequence divergence. RESULTS: Good agreement is achieved between experimental and calculated melting curves of plasmid, bacterial, yeast and human DNAs. Denaturation maps that accompany the calculated curves indicate non-coding regions have a significantly lower (G+C) composition than coding regions in all species examined. Curves of partially sequenced human DNA suggest the current database may be heavily biased with coding regions, and excluding large (A+T)-rich elements. AVAILABILITY: MELTSIM 1.0 is available at: //www.uml.edu/Dept/Chem/UMLBIC/Apps/MEL TSIM/MELTSIM-1.0-Win/meltsim. zip. Melting curve plots in this paper were made with GNUPLOT 3.5, available at: http://www.cs.dartmouth.edu/gnuplot_inf o.html Contact : blake@maine.maine.edu;  相似文献   

16.
Human protein C is a vitamin K-dependent plasma protein that serves as a feedback down-regulator of the coagulation cascade by specifically degrading the protein cofactors VIIIa and Va. The protein C precursor consists of the following domains: leader peptide, "gla" region, two epidermal growth factor segments, and the activation peptide/serine protease. Comparison of amino acid sequences reveals that protein C and factor IX are homologous. A comparison of the genes for protein C and factor IX shows that all seven of the introns within the protein coding regions are in identical positions and correspond to protein structure-function domain boundries. However, the base compositions of the two genes (coding and noncoding regions) are remarkably different: approximately 60% guanine + cytosine (G + C) for protein C versus approximately 40% G + C for factor IX. One possible explanation for this phenomenon is that the factor IX gene (located on the X chromosome) has undergone extensive deoxycytosine methylation and subsequent spontaneous deamination mutagenesis, resulting in a net C to thymine (and G to adenine) transition. This would suggest that the protein C gene may represent a more primitive form of the gene duplication precursor.  相似文献   

17.
A Yasui  S A Langeveld 《Gene》1985,36(3):349-355
A cloned fragment of Saccharomyces cerevisiae chromosomal DNA carrying the photoreactivation gene (PHR) has been sequenced. The fragment contains a 1695-bp intronless open reading frame (ORF) coding for a polypeptide of 564 amino acids (aa). The phr gene of Escherichia coli was also sequenced, and the sequence is in agreement with the published data. The yeast PHR gene has a G + C content of 36.2%, whereas 53.7% was found for the E. coli gene. Despite the difference in G + C content there is a 35% homology between the deduced aa sequences. This homology suggests that both genes have originated from a common ancestral gene.  相似文献   

18.
In recent years, the amount of molecular sequencing data from Tetrahymena thermophila has dramatically increased. We analyzed G + C content, codon usage, initiator codon context and stop codon sites in the extremely A + T rich genome of this ciliate. Average G + C content was 38% for protein coding regions, 21% for 5' non-coding sequences, 19% for 3' non-coding sequences, 15% for introns, 19% for micronuclear limited sequences and 17% for macronuclear retained sequences flanking micronuclear specific regions. The 75 available T. thermophila protein coding sequences favored codons ending in T and, where possible, avoided those with G in the third position. Highly expressed genes were relatively G + C-rich and exhibited an extremely biased pattern of codon usage while developmentally regulated genes were more A + T-rich and showed less codon usage bias. Regions immediately preceding Tetrahymena translation initiator codons were generally A-rich. For the 60 stop codons examined, the frequency of G in the end + 1 site was much higher than expected whereas C never occupied this position.  相似文献   

19.
Summary Unrelated organisms with DNA of extreme G + C content (25% or 70%) are found to share very specific patterns of nearest neighbour base doublet frequency in their DNAs. This is shown to be a result of restrictions on the extremity of amino acid composition in their proteins, combined with a maximisation of the use of one type of base pair in redundant codon positions. Inferences are made about the universal nature of the genetic code and the proportion of DNA used for specifying protein in different species. The composition of coding DNA strands in these organisms is also discussed.  相似文献   

20.
We report the analysis of three open reading frames of Salmonella typhimurium LT2 which we identified as rfaF, the structural gene for ADP-heptose:LPS heptosyltransferase II; rfaD, the structural gene for ADP-L-glycero-D-manno-heptose-6-epimerase; and part of kbl, the structural gene for 2-amino-3-ketobutyrate CoA ligase. A plasmid carrying rfaF complements an rfaF mutant of S. typhimurium; rfaD and kbl are homologous to and in the same location as the equivalent genes in Escherichia coli K-12. The RfaF (heptosyl transferase II) protein shares regions of amino acid homology with RfaC (heptosyltransferase I), RfaQ (postulated to be heptosyltransferase III), and KdtA (ketodeoxyoctonate transferase), suggesting that these regions function in heptose binding. E. coli contains a block of DNA of about 1,200 bp between kbl and rfaD which is missing from S. typhimurium. This DNA includes yibB, which is an open reading frame of unknown function, and two promoters upstream of rfaD (P3, a heat-shock promoter, and P2). Both S. typhimurium and E. coli rfaD genes share a normal consensus promoter (P1). We postulate that the yibB segment is an insertion into the line leading to E. coli from the common ancestor of the two genera, though it could be a deletion from the line leading to S. typhimurium. The G+C content of the rfaLKZYJI genes of both S. typhimurium LT2 and E. coli K-12 is about 35%, much lower than the average of enteric bacteria; if this low G+C content is due to lateral transfer from a source of low G+C content, it must have occurred prior to evolutionary divergence of the two genera.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号