首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Summary To investigate the dependence of protein composition on DNA base composition, a set of data on individual proteins with known amino acid compositions from a spectrum of bacterial species has been compiled. It is found that similar relationships of amino acid frequency to G + C content exist for these proteins as for the bulk proteins studied by Sueoka (1961). The data are analysed by linear and cubic regression, and a measure of the proportions of A + T-rich and G + C-rich codons in the underlying messenger RNAs is put forward. The theoretical limits on the G + C content of coding DNA are discussed, and inference are made about the various selective forces acting on DNAs of different G + C contents.  相似文献   

2.
The evolution of DNA base composition evolution is simplified to a six-parameter model when there are no strand biases for mutation and selection. We analyzed the dynamics of this model with special attention to the influence of a change in substitution rates. The G + C content of the DNA sequence tends to an equilibrium value that is controlled by four parameters of the model. When the substitution rates are not constant, the G + C equilibrium position is not constant. The DNA sequence base frequencies always tend to a state in which A = T and G = C within a strand, regardless of substitution rates. This is true even when the substitution rates are not constant over time. This provides a simple way of rejecting the model from inspection of present-day DNA base composition.  相似文献   

3.
A novel method to calculate the G+C content of genomic DNA sequences.   总被引:2,自引:0,他引:2  
The base composition of a DNA fragment or genome is usually measured by the proportion of A+T or G+C in the sequence. The G+C content along genomic sequences is usually calculated using an overlapping or non-overlapping sliding window method. The result and accuracy of such an approach depends on the size of the window and the moving distance adopted. In this paper, a novel windowless technique to calculate the G+C content of genomic sequences is proposed. By this method, the G+C content can be calculated at different "resolution". In an extreme case, the G+C content may be computed at a specific point, rather than in a window of finite size. This is particularly useful to analyze the fine variation of base composition along genomic sequences. As the first example, the variation of G+C content along each of 16 yeast chromosomes is analyzed. The G+C-rich regions with length larger than 5 kb sequences are detected and listed in details. It is found that each chromosome consists of several G+C-rich and G+C-poor regions alternatively, i.e., a mosaic structure. Another example is to analyze the G+C content for each of the two chromosomes of the Vibrio cholerae genome. Based on the variations of the G+C content in each chromosome, it is shown that some fragments in the Vibrio cholerae genome may have been transferred from other species. Especially, the position and size of the large integron island on the smaller chromosome was precisely predicted. This method would be a useful tool for analyzing genomic sequences.  相似文献   

4.
Summary One hundred twelve human DNA sequences were analyzed with respect to dinucleotide frequency and amino acid composition. The variation in guanine and cytosine (G+C) content revealed: (1) at 2–3 and 3-1 doublet positions CG discrimination is attenuated at high G+C, but TA disfavor is enhanced, and (2) several amino acids are subject to G+C change. These findings have been reported in part for collections of sequences from various species. The present study confirms that in a single organism-the human-the G+C effects do exist. Aspects of the argument that connects G+C with protein thermal stability are also discussed.  相似文献   

5.
Base composition is not uniform across the genome of Drosophila melanogaster. Earlier analyses have suggested that there is variation in composition in D. melanogaster on both a large scale and a much smaller, within-gene, scale. Here we present analyses on 117 genes which have reliable intron/exon boundaries and no known alternative splicing. We detect significant heterogeneity in G+C content among intron segments from the same gene, as well as a significant positive correlation between the intron and the third codon position G+C content within genes. Both of these observations appear to be due, in part, to an overall decline in intron and third codon position G+C content along Drosophila genes with introns. However, there is also evidence of an increase in third codon position G+C content at the start of genes; this is particularly evident in genes without introns. This is consistent with selection acting against preferred codons at the start of genes. Received: 24 February 1997 / Accepted: 10 November 1997  相似文献   

6.
Sueoka N  Kawanishi Y 《Gene》2000,261(1):53-62
The human genome, as in other eukaryotes, has a wide heterogeneity in the DNA base composition. The evolutionary basis for this heterogeneity has been unknown. A previous study of the human genome (846 genes analyzed) has shown that, in the major range of the G+C content in the third codon position (0.25-0.75), biases from the Parity Rule 2 (PR2) among the synonymous codons of the four-codon amino acids are similar except in the highest G+C range (Sueoka, N., 1999. Translation-coupled violation of Parity Rule 2 in human genes is not the cause of heterogeneity of the DNA G+C content of third codon position. Gene 238, 53-58.). PR2 is an intra-strand rule where A=T and G=C are expected when there are no biases between the two complementary strands of DNA in mutation and selection rates (substitution rates). In this study, 14,026 human genes were analyzed. In addition, the third codon positions of two-codon amino acids were analyzed. New results show the following: (a) The G+C contents of the third codon position of human genes are scattered in the G+C range of 0.22-0.96 in the third codon position. (b) The PR2 biases are similar in the range of 0.25-0.75, whereas, in the high G+C range (0.75-0.96; 13% of the genes), the PR2-bias fingerprints are different from those of the major range. (c) Unlike the PR2 biases, the G+C contents of the third codon position for both four-codon and two-codon amino acids are all correlated almost perfectly with the G+C content of the third codon position over the total G+C ranges. These results support the notion that the directional mutation pressure, rather than the directional selection pressure, is mainly responsible for the heterogeneity of the G+C content of the third codon position.  相似文献   

7.
A new procedure for the determination of the percentage guanine plus cytosine (% G+C; mol/100 mol) values of microquantities of DNA is described. Its principle is a DNA-polymerase-I-directed nick translation of DNA in the presence of dGTP, dTTP, [3H]dCTP, and [alpha-32P]dATP. Kinetics experiments indicate that the plateau value is reached in about 20 min of incubation under our experimental conditions. Percentage G+C is obtained from the linear relation 1/(% G+C) = 0.01 K [32P]/[3H] + 0.01, where the ratio of trichloroacetic-acid-precipitable radioactivity is taken into account, the K value being determined for each experiment by using a few reference DNAs of known composition. This procedure has proven suitable for analysis of plasmidic, viral and cellular DNAs of different base composition (25-75% G+C), shape (linear and circular double-stranded DNA) and size (100-150 000 base pairs). Usual methods for % G+C analysis (buoyant density and melting temperature determinations) yield unreliable results in the presence of either modified or unusual bases: the double-labeling procedure is still valid under these conditions. The latter is, therefore, the method of choice for analysis or rare DNA species which are available in very small quantities (it requires amounts of DNA as low as 1 ng, i.e. several order of magnitude lower than those used for chromatographic analysis of DNA hydrolysates). Since the obtention of highly purified DNA is an essential prerequisite for the double-labeling procedure, a method for purification of bacterial DNA is detailed in the present work.  相似文献   

8.
9.
The number of distinct functional classes of single-stranded RNAs (ssRNAs) and the number of sequences representing them are substantial and continue to increase. Organizing this data in an evolutionary context is essential, yet traditional comparative sequence analyses require that homologous sites can be identified. This prevents comparative analysis between sequences of different functional classes that share no site-to-site sequence similarity. Analysis within a single evolutionary lineage also limits evolutionary inference because shared ancestry confounds properties of molecular structure and function that are historically contingent with those that are imposed for biophysical reasons. Here, we apply a method of comparative analysis to ssRNAs that is not restricted to homologous sequences, and therefore enables comparison between distantly related or unrelated sequences, minimizing the effects of shared ancestry. This method is based on statistical similarities in nucleotide base composition among different functional classes of ssRNAs. In order to denote base composition unambiguously, we have calculated the fraction G+A and G+U content, in addition to the more commonly used fraction G+C content. These three parameters define RNA composition space, which we have visualized using interactive graphics software. We have examined the distribution of nucleotide composition from 15 distinct functional classes of ssRNAs from organisms spanning the universal phylogenetic tree and artificial ribozymes evolved in vitro. Surprisingly, these distributions are biased consistently in G+A and G+U content, both within and between functional classes, regardless of the more variable G+C content. Additionally, an analysis of the base composition of secondary structural elements indicates that paired and unpaired nucleotides, known to have different evolutionary rates, also have significantly different compositional biases. These universal compositional biases observed among ssRNAs sharing little or no sequence similarity suggest, contrary to current understanding, that base composition biases constitute a convergent adaptation among a wide variety of molecular functions.  相似文献   

10.
We studied the relationship between ten authentic Campylobacter fetus and two C. bubulus strains, and seventeen named vibrios of animal and human origin. All organisms fall within the genus Campylobacter, as defined. Their DNA base composition ranges from 29.8 to 35.9% (G+C). On the basis of similarity of differential biochemical tests and % (G+C), they can be divided into four closely similar groups.
  1. C. fetus var. intestinalis and var. venerialis with 34.7 to 35.9% (G+C).
  2. Vibrio fecalis with 32.0 to 32.8% (G+C).
  3. C. bubulus with 30.1 and 30.6% (G+C).
  4. The cluster V. jejuni, V. coli, Vibrio sp. from fowl, and related vibrios from man and sheep with 29.8 to 34.0% (G+C).
We propose that all the vibrios used here be included in the genus Campylobacter.  相似文献   

11.
The susceptibility to recombination of a plasmid inserted into a chromosome varies with its genomic position. This recombination position effect is known to correlate with the average G+C content of the flanking sequences. Here we propose that this effect could be mediated by changes in the susceptibility to superhelical duplex destabilization that would occur. We use standard nonparametric statistical tests, regression analysis and principal component analysis to identify statistically significant differences in the destabilization profiles calculated for the plasmid in different contexts, and correlate the results with their measured recombination rates. We show that the flanking sequences significantly affect the free energy of denaturation at specific sites interior to the plasmid. These changes correlate well with experimentally measured variations of the recombination rates within the plasmid. This correlation of recombination rate with superhelical destabilization properties of the inserted plasmid DNA is stronger than that with average G+C content of the flanking sequences. This model suggests a possible mechanism by which flanking sequence base composition, which is not itself a context-dependent attribute, can affect recombination rates at positions within the plasmid.  相似文献   

12.
The DNA base composition and other characteristics of 54 bacterial cultures isolated from refrigerated foods, human clinical specimens, and other sources, designated asP. putrefaciens by one or more investigators were determined. On the basis of % G + C content and ability to grow in 6.0% NaCl two species were clearly distinguished. The first, consists of 37 isolates all but one incapable of growth in 6.0% NaCl. The DNA composition of these isolates falls within the range of 47.8 to 50.8% G + C with a mean of 49.5 ± 0.7% G + C from Tm determinations. This group contains all fishery and dairy isolates studied, and several isolates from human clinical specimens. The second species consists of 16 isolates, 13 of human clinical origin, 2 from meat products, and 1 from soil. All members of this second species grow readily in 6.0% NaCl and the composition of their DNA falls within the range of 55.9–59.0% G + C with a mean of 57.6 ± 0.65% G + C from Tm determinations. This investigation was supported by Public Health Service grant FD 00153-05 from the Food and Drug Administration. The author wishes to acknowledge helpful correspondence received during the course of this study from G. Gilardi, R. Hugh, M. Mandel, A. von Graevenitz, and R. Weaver.  相似文献   

13.
The effect of DNA base composition on the kinetics of the association between DNA and proflavine has been investigated using the temperature jump relaxation method. It is found that, regardless of the G + C base composition the results fit a two step mechanism, the second of which exhibits characteristics of intercalation of proflavine into DNA. However, they two equilibrium constants corresponding to these steps, KI and KII, depend on the nature of the DNAs. The constant KI is found to be an order of magnitude greater for M. lysodeikticus DNA (72% G + C) than for calf thymus DNA (48% G + C). Increasing G-C content thus appears to favor the intermediate non-intercalated complex of proflavine with DNA. Methylation of M. lysodeikticus DNA with dimethyl sulfate, preferentially yielding N7 methyl guanine as the modified base, again leads to an apparent two step mechanism, with the value of KI unchanged with respect to untreated DNA, while the affinity of proflavine for the intercalated complex measured by the value of KII increases for methylated DNA.  相似文献   

14.
Estimates of nuclear DNA base pair composition by determination of thermal denaturation temperatures (Tm) indicated guanine + cytosine (G + C) levels of 35–56% for 17 species of marine green algae. Tm values were found to be reproducible with coefficients of variation among samples and replicates of generally less than 1 percent. G + C % values in four species of Enteromorpha varied within a narrow range of 53–56%, whereas values for three species of Ulva showed substantially greater variation, ranging from 35–55%. Ulva fasciata collections from two geographically separate North Carolina sites had mean G + C composition of 44.8 and 35.6 respectively, suggesting that these populations may be genetically distinct. Enteromorpha linza, which has been treated as a species of Ulva, had a G + C composition of 53.2, typical of the Enteromorpha species investigated. Nuclear DNA base pair composition data for species of Cladophorales and Caulerpales are given as well.Center for Marine Science Research, UNC-W contribution No. 009.  相似文献   

15.
The genomic as well as structural relationship of phycobiliproteins (PBPs) in different cyanobacterial species are determined by nucleotides as well as amino acid composition. The genomic GC constituents influence the amino acid variability and codon usage of particular subunit of PBPs. We have analyzed 11 cyanobacterial species to explore the variation of amino acids and causal relationship between GC constituents and codon usage. The study at the first, second and third levels of GC content showed relatively more amino acid variability on the levels of G3 + C3 position in comparison to the first and second positions. The amino acid encoded GC rich level including G rich and C rich or both correlate the codon variability and amino acid availability. The fluctuation in amino acids such as Arg, Ala, His, Asp, Gly, Leu and Glu in α and β subunits was observed at G1C1 position; however, fluctuation in other amino acids such as Ser, Thr, Cys and Trp was observed at G2C2 position. The coding selection pressure of amino acids such as Ala, Thr, Tyr, Asp, Gly, Ile, Leu, Asn, and Ser in α and β subunits of PBPs was more elaborated at G3C3 position. In this study, we observed that each subunit of PBPs is codon specific for particular amino acid. These results suggest that genomic constraint linked with GC constituents selects the codon for particular amino acids and furthermore, the codon level study may be a novel approach to explore many problems associated with genomics and proteomics of cyanobacteria.  相似文献   

16.
Codon usage and base composition in sequences from the A + T-rich genome ofRickettsia prowazekii, a member of the alpha Proteobacteria, have been investigated. Synonymous codon usage patterns are roughly similar among genes, even though the data set includes genes expected to be expressed at very different levels, indicating that translational selection has been ineffective in this species. However, multivariate statistical analysis differentiates genes according to their G + C contents at the first two codon positions. To study this variation, we have compared the amino acid composition patterns of 21R. prowazekii proteins with that of a homologous set of proteins fromEscherichia coli. The analysis shows that individual genes have been affected by biased mutation rates to very different extents: genes encoding proteins highly conserved among other species being the least affected. Overall, protein coding and intergenic spacer regions have G + C content values of 32.5% and 21.4%, respectively. Extrapolation from these values suggests thatR. prowazekii has around 800 genes and that 60–70% of the genome may be coding. Correspondence to: S.G.E. Andersson  相似文献   

17.
Summary We have investigated the relationship between the G + C content of silent (synonymous) sites in codons and the amino acid composition of encoded proteins for approximately 1,600 human genes. There are positive correlations between silent site G + C and the proportions of codons for Arg, Pro, Ala, Trp, His, Gln, and Leu and negative ones for Tyr, Phe, Asn, Ile, Lys, Asp, Thr, and Glu. The median proteins coded by groups of genes that differ in silent-site G + C content also differ in amino acid composition, as do some proteins coded by homologous genes. The pattern of compositional change can be largely explained by directional mutation pressure, the genetic code, and differences in the frequencies of accepted amino acid substitutions; the shifts in protein composition are likely to be selectively neutral.Offprint requests to: D.W. Collins  相似文献   

18.
Correspondence analysis of 28 proteomes selected to span the entire realm of prokaryotes revealed universal biases in the proteins' amino acid distribution. Integral Inner Membrane Proteins always form an individual cluster, which can then be used to predict protein localisation in unknown proteomes, independently of the organism's biotope or kingdom. Orphan proteins are consistently rich in aromatic residues. Another bias is also ubiquitous: the amino acid composition is driven by the G + C content of the first codon position. An unexpected bias is driven, in many proteomes, by the AAN box of the genetic code, suggesting some functional biochemical relationship between asparagine and lysine. Less-significant biases are driven by the rare amino acids, cysteine and tryptophan. Some allow identification of species-specific functions or localisation such as surface or exported proteins. Errors in genome annotations are also revealed by correspondence analysis, making it useful for quality control and correction.  相似文献   

19.
A Eyre-Walker 《Genetics》1999,152(2):675-683
It has been suggested that mutation bias is the major determinant of base composition bias at synonymous, intron, and flanking DNA sites in mammals. Here I test this hypothesis using population genetic data from the major histocompatibility genes of several mammalian species. The results of two tests are inconsistent with the mutation hypothesis in coding, noncoding, CpG-island, and non-CpG-island DNA, but are consistent with selection or biased gene conversion. It is argued that biased gene conversion is unlikely to affect silent site base composition in mammals. The results therefore suggest that selection is acting upon silent site G + C content. This may have broad implications, since silent site base composition reflects large-scale variation in G + C content along mammalian chromosomes. The results therefore suggest that selection may be acting upon the base composition of isochores and large sections of junk DNA.  相似文献   

20.
Correlation was positive between the G + C content at the codon third position in genes of vertebrates and the G + C content of the genome portion surrounding each gene. Exons of genes with a high G + C% at the codon 3rd position are surrounded by G + C-rich introns and G + C-rich flanking sequences, and those with a low G + C% at the position by A + T-rich introns and flanking sequences. Analysis of G + C content distribution along DNA sequences using a DNA Sequence Data Bank supported the view that the vertebrate genome is a mosaic of regions with clear differences in their G + C content. The biological significance of the variation in G + C content throughout the vertebrate genome is discussed in connection with chromosomal banding.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号