首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 484 毫秒
1.
Two years ago, we showed that positive correlations between optimal growth temperature (T(opt)) and genome GC are observed in 15 out of the 20 families of prokaryotes we analyzed, thus indicating that "T(opt) is one of the factors that influence genomic GC in prokaryotes". Our results were disputed, but these criticisms were demonstrated to be mistaken and based on misconceptions. In a recent report, Wang et al. [H.C. Wang, E. Susko, A.J. Roger, On the correlation between genomic G+C content and optimal growth temperature in prokaryotes: data quality and confounding factors, Biochem. Biophys. Res. Commun. 342 (2006) 681-684] criticize our results by stating that "all previous simple correlation analyses of GC versus temperature have ignored the fact that genomic GC content is influenced by multiple factors including both intrinsic mutational bias and extrinsic environmental factors". This statement, besides being erroneous, is surprising because it applies in fact not to ours but to the authors' article. Here, we rebut the points raised by Wang et al. and review some issues that have been a matter of debate, regarding the influence of environmental factors upon GC content in prokaryotes. Furthermore, we demonstrate that the relationship that exists between genome size and GC level is valid for aerobic, facultative, and microaerophilic species, but not for anaerobic prokaryotes.  相似文献   

2.
Regarding the existence of any specific correlation between optimal growth temperature and genomic GC levels, Musto et al. [FEBS Lett. 573 (2004) 73] have recently performed analysis on 20 prokaryotic families and showed that in most of the families there exists a positive correlation between these two parameters. On the basis of these results they claimed that optimal growth temperature is one of the factors that influence genomic GC composition in prokaryotes. In a subsequent article, Marashi and Ghalanbor [Biochem. Biophys. Res. Commun. 325 (2004) 381] have demonstrated that the correlation values change substantially when very few points in some of the families were excluded from the data set of Musto et al. [FEBS Lett. 573 (2004) 73]. But Marashi and Ghalanbor have not provided any reason behind this. The points excluded by Marashi and Ghalanbor are actually the outliers in the data set, which strongly affect the correlation coefficients. But the presence of outliers in large data set hardly had any effect on the correlation values. Marashi and Ghalanbor have excluded points from only those families that have small sample sizes and observed a substantial change in correlation coefficient values. Therefore, we argue that any conclusion drawn for a small sample size having outliers is always questionable. Although Musto's approach is a novel one, but to make any generalization one needs to be careful about the flawlessness in the data set.  相似文献   

3.
The correlation between genomic G+C content and optimal growth temperature in prokaryotes has gained renewed interest after Musto et al. [H. Musto, H. Naya, A. Zavala, H. Romero, F. Alvarex-Valin, G. Bernardi, Correlations between genomic GC levels and optimal growth temperatures in prokaryotes, FEBS Lett. 573 (2004) 73-77], reported that positive correlations exist in 15 families studied. We have reanalyzed their data and found that when genome size and data quality were adjusted for, there was no significant evidence of relationship between optimal temperature and GC content for two of the families that had previously shown strongly significant correlations. Using updated temperature optima for Halobacteriaceae species we found the correlation is insignificant in this family. For the family Enterobacteriaceae when genome size and optimal temperature are included in a multiple linear regression, only genome size is significant as a predictor of GC content. We showed that more profound statistical methods than simple two factor correlation analysis should be used for analyzing complex intrinsic and extrinsic factors that affect genomic GC content. We further found that a positive correlation between temperature and genomic GC is only evident in free-living species of low optimal growth temperatures.  相似文献   

4.
When the amino acid usage of all completely sequenced prokaryotes is studied by multivariate analysis (MVA), it is known that the genomic molar content of guanine plus cytosine (GC) and optimal growth temperature (Topt) have a dominant effect. Furthermore, these two factors are associated to the first two axes of different MVA, and thus, nearly independent among them. However, it was recently shown that for several Families of prokaryotes there are significant and positive correlations between GC and Topt. This trend is particularly clear within Bacillaceae, where there are species displaying a broad range of variations for these two factors. In this paper we report that (a) Topt and genomic GC are the main factors shaping amino acid usage but are not independent between them, (b) the usage of cysteine is the second source of variability, and finally (c) the global hydrophobicity of the encoded proteins of each species is the third main factor.  相似文献   

5.
In prokaryotes, GC levels range from 25% to 75%, and Topt from approximately 0 degrees C to >100 degrees C. When all species are considered together, no correlation is found between the two variables. Correlations are found, however, when Families of prokaryotes are analysed. Indeed, when Families comprising at least 10 species were studied (a set of 20 Families), positive correlations are found for 15 of them. Furthermore, a comparative analysis by independent contrasts made within the Families in order to control for phylogenetic non-independence showed qualitatively equivalent results. We conclude that Topt is one of the factors that influences genomic GC in prokaryotes.  相似文献   

6.
Musto et al. [FEBS Lett. 573 (2004) 73] studied the correlations between GC levels and optimal growth temperatures in 20 prokaryotic families. They reported that positive correlations are generally observed, and many of these are significant. Here, we have shown that these correlations are not "robust," i.e., correlation coefficients and/or significance of correlations can be considerably influenced by exclusion of very few (even as small as one) species from each dataset. The sensitivity of correlations is assumed as a result of high levels of bias in the family datasets. We concluded that solely based on these data, one cannot establish that GC contents of prokaryotic genomes increase as a result of growth temperature increments.  相似文献   

7.
The causes of the variation between genomes in their guanine (G) and cytosine (C) content is one of the central issues in evolutionary genomics. The thermal adaptation hypothesis conjectures that, as G:C pairs in DNA are more thermally stable than adenonine:thymine pairs, high GC content may he a selective response to high temperature. A compilation of data on genomic GC content and optimal growth temperature for numerous prokaryotes failed to demonstrate the predicted correlation. By contrast, the GC content of Structural RNAs is higher at high temperatures. The issue that we address here is whether more freely evolving sites in exons (i.e. codonic third positions) evolve in the same manner as genomic DNA as a whole, Showing no correlated response, or like structural RNAs showing a strong correlation. The latter pattern would provide strong support for the thermal adaptation hypothesis, as the variation in GC content between orthologous genes is typically most profoundly seen at codon third sites (GC3). Simple analysis of completely sequenced prokaryotic genomes shows that GC3, but not genomic GC, is higher on average in thermophilic species. This demonstrates, if nothing else, that the results from the two measures cannot be presumed to be the same. A proper analysis, however, requires phylogenetic control. Here, therefore, we report the results of a comparative analysis of GC composition and optimal growth temperature for over 100 prokaryotes. Comparative analysis fails to show, in either Archea or Eubacteria, any hint of connection between optimal growth temperature and GC content in the genome as a whole, in protein-coding regions or, more crucially at GC. Conversely, comparable analysis confirms that GC content of structural RNA is strongly correlated with optimal temperature. Against the expectations of the thermal adaptation hypothesis, within prokaryotes GC content in protein-coding genies, even at relatively freely evolving sites, cannot be considered an adaptation to the thermal environment.  相似文献   

8.
One of the historic debates in molecular evolution concerns the strong variation in the genomic guanine–cytosine (GC) content of prokaryotes, which ranges from approximately 20–75%: Is this factor selectively neutral, or is it the result of natural selection? In a previous article published by our group, we showed that inside well-defined taxonomic groups of prokaryotes, strictly aerobic organisms tend to display higher genomic GC levels than strictly anaerobic species. In the present study, we examined the GC content of fragments of DNA obtained from microbial communities along a well-defined environmental gradient: a 4,000-m vertical profile in the North Pacific subtropical gyre. The patterns of GC distribution might be associated with oxygen concentrations in the seawater column. These results give further support to the link between a physiologic trait (aerobic respiration) and genomic GC content.  相似文献   

9.
The GC contents of 2670 prokaryotic genomes that belong to diverse phylogenetic lineages were analyzed in this paper. These genomes had GC contents that ranged from 13.5% to 74.9%. We analyzed the distance of base frequencies at the three codon positions, codon frequencies, and amino acid compositions across genomes with respect to the differences in the GC content of these prokaryotic species. We found that although the phylogenetic lineages were remote among some species, a similar genomic GC content forced them to adopt similar base usage patterns at the three codon positions, codon usage patterns, and amino acid usage patterns. Our work demonstrates that in prokaryotic genomes: a) base usage, codon usage, and amino acid usage change with GC content with a linear correlation; b) the distance of each usage has a linear correlation with the GC content difference; and c) GC content is more essential than phylogenetic lineage in determining base usage, codon usage, and amino acid usage. This work is exceptional in that we adopted intuitively graphic methods for all analyses, and we used these analyses to examine as many as 2670 prokaryotes. We hope that this work is helpful for understanding common features in the organization of microbial genomes.  相似文献   

10.
The "universal correlation" (D'Onofrio, G., Bernardi, G., 1992. A universal compositional correlation among codon positions. Gene 110, 81-88.) that holds between and or ( values are the average values of the coding sequences of each genome analyzed) at both the inter- and intra-genomic level, was re-analyzed on a vastly larger dataset. The results showed a slight, but significant, difference in the vs. correlations exhibited by prokaryotes and eukaryotes. This finding prompted an analysis of the correlation between and the amino acid frequencies in the encoded proteins, which has shown that positive correlations exist between values of coding sequences and the hydropathy of the corresponding proteins. These correlations are due to the fact that hydrophobic and amphypathic amino acids increase, whereas hydrophilic amino acids decrease with increasing values. Hydropathy values of prokaryotic proteins are systematically higher than those of eukaryotes, but the slopes of the regression lines are identical. The lower hydrophobicity of eukaryotic proteins is due to differences in the amino acid composition. In particular, the twofold higher cysteine (and disulfide bond) level of eukaryotic proteins compared to prokaryotic proteins most probably compensates for their lower hydrophobicity. This supports the viewpoint that hydrophobicity plays a structural and functional role as far as protein stability is concerned.  相似文献   

11.
Palidwor GA  Perkins TJ  Xia X 《PloS one》2010,5(10):e13431

Background

In spite of extensive research on the effect of mutation and selection on codon usage, a general model of codon usage bias due to mutational bias has been lacking. Because most amino acids allow synonymous GC content changing substitutions in the third codon position, the overall GC bias of a genome or genomic region is highly correlated with GC3, a measure of third position GC content. For individual amino acids as well, G/C ending codons usage generally increases with increasing GC bias and decreases with increasing AT bias. Arginine and leucine, amino acids that allow GC-changing synonymous substitutions in the first and third codon positions, have codons which may be expected to show different usage patterns.

Principal Findings

In analyzing codon usage bias in hundreds of prokaryotic and plant genomes and in human genes, we find that two G-ending codons, AGG (arginine) and TTG (leucine), unlike all other G/C-ending codons, show overall usage that decreases with increasing GC bias, contrary to the usual expectation that G/C-ending codon usage should increase with increasing genomic GC bias. Moreover, the usage of some codons appears nonlinear, even nonmonotone, as a function of GC bias. To explain these observations, we propose a continuous-time Markov chain model of GC-biased synonymous substitution. This model correctly predicts the qualitative usage patterns of all codons, including nonlinear codon usage in isoleucine, arginine and leucine. The model accounts for 72%, 64% and 52% of the observed variability of codon usage in prokaryotes, plants and human respectively. When codons are grouped based on common GC content, 87%, 80% and 68% of the variation in usage is explained for prokaryotes, plants and human respectively.

Conclusions

The model clarifies the sometimes-counterintuitive effects that GC mutational bias can have on codon usage, quantifies the influence of GC mutational bias and provides a natural null model relative to which other influences on codon bias may be measured.  相似文献   

12.
The guanine/cytosine (GC) content of prokaryotic genomes is species-specific, taking values from 16% to 77%. This diversity of selection for GC content remains contentious. We analyse the correlations between GC content and a range of phenotypic and genotypic data in thousands of prokaryotes. GC content integrates well with these traits into r/K selection theory when phenotypic plasticity is considered. High GC-content prokaryotes are r-strategists with cheaper descendants thanks to a lower average amino acid metabolic cost, colonize unstable environments thanks to flagella and a bacillus form and are generalists in terms of resource opportunism and their defence mechanisms. Low GC content prokaryotes are K-strategists specialized for stable environments that maintain homeostasis via a high-cost outer cell membrane and endospore formation as a response to nutrient deprivation, and attain a higher nutrient-to-biomass yield. The lower proteome cost of high GC content prokaryotes is driven by the association between GC-rich codons and cheaper amino acids in the genetic code, while the correlation between GC content and genome size may be partly due to functional diversity driven by r/K selection. In all, molecular diversity in the GC content of prokaryotes may be a consequence of ecological r/K selection.  相似文献   

13.
The purine-loading index (PLI) is the difference between the numbers of purines (A+G) and pyrimidines (T+C) per kilobase of single-stranded nucleic acid. By purine-loading their mRNAs organisms may minimize unnecessary RNA–RNA interactions and prevent inadvertent formation of "self" double-stranded RNA. Since RNA–RNA interactions have a strong entropy-driven component, this need to minimize should increase as temperature increases. Consistent with this, we report for 550 prokaryotic species that optimum growth temperature is related to the average PLI of open reading frames. With increasing temperature prokaryotes tend to acquire base A and lose base C, while keeping bases T and G relatively constant. Accordingly, while the PLI increases, the (G+C)% decreases. The previously observed positive correlation between (G+C)% and optimum growth temperature, which applies to RNA species whose structure is of major importance for their function (ribosomal and transfer RNAs) does not apply to mRNAs, and hence is unlikely to apply generally to genomic DNA.Abbreviations CUTG Codon usage tables from GenBank - S Chargaff difference for the S bases ("GC skew") - W Chargaff difference for the W bases ("AT skew") - ORF Open reading frame - PLI Purine-loading indexCommunicated by F. Robb  相似文献   

14.
Zavala A  Naya H  Romero H  Sabbia V  Piovani R  Musto H 《Gene》2005,357(2):137-143
GC level is a key feature in prokaryotic genomes. Widely employed in evolutionary studies, new insights appear however limited because of the relatively low number of characterized genomes. Since public databases mainly comprise several hundreds of prokaryotes with a low number of sequences per genome, a reliable prediction method based on available sequences may be useful for studies that need a trustworthy estimation of whole genomic GC. As the analysis of completely sequenced genomes shows a great variability in distributional shapes, it is of interest to compare different estimators. Our analysis shows that the mean of GC values of a random sample of genes is a reasonable estimator, based on simplicity of the calculation and overall performance. However, usually sequences come from a process that cannot be considered as random sampling. When we analyzed two introduced sources of bias (gene length and protein functional categories) we were able to detect an additional bias in the estimation for some cases, although the precision was not affected. We conclude that the mean genic GC level of a sample of 10 genes is a reliable estimator of genomic GC content, showing comparable accuracy with many widely employed experimental methods.  相似文献   

15.
A quickly growing number of characteristics reflecting various aspects of gene function and evolution can be either measured experimentally or computed from DNA and protein sequences. The study of pairwise correlations between such quantitative genomic variables as well as collective analysis of their interrelations by multidimensional methods have delivered crucial insights into the processes of molecular evolution. Here, we present a principal component analysis (PCA) of 16 genomic variables from Saccharomyces cerevisiae, the largest data set analyzed so far. Because many missing values and potential outliers hinder the direct calculation of principal components, we introduce the application of Bayesian PCA. We confirm some of the previously established correlations, such as evolutionary rate versus protein expression, and reveal new correlations such as those between translational efficiency, phosphorylation density, and protein age. Although the first principal component primarily contrasts genomic change and protein expression, the second component separates variables related to gene existence and expressed protein functions. Enrichment analysis on genes affecting variable correlations unveils classes of influential genes. For example, although ribosomal and nuclear transport genes make important contributions to the correlation between protein isoelectric point and molecular weight, protein synthesis and amino acid metabolism genes help cause the lack of significant correlation between propensity for gene loss and protein age. We present the novel Quagmire database (Quantitative Genomics Resource) which allows exploring relationships between more genomic variables in three model organisms-Escherichia coli, S. cerevisiae, and Homo sapiens (http://webclu.bio.wzw.tum.de:18080/quagmire).  相似文献   

16.
Summary We have investigated the compositional properties of coding sequences from cold-blooded vertebrates and we have compared them with those from warm-blooded vertebrates. Moreover, we have studied the compositional correlations of coding sequences with the genomes in which they are contained, as well as the compositional correlations among the codon positions of the genes analyzed.The distribution of GC levels of the third codon positions of genes from cold-blooded vertebrates are distinctly different from those of warm-blooded vertebrates in that they do not reach the high values attained by the latter. Moreover, coding sequences from cold-blooded vertebrates are either equal, or, in most cases, lower in GC (not only in third, but also in first and second codon positions) than homologous coding sequences from warm-blooded vertebrates; higher values are exceptional. These results at the gene level are in agreement with the compositional differences between cold-blooded and warm-blooded vertebrates previously found at the whole genome (DNA) level (Bernardi and Bernardi 1990a,b).Two linear correlations were found: one between the GC levels of coding sequences (or of their third codon positions) and the GC levels of the genomes of cold-blooded vertebrates containing them; and another between the GC levels of third and first+ second codon positions of genes from cold-blooded vertebrates. The first correlation applies to the genomes (or genome compartments) of all vertebrates and the second to the genes of all living organisms. These correlations are tantamount to a genomic code.  相似文献   

17.
The personal genomics era has attracted a large amount of attention for anti-cancer therapy by patient-specific analysis. Patient-specific analysis enables discovery of individual genomic characteristics for each patient, and thus we can effectively predict individual genetic risk of disease and perform personalized anti-cancer therapy. Although the existing methods for patient-specific analysis have successfully uncovered crucial biomarkers, their performance takes a sudden turn for the worst in the presence of outliers, since the methods are based on non-robust manners. In practice, clinical and genomic alterations datasets usually contain outliers from various sources (e.g., experiment error, coding error, etc.) and the outliers may significantly affect the result of patient-specific analysis. We propose a robust methodology for patient-specific analysis in line with the NetwrokProfiler. In the proposed method, outliers in high dimensional gene expression levels and drug response datasets are simultaneously controlled by robust Mahalanobis distance in robust principal component space. Thus, we can effectively perform for predicting anti-cancer drug sensitivity and identifying sensitivity-specific biomarkers for individual patients. We observe through Monte Carlo simulations that the proposed robust method produces outstanding performances for predicting response variable in the presence of outliers. We also apply the proposed methodology to the Sanger dataset in order to uncover cancer biomarkers and predict anti-cancer drug sensitivity, and show the effectiveness of our method.  相似文献   

18.
G D'Onofrio  G Bernardi 《Gene》1992,110(1):81-88
We have investigated the compositional distributions of third codon positions of genes from the 16 prokaryotes and seven eukaryotes for which the largest numbers of coding sequences are available in data banks. In prokaryotes, both narrow and broad distributions were found. In eukaryotes, distributions were very broad (except for Saccharomyces cerevisiae) and remarkably different for different genomes. In low-GC genomes, third codon positions were lower in GC than first + second codon positions and trailed towards high GC; the opposite situation was found for high-GC genomes. In all genomes, first codon positions were higher in GC than second codon positions. We then investigated the compositional correlations between third and first + second codon positions in prokaryotic genomes (the 16 mentioned above plus 87 additional ones) and in genome compartments of eukaryotes. A general, common relationship was found, which also holds within the same (heterogeneous) genomes. This universal correlation is due to the fact that the relative effects of compositional constraints on different codon positions are the same, on the average, whatever the genome under consideration.  相似文献   

19.
Romero H  Zavala A  Musto H 《Gene》2000,242(1-2):307-311
It is widely accepted that the compositional pressure is the only factor shaping codon usage in unicellular species displaying extremely biased genomic compositions. This seems to be the case in the prokaryotes Mycoplasma capricolum, Rickettsia prowasekii and Borrelia burgdorferi (GC-poor), and in Micrococcus luteus (GC-rich). However, in the GC-poor unicellular eukaryotes Dictyostelium discoideum and Plasmodium falciparum, there is evidence that selection, acting at the level of translation, influences codon choices. This is a twofold intriguing finding, since (1) the genomic GC levels of the above mentioned eukaryotes are lower than the GC% of any studied bacteria, and (2) bacteria usually have larger effective population sizes than eukaryotes, and hence natural selection is expected to overcome more efficiently the randomizing effects of genetic drift among prokaryotes than among eukaryotes. In order to gain a new insight about this problem, we analysed the patterns of codon preferences of the nuclear genes of Entamoeba histolytica, a unicellular eukaryote characterised by an extremely AT-rich genome (GC = 25%). The overall codon usage is strongly biased towards A and T in the third codon positions, and among the presumed highly expressed sequences, there is an increased relative usage of a subset of codons, many of which are C-ending. Since an increase in C in third codon positions is 'against' the compositional bias, we conclude that codon usage in E. histolytica, as happens in D. discoideum and P. falciparum, is the result of an equilibrium between compositional pressure and selection. These findings raise the question of why strongly compositionally biased eukaryotic cells may be more sensitive to the (presumed) slight differences among synonymous codons than compositionally biased bacteria.  相似文献   

20.
Previous studies have reported a positive correlation between the GC content of the double-stranded regions of structural RNAs and the optimal growth temperature (OGT) in prokaryotes. These observations led to the hypothesis that natural selection favors an increase in GC content to ensure the correct folding and the structural stability of the molecule at high temperature. To date these studies have focused mainly on ribosomal and transfer RNAs. Therefore, we addressed the question of the relationship between GC content and OGT in a different and universally conserved structural RNA, the RNA component of the signal recognition particle (SRP). To this end we generated the secondary structures of SRP-RNAs for mesophilic, thermophilic, and hyperthermophilic bacterial and archaeal species. The analysis of the GC content in the stems and loops of the SRP-RNA of these organisms failed to detect a relationship between the GC contents in the stems of this structural RNA and the growth temperature of bacteria. By contrast, we found that in archaea the GC content in the stem regions of SRP-RNA is highest in hyperthermophiles, intermediate in thermophiles, and lower in mesophiles. In these organisms, we demonstrated a clear positive correlation between the GC content of the stem regions of their SRP-RNAs and their OGT. This correlation was confirmed by a phylogenetic nonindependence analysis. Thus we conclude that in archaea the increase in GC content in the stem regions of SRP-RNA is an adaptation response to environmental temperature.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号