首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
DNA, RNA and proteins are major biological macromolecules that coevolve and adapt to environments as components of one highly interconnected system. We explore here sequence/structure determinants of mechanisms of adaptation of these molecules, links between them, and results of their mutual evolution. We complemented statistical analysis of genomic and proteomic sequences with folding simulations of RNA molecules, unraveling causal relations between compositional and sequence biases reflecting molecular adaptation on DNA, RNA and protein levels. We found many compositional peculiarities related to environmental adaptation and the life style. Specifically, thermal adaptation of protein-coding sequences in Archaea is characterized by a stronger codon bias than in Bacteria. Guanine and cytosine load in the third codon position is important for supporting the aerobic life style, and it is highly pronounced in Bacteria. The third codon position also provides a tradeoff between arginine and lysine, which are favorable for thermal adaptation and aerobicity, respectively. Dinucleotide composition provides stability of nucleic acids via strong base-stacking in ApG dinucleotides. In relation to coevolution of nucleic acids and proteins, thermostability-related demands on the amino acid composition affect the nucleotide content in the second codon position in Archaea.  相似文献   

2.
Mimivirus is a nucleocytoplasmic large DNA virus (NCLDV) with a genome size (1.2 Mb) and coding capacity ( 1000 genes) comparable to that of some cellular organisms. Unlike other viruses, Mimivirus and its NCLDV relatives encode homologs of broadly conserved informational genes found in Bacteria, Archaea, and Eukaryotes, raising the possibility that they could be placed on the tree of life. A recent phylogenetic analysis of these genes showed the NCLDVs emerging as a monophyletic group branching between Eukaryotes and Archaea. These trees were interpreted as evidence for an independent "fourth domain" of life that may have contributed DNA processing genes to the ancestral eukaryote. However, the analysis of ancient evolutionary events is challenging, and tree reconstruction is susceptible to bias resulting from non-phylogenetic signals in the data. These include compositional heterogeneity and homoplasy, which can lead to the spurious grouping of compositionally-similar or fast-evolving sequences. Here, we show that these informational gene alignments contain both significant compositional heterogeneity and homoplasy, which were not adequately modelled in the original analysis. When we use more realistic evolutionary models that better fit the data, the resulting trees are unable to reject a simple null hypothesis in which these informational genes, like many other NCLDV genes, were acquired by horizontal transfer from eukaryotic hosts. Our results suggest that a fourth domain is not required to explain the available sequence data.  相似文献   

3.

Background  

Despite a large agreement between ribosomal RNA and concatenated protein phylogenies, the phylogenetic tree of the bacterial domain remains uncertain in its deepest nodes. For instance, the position of the hyperthermophilic Aquificales is debated, as their commonly observed position close to Thermotogales may proceed from horizontal gene transfers, long branch attraction or compositional biases, and may not represent vertical descent. Indeed, another view, based on the analysis of rare genomic changes, places Aquificales close to epsilon-Proteobacteria.  相似文献   

4.
We analyzed the nucleotide contents of several completely sequenced genomes, and we show that nucleotide bias can have a dramatic effect on the amino acid composition of the encoded proteins. By surveying the genes in 21 completely sequenced eubacterial and archaeal genomes, along with the entire Saccharomyces cerevisiae genome and two Plasmodium falciparum chromosomes, we show that biased DNA encodes biased proteins on a genomewide scale. The predicted bias affects virtually all genes within the genome, and it could be clearly seen even when we limited the analysis to sets of homologous gene sequences. Parallel patterns of compositional bias were found within the archaea and the eubacteria. We also found a positive correlation between the degree of amino acid bias and the magnitude of protein sequence divergence. We conclude that mutational bias can have a major effect on the molecular evolution of proteins. These results could have important implications for the interpretation of protein-based molecular phylogenies and for the inference of functional protein adaptation from comparative sequence data.  相似文献   

5.
6.
It is now well-established that compositional bias in DNA sequences can adversely affect phylogenetic analysis based on those sequences. Phylogenetic analyses based on protein sequences are generally considered to be more reliable than those derived from the corresponding DNA sequences because it is believed that the use of encoded protein sequences circumvents the problems caused by nucleotide compositional biases in the DNA sequences. There exists, however, a correlation between AT/GC bias at the nucleotide level and content of AT- and GC-rich codons and their corresponding amino acids. Consequently, protein sequences can also be affected secondarily by nucleotide compositional bias. Here, we report that DNA bias not only may affect phylogenetic analysis based on DNA sequences, but also drives a protein bias which may affect analyses based on protein sequences. We present a striking example where common phylogenetic tools fail to recover the correct tree from complete animal mitochondrial protein-coding sequences. The data set is very extensive, containing several thousand sites per sequence, and the incorrect phylogenetic trees are statistically very well supported. Additionally, neither the use of the LogDet/paralinear transform nor removal of positions in the protein alignment with AT- or GC-rich codons allowed recovery of the correct tree. Two taxa with a large compositional bias continually group together in these analyses, despite a lack of close biological relatedness. We conclude that even protein-based phylogenetic trees may be misleading, and we advise caution in phylogenetic reconstruction using protein sequences, especially those that are compositionally biased. Received: 19 February 1998 / Accepted: 28 August 1998  相似文献   

7.
Different statistical measures of bias of oligonucleotide sequences in DNA sequences were compared, both by theoretical analysis and according to their abilities to predict the relative abundances of oligonucleotides in the genome of Escherichia coli. The expected frequency of an oligonucleotide calculated from a maximal order Markov model was shown to be a degenerate case of the expected frequency calculated from biases of all subwords arising when noncontiguous subwords exhibit no bias. Since (at least in E. coli) noncontiguous sequences exhibit significant bias, the total compositional bias approach is expected to represent biases in genomic sequences more faithfully than Markov approaches. In fact, the efficacy of statistics based on Markov analysis even at the highest order were inferior in predicting actual frequencies of oligonucleotides to methods that factored out biases of internal subwords with gaps. Using total compositional bias as a measure of relative abundance, tetranucleotide and hexanucleotide palindromes were found to be distributed differently from nonpalindromic sequences, with their means shifted somewhat towards underrepresentation. A subpopulation of palindromic hexanucleotides, however, was highly underrepresented, and this group consisted almost entirely of targets for Type II restriction enzymes found within strains of E. coli. Sites recognized by Type I endonucleases from related strains were not markedly biased, and with pentanucleotides, palindromic and nonpalindromic sequences had nearly identical distributions. The loss of restriction sites may be explained by the free transfer of plasmids encoding restriction enzymes and episodic selection for the presence of the enzymes.  相似文献   

8.
The availability of hundreds of complete bacterial genomes has created new challenges and simultaneously opportunities for bioinformatics. In the area of statistical analysis of genomic sequences, the studies of nucleotide compositional bias and gene bias between strands and replichores paved way to the development of tools for prediction of bacterial replication origins. Only a few (about 20) origin regions for eubacteria and archaea have been proven experimentally. One reason for that may be that this is now considered as an essentially bioinformatics problem, where predictions are sufficiently reliable not to run labor-intensive experiments, unless specifically needed. Here we describe the main existing approaches to the identification of replication origin (oriC) and termination (terC) loci in prokaryotic chromosomes and characterize a number of computational tools based on various skew types and other types of evidence. We also classify the eubacterial and archaeal chromosomes by predictability of their replication origins using skew plots. Finally, we discuss possible combined approaches to the identification of the oriC sites that may be used to improve the prediction tools, in particular, the analysis of DnaA binding sites using the comparative genomic methods.  相似文献   

9.
MOTIVATION: Separation of protein sequence regions according to their local information complexity and subsequent masking of low complexity regions has greatly enhanced the reliability of function prediction by sequence similarity. Comparisons with alternative methods that focus on compositional sequence bias rather than information complexity measures have shown that removal of compositional bias yields at least as sensitive and much more specific results. Besides the application of sequence masking algorithms to sequence similarity searches, the study of the masked regions themselves is of great interest. Traditionally, however, these have been neglected despite evidence of their functional relevance. RESULTS: Here we demonstrate that compositional bias seems to be a more effective measure for the detection of biologically meaningful signals. Typical results on proteins are compared to results for sequences that have been randomized in various ways, conserving composition and local correlations for individual proteins or the entire set. It is remarkable that low-complexity regions have the same form of distribution in proteins as in randomized sequences, and that the signal from randomized sequences with conserved local correlations and amino acid composition almost matches the signal from proteins. This is not the case for sequence bias, which hence seems to be a genuinely biological phenomenon in contrast to patches of low complexity.  相似文献   

10.
The unique DNA topology and DNA topoisomerases of hyperthermophilic archaea   总被引:6,自引:0,他引:6  
Abstract: Hyperthermophilic archaea exhibit a unique pattern of DNA topoisomerase activities. They have a peculiar enzyme, reverse gyrase, which introduces positive superturns into DNA at the expense of ATP. This enzyme has been found in all hyperthermophiles tested so far (including Bacteria) but never in mesophiles. Reverse gyrases are formed by the association of a helicase-like domain and a 5'-type I DNA topoisomerase. These two domains might be located on the same polypeptide. However, in the methanogenic archaeon Methanopyrus kandleri , the topoisomerase domain is divided between two subunits. Besides reverse gyrase, Archaea contain other type I DNA topoisomerases; in particular, M. kandleri harbors the only known procaryotic 3'-type I DNA topoisomerase (Topo V). Hyperthermophilic archaea also exhibit specific type II DNA topoisomerases (Topo II), i.e. whereas mesophilic Bacteria have a Topo II that produces negative supercoiling (DNA gyrase), the Topo II from Sulfolobus and Pyrococcus lack gyrase activity and are the smallest enzymes of this type known so far. This peculiar pattern of DNA topoisomerases in hyperthermophilic archaea is paralleled by a unique DNA topology, i.e. whereas DNA isolated from Bacteria and Eucarya is negatively supercoiled, plasmidic DNA from hyperthermophilic archaea are from relaxed to positively supercoiled. The possible evolutionary implications of these findings are discussed in this review. We speculate that gyrase activity in mesophiles and reverse gyrase activity in hyperthermophiles might have originated in the course of procaryote evolution to balance the effect of temperature changes on DNA structure.  相似文献   

11.
Across evolution, the signal recognition particle pathway targets extra-cytoplasmic proteins to membranous translocation sites. Whereas the pathway has been extensively studied in Eukarya and Bacteria, little is known of this system in Archaea. In the following, membrane association of FtsY, the prokaryal signal recognition particle receptor, and SRP54, a central component of the signal recognition particle, was addressed in the halophilic archaea Haloferax volcanii. Purified H. volcanii FtsY, the FtsY C-terminal GTP-binding domain (NG domain) or SRP54, were combined separately or in different combinations with H. volcanii inverted membrane vesicles and examined by gradient floatation to differentiate between soluble and membrane-bound protein. Such studies revealed that both FtsY and the FtsY NG domain bound to H. volcanii vesicles in a manner unaffected by proteolytic pretreatment of the membranes, implying that in Archaea, FtsY association is mediated through the membrane lipids. Indeed, membrane association of FtsY was also detected in intact H. volcanii cells. The contribution of the NG domain to FtsY binding in halophilic archaea may be considerable, given the low number of basic charges found at the start of the N-terminal acidic domain of haloarchaeal FtsY proteins (the region of the protein thought to mediate FtsY-membrane association in Bacteria). Moreover, FtsY, but not the NG domain, was shown to mediate membrane association of H. volcanii SRP54, a protein that did not otherwise interact with the membrane.  相似文献   

12.
The diversity of archaea and bacteria was investigated in ten hot springs (elevation >4600 m above sea level) in Central and Central-Eastern Tibet using 16S rRNA gene phylogenetic analysis. The temperature and pH of these hot springs were 26-81°C and close to neutral, respectively. A total of 959 (415 and 544 for bacteria and archaea, respectively) clone sequences were obtained. Phylogenetic analysis showed that bacteria were more diverse than archaea and that these clone sequences were classified into 82 bacterial and 41 archaeal operational taxonomic units (OTUs), respectively. The retrieved bacterial clones were mainly affiliated with four known groups (i.e., Firmicutes, Proteobacteria, Cyanobacteria, Chloroflexi), which were similar to those in other neutral-pH hot springs at low elevations. In contrast, most of the archaeal clones from the Tibetan hot springs were affiliated with Thaumarchaeota, a newly proposed archaeal phylum. The dominance of Thaumarchaeota in the archaeal community of the Tibetan hot springs appears to be unique, although the exact reasons are not yet known. Statistical analysis showed that diversity indices of both archaea and bacteria were not statistically correlated with temperature, which is consistent with previous studies.  相似文献   

13.
Architectural proteins play an important role in compacting and organizing the chromosomal DNA in all three kingdoms of life (Eukarya, Bacteria and Archaea). These proteins are generally not conserved at the amino acid sequence level, but the mechanisms by which they modulate the genome do seem to be functionally conserved across kingdoms. On a generic level, architectural proteins can be classified based on their structural effect as DNA benders, DNA bridgers or DNA wrappers. Although chromatin organization in archaea has not been studied extensively, quite a number of architectural proteins have been identified. In the present paper, we summarize the knowledge currently available on these proteins in Crenarchaea. By the type of architectural proteins available, the crenarchaeal nucleoid shows similarities with that of Bacteria. It relies on the action of a large set of small, abundant and generally basic proteins to compact and organize their genome and to modulate its activity.  相似文献   

14.
To survive at high temperature, thermophile organisms must adapt their biomolecules. In both nucleic acids and proteins, this adaptation involves a vast array of compositional and structural modifications. The archaea stand out as the only group of organisms that have species capable of growing at temperatures ranging from 0 to 110°C. In this study, we have used the archaea genome datasets to identify molecular trends related to thermal adaptation in the protein components (SRP19 and SRP54) of the signal recognition particle (SRP). Using comparative genomics and secondary structure homology modeling we have detected significant differences in the amino acids composition and distribution between the SRP proteins of thermophile and mesophile archaea. These include: a significant increase in the thermophile SRP proteins of the frequency of charged amino acids able to participate in electrostatic interactions which contribute to stabilize proteins; decreased content of both thermolabile and small/tiny amino acids which usually contribute to protein flexibility; and a significant increase in aliphatic and aromatic amino acids providing good covering and masking to produce hydrophobic pockets involved in stabilizing protein structure. Moreover, a detailed analysis of the four structural and functional domains of the SRP54 indicates a particularly robust correlation between the compositional properties of the M domain and the optimal growth temperature (OGT) of the archaea. The analysis of the bacterial SRP54(Ffh) shows similar adaptations to the OGT. Thus, natural selection has adapted the SRP proteins to the OGT of the archaea and bacteria species by modifying both, their amino acids composition and distribution.  相似文献   

15.
Detection of protein homology via sequence similarity has important applications in biology, from protein structure and function prediction to reconstruction of phylogenies. Although current methods for aligning protein sequences are powerful, challenges remain, including problems with homologous overextension of alignments and with regions under convergent evolution. Here, we test the ability of the profile hidden Markov model method HMMER3 to correctly assign homologous sequences to >13 000 manually curated families from the Pfam database. We identify problem families using protein regions that match two or more Pfam families not currently annotated as related in Pfam. We find that HMMER3 E-value estimates seem to be less accurate for families that feature periodic patterns of compositional bias, such as the ones typically observed in coiled-coils. These results support the continued use of manually curated inclusion thresholds in the Pfam database, especially on the subset of families that have been identified as problematic in experiments such as these. They also highlight the need for developing new methods that can correct for this particular type of compositional bias.  相似文献   

16.
We provide a comprehensive analysis of the current enzymes with alpha-amylase activity (AAMYs) that belong to family 13 glycoside hydrolase (GH-13; 144 Archaea, Bacteria, and Eukaryota sequences from 87 different species). This study aims to further knowledge of the evolutionary molecular relationships among the sequences of their A and B domains with special emphasis on the correlation between what is observed in the structures and protein evolution. Multialignments for the A domain distinguish two clusters for sequences from Archaea organisms, eight for sequences from Bacteria organisms, and three for sequences from Eukaryota organisms. The clusters for Bacteria do not follow any strict taxonomic pathway; in fact, they are rather scattered. When we compared the A domains of sequences belonging to different kingdoms, we found that various pairs of clusters were significantly similar. Using either sequence similarity with crystallized structures or secondary-structure prediction methods, we identified in all AAMYs the eight putative beta-strands that constitute the beta-sheet in the TIM barrel of the A domain and studied the packing in its interior. We also discovered a "hidden homology" in the TIM barrel, an invariant Gly located upstream in the sequence before the conserved Asp in beta-strand 3. This Gly precedes an alpha-helix and is actively involved in capping its N-terminal end with a capping box. In all cases, a Schellman motif caps the C-terminal end of this helix.  相似文献   

17.
18.
The Universal Ancestor and the Ancestor of Bacteria Were Hyperthermophiles   总被引:4,自引:0,他引:4  
The definition of the node of the last universal common ancestor (LUCA) is justified in a topology of the unrooted universal tree. This definition allows previous analyses based on paralogous proteins to be extended to orthologous ones. In particular, the use of a thermophily index (based on the amino acids propensity to enter the [hyper] thermophile proteins more frequently) and its correlation with the optimal growth temperature of the various organisms allow inferences to be made on the habitat in which the LUCA lived. The reconstruction of ancestral sequences by means of the maximum likelihood method and their attribution to the set of mesophilic or hyperthermophilic sequences have led to the following conclusions: the LUCA was a hyperthermophile organism, as were the ancestors of the Archaea and Bacteria domains, while the ancestor of the Eukarya domain was a mesophile. These conclusions are independent of the presence of hyperthermophile bacteria in the sample of sequences used in the analysis and are therefore independent of whether or not these are the first lines of divergence in the Bacteria domain, as observed in the topology of the universal tree of ribosomal RNA. These conclusions are thus more easily understood under the hypothesis that the origin of life took place at a high temperature.  相似文献   

19.
Since the definition of archaea as a separate domain of life along with bacteria and eukaryotes, they have become one of the most interesting objects of modern microbiology, molecular biology, and biochemistry. Sequencing and analysis of archaeal genomes were especially important for studies on archaea because of a limited availability of genetic tools for the majority of these microorganisms and problems associated with their cultivation. Fifteen years since the publication of the first genome of an archaeon, more than one hundred complete genome sequences of representatives of different phylogenetic groups have been determined. Analysis of these genomes has expanded our knowledge of biology of archaea, their diversity and evolution, and allowed identification and characterization of new deep phylogenetic lineages of archaea. The development of genome technologies has allowed sequencing the genomes of uncultivated archaea directly from enrichment cultures, metagenomic samples, and even from single cells. Insights have been gained into the evolution of key biochemical processes in archaea, such as cell division and DNA replication, the role of horizontal gene transfer in the evolution of archaea, and new relationships between archaea and eukaryotes have been revealed.  相似文献   

20.
Kono N  Arakawa K  Tomita M 《PloS one》2012,7(4):e34526
In bacterial circular chromosomes and most plasmids, the replication is known to be terminated when either of the following occurs: the forks progressing in opposite directions meet at the distal end of the chromosome or the replication forks become trapped by Tus proteins bound to Ter sites. Most bacterial genomes have various polarities in their genomic structures. The most notable feature is polar genomic compositional asymmetry of the bases G and C in the leading and lagging strands, called GC skew. This asymmetry is caused by replication-associated mutation bias, and this "footprint" of the replication machinery suggests that, in contrast to the two known mechanisms, replication termination occurs near the chromosome dimer resolution site dif. To understand this difference between the known replication machinery and genomic compositional bias, we undertook a simulation study of genomic mutations, and we report here how different replication termination models contribute to the generation of replication-related genomic compositional asymmetry. Contrary to naive expectations, our results show that a single finite termination site at dif or at the GC skew shift point is not sufficient to reconstruct the genomic compositional bias as observed in published sequences. The results also show that the known replication mechanisms are sufficient to explain the position of the GC skew shift point.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号