In many bacterial genomes, the leading and lagging strands have different skews in base composition; for example, an excess of guanosine compared to cytosine on the leading strand. We find that Chlamydia genes that have switched their orientation relative to the direction of replication, for example by inversion, acquire the skew of their new ``host' strand. In contrast to most evolutionary processes, which have unpredictable effects on the sequence of a gene, replication-related skews reflect a directional evolutionary force that causes predictable changes in the base composition of switched genes, resulting in increased DNA and amino acid sequence divergence. Received: 27 April 2000 / Accepted: 1 August 2000  相似文献   

Computer analyses of various genome sequences revealed the existence of certain periodical patterns of adenine–adenine dinucleotides (ApA). For each genome sequence of 13 eubacteria, 3 archaebacteria, 10 eukaryotes, 60 mitochondria, and 9 chloroplasts, we counted frequencies of ApA dinucleotides at each downstream position within 50 bp from every ApA. We found that the complete genomes of all three archaebacteria have clear ApA periodicities of about 10 bps. On the other hand, all of the 13 eubacteria we analyzed were found to have an ApA periodicity of about 11 bp. Similar periodicities exist in the 10 eukaryotes, although higher organisms such as primates tend to have weaker periodic patterns. None of the mitochondria and chroloplasts we analyzed showed an evident periodic pattern. Received: 3 November 1998 / Accepted: 24 March 1999  相似文献   

Parsimony is commonly used to infer the direction of substitution and mutation. However, it is known that parsimony is biased when the base composition of the DNA sequence is skewed. Here I quantify this effect for several simple cases. The analysis demonstrates that parsimony can be misleading even when levels of sequence divergence are as low as 10%; parsimony incorrectly infers an excess of common to rare changes. Caution must therefore be excercised in the use of parsimony. Received: 13 November 1997 / Accepted: 18 June 1998  相似文献   

The extent to which base composition and codon usage vary among RNA viruses, and the possible causes of this bias, is undetermined in most cases. A maximum-likelihood statistical method was used to test whether base composition and codon usage bias covary with arthropod association in the genus Flavivirus, a major source of disease in humans and animals. Flaviviruses are transmitted by mosquitoes, by ticks, or directly between vertebrate hosts. Those viruses associated with ticks were found to have a significantly lower G+C content than non-vector-borne flaviviruses and this difference was present throughout the genome at all amino acids and codon positions. In contrast, mosquito-borne viruses had an intermediate G+C content which was not significantly different from those of the other two groups. In addition, biases in dinucleotide and codon usage that were independent of base composition were detected in all flaviviruses, but these did not covary with arthropod association. However, the overall effect of these biases was slight, suggesting only weak selection at synonymous sites. A preliminary analysis of base composition, codon usage, and vector specificity in other RNA virus families also revealed a possible association between base composition and vector specificity, although with biases different from those seen in the Flavivirus genus. Received: 29 August 2000 / Accepted: 19 December 2000  相似文献   

Five complete bacterial genome sequences have been released to the scientific community. These include four (eu)Bacteria, Haemophilus influenzae, Mycoplasma genitalium, M. pneumoniae, and Synechocystis PCC 6803, as well as one Archaeon, Methanococcus jannaschii. Features of organization shared by these genomes are likely to have arisen very early in the history of the bacteria and thus can be expected to provide further insight into the nature of early ancestors. Results of a genome comparison of these five organisms confirm earlier observations that gene order is remarkably unpreserved. There are, nevertheless, at least 16 clusters of two or more genes whose order remains the same among the four (eu)Bacteria and these are presumed to reflect conserved elements of coordinated gene expression that require gene proximity. Eight of these gene orders are essentially conserved in the Archaea as well. Many of these clusters are known to be regulated by RNA-level mechanisms in Escherichia coli, which supports the earlier suggestion that this type of regulation of gene expression may have arisen very early. We conclude that although the last common ancestor may have had a DNA genome, it likely was preceded by progenotes with an RNA genome. Received: 10 March 1996 / Accepted: 20 May 1997  相似文献   

The idea that the pattern of point mutation in Drosophila has remained constant during the evolution of the genus has recently been challenged. A study of the nucleotide composition focused on the Drosophila saltans group has evidenced unsuspected nucleotide composition differences among lineages. Compositional differences are associated with an accelerated rate of amino acid replacement in functionally less constrained regions. Here we reassess this issue from a different perspective. Adopting a maximum-likelihood estimation approach, we focus on the different predictions that mutation and selection make about the nonsynonymous-to-synonymous rate ratio. We investigate two gene regions, alcohol dehydrogenase (Adh) and xanthine dehydrogenase (Xdh), using a balanced data set that comprises representatives from the melangaster, obscura, saltans, and willistoni groups. We also consider representatives of the Hawaiian picture-winged group. These Hawaiian species are known to have experienced repeated bottlenecks and are included as a reference for comparison. Our results confirm patterns previously detected. The branch ancestral to the fast-evolving willistoni/saltans lineage, where most of the change in GC content has occurred, exhibits an excess of synonymous substitutions. The shift in mutation bias has affected the extent of the rate variation among sites in Xdh. Received: 4 May 1999 / Accepted: 26 July 1999  相似文献   

In this work detailed statistics on ancestral gene duplication and gene conservation in completely sequenced cellular genomes are presented. Analysis of open reading frame (ORF) products having simultaneous matches in several distinct organisms showed a significant correlation between duplication and conservation. Systematic comparisons of predicted proteomes of 23 organisms (including 20 that have been completely sequenced), have allowed us to quantify the degree of ancestral duplication within each genome and the level of conservation between genomes, using threshold values calculated for individual organisms. Statistical analysis of various gene proportions revealed interesting trends in gene structure and evolution, such as that (a) more than one-quarter (25%–66%) of the predicted ORF products of the surveyed organisms are not unique, indicating a high level of ancestral duplications; (b) levels of exclusive conservation within Bacteria are higher than those within the eukaryal or archaeal domains; and (c) at least one-half (47–99%) of the total predicted ORF products in the surveyed genomes have one or several highly significant matches in another genome. Significant matches are based on simulations taking into account the mean size of ORF products and the composition of each target organism's proteome. The methodology we have developed ensures stability and comparability of our results as the number of completely sequenced genomes increases. Received: 4 May 1998 / Accepted: 28 September 1998  相似文献   

We have analyzed the nad3-rps12 locus for eight angiosperms in order to compare the utility of mitochondrial DNA and edited mRNA sequences in phylogenetic reconstruction. The two coding regions, containing from 25 to 35 editing sites in the various plants, have been concatenated in order to increase the significance of the analysis. Differing from the corresponding chloroplast sequences, unedited mitochondrial DNA sequences seem to evolve under a quasi-neutral substitution process which undifferentiates the nucleotide substitution rates for the three codon positions. By using complete gene sequences (all codon positions) we found that genomic sequences provide a classical angiosperm phylogenetic tree with a clear-cut grouping of monocotyledons and dicotyledons with Magnoliidae at the basal branch of the tree. Conversely, owing to their low nucleotide substitution rates, edited mRNA sequences were found not to be suitable for studying phylogenetic relationships among angiosperms. Received: 24 January 1996 / Accepted: 5 June 1996  相似文献   

The majority of plant disease resistance genes are members of very large multigene families. They encode structurally related proteins containing nucleotide binding site domains (NBS) and C-terminal leucine rich repeats (LRR). The N-terminal region of some resistance genes contain a short sequence called TIR with homology to the animal innate immunity factors, Toll and interleukin receptor-like genes. Only a few plant resistance genes have been functionally analyzed and the origin and evolution of plant resistance genes remain obscure. We have reconstructed gene phylogeny by exhaustive analysis of available genome and amplified NBS domain sequences. Our study shows that NBS domains faithfully predict whole gene structure and can be divided into two major groups. Group I NBS domains contain group-specific motifs that are always linked with the TIR sequence in the N terminus. Significantly, Group I NBS domains and their associated TIR domains are widely distributed in dicot species but were not detected in cereal databases. Furthermore, Group I specific NBS sequences were readily amplified from dicot genomic DNA but could not be amplified from cereal genomic DNA. In contrast, Group II NBS domains are always associated with putative coiled-coil domains in their N terminus and appear to be present throughout the angiosperms. These results suggest that the two main groups of resistance genes underwent divergent evolution in cereal and dicot genomes and imply that their cognate signaling pathways have diverged as well. Received: 17 May 1999 / Accepted: 25 September 1999  相似文献   

We have elaborated a method which has allowed us to estimate the direction of translocation of orthologs which have changed, during the phylogeny, their positions on chromosome in respect to the leading or lagging role of DNA strands. We have shown that the relative number of translocations which have switched positions of genes from the leading to the lagging DNA strand is lower than the number of translocations which have transferred genes from the lagging strand to the leading strand of prokaryotic genomes. This paradox could be explained by assuming that the stronger mutation pressure and selection after inversion preferentially eliminate genes transferred from the leading to the lagging DNA strand. Received: 12 December 2000 / Accepted: 20 April 2001  相似文献   

The complete nucleotide sequence of the mitochondrial genome was determined for a conger eel, Conger myriaster (Elopomorpha: Anguilliformes), using a PCR-based approach that employs a long PCR technique and many fish-versatile primers. Although the genome [18,705 base pairs (bp)] contained the same set of 37 mitochondrial genes [two ribosomal RNA (rRNA), 22 transfer RNA (tRNA), and 13 protein-coding genes] as found in other vertebrates, the gene order differed from that recorded for any other vertebrates. In typical vertebrates, the ND6, tRNAGlu, and tRNAPro genes are located between the ND5 gene and the control region, whereas the former three genes, in C. myriaster, have been translocated to a position between the control region and the tRNAPhe gene that are contiguously located at the 5′ end of the 12S rRNA gene in typical vertebrates. This gene order is similar to the recently reported gene order in four lineages of birds in that the latter lack the ND6, tRNAGlu, and tRNAPro genes between the ND5 gene and the control region; however, the relative position of the tRNAPro to the ND6–tRNAGlu genes in C. myriaster was different from that in the four birds, which presumably resulted from different patterns of tandem duplication of gene regions followed by gene deletions in two distantly related groups of organisms. Sequencing of the ND5–cyt b region in 11 other anguilliform species, representing 11 families, plus one outgroup species, revealed that the same gene order as C. myriaster was shared by another 4 families, belonging to the suborder Congroidei. Although the novel gene orders of four lineages of birds were indicated to have multiple independent origins, phylogenetic analyses using nucleotide sequences from the mitochondrial 12S rRNA and cyt b genes suggested that the novel gene orders of the five anguilliform families had originated in a single ancestral species. Received: 13 July 2000 / Accepted: 30 November 2000  相似文献   

A comprehensive analysis of duplication and gene conversion for 7394 Caenorhabditis elegans genes (about half the expected total for the genome) is presented. Of the genes examined, 40% are involved in duplicated gene pairs. Intrachromosomal or cis gene duplications occur approximately two times more often than expected. In general the closer the members of duplicated gene pairs are, the more likely it is that gene orientation is conserved. Gene conversion events are detectable between only 2% of the duplicated pairs. Even given the excesses of cis duplications, there is an excess of gene conversion events between cis duplicated pairs on every chromosome except the X chromosome. The relative rates of cis and trans gene conversion and the negative correlation between conversion frequency and DNA sequence divergence for unconverted regions of converted pairs are consistent with previous experimental studies in yeast. Three recent, regional duplications, each spanning three genes are described. All three have already undergone substantial deletions spanning hundreds of base pairs. The relative rates of duplication and deletion may contribute to the compactness of the C. elegans genome. Received: 30 July 1998 / Accepted: 12 October 1998  相似文献   

Highly expressed plastid genes display codon adaptation, which is defined as a bias toward a set of codons which are complementary to abundant tRNAs. This type of adaptation is similar to what is observed in highly expressed Escherichia coli genes and is probably the result of selection to increase translation efficiency. In the current work, the codon adaptation of plastid genes is studied with regard to three specific features that have been observed in E. coli and which may influence translation efficiency. These features are (1) a relatively low codon adaptation at the 5′ end of highly expressed genes, (2) an influence of neighboring codons on codon usage at a particular site (codon context), and (3) a correlation between the level of codon adaptation of a gene and its amino acid content. All three features are found in plastid genes. First, highly expressed plastid genes have a noticeable decrease in codon adaptation over the first 10–20 codons. Second, for the twofold degenerate NNY codon groups, highly expressed genes have an overall bias toward the NNC codon, but this is not observed when the 3′ neighboring base is a G. At these sites highly expressed genes are biased toward NNT instead of NNC. Third, plastid genes that have higher codon adaptations also tend to have an increased usage of amino acids with a high G + C content at the first two codon positions and GNN codons in particular. The correlation between codon adaptation and amino acid content exists separately for both cytosolic and membrane proteins and is not related to any obvious functional property. It is suggested that at certain sites selection discriminates between nonsynonymous codons based on translational, not functional, differences, with the result that the amino acid sequence of highly expressed proteins is partially influenced by selection for increased translation efficiency. Received: 21 July 1999 / Accepted: 5 November 1999  相似文献   

Cryptomonads, small biflagellate algae, contain four different genomes. In addition to the nucleus, mitochondrion, and chloroplast is a fourth DNA-containing organelle the nucleomorph. Nucleomorphs result from the successive reduction of the nucleus of an engulfed phototrophic eukaryotic endosymbiont by a secondary eukaryotic host cell. By sequencing the chloroplast genome and the nucleomorph chromosomes, we identified a groEL homologue in the genome of the chloroplast and a related cpn60 in one of the nucleomorph chromosomes. The nucleomorph-encoded Cpn60 and the chloroplast-encoded GroEL correspond in each case to one of the two divergent GroEL homologues in the cyanobacterium Synechocystis sp. PCC6803. The coexistence of divergent groEL/cpn60 genes in different genomes in one cell offers insights into gene transfer from evolving chloroplasts to cell nuclei and convergent gene evolution in chlorophyll a/b versus chlorophyll a/c/phycobilin eukaryotic lineages. Received: 24 April 1998 / Accepted: 12 June 1998  相似文献   

Gypsy LTR-retrotransposons have been identified in the genomes of many organisms, but only a small number of vertebrate examples have been reported to date. Here we show that members of this family are likely to be widespread in many vertebrate classes with the possible exceptions of mammals and birds. Phylogenetic analyses demonstrate that although there are several distinct lineages of vertebrate gypsy LTR-retrotransposons, the majority clusters into one monophyletic clade. Groups of fungal, plant, and insect elements were also observed, suggesting horizontal transfer between phyla may be infrequent. However, in contrast to this, there was little evidence to support sister relationships between elements derived from vertebrate and insect hosts. In fact, the majority of the vertebrate elements appeared to be most closely related to a group of gypsy LTR-retrotransposons present within fungi. This implies either that at least one horizontal transmission between these two phyla has occurred previously or that a gypsy LTR-retrotransposon lineage has been lost from insect taxa. Received: 22 December 1998 / Accepted: 6 April 1999  相似文献   

In many unicellular organisms, invertebrates, and plants, synonymous codon usage biases result from a coadaptation between codon usage and tRNAs abundance to optimize the efficiency of protein synthesis. However, it remains unclear whether natural selection acts at the level of the speed or the accuracy of mRNAs translation. Here we show that codon usage can improve the fidelity of protein synthesis in multicellular species. As predicted by the model of selection for translational accuracy, we find that the frequency of codons optimal for translation is significantly higher at codons encoding for conserved amino acids than at codons encoding for nonconserved amino acids in 548 genes compared between Caenorhabditis elegans and Homo sapiens. Although this model predicts that codon bias correlates positively with gene length, a negative correlation between codon bias and gene length has been observed in eukaryotes. This suggests that selection for fidelity of protein synthesis is not the main factor responsible for codon biases. The relationship between codon bias and gene length remains unexplained. Exploring the differences in gene expression process in eukaryotes and prokaryotes should provide new insights to understand this key question of codon usage. Received: 18 June 2000 / Accepted: 10 November 2000  相似文献   

Retrovirus-like sequences and their solitary (solo) long terminal repeats (LTRs) are common repetitive elements in eukaryotic genomes. We reported previously that the tandemly arrayed genes encoding U2 snRNA (the RNU2 locus) in humans and apes contain a solo LTR (U2-LTR) which was presumably generated by homologous recombination between the two LTRs of an ancestral provirus that is retained in the orthologous baboon RNU2 locus. We have now sequenced the orthologous U2-LTRs in human, chimpanzee, gorilla, orangutan, and baboon and examined numerous homologs of the U2-LTR that are dispersed throughout the human genome. Although these U2-LTR homologs have been collectively referred to as LTR13 in the literature, they do not display sequence similarity to any known retroviral LTRs; however, the structure of LTR13 closely resembles that of other retroviral LTRs with a putative promoter, polyadenylation signal, and a tandemly repeated 53-bp enhancer-like element. Genomic blotting indicates that LTR13 is primate-specific; based on sequence analysis, we estimate there are about 2,500 LTR13 elements in the human genome. Comparison of the primate U2-LTR sequences suggests that the homologous recombination event that gave rise to the solo U2-LTR occurred soon after insertion of the ancestral provirus into the ancestral U2 tandem array. Phylogenetic analysis of the LTR13 family confirms that it is diverse, but the orthologous U2-LTRs form a coherent group in which chimpanzee is closest to the humans; orangutan is a clear outgroup of human, chimpanzee, and gorilla; and baboon is a distant relative of human, chimpanzee, gorilla, and orangutan. We compare the LTR13 family with other known LTRs and consider whether these LTRs might play a role in concerted evolution of the primate RNU2 locus. Received: 29 September 1997 / Accepted: 16 January 1998  相似文献   

