When two sequences are aligned with a single set of alignment parameters, or when mutation parameters are estimated on the basis of a single ``optimal' sequence alignment, the variability of both the alignment and the estimated parameters can be seriously underestimated. To obtain a more realistic impression of the actual uncertainty, we propose sampling sequence alignments and mutation parameters simultaneously from their joint posterior distribution given the two original sequences. We illustrate our method with human and orangutan sequences from the hyper variable region I and with gene–pseudogene pairs. Received: 16 November 2000 / Accepted: 15 May 2001  相似文献   

Comparison of complete genome sequences for different variants of hepatitis C virus (HCV) reveals several different constraints on sequence change. Synonymous changes are suppressed in coding regions at both 5′ and 3′ ends of the genome. No evidence was found for the existence of alternative reading frames or for a lower mutation frequency in these regions. Instead, suppression may be due to constraints imposed by RNA secondary structures identified within the core and NS5b genes. Nonsynonymous substitutions are less frequent than synonymous ones except in the hypervariable region of E2 and, to a lesser extent, in E1, NS2, and NS5b. Transitions are more frequent than transversions, particularly at the third position of codons where the bias is 16:1. In addition, nucleotide substitutions may not occur symmetrically since there is a bias toward G or C at the third position of codons, while T ↔ C transitions were twice as frequent as A ↔ G transitions. These different biases do not affect the phylogenetic analysis of HCV variants but need to be taken into account in interpreting sequence change in longitudinal studies. Received: 9 September 1996 / Accepted: 20 April 1997  相似文献   

We conducted comprehensive sequence analysis of 5′ flanking regions of primate Alu elements. Information contents were computed and frequencies of 1024 pentanucleotides were measured to approximate the location of a characteristic sequence and to specify its pattern(s), which may be involved in the integration of Alu elements into their host genomes. A large number of samples was used, the wide region of the 5′ end of Alu elements was analyzed, and comparisons were made among different subfamilies. Through our analyses, ``TTTTAAAAA' or ``(T) m (A) n ' can be stated as a candidate for the characteristic sequence pattern, which resides around the region 5 to 20 base pairs upstream of the 5′ end of Alu elements. This characteristic sequence pattern was more prominent in the sequences of younger Alus, which is a strong indication that the sequence pattern has a role at the time of Alu integration. Received: 10 May 1999 / Accepted: 1 October 1999  相似文献   

The idea that the pattern of point mutation in Drosophila has remained constant during the evolution of the genus has recently been challenged. A study of the nucleotide composition focused on the Drosophila saltans group has evidenced unsuspected nucleotide composition differences among lineages. Compositional differences are associated with an accelerated rate of amino acid replacement in functionally less constrained regions. Here we reassess this issue from a different perspective. Adopting a maximum-likelihood estimation approach, we focus on the different predictions that mutation and selection make about the nonsynonymous-to-synonymous rate ratio. We investigate two gene regions, alcohol dehydrogenase (Adh) and xanthine dehydrogenase (Xdh), using a balanced data set that comprises representatives from the melangaster, obscura, saltans, and willistoni groups. We also consider representatives of the Hawaiian picture-winged group. These Hawaiian species are known to have experienced repeated bottlenecks and are included as a reference for comparison. Our results confirm patterns previously detected. The branch ancestral to the fast-evolving willistoni/saltans lineage, where most of the change in GC content has occurred, exhibits an excess of synonymous substitutions. The shift in mutation bias has affected the extent of the rate variation among sites in Xdh. Received: 4 May 1999 / Accepted: 26 July 1999  相似文献   

A model of nucleotide substitution that allows the transition/transversion rate bias to vary across sites was constructed. We examined the fit of this model using likelihood-ratio tests by analyzing 13 protein coding genes and 1 pseudogene. Likelihood-ratio testing indicated that a model that allows variation in the transition/transversion rate bias across sites provided a significant improvement in fit for most protein coding genes but not for the pseudogene. When the analysis was repeated with parameters estimated separately for first, second, and third codon positions, strong heterogeneity was uncovered for the first and second codon positions; the variation in the transition/transversion rate was generally weaker at the third codon position. The transition rate bias and branch lengths are underestimated when variation in the transition/transversion rate was not accommodated, suggesting that it may be important to accommodate variation in the pattern of nucleotide substitution for accurate estimation of evolutionary parameters. Received: 4 November 1997 / Accepted: 19 May 1998  相似文献   

A phylogenetic tree for major lineages of iguanian lizards is estimated from 1,488 aligned base positions (858 informative) of newly reported mitochondrial DNA sequences representing coding regions for eight tRNAs, ND2, and portions of ND1 and COI. Two well-supported groups are defined, the Acrodonta and the Iguanidae (sensu lato). This phylogenetic hypothesis is used to investigate evolutionary shifts in mitochondrial gene order, origin for light-strand replication, and secondary structure of tRNACys. These three characters shift together on the branch leading to acrodont lizards. Plate tectonics and the fossil record indicate that these characters changed in the Jurassic. We propose that changes to the secondary structure of tRNACys may destroy function of the origin for light-strand replication which, in turn, may facilitate shifts in gene order. Received: 28 May 1996 / Accepted: 27 December 1996  相似文献   

The nucleotide sequence for an 11,715-bp segment of the mitochondrial genome of the octocoral Sarcophyton glaucum is presented, completing the analysis of the entire genome for this anthozoan member of the phylum Cnidaria. The genome contained the same 13 protein-coding and 2 ribosomal RNA genes as in other animals. However, it also included an unusual mismatch repair gene homologue reported previously and codes for only a single tRNA gene. Intermediate in length compared to two other cnidarians (17,443 and 18,911 bp), this organellar genome contained the smallest amount of noncoding DNA (428, compared to 1283 and 781 nt, respectively), making it the most compact one found for the phylum to date. The mitochondrial genes of S. glaucum exhibited an identical arrangement to that found in another octocoral, Renilla kolikeri, with five protein-coding genes in the same order as has been found in insect and vertebrate mitochondrial genomes. Although gene order appears to be highly conserved among octocorals, compared to the hexacoral, Metridium senile, few similarities were found. Like other metazoan mitochondrial genomes, the A + T composition was elevated and a general bias against codons ending in G or C was observed. However, an exception to this was the infrequent use of TGA compared to TGG to code for tryptophan. This divergent codon bias is unusual but appears to be a conserved feature among two rather distantly related anthozoans. Received: 27 January 1998 / Accepted: 25 May 1998  相似文献   

In order to obtain the evolutionary distance data that are as purely additive as possible, we have developed a novel method for evaluating the evolutionary distances from the base-pair changes in stem regions of ribosomal RNAs (rRNAs). The application of this method to small-subunit (SSU) and large-subunit (LSU) rRNAs provides the distance data, with which both the unweighted pair group method of analysis and the neighbor-joining method give almost the same tree topology of most organisms except for some Protoctista, thermophilic bacteria, parasitic organisms, and endosymbionts. Although the evolutionary distances calculated with LSU rRNAs are somewhat longer than those with SSU rRNAs, the difference, probably due to a slight difference in functional constraint, is substantially decreased when the distances are converted into the divergence times of organisms by the measure of the time scale estimated in each type of rRNAs. The divergence times of main branches agree fairly well with the geological record of organisms, at least after the appearance of oxygen-releasing photosynthesis, although the divergence times of Eukaryota, Archaebacteria, and Eubacteria are somewhat overestimated in comparison with the geological record of Earth formation. This result is explained by considering that the mutation rate is determined by the accumulation of misrepairs for DNA damage caused by radiation and that the effect of radiation had been stronger before the oxygen molecules became abundant in the atmosphere of the Earth. Received: 23 October 1997 / Accepted: 12 August 1998  相似文献   

In this study we constructed a bootstrapped distance tree of 500 small subunit ribosomal RNA sequences from organisms belonging to the so-called crown of eukaryote evolution. Taking into account the substitution rate of the individual nucleotides of the rRNA sequence alignment, our results suggest that (1) animals, true fungi, and choanoflagellates share a common origin: The branch joining these taxa is highly supported by bootstrap analysis (bootstrap support [BS] > 90%), (2) stramenopiles and alveolates are sister groups (BS = 75%), (3) within the alveolates, dinoflagellates and apicomplexans share a common ancestor BS > 95%), while in turn they both share a common origin with the ciliates (BS > 80%), and (4) within the stramenopiles, heterokont algae, hyphochytriomycetes, and oomycetes form a monophyletic grouping well supported by bootstrap analysis (BS > 85%), preceded by the well-supported successive divergence of labyrinthulomycetes and bicosoecids. On the other hand, many evolutionary relationships between crown taxa are still obscure on the basis of 18S rRNA. The branching order between the animal-fungal-choanoflagellates clade and the chlorobionts, the alveolates and stramenopiles, red algae, and several smaller groups of organisms remains largely unresolved. When among-site rate variation is not considered, the inferred tree topologies are inferior to those where the substitution rate spectrum for the 18S rRNA is taken into account. This is primarily indicated by the erroneous branching of fast-evolving sequences. Moreover, when different substitution rates among sites are not considered, the animals no longer appear as a monophyletic grouping in most distance trees. Received: 11 June 1997 / Accepted: 21 July 1997  相似文献   

We previously reported the sequence of a 9260-bp fragment of mitochondrial (mt) DNA of the cephalopod Loligo bleekeri [J. Sasuga et al. (1999) J. Mol. Evol. 48:692–702]. To clarify further the characteristics of Loligo mtDNA, we have sequenced an 8148-bp fragment to reveal the complete mt genome sequence. Loligo mtDNA is 17,211 bp long and possesses a standard set of metazoan mt genes. Its gene arrangement is not identical to any other metazoan mt gene arrangement reported so far. Three of the 19 noncoding regions longer than 10 bp are 515, 507, and 509 bp long, and their sequences are nearly identical, suggesting that multiplication of these noncoding regions occurred in an ancestral Loligo mt genome. Comparison of the gene arrangements of Loligo, Katharina tunicata, and Littorina saxatilis mt genomes revealed that 17 tRNA genes of the Loligo mt genome are adjacent to noncoding regions. A majority (15 tRNA genes) of their counterparts is found in two tRNA gene clusters of the Katharina mt genome. Therefore, the Loligo mt genome (17 tRNA genes) may have spread over the genome, and this may have been coupled with the multiplication of the noncoding regions. Maximum likelihood analysis of mt protein genes supports the clade Mollusca + Annelida + Brachiopoda but fails to infer the relationships among Katharina, Loligo, and three gastropod species. Received: 9 May 2001 / Accepted: 3 October 2001  相似文献   

Photosynthetic eukaryotes can, according to features of their chloroplasts, be divided into two major groups: the red and the green lineage of plastid evolution. To extend the knowledge about the evolution of the red lineage we have sequenced and analyzed the chloroplast genome (cp-genome) of Cyanidium caldarium RK1, a unicellular red alga (AF022186). The analysis revealed that this genome shows several unusual structural features, such as a hypothetical hairpin structure in a gene-free region and absence of large repeat units. We provide evidence that this structural organization of the cp-genome of C. caldarium may be that of the most ancient cp-genome so far described. We also compared the cp-genome of C. caldarium to the other known cp-genomes of the red lineage. The cp-genome of C. caldarium cannot be readily aligned with that of Porphyra purpurea, a multicellular red alga, or Guillardia theta due to a displacement of a region of the cp-genome. The phylogenetic tree reveals that the secondary endosymbiosis, through which G. theta evolved, took place after the separation of the ancestors of C. caldarium and P. purpurea. We found several genes unique to the cp-genome of C. caldarium. Five of them seem to be involved in the building of bacterial cell envelopes and may be responsible for the thermotolerance of the chloroplast of this alga. Two additional genes may play a role in stabilizing the photosynthetic machinery against salt stress and detoxification of the chloroplast. Thus, these genes may be unique to the cp-genome of C. caldarium and may be required for the endurance of the extreme living conditions of this alga. Received: 3 June 2000 / Accepted: 18 July 2000  相似文献   

A family of four satellite DNAs has been characterized in the genome of the bivalve mollusc, Donax trunculus. All share HindIII sites, a similar monomer length of about 160 base pairs (bp), and the related oligonucleotide motifs GGTCA and GGGTTA, repeated six to 15 times within the repetitive units. The motif GGTCA is common to all members of the satellite family. It is present in three of them in both orientations, interspersed within nonrepetitive DNA sequences. The hexanucleotide GGGTTA appears to be the main building element of one of the satellites forming a prominent subrepeat structure in conjunction with the 5-bp motif. The former has been also found in perfect tandem repeats in a junction region adjacent to the proper satellite sequence. Southern analysis has revealed that (GGGTTA)n and/or related sequences are abundant and widely distributed in the D. trunculus genome. The distribution observed is consistent with the concurrence of the scattering of short sequence motifs throughout the genome and the spread of longer DNA segments, with concomitant formation of satellite monomer repeats. Both kinds of dispersion may have contributed to the observed complex arrangement of the HindIII satellite DNA family in Donax. Received: 28 May 1996 / Accepted: 30 July 1996  相似文献   

Substitutions occurring in noncoding sequences of the plant chloroplast genome violate the independence of sites that is assumed by substitution models in molecular evolution. The probability that a substitution at a site is a transversion, as opposed to a transition, increases significantly with increasing A + T content of the two adjacent nucleotides. In the present study, this dependency of substitutions on local context is examined further in a number of noncoding regions from the chloroplast genome of members of the grass family (Poaceae). Two features were examined; the influence of specific neighboring bases, as opposed to the general A + T content, on transversion proportion and an influence on substitutions by nucleotides other than the two immediately adjacent to the site of substitution. In both cases, a significant effect was found. In the case of specific nucleotides, transversion proportion is significantly higher at sites with a pyrimidine immediately 5′ on either strand. Substitutions at sites of the type YNR, where N is the site of substitution, have the highest rate of transversion. This specific effect is secondary to the A + T content effect such that, in terms of proportion of substitutions that are transversions, the nucleotides are ranked T > A > C > G as to their effect when they are immediately 5′ to the site of substitution. In the case of nucleotides other than the immediate neighbors, a significant influence on substitution dynamics is observed in the case where the two neighboring bases are both A and/or T. Thus, substitutions are primarily, but not exclusively, influenced by the composition of the two nucleotides that are immediately adjacent. These results indicate that the pattern of molecular evolution of the plant chloroplast genome is extremely complex as a result of a variety of inter-site dependencies. Received: 18 October 1996 / Accepted: 12 April 1997  相似文献   

The synonymous divergence between Escherichia coli and Salmonella typhimurium is explained in a model where there is a large variation between mutation rates at different nucleotide sites in the genome. The model is based on the experimental observation that spontaneous mutation rates can vary over several orders of magnitude at different sites in a gene. Such site-specific variation must be taken into account when studying synonymous divergence and will result in an apparent saturation below the level expected from an assumption of uniform rates. Recently, it has been suggested that codon preference in enterobacteria has a very large site-specific variation and that the synonymous divergence between different species, e.g., E. coli and Salmonella, is saturated. In the present communication it is shown that when site-specific variation in mutation rates is introduced, there is no need to invoke assumptions of saturation and a large variability in codon preference. The same rate variation will also bring average mutation rates as estimated from synonymous sequence divergence into numerical agreement with experimental values. Received: 10 July 1998 / Accepted: 20 August 1998  相似文献   

By means of simulations and DNA sequence analyses, standardized identity excess (a measure of linkage disequilibrium) between segregating nucleotide sites was studied as an effort to quantify the patchwork pattern among alleles of the major histocompatibility complex loci. It was found that the pattern under selective neutrality, and/or no intralocus recombination does not fit the observed pattern based on DNA sequences. However, the intensity and type of selection and the rate of recombination are difficult to estimate by comparing simulation results with the observed pattern. Received: 10 December 1999 / Accepted: 2 March 2000  相似文献   

An AluI satellite DNA family has been isolated in the genome of the root-knot nematode Meloidogyne chitwoodi. This repeated sequence was shown to be present at approximately 11,400 copies per haploid genome, and represents about 3.5% of the total genomic DNA. Nineteen monomers were cloned and sequenced. Their length ranged from 142 to 180 bp, and their A + T content was high (from 65.7 to 79.1%), with frequent runs of As and Ts. An unexpected heterogeneity in primary structure was observed between monomers, and multiple alignment analysis showed that the 19 repeats could be unambiguously clustered in six subfamilies. A consensus sequence has been deduced for each subfamily, within which the number of positions conserved is very high, ranging from 86.7% to 98.6%. Even though blocks of conserved regions could be observed, multiple alignment of the six consensus sequences did not enable the establishment of a general unambiguous consensus sequence. Screening of the six consensus sequences for evidence of internal repeated subunits revealed a 6-bp motif (AAATTT), present in both direct and inverted orientation. This motif was found up to nine times in the consensus sequences, also with the occurrence of degenerated subrepeats. Along with the meiotic parthenogenetic mode of reproduction of this nematode, such structural features may argue for the evolution of this satellite DNA family either (1) from a common ancestral sequence by amplification followed by mechanisms of sequence divergence, or (2) through independent mutations of the ancestral sequence in isolated amphimictic nematode populations and subsequent hybridization events. Overall, our results suggest the ancient origin of this satellite DNA family, and may reflect for M. chitwoodi a phylogenetic position close to the ancestral amphimictic forms of root-knot nematodes. Received: 23 April 1997 / Accepted: 9 July 1997  相似文献   

When human T cell receptor for antigen (TCR) alpha chain V-genes were compared pair-wise, the numbers of nucleotide differences showed a characteristic distribution; most were in the range of 100 to 200 differences out of a total of about 300 bases. The same distribution was observed for mouse TCR alpha chains. Even more interesting was that comparing human alpha chains and mouse alpha chains gave essentially the same nucleotide difference pattern. It is inferred from the large number of differences and from the nonspecificity of trans-species (human and mouse) nucleotide sequence differences of TCR V-genes that TCR alpha chains probably diverged early during evolution. The same feature was also observed for human and mouse TCR beta chains, although the alpha and beta chain V-genes were distinct. This evolutionary preservation could be of vital importance to the fidelity of the complicated trimolecular interactions among TCR alpha and beta chains, the processed peptide, and the major histocompatibility complex (MHC) class I or II molecules. Received: 22 January 1996 / Accepted: 9 September 1996  相似文献   

A Monte Carlo method was used to test the extent of sequence similarity among viroids, satellite RNAs, and hepatitis delta virus. This analysis revealed that there is insufficient sequence similarity among these pathogens to support the hypothesis that they have a common evolutionary origin. Furthermore, while definite patterns of sequence similarity were observed among some viroids, there was a clear lack of overall similarity, indicating that a monophyletic origin for even this group cannot be reliably supported from sequence data alone. Received: 30 April 1999 / Accepted: 24 August 1999  相似文献   

Size homoplasy was analyzed at microsatellite loci by sequencing electromorphs, that is, variants of the same size (base pairs). This study was conducted using five interrupted and/or compound loci in three invertebrate species, the honey bee Apis mellifera, the bumble bee Bombus terrestris, and the freshwater snail Bulinus truncatus. The 15 electromorphs sequenced turned out to hide 31 alleles (i.e., variants identical in sequence). Variation in the amount of size homoplasy was detected among electromorphs and loci. From one to seven alleles were detected per electromorph, and one locus did not show any size homoplasy in both bee species. The amount of size homoplasy was related to the sequencing effort, since the number of alleles was correlated with the number of copies of electromorphs sequenced, but also with the molecular structure of the core sequence at each locus. Size homoplasy within populations was detected only three times, meaning that size homoplasy was detected mostly among populations. We analyzed population structure, estimating F st and a genetic distance, based on either electromorphs or alleles. Whereas little difference was found in A. mellifera, uncovering size homoplasy led to a more marked population structure in B. terrestris and B. truncatus. We also showed in A. mellifera that the detection of size homoplasy may alter phylogenetic reconstructions. Received: 21 July 1997 / Accepted: 29 January 1998  相似文献   

A long-standing hypothesis posits that morphological changes may be more likely to result from changes in regulation of gene expression than from changes in the protein coding sequences of genes. We have compared the expression pattern of the twisted gastrulation (tsg) gene among five Drosophila species: D. melanogaster, D. simulans, D. subobscura, D. mojavensis, and D. virilis. The tsg gene encodes a secreted protein that is required for the specification of dorsal midline fates in the Drosophila early embryo. TSG is unlike other secreted growth and differentiation factors in Drosophila in that its expression pattern can be experimentally varied and still result in normal development. Because of this, its regulatory region may be freer to diverge than that of other developmental genes whose misexpression may lead to lethal defects. Thus, the tsg gene may be a good indicator of the frequency and nature of evolutionary changes affecting patterns of gene expression. Over ∼60 million years (Myr), the tsg gene has retained a dorsal-on/ventral-off pattern and a middorsal region of expression; but there have been marked changes in the middorsal domain of expression as well as the appearance/loss of other domains of expression along the anterior/posterior axis. Changes between closely related species (∼2–5 Myr since divergence) that are not reflected among more distantly related species suggest frequent changes in gene expression over evolutionary time. These changes in gene expression may serve as the raw material for eventual evolutionary changes in morphology. Received: 24 March 1997 / Accepted: 20 June 1997  相似文献   

