首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Estimation of the Transition/Transversion Rate Bias and Species Sampling   总被引:7,自引:0,他引:7  
The transition/transversion (ti/tv) rate ratios are estimated by pairwise sequence comparison and joint likelihood analysis using mitochondrial cytochrome b genes of 28 primate species, representing both the Strepsirrhini (lemurs and lories) and the Anthropoidea (monkeys, apes, and humans). Pairwise comparison reveals a strong negative correlation between estimates of the ti/tv ratio and the sequence distance, even when both are corrected for multiple substitutions. The maximum-likelihood estimate of the ti/tv ratio changes with the species included in the analysis. The ti/tv bias within the lemuriform taxa is found to be as strong as in the anthropoids, in contradiction to an earlier study which sampled only one lemuriform. Simulations show the surprising result that both the pairwise correction method and the joint likelihood analysis tend to overcorrect for multiple substitutions and overestimate the ti/tv ratio, especially at low sequence divergence. The bias, however, is not large enough to account for the observed patterns. Nucleotide frequency biases, variation of substitution rates among sites, and different evolutionary dynamics at the three codon positions can be ruled out as possible causes. The likelihood-ratio test suggests that the ti/tv rate ratios may be variable among evolutionary lineages. Without any biological evidence for such a variation, however, we are left with no plausible explanations for the observed patterns other than a possible saturation effect due to the unrealistic nature of the model assumed. Received: 1 October 1997 / Accepted: 29 September 1998  相似文献   

2.
The reliable reconstruction of tree topology from a set of homologous sequences is one of the main goals in the study of molecular evolution. If consistent estimators of distances from a multiple sequence alignment are known, the distance method is attractive because the tree reconstruction is consistent. To obtain a distance estimate d, the observed proportion of differences p (p-distance) is usually ``corrected' for multiple and back substitutions by means of a functional relationship d=f(p). In this paper the conditions under which this correction of p-distances will not alter the selection of the tree topology are specified. When these conditions are not fulfilled the selection of the tree topology may depend on the correction function applied. A novel method which includes estimates of distances not only between sequence pairs, but between triplets, quadruplets, etc., is proposed to strengthen the proper selection of correction function and tree topology. A ``super' tree that includes all tree topologies as special cases is introduced. Received: 17 February 1998 / Accepted: 20 July 1998  相似文献   

3.
The Artemia hemoglobin is a dimer comprising two nine-domain covalent polymers in quaternary association. Each polymer is encoded by a gene representing nine successive globin domains which have different sequences and are presumed to have been copied originally from a single-domain gene. Two different polymers exist as the result of a complete duplication of the nine-domain gene, allowing the formation of either homodimers or the heterodimer. The total population size of 18 domains comprising nine corresponding pairs, coupled with the probability that they reflect several hundred million years of evolution in the same lineage, provides a unique model in which the process of gene multiplication can be analyzed. The outcome has important implications for the reliability of local molecular clocks. The two polymers differ from each other at 11.7% of amino acid sites; however when corresponding individual domains are compared between polymers, amino acid substitution fluctuates by a factor of 2.7-fold from lowest to highest. This variation is not obvious at the DNA level: Domain pair identity values fluctuate by 1.3-fold. Identity values are, however, uncorrected for multiple substitutions, and both silent and nonsilent changes are pooled. Therefore, to determine the variability in relative substitution rates at the DNA level, we have used the method of Li (1993, J Mol Evol 36:96–99) to determine estimates of nonsynonymous (K A ) and synonymous (K S ) substitutions per site for the nine pairs of domains. As expected, the overall level of silent substitutions (K S of 56.9%) far exceeded nonsilent substitutions (K A of 6.7%); however, for corresponding domain pairs, K A fluctuates by 2.3-fold and K S by 1.7-fold. The large discrepancies reflected in the expressed protein have accrued within a single lineage and the implication is that divergence dates of different genera based on amino acid sequences, even with well-studied proteins of reasonable size, can be wrong by a factor well in excess of 2. Received: 4 June 1997 / Accepted: 17 December 1997  相似文献   

4.
The two eosinophil ribonucleases, eosinophil-derived neurotoxin (EDN/RNase 2) and eosinophil cationic protein (ECP/RNase 3), are among the most rapidly evolving coding sequences known among primates. The eight mouse genes identified as orthologs of EDN and ECP form a highly divergent, species-limited cluster. We present here the rat ribonuclease cluster, a group of eight distinct ribonuclease A superfamily genes that are more closely related to one another than they are to their murine counterparts. The existence of independent gene clusters suggests that numerous duplications and diversification events have occurred at these loci recently, sometime after the divergence of these two rodent species (∼10–15 million years ago). Nonsynonymous substitutions per site (d N) calculated for the 64 mouse/rat gene pairs indicate that these ribonucleases are incorporating nonsilent mutations at accelerated rates, and comparisons of nonsynonymous to synonymous substitution (d N / d S) suggest that diversity in the mouse ribonuclease cluster is promoted by positive (Darwinian) selection. Although the pressures promoting similar but clearly independent styles of rapid diversification among these primate and rodent genes remain uncertain, our recent findings regarding the function of human EDN suggest a role for these ribonucleases in antiviral host defense. Received: 8 April 1999 / Accepted: 22 June 1999  相似文献   

5.
Mitochondrial small-subunit (19S) rDNA sequences were obtained from 10 angiosperms to further characterize sequence divergence levels and structural variation in this molecule. These sequences were derived from seven holoparasitic (nonphotosynthetic) angiosperms as well as three photosynthetic plants. 19S rRNA is composed of a conservative core region (ca. 1450 nucleotides) as well as two variable regions (V1 and V7). In pairwise comparisons of photosynthetic angiosperms to Glycine, the core 19S rDNA sequences differed by less than 1.4%, thus supporting the observation that variation in mitochondrial rDNA is 3–4 times lower than seen in protein coding and rDNA genes of other subcellular organelles. Sequences representing four distinct lineages of nonasterid holoparasites showed significantly increased numbers of substitutions in their core 19S rDNA sequences (2.3–7.6%), thus paralleling previous findings that showed accelerated rates in nuclear (18S) and plastid (16S) rDNA from the same plants. Relative rate tests confirmed the accelerated nucleotide substitution rates in the holoparasites whereas rates in nonparasitic plants were not significantly increased. Among comparisons of both parasitic and nonparasitic plants, transversions outnumbered transitions, in many cases more than two to one. The core 19S rRNA is conserved in sequence and structure among all nonparasitic angiosperms whereas 19S rRNA from members of holoparasitic Balanophoraceae have unique extensions to the V5 and V6 variable domains. Substitution and insertion/deletion mutations characterized the V1 and V7 regions of the nonasterid holoparasites. The V7 sequence of one holoparasite (Scybalium) contained repeat motifs. The cause of substitution rate increases in the holoparasites does not appear to be a result of RNA editing, hence the underlying molecular mechanism remains to be fully documented. Received: 18 May 1997 / Accepted: 11 July 1997  相似文献   

6.
Partial sequences of two mitochondrial genes, the 12S ribosomal gene (739 bp) and the cytochrome b gene (672 bp), were analyzed in hopes of reconstructing the evolutionary relationships of 11 leporid species, representative of seven genera. However, partial cytochrome b sequences were of little phylogenetic value in this study. A suite of pairwise comparisons between taxa revealed that at the intergeneric level, the cytochrome b gene is saturated at synonymous coding positions due to multiple substitution events. Furthermore, variation at the nonsynonymous positions is limited, rendering the cytochrome b gene of little phylogenetic value for assessing the relationships between leporid genera. If the cytochrome b data are analyzed without accounting for these two classes of nucleotides (i.e., synonymous and nonsynonymous sites), one may incorrectly conclude that signal exists in the cytochrome b data. The mitochondrial 12S rRNA gene, on the other hand, has not experienced excessive saturation at either stem or loop positions. Phylogenies reconstructed from the 12S rDNA data support hypotheses based on fossil evidence that African rock rabbits (Pronolagus) are outside of the main leporid stock and that leporids experienced a rapid radiation. However, the molecular data suggest that this radiation event occurred in the mid-Miocene several millions of years earlier than the Pleistocene dates suggested by paleontological evidence. Received: 23 April 1998 / Accepted: 14 May 1998  相似文献   

7.
We present a method for estimating the most general reversible substitution matrix corresponding to a given collection of pairwise aligned DNA sequences. This matrix can then be used to calculate evolutionary distances between pairs of sequences in the collection. If only two sequences are considered, our method is equivalent to that of Lanave et al. (1984). The main novelty of our approach is in combining data from different sequence pairs. We describe a weighting method for pairs of taxa related by a known tree that results in uniform weights for all branches. Our method for estimating the rate matrix results in fast execution times, even on large data sets, and does not require knowledge of the phylogenetic relationships among sequences. In a test case on a primate pseudogene, the matrix we arrived at resembles one obtained using maximum likelihood, and the resulting distance measure is shown to have better linearity than is obtained in a less general model.  相似文献   

8.
A New World monkey, the common marmoset (Callithrix jacchus), will be used as a preclinical animal model to study the feasibility of cell and gene therapy targeting immunological and hematological disorders. For elucidating the immunogenetic background of common marmoset to further studies, in the present study, polymorphisms of MHC-DRB genes in this species were examined. Twenty-one Caja-DRB exon 2 alleles, including seven new ones, were detected by means of subcloning and the polymerase chain reaction-single strand conformation polymorphism (PCR-SSCP) methods followed by nucleotide sequencing. Based on the alignment of these allele sequences, we designed two pairs of specific primers and established a PCR-SSCP method for DNA-based histocompatibility typing of the common marmoset. According to the family segregation data and phylogenetic analyses, we presumed that Caja-DRB alleles could be classified into five different loci. Southern blotting analysis also supported the existence of multiple DRB loci. The patterns of nucleotide substitutions suggests that positive selection operates in the antigen-recognition sites of Caja-DRB genes. Received: 18 February 2000 / Accepted: 17 May 2000  相似文献   

9.
The heat shock protein 70 kDa sequences (HSP70) are of great importance as molecular chaperones in protein folding and transport. They are abundant under conditions of cellular stress. They are highly conserved in all domains of life: Archaea, eubacteria, eukaryotes, and organelles (mitochondria, chloroplasts). A multiple alignment of a large collection of these sequences was obtained employing our symmetric-iterative ITERALIGN program (Brocchieri and Karlin 1998). Assessments of conservation are interpreted in evolutionary terms and with respect to functional implications. Many archaeal sequences (methanogens and halophiles) tend to align best with the Gram-positive sequences. These two groups also miss a signature segment [about 25 amino acids (aa) long] present in all other HSP70 species (Gupta and Golding 1993). We observed a second signature sequence of about 4 aa absent from all eukaryotic homologues, significantly aligned in all prokaryotic sequences. Consensus sequences were developed for eight groups [Archaea, Gram-positive, proteobacterial Gram-negative, singular bacteria, mitochondria, plastids, eukaryotic endoplasmic reticulum (ER) isoforms, eukaryotic cytoplasmic isoforms]. All group consensus comparisons tend to summarize better the alignments than do the individual sequence comparisons. The global individual consensus ``matches' 87% with the consensus of consensuses sequence. A functional analysis of the global consensus identifies a (new) highly significant mixed charge cluster proximal to the carboxyl terminus of the sequence highlighting the hypercharge run EEDKKRRER (one-letter aa code used). The individual Archaea and Gram-positive sequences contain a corresponding significant mixed charge cluster in the location of the charge cluster of the consensus sequence. In contrast, the four Gram-negative proteobacterial sequences of the alignment do not have a charge cluster (even at the 5% significance level). All eukaryotic HSP70 sequences have the analogous charge cluster. Strikingly, several of the eukaryotic isoforms show multiple mixed charged clusters. These clusters were interpreted with supporting data related to HSP70 activity in facilitating chaperone, transport, and secretion function. We observed that the consensus contains only a single tryptophan residue and a single conserved cysteine. This is interpreted with respect to the target rule for disaggregating misfolded proteins. The mitochondrial HSP70 connections to bacterial HSP70 are analyzed, suggesting a polyphyletic split of Trypanosoma and Leishmania protist mitochondrial (Mt) homologues separated from Mt-animal/fungal/plant homologues. Moreover, the HSP70 sequences from the amitochondrial Entamoeba histolytica and Trichomonas vaginalis species were analyzed. The E. histolytica HSP70 is most similar to the higher eukaryotic cytoplasmic sequences, with significantly weaker alignments to ER sequences and much diminished matching to all eubacterial, mitochondrial, and chloroplast sequences. This appears to be at variance with the hypothesis that E. histolytica rather recently lost its mitochondrial organelle. T. vaginalis contains two HSP70 sequences, one Mt-like and the second similar to eukaryotic cytoplasmic sequences suggesting two diverse origins. Received: 29 January 1998 / Accepted: 14 May 1998  相似文献   

10.
We conducted comprehensive sequence analysis of 5′ flanking regions of primate Alu elements. Information contents were computed and frequencies of 1024 pentanucleotides were measured to approximate the location of a characteristic sequence and to specify its pattern(s), which may be involved in the integration of Alu elements into their host genomes. A large number of samples was used, the wide region of the 5′ end of Alu elements was analyzed, and comparisons were made among different subfamilies. Through our analyses, ``TTTTAAAAA' or ``(T) m (A) n ' can be stated as a candidate for the characteristic sequence pattern, which resides around the region 5 to 20 base pairs upstream of the 5′ end of Alu elements. This characteristic sequence pattern was more prominent in the sequences of younger Alus, which is a strong indication that the sequence pattern has a role at the time of Alu integration. Received: 10 May 1999 / Accepted: 1 October 1999  相似文献   

11.
Microsatellite length variation was investigated at a highly variable microsatellite locus in four species of Apodemus. Information obtained from microsatellite allele sequences was contrasted with allele sizes, which included 18 electromorphs. Additional analysis of a 400-bp unique sequence in the flanking region identified 26 different haplotype sequences or ``true' alleles in the sample. Three molecular mechanisms, namely, (1) addition/deletion of repeats, (2) substitutions and indels in the flanking region, and (3) mutations interrupting the repeat, contributed to the generation of allelic variation. Size homoplasy can be inferred for alleles within populations, from different populations of the same species, and from different species. We propose that microsatellite flanking sequences may be informative markers for investigating mutation processes in microsatellite repeats as well as phylogenetic relationships among alleles, populations, and species. Received: 3 November 1999 / Accepted: 2 May 2000  相似文献   

12.
Phylogenetic relationships among reptiles were examined using previously published and newly determined hemoglobin sequences. Trees reconstructed from these sequences using maximum-parsimony, neighbor-joining, and maximum-likelihood algorithms were compared with a phylogenetic tree of Amniota, which was assembled on the basis of published morphological data. All analyses differentiated α chains into αA and αD types, which are present in all reptiles except crocodiles, where only αA chains are expressed. The occurrence of the αD chain in squamates (lizards and snakes only in this study) appears to be a general characteristic of these species. Lizards and snakes also express two types of β chains (βI and βII), while only one type of β chain is present in birds and crocodiles. Reconstructed hemoglobin trees for both α and β sequences did not yield the monophyletic Archosauria (i.e., crocodilians + birds) and Lepidosauria (i.e., Sphenodon+ squamates) groups defined by the morphology tree. This discrepancy, as well as some other poorly resolved nodes, might be due to substantial heterogeneity in evolutionary rates among single hemoglobin lineages. Estimation of branch lengths based on uncorrected amino acid substitutions and on distances corrected for multiple substitutions (PAM distances) revealed that relative rates for squamate αA and αD chains and crocodilian β chains are at least twice as high as those of the rest of the chains considered. In contrast to these rate inequalities between reptilian orders, little variation was found within squamates, which allowed determination of absolute evolutionary rates for this subset of hemoglobins. Rate estimates for hemoglobins of lizards and snakes yielded 1.7 (αA) and 3.3 (β) million years/PAM when calibrated with published divergence time vs. PAM distance correlates for several speciation events within snakes and for the squamate ↔ sphenodontid split. This suggests that hemoglobin chains of squamate reptiles evolved ∼3.5 (αA) or ∼1.7 times (β) faster than their mammalian equivalents. These data also were used to obtain a first estimate of some intrasquamate divergence times. Received: 15 September 1997 / Accepted: 4 February 1998  相似文献   

13.
When divergence between viral species is large, the analysis and comparison of nucleotide or protein sequences are dependent on mutation biases and multiple substitutions per site leading, among other things, to the underestimation of branch lengths in phylogenetic trees. To avoid the problem of multiply substituted sites, a method not directly based on the nucleic or protein sequences has been applied to retroviruses. It consisted of asking questions about genome structure or organization, and gene function, the series of answers creating coded sequences analyzed by phylogenic software. This method recovered the principal retroviral groups such as the lentiviruses and spumaviruses and highlighted questions and answers characteristic of each group of retroviruses. In general, there was reasonable concordance between the coded genome methodology and that based on conventional phylogeny of the integrase protein sequence, indicating that integrase was fixing mutations slowly enough to marginalize the problem of multiple substitutions at sites. To a first approximation, this suggests that the acquisition of novel genetic features generally parallels the fixation of amino acid substitutions. Received: 18 May 2001 / Accepted: 7 September 2001  相似文献   

14.
Partial sequences of the rpoC1 gene from two species of angiosperms and three species of gymnosperms (8330 base pairs) were determined and compared. The data obtained support the hypothesis that angiosperms and gymnosperms are monophyletic and none of the recent groups of the latter is sister to angiosperms. Received: 20 November 1998 / Accepted: 26 April 1999  相似文献   

15.
The Molecular Evolution of the Vertebrate Trypsinogens   总被引:1,自引:0,他引:1  
We expand the already large number of known trypsinogen nucleotide and amino acid sequences by presenting additional trypsinogen sequences from the tunicate (Boltenia villosa), the lamprey (Petromyzon marinus), the pufferfish (Fugu rubripes), and the frog (Xenopus laevis). The current array of known trypsinogen sequences now spans the entire vertebrate phylogeny. Phylogenetic analysis is made difficult by the presence of multiple isozymes within species and rates of evolution that vary highly between both species and isozymes. We nevertheless present a Fitch-Margoliash phylogeny constructed from pairwise distances. We employ this phylogeny as a vehicle for speculation on the evolution of the trypsinogen gene family as well as the general modes of evolution of multigene families. Unique attributes of the lamprey and tunicate trypsinogens are noted. Received: 12 July 1997  相似文献   

16.
Aquatic larvae of the midge, Chironomus tentans, synthesize a 185-kDa silk protein (sp185) with the cysteine-containing motif Cys-X-Cys-X-Cys (where X is any residue) every 20–28 residues. We report here the cloning and full-length sequence of cDNAs encoding homologous silk proteins from Chironomus pallidivittatus (sp185) and Chironomus thummi (sp220). Deduced amino acid sequences reveal proteins of nearly identical mass composed of 72 blocks of 20–28 residues, 61% of which can be described by the motif X5–8-Cys-X5-(Trp/Phe/Tyr)-X4-Cys-X-Cys-X-Cys. Spatial arrangement of these residues is preserved more than surrounding sequences. cDNA clones enabled us to map the genes on polytene chromosomes and identify for the first time the homolog of the Camptochironomus Balbiani ring 3 locus in Chironomus thummi. The apparent molecular weight difference between these proteins (185 vs 220 kDa) is not attributable to primary structure and may be due to differential N-linked glycosylation. DNA distances and codon substitutions indicate that the C. tentans and C. pallidivittatus genes are more related to each other than either is to C. thummi; however, substitution rates for the 5′- and 3′-halves of these genes are different. Blockwise sequence comparisons suggest intragenic variation in that some regions evolved slower or faster than the mean and may have been subjected to different selective pressures. Received: 30 August 1996 / Accepted: 6 November 1996  相似文献   

17.
The complete mitochondrial genomes of two microbats, the horseshoe bat Rhinolophus pumilus, and the Japanese pipistrelle Pipistrellus abramus, and that of an insectivore, the long-clawed shrew Sorex unguiculatus, were sequenced and analyzed phylogenetically by a maximum likelihood method in an effort to enhance our understanding of mammalian evolution. Our analysis suggested that (1) a sister relationship exists between moles and shrews, which form an eulipotyphlan clade; (2) chiropterans have a sister-relationship with eulipotyphlans; and (3) the Eulipotyphla/Chiroptera clade is closely related to fereuungulates (Cetartiodactyla, Perissodactyla and Carnivora). Divergence times on the mammalian tree were estimated from consideration of a relaxed molecular clock, the amino acid sequences of 12 concatenated mitochondrial proteins and multiple reference criteria. Moles and shrews were estimated to have diverged approximately 48 MyrBP, and bats and eulipotyphlans to have diverged 68 MyrBP. Recent phylogenetic controversy over the polyphyly of microbats, the monophyly of rodents, and the position of hedgehogs is also examined. Received: 21 December 2000 / Accepted: 16 February 2001  相似文献   

18.
We studied the functional effects of single amino acid substitutions in the postulated M4 transmembrane domains of Torpedo californica nicotinic acetylcholine receptors (nAChRs) expressed in Xenopus oocytes at the single-channel level. At low ACh concentrations and cold temperatures, the replacement of wild-type α418Cys residues with the large, hydrophobic amino acids tryptophan or phenylalanine increased mean open times 26-fold and 3-fold, respectively. The mutation of a homologous cysteine in the β subunit (β447Trp) had similar but smaller effects on mean open time. Coexpression of α418Trp and β447Trp had the largest effect on channel open time, increasing mean open time 58-fold. No changes in conductance or ion selectivity were detected for any of the single subunit amino acid substitutions tested. However, the coexpression of the α418Trp and β447Trp mutated subunits also produced channels with at least two additional conductance levels. Block by acetylcholine was apparent in the current records from α418Trp mutants. Burst analysis of the α418Trp mutations showed an increase in the channel open probability, due to a decrease in the apparent channel closing rate and a probable increase in the effective opening rate. Our results show that modifications in the primary structure of the α- and β subunit M4 domain, which are postulated to be at the lipid-protein interface, can significantly alter channel gating, and that mutations in multiple subunits act additively to increase channel open time. Received: 27 September 1996/Revised: 28 January 1997  相似文献   

19.
Estimation of the ratio of the rates of transitions to transversions (TI:TV ratio) for a collection of aligned nucleotide sequences is important because it provides insight into the process of molecular evolution and because such estimates may be used to further model the evolutionary process for the sequences under consideration. In this paper, we compare several methods for estimating the TI:TV ratio, including the pairwise method [TREE 11 (1996) 158], a modification of the pairwise method due to Ina [J. Mol. Evol. 46 (1998) 521], a method based on parsimony (TREE 11 (1996) 158), a method due to Purvis and Bromham [J. Mol. Evol. 44 (1997) 112] that uses phylogenetically independent pairs of sequences, the maximum likelihood method, and a Bayesian method [Bioinformatics 17 (2001) 754]. We examine the performance of each estimator under several conditions using both simulated and real data.  相似文献   

20.
Genomic trees have been constructed based on the presence and absence of families of protein-encoding genes observed in 27 complete genomes, including genomes of 15 free-living organisms. This method does not rely on the identification of suspected orthologs in each genome, nor the specific alignment used to compare gene sequences because the protein-encoding gene families are formed by grouping any protein with a pairwise similarity score greater than a preset value. Because of this all inclusive grouping, this method is resilient to some effects of lateral gene transfer because transfers of genes are masked when the recipient genome already has a homolog (not necessarily an ortholog) of the incoming gene. Of 71 genes suspected to have been laterally transferred to the genome of Aeropyrum pernix, only approximately 7 to 15 represent genes where a lateral gene transfer appears to have generated homoplasy in our character dataset. The genomic tree of the 15 free-living taxa includes six different bacterial orders, six different archaeal orders, and two different eukaryotic kingdoms. The results are remarkably similar to results obtained by analysis of rRNA. Inclusion of the other 12 genomes resulted in a tree only broadly similar to that suggested by rRNA with at least some of the differences due to artifacts caused by the small genome size of many of these species. Very small genomes, such as those of the two Mycoplasma genomes included, fall to the base of the Bacterial domain, a result expected due to the substantial gene loss inherent to these lineages. Finally, artificial ``partial genomes' were generated by randomly selecting ORFs from the complete genomes in order to test our ability to recover the tree generated by the whole genome sequences when only partial data are available. The results indicated that partial genomic data, when sampled randomly, could robustly recover the tree generated by the whole genome sequences. Received: 30 May 2001 / Accepted: 10 October 2001  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号