共查询到20条相似文献,搜索用时 31 毫秒
1.
The transition/transversion (ti/tv) rate ratios are estimated by pairwise sequence comparison and joint likelihood analysis
using mitochondrial cytochrome b genes of 28 primate species, representing both the Strepsirrhini (lemurs and lories) and the Anthropoidea (monkeys, apes,
and humans). Pairwise comparison reveals a strong negative correlation between estimates of the ti/tv ratio and the sequence
distance, even when both are corrected for multiple substitutions. The maximum-likelihood estimate of the ti/tv ratio changes
with the species included in the analysis. The ti/tv bias within the lemuriform taxa is found to be as strong as in the anthropoids,
in contradiction to an earlier study which sampled only one lemuriform. Simulations show the surprising result that both the
pairwise correction method and the joint likelihood analysis tend to overcorrect for multiple substitutions and overestimate
the ti/tv ratio, especially at low sequence divergence. The bias, however, is not large enough to account for the observed
patterns. Nucleotide frequency biases, variation of substitution rates among sites, and different evolutionary dynamics at
the three codon positions can be ruled out as possible causes. The likelihood-ratio test suggests that the ti/tv rate ratios
may be variable among evolutionary lineages. Without any biological evidence for such a variation, however, we are left with
no plausible explanations for the observed patterns other than a possible saturation effect due to the unrealistic nature
of the model assumed.
Received: 1 October 1997 / Accepted: 29 September 1998 相似文献
2.
Grishin NV 《Journal of molecular evolution》1999,48(3):264-273
The reliable reconstruction of tree topology from a set of homologous sequences is one of the main goals in the study of
molecular evolution. If consistent estimators of distances from a multiple sequence alignment are known, the distance method
is attractive because the tree reconstruction is consistent. To obtain a distance estimate d, the observed proportion of differences p (p-distance) is usually ``corrected' for multiple and back substitutions by means of a functional relationship d=f(p). In this paper the conditions under which this correction of p-distances will not alter the selection of the tree topology are specified. When these conditions are not fulfilled the selection
of the tree topology may depend on the correction function applied. A novel method which includes estimates of distances not
only between sequence pairs, but between triplets, quadruplets, etc., is proposed to strengthen the proper selection of correction
function and tree topology. A ``super' tree that includes all tree topologies as special cases is introduced.
Received: 17 February 1998 / Accepted: 20 July 1998 相似文献
3.
Charles M. Matthews Cassandra J. Vandenberg Clive N.A. Trotman 《Journal of molecular evolution》1998,46(6):729-733
The Artemia hemoglobin is a dimer comprising two nine-domain covalent polymers in quaternary association. Each polymer is encoded by
a gene representing nine successive globin domains which have different sequences and are presumed to have been copied originally
from a single-domain gene. Two different polymers exist as the result of a complete duplication of the nine-domain gene, allowing
the formation of either homodimers or the heterodimer. The total population size of 18 domains comprising nine corresponding
pairs, coupled with the probability that they reflect several hundred million years of evolution in the same lineage, provides
a unique model in which the process of gene multiplication can be analyzed. The outcome has important implications for the
reliability of local molecular clocks.
The two polymers differ from each other at 11.7% of amino acid sites; however when corresponding individual domains are compared
between polymers, amino acid substitution fluctuates by a factor of 2.7-fold from lowest to highest. This variation is not
obvious at the DNA level: Domain pair identity values fluctuate by 1.3-fold. Identity values are, however, uncorrected for
multiple substitutions, and both silent and nonsilent changes are pooled. Therefore, to determine the variability in relative
substitution rates at the DNA level, we have used the method of Li (1993, J Mol Evol 36:96–99) to determine estimates of nonsynonymous (K
A
) and synonymous (K
S
) substitutions per site for the nine pairs of domains. As expected, the overall level of silent substitutions (K
S
of 56.9%) far exceeded nonsilent substitutions (K
A
of 6.7%); however, for corresponding domain pairs, K
A
fluctuates by 2.3-fold and K
S
by 1.7-fold. The large discrepancies reflected in the expressed protein have accrued within a single lineage and the implication
is that divergence dates of different genera based on amino acid sequences, even with well-studied proteins of reasonable
size, can be wrong by a factor well in excess of 2.
Received: 4 June 1997 / Accepted: 17 December 1997 相似文献
4.
Singhania NA Dyer KD Zhang J Deming MS Bonville CA Domachowske JB Rosenberg HF 《Journal of molecular evolution》1999,49(6):721-728
The two eosinophil ribonucleases, eosinophil-derived neurotoxin (EDN/RNase 2) and eosinophil cationic protein (ECP/RNase
3), are among the most rapidly evolving coding sequences known among primates. The eight mouse genes identified as orthologs
of EDN and ECP form a highly divergent, species-limited cluster. We present here the rat ribonuclease cluster, a group of
eight distinct ribonuclease A superfamily genes that are more closely related to one another than they are to their murine
counterparts. The existence of independent gene clusters suggests that numerous duplications and diversification events have
occurred at these loci recently, sometime after the divergence of these two rodent species (∼10–15 million years ago). Nonsynonymous
substitutions per site (d
N) calculated for the 64 mouse/rat gene pairs indicate that these ribonucleases are incorporating nonsilent mutations at accelerated
rates, and comparisons of nonsynonymous to synonymous substitution (d
N / d
S) suggest that diversity in the mouse ribonuclease cluster is promoted by positive (Darwinian) selection. Although the pressures
promoting similar but clearly independent styles of rapid diversification among these primate and rodent genes remain uncertain,
our recent findings regarding the function of human EDN suggest a role for these ribonucleases in antiviral host defense.
Received: 8 April 1999 / Accepted: 22 June 1999 相似文献
5.
Mitochondrial small-subunit (19S) rDNA sequences were obtained from 10 angiosperms to further characterize sequence divergence
levels and structural variation in this molecule. These sequences were derived from seven holoparasitic (nonphotosynthetic)
angiosperms as well as three photosynthetic plants. 19S rRNA is composed of a conservative core region (ca. 1450 nucleotides)
as well as two variable regions (V1 and V7). In pairwise comparisons of photosynthetic angiosperms to Glycine, the core 19S rDNA sequences differed by less than 1.4%, thus supporting the observation that variation in mitochondrial rDNA
is 3–4 times lower than seen in protein coding and rDNA genes of other subcellular organelles. Sequences representing four
distinct lineages of nonasterid holoparasites showed significantly increased numbers of substitutions in their core 19S rDNA
sequences (2.3–7.6%), thus paralleling previous findings that showed accelerated rates in nuclear (18S) and plastid (16S)
rDNA from the same plants. Relative rate tests confirmed the accelerated nucleotide substitution rates in the holoparasites
whereas rates in nonparasitic plants were not significantly increased. Among comparisons of both parasitic and nonparasitic
plants, transversions outnumbered transitions, in many cases more than two to one. The core 19S rRNA is conserved in sequence
and structure among all nonparasitic angiosperms whereas 19S rRNA from members of holoparasitic Balanophoraceae have unique
extensions to the V5 and V6 variable domains. Substitution and insertion/deletion mutations characterized the V1 and V7 regions
of the nonasterid holoparasites. The V7 sequence of one holoparasite (Scybalium) contained repeat motifs. The cause of substitution rate increases in the holoparasites does not appear to be a result of
RNA editing, hence the underlying molecular mechanism remains to be fully documented.
Received: 18 May 1997 / Accepted: 11 July 1997 相似文献
6.
Partial sequences of two mitochondrial genes, the 12S ribosomal gene (739 bp) and the cytochrome b gene (672 bp), were analyzed in hopes of reconstructing the evolutionary relationships of 11 leporid species, representative
of seven genera. However, partial cytochrome b sequences were of little phylogenetic value in this study. A suite of pairwise comparisons between taxa revealed that at
the intergeneric level, the cytochrome b gene is saturated at synonymous coding positions due to multiple substitution events. Furthermore, variation at the nonsynonymous
positions is limited, rendering the cytochrome b gene of little phylogenetic value for assessing the relationships between leporid genera. If the cytochrome b data are analyzed without accounting for these two classes of nucleotides (i.e., synonymous and nonsynonymous sites), one
may incorrectly conclude that signal exists in the cytochrome b data. The mitochondrial 12S rRNA gene, on the other hand, has not experienced excessive saturation at either stem or loop
positions. Phylogenies reconstructed from the 12S rDNA data support hypotheses based on fossil evidence that African rock
rabbits (Pronolagus) are outside of the main leporid stock and that leporids experienced a rapid radiation. However, the molecular data suggest
that this radiation event occurred in the mid-Miocene several millions of years earlier than the Pleistocene dates suggested
by paleontological evidence.
Received: 23 April 1998 / Accepted: 14 May 1998 相似文献
7.
We present a method for estimating the most general reversible substitution matrix corresponding to a given collection of
pairwise aligned DNA sequences. This matrix can then be used to calculate evolutionary distances between pairs of sequences
in the collection. If only two sequences are considered, our method is equivalent to that of Lanave et al. (1984). The main
novelty of our approach is in combining data from different sequence pairs. We describe a weighting method for pairs of taxa
related by a known tree that results in uniform weights for all branches. Our method for estimating the rate matrix results
in fast execution times, even on large data sets, and does not require knowledge of the phylogenetic relationships among sequences.
In a test case on a primate pseudogene, the matrix we arrived at resembles one obtained using maximum likelihood, and the
resulting distance measure is shown to have better linearity than is obtained in a less general model. 相似文献
8.
Wu MS Tani K Sugiyama H Hibino H Izawa K Tanabe T Nakazaki Y Ishii H Ohashi J Hohjoh H Iseki T Tojo A Nakamura Y Tanioka Y Tokunaga K Asano S 《Journal of molecular evolution》2000,51(3):214-222
A New World monkey, the common marmoset (Callithrix jacchus), will be used as a preclinical animal model to study the feasibility of cell and gene therapy targeting immunological and
hematological disorders. For elucidating the immunogenetic background of common marmoset to further studies, in the present
study, polymorphisms of MHC-DRB genes in this species were examined. Twenty-one Caja-DRB exon 2 alleles, including seven new
ones, were detected by means of subcloning and the polymerase chain reaction-single strand conformation polymorphism (PCR-SSCP)
methods followed by nucleotide sequencing. Based on the alignment of these allele sequences, we designed two pairs of specific
primers and established a PCR-SSCP method for DNA-based histocompatibility typing of the common marmoset. According to the
family segregation data and phylogenetic analyses, we presumed that Caja-DRB alleles could be classified into five different
loci. Southern blotting analysis also supported the existence of multiple DRB loci. The patterns of nucleotide substitutions
suggests that positive selection operates in the antigen-recognition sites of Caja-DRB genes.
Received: 18 February 2000 / Accepted: 17 May 2000 相似文献
9.
Heat Shock Protein 70 Family: Multiple Sequence Comparisons, Function, and Evolution 总被引:14,自引:0,他引:14
The heat shock protein 70 kDa sequences (HSP70) are of great importance as molecular chaperones in protein folding and transport.
They are abundant under conditions of cellular stress. They are highly conserved in all domains of life: Archaea, eubacteria,
eukaryotes, and organelles (mitochondria, chloroplasts). A multiple alignment of a large collection of these sequences was
obtained employing our symmetric-iterative ITERALIGN program (Brocchieri and Karlin 1998). Assessments of conservation are
interpreted in evolutionary terms and with respect to functional implications. Many archaeal sequences (methanogens and halophiles)
tend to align best with the Gram-positive sequences. These two groups also miss a signature segment [about 25 amino acids
(aa) long] present in all other HSP70 species (Gupta and Golding 1993). We observed a second signature sequence of about 4
aa absent from all eukaryotic homologues, significantly aligned in all prokaryotic sequences. Consensus sequences were developed
for eight groups [Archaea, Gram-positive, proteobacterial Gram-negative, singular bacteria, mitochondria, plastids, eukaryotic
endoplasmic reticulum (ER) isoforms, eukaryotic cytoplasmic isoforms]. All group consensus comparisons tend to summarize better
the alignments than do the individual sequence comparisons. The global individual consensus ``matches' 87% with the consensus
of consensuses sequence. A functional analysis of the global consensus identifies a (new) highly significant mixed charge
cluster proximal to the carboxyl terminus of the sequence highlighting the hypercharge run EEDKKRRER (one-letter aa code used).
The individual Archaea and Gram-positive sequences contain a corresponding significant mixed charge cluster in the location
of the charge cluster of the consensus sequence. In contrast, the four Gram-negative proteobacterial sequences of the alignment
do not have a charge cluster (even at the 5% significance level). All eukaryotic HSP70 sequences have the analogous charge
cluster. Strikingly, several of the eukaryotic isoforms show multiple mixed charged clusters. These clusters were interpreted
with supporting data related to HSP70 activity in facilitating chaperone, transport, and secretion function. We observed that
the consensus contains only a single tryptophan residue and a single conserved cysteine. This is interpreted with respect
to the target rule for disaggregating misfolded proteins. The mitochondrial HSP70 connections to bacterial HSP70 are analyzed,
suggesting a polyphyletic split of Trypanosoma and Leishmania protist mitochondrial (Mt) homologues separated from Mt-animal/fungal/plant homologues. Moreover, the HSP70 sequences from
the amitochondrial Entamoeba histolytica and Trichomonas vaginalis species were analyzed. The E. histolytica HSP70 is most similar to the higher eukaryotic cytoplasmic sequences, with significantly weaker alignments to ER sequences
and much diminished matching to all eubacterial, mitochondrial, and chloroplast sequences. This appears to be at variance
with the hypothesis that E. histolytica rather recently lost its mitochondrial organelle. T. vaginalis contains two HSP70 sequences, one Mt-like and the second similar to eukaryotic cytoplasmic sequences suggesting two diverse
origins.
Received: 29 January 1998 / Accepted: 14 May 1998 相似文献
10.
We conducted comprehensive sequence analysis of 5′ flanking regions of primate Alu elements. Information contents were computed and frequencies of 1024 pentanucleotides were measured to approximate the location
of a characteristic sequence and to specify its pattern(s), which may be involved in the integration of Alu elements into their host genomes. A large number of samples was used, the wide region of the 5′ end of Alu elements was analyzed, and comparisons were made among different subfamilies. Through our analyses, ``TTTTAAAAA' or ``(T)
m
(A)
n
' can be stated as a candidate for the characteristic sequence pattern, which resides around the region 5 to 20 base pairs
upstream of the 5′ end of Alu elements. This characteristic sequence pattern was more prominent in the sequences of younger Alus, which is a strong indication that the sequence pattern has a role at the time of Alu integration.
Received: 10 May 1999 / Accepted: 1 October 1999 相似文献
11.
Microsatellite length variation was investigated at a highly variable microsatellite locus in four species of Apodemus. Information obtained from microsatellite allele sequences was contrasted with allele sizes, which included 18 electromorphs.
Additional analysis of a 400-bp unique sequence in the flanking region identified 26 different haplotype sequences or ``true'
alleles in the sample. Three molecular mechanisms, namely, (1) addition/deletion of repeats, (2) substitutions and indels
in the flanking region, and (3) mutations interrupting the repeat, contributed to the generation of allelic variation. Size
homoplasy can be inferred for alleles within populations, from different populations of the same species, and from different
species. We propose that microsatellite flanking sequences may be informative markers for investigating mutation processes
in microsatellite repeats as well as phylogenetic relationships among alleles, populations, and species.
Received: 3 November 1999 / Accepted: 2 May 2000 相似文献
12.
Thomas A. Gorr Barbara K. Mable Traute Kleinschmidt 《Journal of molecular evolution》1998,47(4):471-485
Phylogenetic relationships among reptiles were examined using previously published and newly determined hemoglobin sequences.
Trees reconstructed from these sequences using maximum-parsimony, neighbor-joining, and maximum-likelihood algorithms were
compared with a phylogenetic tree of Amniota, which was assembled on the basis of published morphological data. All analyses differentiated α chains into αA and αD types, which are present in all reptiles except crocodiles, where only αA chains are expressed. The occurrence of the αD chain in squamates (lizards and snakes only in this study) appears to be a general characteristic of these species. Lizards
and snakes also express two types of β chains (βI and βII), while only one type of β chain is present in birds and crocodiles.
Reconstructed hemoglobin trees for both α and β sequences did not yield the monophyletic Archosauria (i.e., crocodilians + birds) and Lepidosauria (i.e., Sphenodon+ squamates) groups defined by the morphology tree. This discrepancy, as well as some other poorly resolved nodes, might be
due to substantial heterogeneity in evolutionary rates among single hemoglobin lineages. Estimation of branch lengths based
on uncorrected amino acid substitutions and on distances corrected for multiple substitutions (PAM distances) revealed that
relative rates for squamate αA and αD chains and crocodilian β chains are at least twice as high as those of the rest of the chains considered. In contrast to
these rate inequalities between reptilian orders, little variation was found within squamates, which allowed determination
of absolute evolutionary rates for this subset of hemoglobins. Rate estimates for hemoglobins of lizards and snakes yielded
1.7 (αA) and 3.3 (β) million years/PAM when calibrated with published divergence time vs. PAM distance correlates for several speciation
events within snakes and for the squamate ↔ sphenodontid split. This suggests that hemoglobin chains of squamate reptiles
evolved ∼3.5 (αA) or ∼1.7 times (β) faster than their mammalian equivalents. These data also were used to obtain a first estimate of some
intrasquamate divergence times.
Received: 15 September 1997 / Accepted: 4 February 1998 相似文献
13.
When divergence between viral species is large, the analysis and comparison of nucleotide or protein sequences are dependent
on mutation biases and multiple substitutions per site leading, among other things, to the underestimation of branch lengths
in phylogenetic trees. To avoid the problem of multiply substituted sites, a method not directly based on the nucleic or protein
sequences has been applied to retroviruses. It consisted of asking questions about genome structure or organization, and gene
function, the series of answers creating coded sequences analyzed by phylogenic software. This method recovered the principal
retroviral groups such as the lentiviruses and spumaviruses and highlighted questions and answers characteristic of each group
of retroviruses. In general, there was reasonable concordance between the coded genome methodology and that based on conventional
phylogeny of the integrase protein sequence, indicating that integrase was fixing mutations slowly enough to marginalize the
problem of multiple substitutions at sites. To a first approximation, this suggests that the acquisition of novel genetic
features generally parallels the fixation of amino acid substitutions.
Received: 18 May 2001 / Accepted: 7 September 2001 相似文献
14.
Tagir Kh. Samigullin William F. Martin Aleksey V. Troitsky Andrey S. Antonov 《Journal of molecular evolution》1999,49(3):310-315
Partial sequences of the rpoC1 gene from two species of angiosperms and three species of gymnosperms (8330 base pairs) were determined and compared. The
data obtained support the hypothesis that angiosperms and gymnosperms are monophyletic and none of the recent groups of the
latter is sister to angiosperms.
Received: 20 November 1998 / Accepted: 26 April 1999 相似文献
15.
The Molecular Evolution of the Vertebrate Trypsinogens 总被引:1,自引:0,他引:1
We expand the already large number of known trypsinogen nucleotide and amino acid sequences by presenting additional trypsinogen
sequences from the tunicate (Boltenia villosa), the lamprey (Petromyzon marinus), the pufferfish (Fugu rubripes), and the frog (Xenopus laevis). The current array of known trypsinogen sequences now spans the entire vertebrate phylogeny. Phylogenetic analysis is made
difficult by the presence of multiple isozymes within species and rates of evolution that vary highly between both species
and isozymes. We nevertheless present a Fitch-Margoliash phylogeny constructed from pairwise distances. We employ this phylogeny
as a vehicle for speculation on the evolution of the trypsinogen gene family as well as the general modes of evolution of
multigene families. Unique attributes of the lamprey and tunicate trypsinogens are noted.
Received: 12 July 1997 相似文献
16.
Steven T. Case Carol Cox Walter C. Bell Rosemary T. Hoffman Jon Martin Robert Hamilton 《Journal of molecular evolution》1997,44(4):452-462
Aquatic larvae of the midge, Chironomus tentans, synthesize a 185-kDa silk protein (sp185) with the cysteine-containing motif Cys-X-Cys-X-Cys (where X is any residue) every
20–28 residues. We report here the cloning and full-length sequence of cDNAs encoding homologous silk proteins from Chironomus pallidivittatus (sp185) and Chironomus thummi (sp220). Deduced amino acid sequences reveal proteins of nearly identical mass composed of 72 blocks of 20–28 residues, 61%
of which can be described by the motif X5–8-Cys-X5-(Trp/Phe/Tyr)-X4-Cys-X-Cys-X-Cys. Spatial arrangement of these residues is preserved more than surrounding sequences. cDNA clones enabled
us to map the genes on polytene chromosomes and identify for the first time the homolog of the Camptochironomus Balbiani ring 3 locus in Chironomus thummi. The apparent molecular weight difference between these proteins (185 vs 220 kDa) is not attributable to primary structure
and may be due to differential N-linked glycosylation. DNA distances and codon substitutions indicate that the C. tentans and C. pallidivittatus genes are more related to each other than either is to C. thummi; however, substitution rates for the 5′- and 3′-halves of these genes are different. Blockwise sequence comparisons suggest
intragenic variation in that some regions evolved slower or faster than the mean and may have been subjected to different
selective pressures.
Received: 30 August 1996 / Accepted: 6 November 1996 相似文献
17.
Masato Nikaido Kuniko Kawai Yin Cao Masashi Harada Satoru Tomita Norihiro Okada Masami Hasegawa 《Journal of molecular evolution》2001,53(4-5):508-516
The complete mitochondrial genomes of two microbats, the horseshoe bat Rhinolophus pumilus, and the Japanese pipistrelle Pipistrellus abramus, and that of an insectivore, the long-clawed shrew Sorex unguiculatus, were sequenced and analyzed phylogenetically by a maximum likelihood method in an effort to enhance our understanding of
mammalian evolution. Our analysis suggested that (1) a sister relationship exists between moles and shrews, which form an
eulipotyphlan clade; (2) chiropterans have a sister-relationship with eulipotyphlans; and (3) the Eulipotyphla/Chiroptera
clade is closely related to fereuungulates (Cetartiodactyla, Perissodactyla and Carnivora). Divergence times on the mammalian
tree were estimated from consideration of a relaxed molecular clock, the amino acid sequences of 12 concatenated mitochondrial
proteins and multiple reference criteria. Moles and shrews were estimated to have diverged approximately 48 MyrBP, and bats
and eulipotyphlans to have diverged 68 MyrBP. Recent phylogenetic controversy over the polyphyly of microbats, the monophyly
of rodents, and the position of hedgehogs is also examined.
Received: 21 December 2000 / Accepted: 16 February 2001 相似文献
18.
S.I. Ortiz-Miranda J.A. Lasalde P.A. Pappone M.G. McNamee 《The Journal of membrane biology》1997,158(1):17-30
We studied the functional effects of single amino acid substitutions in the postulated M4 transmembrane domains of Torpedo californica nicotinic acetylcholine receptors (nAChRs) expressed in Xenopus oocytes at the single-channel level. At low ACh concentrations and cold temperatures, the replacement of wild-type α418Cys
residues with the large, hydrophobic amino acids tryptophan or phenylalanine increased mean open times 26-fold and 3-fold,
respectively. The mutation of a homologous cysteine in the β subunit (β447Trp) had similar but smaller effects on mean open
time. Coexpression of α418Trp and β447Trp had the largest effect on channel open time, increasing mean open time 58-fold.
No changes in conductance or ion selectivity were detected for any of the single subunit amino acid substitutions tested.
However, the coexpression of the α418Trp and β447Trp mutated subunits also produced channels with at least two additional
conductance levels. Block by acetylcholine was apparent in the current records from α418Trp mutants. Burst analysis of the
α418Trp mutations showed an increase in the channel open probability, due to a decrease in the apparent channel closing rate
and a probable increase in the effective opening rate. Our results show that modifications in the primary structure of the
α- and β subunit M4 domain, which are postulated to be at the lipid-protein interface, can significantly alter channel gating,
and that mutations in multiple subunits act additively to increase channel open time.
Received: 27 September 1996/Revised: 28 January 1997 相似文献
19.
A comparison of methods for estimating the transition:transversion ratio from DNA sequences 总被引:1,自引:0,他引:1
Estimation of the ratio of the rates of transitions to transversions (TI:TV ratio) for a collection of aligned nucleotide sequences is important because it provides insight into the process of molecular evolution and because such estimates may be used to further model the evolutionary process for the sequences under consideration. In this paper, we compare several methods for estimating the TI:TV ratio, including the pairwise method [TREE 11 (1996) 158], a modification of the pairwise method due to Ina [J. Mol. Evol. 46 (1998) 521], a method based on parsimony (TREE 11 (1996) 158), a method due to Purvis and Bromham [J. Mol. Evol. 44 (1997) 112] that uses phylogenetically independent pairs of sequences, the maximum likelihood method, and a Bayesian method [Bioinformatics 17 (2001) 754]. We examine the performance of each estimator under several conditions using both simulated and real data. 相似文献
20.
Genomic trees have been constructed based on the presence and absence of families of protein-encoding genes observed in 27
complete genomes, including genomes of 15 free-living organisms. This method does not rely on the identification of suspected
orthologs in each genome, nor the specific alignment used to compare gene sequences because the protein-encoding gene families
are formed by grouping any protein with a pairwise similarity score greater than a preset value. Because of this all inclusive
grouping, this method is resilient to some effects of lateral gene transfer because transfers of genes are masked when the
recipient genome already has a homolog (not necessarily an ortholog) of the incoming gene. Of 71 genes suspected to have been
laterally transferred to the genome of Aeropyrum pernix, only approximately 7 to 15 represent genes where a lateral gene transfer appears to have generated homoplasy in our character
dataset. The genomic tree of the 15 free-living taxa includes six different bacterial orders, six different archaeal orders,
and two different eukaryotic kingdoms. The results are remarkably similar to results obtained by analysis of rRNA. Inclusion
of the other 12 genomes resulted in a tree only broadly similar to that suggested by rRNA with at least some of the differences
due to artifacts caused by the small genome size of many of these species. Very small genomes, such as those of the two Mycoplasma genomes included, fall to the base of the Bacterial domain, a result expected due to the substantial gene loss inherent to
these lineages. Finally, artificial ``partial genomes' were generated by randomly selecting ORFs from the complete genomes
in order to test our ability to recover the tree generated by the whole genome sequences when only partial data are available.
The results indicated that partial genomic data, when sampled randomly, could robustly recover the tree generated by the whole
genome sequences.
Received: 30 May 2001 / Accepted: 10 October 2001 相似文献