首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A heuristic approach to search for the maximum-likelihood (ML) phylogenetic tree based on a genetic algorithm (GA) has been developed. It outputs the best tree as well as multiple alternative trees that are not significantly worse than the best one on the basis of the likelihood criterion. These near-optimum trees are subjected to further statistical tests. This approach enables ones to infer phylogenetic trees of over 20 taxa taking account of the rate heterogeneity among sites on practical time scales on a PC cluster. Computer simulations were conducted to compare the efficiency of the present approach with that of several likelihood-based methods and distance-based methods, using amino acid sequence data of relatively large (5–24) taxa. The superiority of the ML method over distance-based methods increases as the condition of simulations becomes more realistic (an incorrect model is assumed or many taxa are involved). This approach was applied to the inference of the universal tree based on the concatenated amino acid sequences of vertically descendent genes that are shared among all genomes whose complete sequences have been reported. The inferred tree strongly supports that Archaea is paraphyletic and Eukarya is specifically related to Crenarchaeota. Apart from the paraphyly of Archaea and some minor disagreements, the universal tree based on these genes is largely consistent with the universal tree based on SSU rRNA. Received: 4 January 2001 / Accepted: 16 May 2001  相似文献   

2.
Mitochondrial DNA (mtDNA) sequences are widely used for inferring the phylogenetic relationships among species. Clearly, the assumed model of nucleotide or amino acid substitution used should be as realistic as possible. Dependence among neighboring nucleotides in a codon complicates modeling of nucleotide substitutions in protein-encoding genes. It seems preferable to model amino acid substitution rather than nucleotide substitution. Therefore, we present a transition probability matrix of the general reversible Markov model of amino acid substitution for mtDNA-encoded proteins. The matrix is estimated by the maximum likelihood (ML) method from the complete sequence data of mtDNA from 20 vertebrate species. This matrix represents the substitution pattern of the mtDNA-encoded proteins and shows some differences from the matrix estimated from the nuclear-encoded proteins. The use of this matrix would be recommended in inferring trees from mtDNA-encoded protein sequences by the ML method. Received: 3 May 1995 / Accepted: 31 October 1995  相似文献   

3.
The phylogenetic position of hagfishes in vertebrate evolution is currently controversial. The 18S and 28S rRNA trees support the monophyly of hagfishes and lampreys. In contrast, the mitochondrial DNAs suggest the close association of lampreys and gnathostomes. To clarify this controversial issue, we have conducted cloning and sequencing of the four nuclear DNA–coded single-copy genes encoding the triose phosphate isomerase, calreticulin, and the largest subunit of RNA polymerase II and III. Based on these proteins, together with the Mn superoxide dismutase for which hagfish and lamprey sequences are available in database, phylogenetic trees have been inferred by the maximum likelihood (ML) method of protein phylogeny. It was shown that all the five proteins prefer the monophyletic tree of cyclostomes, and the total log-likelihood of the five proteins significantly supports the cyclostome monophyly at the level of ±1 SE. The ML trees of aldolase family comprising three nonallelic isoforms and the complement component group comprising C3, C4, and C5, both of which diverged during vertebrate evolution by gene duplications, also suggest the cyclostome monophyly. Received: 28 April 1999 / Accepted: 30 June 1999  相似文献   

4.
The phylogenetic placement of the Aquifex and Thermotoga lineages has been inferred from (i) the concatenated ribosomal proteins S10, L3, L4, L23, L2, S19, L22, and S3 encoded in the S10 operon (833 aa positions); (ii) the joint sequences of the elongation factors Tu(1α) and G(2) coded by the str operon tuf and fus genes (733 aa positions); and (iii) the joint RNA polymerase β- and β′-type subunits encoded in the rpoBC operon (1130 aa positions). Phylogenies of r-protein and EF sequences support with moderate (r-proteins) to high statistical confidence (EFs) the placement of the two hyperthermophiles at the base of the bacterial clade in agreement with phylogenies of rRNA sequences. In the more robust EF-based phylogenies, the branching of Aquifex and Thermotoga below the successive bacterial lineages is given at bootstrap proportions of 82% (maximum likelihood; ML) and 85% (maximum parsimony; MP), in contrast to the trees inferred from the separate EF-Tu(1α) and EF-G(2) data sets, which lack both resolution and statistical robustness. In the EF analysis MP outperforms ML in discriminating (at the 0.05 level) trees having A. pyrophilus and T. maritima as the most basal lineages from competing alternatives that have (i) mesophiles, or the Thermus genus, as the deepest bacterial radiation and (ii) a monophyletic A. pyrophilusT. maritima cluster situated at the base of the bacterial clade. RNAP-based phylogenies are equivocal with respect to the Aquifex and Thermotoga placements. The two hyperthermophiles fall basal to all other bacterial phyla when potential artifacts contributed by the compositionally biased and fast-evolving Mycoplasma genitalium and Mycoplasma pneumoniae sequences are eschewed. However, the branching order of the phyla is tenuously supported in ML trees inferred by the exhaustive search method and is unresolved in ML trees inferred by the quartet puzzling algorithm. A rooting of the RNA polymerase-subunit tree at the mycoplasma level seen in both the MP trees and the ML trees reconstructed with suboptimal amino acid substitution models is not supported by the EF-based phylogenies which robustly affiliate mycoplasmas with low-G+C gram-positives and, most probably, reflects a ``long branch attraction' artifact. Received: 22 September 1999 / Accepted: 11 January 2000  相似文献   

5.
The photolyase–blue-light photoreceptor family is composed of cyclobutane pyrimidine dimer (CPD) photolyases, (6-4) photolyases, and blue-light photoreceptors. CPD photolyase and (6-4) photolyase are involved in photoreactivation for CPD and (6-4) photoproducts, respectively. CPD photolyase is classified into two subclasses, class I and II, based on amino acid sequence similarity. Blue-light photoreceptors are essential light detectors for the early development of plants. The amino acid sequence of the receptor is similar to those of the photolyases, although the receptor does not show the activity of photoreactivation. To investigate the functional divergence of the family, the amino acid sequences of the proteins were aligned. The alignment suggested that the recognition mechanisms of the cofactors and the substrate of class I CPD photolyases (class I photolyases) are different from those of class II CPD photolyases (class II photolyases). We reconstructed the phylogenetic trees based on the alignment by the NJ method and the ML method. The phylogenetic analysis suggested that the ancestral gene of the family had encoded CPD photolyase and that the gene duplication of the ancestral proteins had occurred at least eight times before the divergence between eubacteria and eukaryotes. Received: 23 October 1996 / Accepted: 1 April 1997  相似文献   

6.
Mammalian secretory ribonucleases (RNases 1) form a family of extensively studied homologous proteins that were already used for phylogenetic analyses at the protein sequence level previously. In this paper we report the determination of six ribonuclease gene sequences of Artiodactyla and two of Cetacea. These sequences have been used with ruminant homologues in phylogenetic analyses that supported a group including hippopotamus and toothed whales, a group of ruminant pancreatic and brain-type ribonucleases, and a group of tylopod sequences containing the Arabian camel pancreatic ribonuclease gene and Arabian and Bactrian camel and alpaca RNase 1 genes of unknown function. In all analyses the pig was the first diverging artiodactyl. This DNA-based tree is compatible to published trees derived from a number of other genes. The differences to those trees obtained with ribonuclease protein sequences can be explained by the influence of convergence of pancreatic RNases from hippopotamus, camel, and ruminants and by taking into account the information from third codon positions in the DNA-based analyses. The evolution of sequence features of ribonucleases such as the distribution of positively charged amino acids and of potential glycosylation sites is described with regard to increased double-stranded RNA cleavage that is observed in several cetacean and artiodactyl RNases which may have no role in ruminant or ruminant-like digestion. Received: 2 June 1998 / Accepted: 31 August 1998  相似文献   

7.
A Laminaria saccharina genomic library in the phage EMBL 4 was used to isolate and sequence a full-length gene encoding a fucoxanthin-chlorophyll a/c-binding protein. Contrary to diatom homologues, the coding sequence is interrupted by an intron of about 900 bp which is located in the middle of the transit peptide. The deduced amino acid sequence of the mature protein is very similar to those of related proteins from Macrocystis pyrifera (Laminariales) and, to a lesser extent, to those from diatoms and Chrysophyceae. Seven of the eight putative chlorophyll-binding amino acids determined in green plants are also present. Alignments of different sequences related to the light-harvesting proteins (LHC) demonstrate a structural similarity among the three transmembrane helices and suggest a unique ancestral helix preceded by two β-turns. The β-turns are conserved in front of the second helices of the chlorophyll a/c proteins more so than in chlorophyll a/b proteins. Phylogenetic trees generated from sequence data indicate that fucoxanthin-chlorophyll-binding proteins diverged prior to the separation of photosystem I and photosystem II LHC genes of green plants. Among the fucoxanthin-containing algae, LHC I or II families could not be distinguished at this time. Received: 14 February 1996 / Accepted: 4 April 1996  相似文献   

8.
Complete sequences of seven protein coding genes from Penaeus notialis mitochondrial DNA were compared in base composition and codon usage with homologous genes from Artemia franciscana and four insects. The crustacean genes are significantly less A + T-rich than their counterpart in insects and the pattern of codon usage (ratio of G + C-rich versus A + T-rich codon) is less biased. A phylogenetic analysis using amino acid sequences of the seven corresponding polypeptides supports a sister-taxon status for mollusks–annelid and arthropods. Furthermore, a distance matrix-based tree and two most-parsimonious trees both suggest that crustaceans are paraphyletic with respect to insects. This is also supported by the inclusion of Panulirus argus COII (complete) and COI and COIII (partial) sequence data. From analysis of single and combined genes to infer phylogenies, it is observed that obtained from single genes are not well supported in most topologies cases and notably differ from that of the tree based on all seven genes. Received: 25 August 1998 / Accepted: 8 March 1999  相似文献   

9.
The codon-degeneracy model (CDM) predicts relative frequencies of substitution for any set of homologous protein-coding DNA sequences based on patterns of nucleotide degeneracy, codon composition, and the assumption of selective neutrality. However, at present, the CDM is reliant on outside estimates of transition bias. A new method by which the power of the CDM can be used to find a synonymous transition bias that is optimal for any given phylogenetic tree topology is presented. An example is illustrated that utilizes optimized transition biases to generate CDM GF-scores for every possible phylogenetic tree for pocket gophers of the genus Orthogeomys. The resulting distribution of CDM GF-scores is compared and contrasted with the results of maximum parsimony and maximum likelihood methods. Although convergence on a single tree topology by the CDM and another method indicates greater support for that particular tree, the value of CDM GF-score as the sole optimality criterion for phylogeny reconstruction remains to be determined. It is clear, however, that the a priori estimation of an optimum transition bias from codon composition has a direct application to differentiating between alternative trees. Received: 13 October 1999 / Accepted: 28 April 2000  相似文献   

10.
Branch length estimates play a central role in maximum-likelihood (ML) and minimum-evolution (ME) methods of phylogenetic inference. For various reasons, branch length estimates are not statistically independent under ML or ME. We studied the response of correlations among branch length estimates to the degree of among-branch length heterogeneity (BLH) in the model (true) tree. The frequency and magnitude of (especially negative) correlations among branch length estimates were both shown to increase as BLH increases under simulation and analytically. For ML, we used the correct model (Jukes–Cantor). For ME, we employed ordinary least-squares (OLS) branch lengths estimated under both simple p-distances and Jukes–Cantor distances, analyzed with and without an among-site rate heterogeneity parameter. The efficiency of ME and ML was also shown to decrease in response to increased BLH. We note that the shape of the true tree will in part determine BLH and represents a critical factor in the probability of recovering the correct topology. An important finding suggests that researchers cannot expect that different branches that were in fact the same length will have the same probability of being accurately reconstructed when BLH exists in the overall tree. We conclude that methods designed to minimize the interdependencies of branch length estimates (BLEs) may (1) reduce both the variance and the covariance associated with the estimates and (2) increase the efficiency of model-based optimality criteria. We speculate on possible ways to reduce the nonindependence of BLEs under OLS and ML. Received: 9 March 1999 / Accepted: 4 May 1999  相似文献   

11.
We suggest a nucleotide substitution model that takes correlation between base-paired nucleotides into account. The model includes the estimation of the transition–transversion ratio and allows inference of the shape parameter of a discrete gamma distribution to include rate heterogeneity. A Cox-test statistic, applied to a diatom ribosomal RNA alignment, shows that the suggested correlation model explains evolution of the stem region better than usual independence models. Moreover, the Cox-test procedure is extended to shed some light upon the problem of assigning helical regions in a secondary structure based alignment. This approach provides an estimate of the percentage of stem positions that do not appear to be correlated. Received: 4 March 1999 / Accepted: 10 May 1999  相似文献   

12.
The Rooting of the Universal Tree of Life Is Not Reliable   总被引:19,自引:0,他引:19  
Several composite universal trees connected by an ancestral gene duplication have been used to root the universal tree of life. In all cases, this root turned out to be in the eubacterial branch. However, the validity of results obtained from comparative sequence analysis has recently been questioned, in particular, in the case of ancient phylogenies. For example, it has been shown that several eukaryotic groups are misplaced in ribosomal RNA or elongation factor trees because of unequal rates of evolution and mutational saturation. Furthermore, the addition of new sequences to data sets has often turned apparently reasonable phylogenies into confused ones. We have thus revisited all composite protein trees that have been used to root the universal tree of life up to now (elongation factors, ATPases, tRNA synthetases, carbamoyl phosphate synthetases, signal recognition particle proteins) with updated data sets. In general, the two prokaryotic domains were not monophyletic with several aberrant groupings at different levels of the tree. Furthermore, the respective phylogenies contradicted each others, so that various ad hoc scenarios (paralogy or lateral gene transfer) must be proposed in order to obtain the traditional Archaebacteria–Eukaryota sisterhood. More importantly, all of the markers are heavily saturated with respect to amino acid substitutions. As phylogenies inferred from saturated data sets are extremely sensitive to differences in evolutionary rates, present phylogenies used to root the universal tree of life could be biased by the phenomenon of long branch attraction. Since the eubacterial branch was always the longest one, the eubacterial rooting could be explained by an attraction between this branch and the long branch of the outgroup. Finally, we suggested that an eukaryotic rooting could be a more fruitful working hypothesis, as it provides, for example, a simple explanation to the high genetic similarity of Archaebacteria and Eubacteria inferred from complete genome analysis.  相似文献   

13.
14.
We previously found that proteinaceous protease inhibitors homologous to Streptomyces subtilisin inhibitor (SSI) are widely produced by various Streptomyces species, and we designated them ``SSI-like proteins' (Taguchi S, Kikuchi H, Suzuki M, Kojima S, Terabe M, Miura K, Nakase T, Momose H [1993] Appl Environ Microbiol 59:4338–4341). In this study, SSI-like proteins from five strains of the genus Streptoverticillium were purified and sequenced, and molecular phylogenetic trees were constructed on the basis of the determined amino acid sequences together with those determined previously for Streptomyces species. The phylogenetic trees showed that SSI-like proteins from Streptoverticillium species are phylogenetically included in Streptomyces SSI-like proteins but form a monophyletic group as a distinct lineage within the Streptomyces proteins. This provides an alternative phylogenetic framework to the previous one based on partial small ribosomal RNA sequences, and it may indicate that the phylogenetic affiliation of the genus Streptoverticillium should be revised. The phylogenetic trees also suggested that SSI-like proteins possessing arginine or methionine at the P1 site, the major reactive center site toward target proteases, arose multiple times on independent lineages from ancestral proteins possessing lysine at the P1 site. Most of the codon changes at the P1 site inferred to have occurred during the evolution of SSI-like proteins are consistent with those inferred from the extremely high G + C content of Streptomyces genomes. The inferred minimum number of amino acid replacements at the P1 site was nearly equal to the average number for all the variable sites. It thus appears that positive Darwinian selection, which has been postulated to account for accelerated rates of amino acid replacement at the major reaction center site of mammalian protease inhibitors, may not have dictated the evolution of the bacterial SSI-like proteins. Received: 23 August 1996 / Accepted: 20 November 1996  相似文献   

15.
To understand the process and mechanism of protein evolution, it is important to know what types of amino acid substitutions are more likely to be under selection and what types are mostly neutral. An amino acid substitution can be classified as either conservative or radical, depending on whether it involves a change in a certain physicochemical property of the amino acid. Assuming Kimura's two-parameter model of nucleotide substitution, I present a method for computing the numbers of conservative and radical nonsynonymous (amino acid altering) nucleotide substitutions per site and estimate these rates for 47 nuclear genes from mammals. The results are as follows. (1) The average radical/conservative rate ratio is 0.81 for charge changes, 0.85 for polarity changes, and 0.49 when both polarity and volume changes are considered. (2) The radical/conservative rate ratio is positively correlated with the nonsynonymous/synonymous rate ratio for charge changes or when both polarity and volume changes are considered. (3) Both the conservative/synonymous rate ratio and the radical/synonymous rate ratio are lower in the rodent lineage than in the primate or artiodactyl lineage, suggesting more intense purifying selection in the rodent lineage, for both conservative and radical nonsynonymous substitutions. (4) Neglecting transition/transversion bias would cause an underestimation of both radical and conservative rates and the ratio thereof. (5) Transversions induce more dramatic genetic alternations than transitions in that transversions produce more amino acid altering changes and among which, more radical changes. Received: 6 April 1999 / Accepted: 16 August 1999  相似文献   

16.
The complete mitochondrial genome was obtained from a microchiropteran bat, Artibeus jamaicensis. The presumptive amino acid sequence for the protein-coding genes was compared with predicted amino acid sequences from several representatives of other mammalian orders. Data were analyzed using maximum parsimony, maximum likelihood, and neighbor joining. All analyses placed bats as the sister group of carnivores, perissodactyls, artiodactyls, and cetaceans (e.g., 100% bootstrap value with both maximum parsimony and neighbor joining). The data strongly support a new hypothesis about the origin of bats, specifically a bat/ferungulate grouping. None of the analyses supported the superorder Archonta (bats, flying lemurs, primates, and tree shrews). Our hypothesis regarding the relationship of bats to other eutherian mammals is concordant with previous molecular studies and contrasts with hypotheses based solely on morphological criteria and an incomplete fossil record. The A. jamaicensis mitochondrial DNA control region has a complex pattern of tandem repeats that differs from previously reported chiropteran control regions. Received: 22 January 1998 / Accepted: 3 June 1998  相似文献   

17.
It is now well-established that compositional bias in DNA sequences can adversely affect phylogenetic analysis based on those sequences. Phylogenetic analyses based on protein sequences are generally considered to be more reliable than those derived from the corresponding DNA sequences because it is believed that the use of encoded protein sequences circumvents the problems caused by nucleotide compositional biases in the DNA sequences. There exists, however, a correlation between AT/GC bias at the nucleotide level and content of AT- and GC-rich codons and their corresponding amino acids. Consequently, protein sequences can also be affected secondarily by nucleotide compositional bias. Here, we report that DNA bias not only may affect phylogenetic analysis based on DNA sequences, but also drives a protein bias which may affect analyses based on protein sequences. We present a striking example where common phylogenetic tools fail to recover the correct tree from complete animal mitochondrial protein-coding sequences. The data set is very extensive, containing several thousand sites per sequence, and the incorrect phylogenetic trees are statistically very well supported. Additionally, neither the use of the LogDet/paralinear transform nor removal of positions in the protein alignment with AT- or GC-rich codons allowed recovery of the correct tree. Two taxa with a large compositional bias continually group together in these analyses, despite a lack of close biological relatedness. We conclude that even protein-based phylogenetic trees may be misleading, and we advise caution in phylogenetic reconstruction using protein sequences, especially those that are compositionally biased. Received: 19 February 1998 / Accepted: 28 August 1998  相似文献   

18.
Quantitative analyses were carried out on a large number of proteins that contain the highly conserved basic helix–loop–helix domain. Measures derived from information theory were used to examine the extent of conservation at amino acid sites within the bHLH domain as well as the extent of mutual information among sites within the domain. Using the Boltzmann entropy measure, we described the extent of amino acid conservation throughout the bHLH domain. We used position association (pa) statistics that reflect the joint probability of occurrence of events to estimate the ``mutual information content' among distinct amino acid sites. Further, we used pa statistics to estimate the extent of association in amino acid composition at each site in the domain and between amino acid composition and variables reflecting clade and group membership, loop length, and the presence of a leucine zipper. The pa values were also used to describe groups of amino acid sites called ``cliques' that were highly associated with each other. Finally, a predictive motif was constructed that accurately identifies bHLH domain-containing proteins that belong to Groups A and B. Received: 15 December 1997 / Accepted: 1 October 1998  相似文献   

19.
The aminoacyl-tRNA synthetases are ubiquitous enzymes which catalyze a crucial step of the cell life, the specific attachment of amino acids to their cognate tRNA. The amino acid sequences of three archaeal seryl-tRNA synthetases (SerRS) from Haloarcula marismortui and Methanococcus jannaschii, both belonging to the group of Euryarchaeota, and from Sulfolobus solfataricus, of the group of Crenarchaeota, were aligned with other eubacterial and eukaryal available SerRS sequences. In an attempt to identify some features of adaptation to extreme environments of these organisms, amino acid composition and amino acid substitutions between mesophilic and thermophilic SerRS were analyzed. In addition, universal phylogenetic trees of SerRS including the three known archaeal sequences, rooted by the threonyl-tRNA synthetases were inferred. Amino acid analyses of the SerRS revealed two ways of adaptation to thermophilic environments between the Eubacteria and the Archaea; most of the usually described amino acid substitutions were nonsignificant in the case of archaeal thermophilic SerRS and most amino acid composition biases seemed to be linked to the genome G+C content pressure. The phylogenetic analysis of the SerRS showed the Archaea to be paraphyletic, H. marismortui emerging with the Gram-positive Bacteria, M. jannaschii being near the root of the tree, and S. solfataricus branching with Eucarya. Received: 30 March 1998 / Accepted: 14 July 1998  相似文献   

20.
The mitochondrial cytochrome b (cyt-b) gene is widely used in systematic studies to resolve divergences at many taxonomic levels. The present study focuses mainly on the utility of cyt-b as a molecular marker for inferring phylogenetic relationship at various levels within the fish family Cichlidae. A total of 78 taxa were used in the present analysis, representing all the major groups in the family Cichlidae (72 taxa) and other families from the suborders Labroidei and Percoidei. Gene trees obtained from cyt-b are compared to a published total evidence tree derived from previous studies. Minimum evolution trees based on cyt-b data resulted in topologies congruent with all previous analyses. Parsimony analyses downweighting transitions relative to transversions (ts1:tv4) or excluding transitions at third codon positions resulted in more robust bootstrap support for recognized clades than unweighted parsimony. Relative rate tests detected significantly long branches for some taxa (LB taxa) which were composed mainly by dwarf Neotropical cichlids. An improvement of the phylogenetic signal, as shown by the four-cluster likelihood mapping analysis, and higher bootstrap values were obtained by excluding LB taxa. Despite some limitations of cyt-b as a phylogenetic marker, this gene either alone or in combination with other data sets yields a tree that is in agreement with the well-established phylogeny of cichlid fish. Received: 11 October 2000 / Accepted: 26 February 2001  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号