首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Alignments of nucleotide or amino acid sequences may contain a variety of different signals, one of which is the historical signal that we often try to recover by phylogenetic analysis. Other signals, such as those arising due to compositional heterogeneities, among-lineage and among-site rate heterogeneities, invariant sites, and covariotides, may interfere adversely with the recovery of the historical signal. The effect of the interaction of these signals on phylogenetic inference is not well understood and may, in many cases, even be underappreciated. In this study, we investigate this matter and present results based on Monte Carlo simulations. We explored the success of four phylogenetic methods in recovering the true tree from data that had evolved under conditions where the equilibrium base frequencies and substitution rates were allowed to vary among lineages. Seven scenarios with increasingly complex conditions were investigated. All of the methods tested, with the exception of neighbor-joining using LogDet distances, were sensitive to compositional convergence in nonsister lineages. Maximum parsimony was also susceptible to attraction between long edges. In many cases, however, phylogenetic inference methods can still recover the true tree when misleading signals are present, in some instances even when the historical signal is no longer dominant. These results highlight the growing need for simple methods to detect violation of the phylogenetic assumptions.  相似文献   

2.
Most phylogenetic tree estimation methods assume that there is a single set of hierarchical relationships among sequences in a data set for all sites along an alignment. Mosaic sequences produced by past recombination events will violate this assumption and may lead to misleading results from a phylogenetic analysis due to the imposition of a single tree along the entire alignment. Therefore, the detection of past recombination is an important first step in an analysis. A Bayesian model for the changes in topology caused by recombination events is described here. This model relaxes the assumption of one topology for all sites in an alignment and uses the theory of Hidden Markov models to facilitate calculations, the hidden states being the underlying topologies at each site in the data set. Changes in topology along the multiple sequence alignment are estimated by means of the maximum a posteriori (MAP) estimate. The performance of the MAP estimate is assessed by application of the model to data sets of four sequences, both simulated and real.  相似文献   

3.
A graphical method for detecting recombination in phylogenetic data sets   总被引:9,自引:3,他引:6  
Current phylogenetic tree reconstruction methods assume that there is a single underlying tree topology for all sites along the sequence. The presence of mosaic sequences due to recombination violates this assumption and will cause phylogenetic methods to give misleading results due to the imposition of a single tree topology on all sites. The detection of mosaic sequences caused by recombination is therefore an important first step in phylogenetic analysis. A graphical method for the detection of recombination, based on the least squares method of phylogenetic estimation, is presented here. This method locates putative recombination breakpoints by moving a window along the sequence. The performance of the method is assessed by simulation and by its application to a real data set.   相似文献   

4.
Gene sequences can undergo accelerated nucleotide changes and rapid diversification. The rapid sequence changes can then potentially lead to phylogenetic incongruence. Recently, Bodilis et al. (2011) observed artificial phylogenetic incongruence using the Pseudomonas surface protein gene oprF, and hypothesized that it was the result of a long-branch attraction artifact ultimately caused by adaptive radiation. In this study, an alternative hypothesis, namely fine-scale recombination, was tested on the same dataset. The results reveal that regions in oprF are of different evolutionary origins, and the mosaic gene structure resulted in confounding phylogenetic signals. These findings demonstrate that unrecognized fine-scale recombination can confound the phylogenetic interpretation and emphasize the limitation of using whole genes as the unit of phylogenetic analysis.  相似文献   

5.
The ability to generate large molecular datasets for phylogenetic studies benefits biologists, but such data expansion introduces numerous analytical problems. A typical molecular phylogenetic study implicitly assumes that sequences evolve under stationary, reversible and homogeneous conditions, but this assumption is often violated in real datasets. When an analysis of large molecular datasets results in unexpected relationships, it often reflects violation of phylogenetic assumptions, rather than a correct phylogeny. Molecular evolutionary phenomena such as base compositional heterogeneity and among‐site rate variation are known to affect phylogenetic inference, resulting in incorrect phylogenetic relationships. The ability of methods to overcome such bias has not been measured on real and complex datasets. We investigated how base compositional heterogeneity and among‐site rate variation affect phylogenetic inference in the context of a mitochondrial genome phylogeny of the insect order Coleoptera. We show statistically that our dataset is affected by base compositional heterogeneity regardless of how the data are partitioned or recoded. Among‐site rate variation is shown by comparing topologies generated using models of evolution with and without a rate variation parameter in a Bayesian framework. When compared for their effectiveness in dealing with systematic bias, standard phylogenetic methods tend to perform poorly, and parsimony without any data transformation performs worst. Two methods designed specifically to overcome systematic bias, LogDet and a Bayesian method implementing variable composition vectors, can overcome some level of base compositional heterogeneity, but are still affected by among‐site rate variation. A large degree of variation in both noise and phylogenetic signal among all three codon positions is observed. We caution and argue that more data exploration is imperative, especially when many genes are included in an analysis.  相似文献   

6.
Nuclear DNA sequence data for diploid organisms are potentially a rich source of phylogenetic information for disentangling the evolutionary relationships of closely related organisms, but present special phylogenetic problems owing to difficulties arising from heterozygosity and recombination. We analyzed allelic relationships for two nuclear gene regions (phosphoenolpyruvate carboxykinase and elongation factor-1a), along with a mitochondrial gene region (NADH dehydrogenase subunit 5), for an assemblage of closely related species of carabid beetles (Carabus subgenus Ohomopterus). We used a network approach to examine whether the nuclear gene sequences provide substantial phylogenetic information on species relationships and evolutionary history. The mitochondrial gene genealogy strongly contradicted the morphological species boundary as a result of introgression of heterospecific mitochondria. Two nuclear gene regions showed high allelic diversity within species, and this diversity was partially attributable to recombination between various alleles and high variability in the intron region. Shared nuclear alleles among species were rare and were considered to represent shared ancestral polymorphism. Despite the presence of recombination, nuclear allelic networks recovered species monophyly more often and presented genetic differentiation patterns (low to high) among species more clearly. Overall, nuclear gene networks provide clear evidence for separate biological species and information on the phylogenetic relationships among closely related carabid beetles.  相似文献   

7.
Since recombination leads to the generation of mosaic genomes that violate the assumption of traditional phylogenetic methods that sequence evolution can be accurately described by a single tree, results and conclusions based on phylogenetic analysis of data sets including recombinant sequences can be severely misleading. Many methods are able to adequately detect recombination between diverse sequences, for example between different HIV-1 subtypes. More problematic is the identification of recombinants among closely related sequences such as a viral population within a host. We describe a simple algorithmic procedure that enables detection of intra-host recombinants based on split-decomposition networks and a robust statistical test for recombination. By applying this algorithm to several published HIV-1 data sets we conclude that intra-host recombination was significantly underestimated in previous studies and that up to one-third of the env sequences longitudinally sampled from a given subject can be of recombinant origin. The results show that our procedure can be a valuable exploratory tool for detection of recombinant sequences before phylogenetic analysis, and also suggest that HIV-1 recombination in vivo is far more frequent and significant than previously thought.  相似文献   

8.
Six types of recombination signal DNA sequences of the Multisite Gateway cloning system were investigated as to their specificity and efficiency in the LR and BP recombination reactions. In the LR reaction to generate an Expression clone by recombination between attL and attR signals which are contained in the Entry clone and the Destination vector, respectively, the cross-reactivity of various attL and attR pairs on six types of respective signal sequences was examined. In the BP reaction to create an Entry clone by transferring the target DNA segment in the Expression clone or the attB-flanked PCR product into a Donor vector, various combinations of attB and attP pairs were tested for their reactivities in recombination. The results obtained indicate a markedly higher specificity and efficiency of cross-reactivity with only the matched att signal pairs, such as attL3-attR3, attB5-attP5, and so on, compared to unmatched signal pairs, such as attL3-attR5, attB5-attP3, and so on, thus verifying a high-throughput production of the positive clones in the Gateway system in which multiple recombination signals exist together in one reaction system. Examples of rapid construction of a three or four DNA-fusion structure in the plasmid are shown.  相似文献   

9.
Pre-B and pre-T cell lines from mutant mice with severe combined immune deficiency (scid mice) were transfected with plasmids that contained recombination signal sequences of antigen receptor gene elements (V, D, and J). Recovered plasmids were tested for possible recombination of signal sequences and/or the adjacent (coding) sequences. Signal ends were joined, but recombination was abnormal in that half of the recombinants had lost nucleotides from one or both signals. Coding ends were not joined at all in either deletional or inversional V(D)J recombination reactions. However, coding ends were able to participate in alternative reactions. The failure of coding joint formation in scid pre-B and pre-T cells appears sufficient to explain the absence of immunoglobulin or T cell receptor production in scid mice.  相似文献   

10.
Genomic regions participating in recombination events may support distinct topologies, and phylogenetic analyses should incorporate this heterogeneity. Existing phylogenetic methods for recombination detection are challenged by the enormous number of possible topologies, even for a moderate number of taxa. If, however, the detection analysis is conducted independently between each putative recombinant sequence and a set of reference parentals, potential recombinations between the recombinants are neglected. In this context, a recombination hotspot can be inferred in phylogenetic analyses if we observe several consecutive breakpoints. We developed a distance measure between unrooted topologies that closely resembles the number of recombinations. By introducing a prior distribution on these recombination distances, a Bayesian hierarchical model was devised to detect phylogenetic inconsistencies occurring due to recombinations. This model relaxes the assumption of known parental sequences, still common in HIV analysis, allowing the entire dataset to be analyzed at once. On simulated datasets with up to 16 taxa, our method correctly detected recombination breakpoints and the number of recombination events for each breakpoint. The procedure is robust to rate and transitionratiotransversion heterogeneities for simulations with and without recombination. This recombination distance is related to recombination hotspots. Applying this procedure to a genomic HIV-1 dataset, we found evidence for hotspots and de novo recombination.  相似文献   

11.
Baldo L  Lo N  Werren JH 《Journal of bacteriology》2005,187(15):5406-5418
Lateral gene transfer and recombination play important roles in the evolution of many parasitic bacteria. Here we investigate intragenic recombination in Wolbachia bacteria, considered among the most abundant intracellular bacteria on earth. We conduct a detailed analysis of the patterns of variation and recombination within the Wolbachia surface protein, utilizing an extensive set of published and new sequences from five main supergroups of Wolbachia. Analysis of nucleotide and amino acid sequence variations confirms four hypervariable regions (HVRs), separated by regions under strong conservation. Comparison of shared polymorphisms reveals a complex mosaic structure of the gene, characterized by a clear intragenic recombining of segments among several distinct strains, whose major recombination effect is shuffling of a relatively conserved set of amino acid motifs within each of the four HVRs. Exchanges occurred both within and between the arthropod supergroups. Analyses based on phylogenetic methods and a specific recombination detection program (MAXCHI) significantly support this complex partitioning of the gene, indicating a chimeric origin of wsp. Although wsp has been widely used to define macro- and microtaxonomy among Wolbachia strains, these results clearly show that it is not suitable for this purpose. The role of wsp in bacterium-host interactions is currently unknown, but results presented here indicate that exchanges of HVR motifs are favored by natural selection. Identifying host proteins that interact with wsp variants should help reveal how these widespread bacterial parasites affect and evolve in response to the cellular environments of their invertebrate hosts.  相似文献   

12.
It is now well-established that compositional bias in DNA sequences can adversely affect phylogenetic analysis based on those sequences. Phylogenetic analyses based on protein sequences are generally considered to be more reliable than those derived from the corresponding DNA sequences because it is believed that the use of encoded protein sequences circumvents the problems caused by nucleotide compositional biases in the DNA sequences. There exists, however, a correlation between AT/GC bias at the nucleotide level and content of AT- and GC-rich codons and their corresponding amino acids. Consequently, protein sequences can also be affected secondarily by nucleotide compositional bias. Here, we report that DNA bias not only may affect phylogenetic analysis based on DNA sequences, but also drives a protein bias which may affect analyses based on protein sequences. We present a striking example where common phylogenetic tools fail to recover the correct tree from complete animal mitochondrial protein-coding sequences. The data set is very extensive, containing several thousand sites per sequence, and the incorrect phylogenetic trees are statistically very well supported. Additionally, neither the use of the LogDet/paralinear transform nor removal of positions in the protein alignment with AT- or GC-rich codons allowed recovery of the correct tree. Two taxa with a large compositional bias continually group together in these analyses, despite a lack of close biological relatedness. We conclude that even protein-based phylogenetic trees may be misleading, and we advise caution in phylogenetic reconstruction using protein sequences, especially those that are compositionally biased. Received: 19 February 1998 / Accepted: 28 August 1998  相似文献   

13.
We have conducted a preliminary phylogenetic survey of ammonia-oxidizing beta-proteobacteria, using 16S rRNA gene libraries prepared by selective PCR and DNA from acid and neutral soils and polluted and nonpolluted marine sediments. Enrichment cultures were established from samples and analyzed by PCR. Analysis of 111 partial sequences of c. 300 bases revealed that the environmental sequences formed seven clusters, four of which are novel, within the phylogenetic radiation defined by cultured autotrophic ammonia oxidizers. Longer sequences from 13 cluster representatives support their phylogenetic positions relative to cultured taxa. These data suggest that known taxa may not be representative of the ammonia-oxidizing beta-proteobacteria in our samples. Our data provide further evidence that molecular and culture-based enrichment methods can select for different community members. Most enrichments contained novel Nitrosomonas-like sequences whereas novel Nitrosospira-like sequences were more common from gene libraries of soils and marine sediments. This is the first evidence for the occurrence of Nitrosospira-like strains in marine samples. Clear differences between the sequences of soil and marine sediment libraries were detected. Comparison of 16S rRNA sequences from polluted and nonpolluted sediments provided no strong evidence that the community composition was determined by the degree of pollution. Soil clone sequences fell into four clusters, each containing sequences from acid and neutral soils in varying proportions. Our data suggest that some related strains may be present in both samples, but further work is needed to resolve whether there is selection due to pH for particular sequence types.  相似文献   

14.
To expand the representation for phylogenetic analysis, ten additional complete Entamoeba small-subunit rRNA gene sequences were obtained from humans, non-human primates, cattle and a tortoise. For some novel sequences no corresponding morphological data were available, and we suggest that these organisms should be referred to as ribosomal lineages (RL) rather than being assigned species names at present. To investigate genetic diversity and host specificity of selected Entamoeba species, a total of 91 new partial small subunit rRNA gene sequences were obtained, including 49 from Entamoeba coli, 18 from Entamoeba polecki, and 17 from Entamoeba hartmanni. We propose a new nomenclature for significant variants within established Entamoeba species. Based on current data we propose that the uninucleated-cyst-producing Entamoeba infecting humans is called Entamoeba polecki and divided into four subtypes (ST1-ST4) and that Entamoeba coli is divided into two subtypes (ST1-ST2). New hosts for several species were detected and, while host specificity and genetic diversity of several species remain to be clarified, it is clear that previous reliance on cultivated material has given us a misleading and incomplete picture of variation within the genus Entamoeba.  相似文献   

15.
We compared 31 complete and nearly complete globally derived HSV-1 genomic sequences using HSV-2 HG52 as an outgroup to investigate their phylogenetic relationships and look for evidence of recombination. The sequences were retrieved from NCBI and were then aligned using Clustal W. The generation of a maximum likelihood tree resulted in a six clade structure that corresponded with the timing and routes of past human migration. The East African derived viruses contained the greatest amount of genetic diversity and formed four of the six clades. The East Asian and European/North American derived viruses formed separate clades. HSV-1 strains E07, E22 and E03 were highly divergent and may each represent an individual clade. Possible recombination was analyzed by partitioning the alignment into 5 kb segments, performing individual phylogenetic analysis on each partition and generating a.phylogenetic network from the results. However most evidence for recombination spread at the base of the tree suggesting that recombination did not significantly disrupt the clade structure. Examination of previous estimates of HSV-1 mutation rates in conjunction with the phylogenetic data presented here, suggests that the substitution rate for HSV-1 is approximately 1.38×10−7 subs/site/year. In conclusion, this study expands the previously described HSV-1 three clade phylogenetic structures to a minimum of six and shows that the clade structure also mirrors global human migrations. Given that HSV-1 has co-evolved with its host, sequencing HSV-1 isolated from various populations could serve as a surrogate biomarker to study human population structure and migration patterns.  相似文献   

16.
To determine the extent of homologous recombination in human influenza A virus, we assembled a data set of 13,852 sequences representing all eight segments and both major circulating subtypes, H3N2 and H1N1. Using an exhaustive search and a nonparametric test for mosaic structure, we identified 315 sequences (approximately 2%) in five different RNA segments that, after a multiple-comparison correction, had statistically significant mosaic signals compatible with homologous recombination. Of these, only two contained recombinant regions of sufficient length (>100 nucleotides [nt]) that the occurrence of homologous recombination could be verified using phylogenetic methods, with the rest involving very short sequence regions (15 to 30 nt). Although this secondary analysis revealed patterns of phylogenetic incongruence compatible with the action of recombination, neither candidate recombinant was strongly supported. Given our inability to exclude the occurrence of mixed infection and template switching during amplification, laboratory artifacts provide an alternative and likely explanation for the occurrence of phylogenetic incongruence in these two cases. We therefore conclude that, if it occurs at all, homologous recombination plays only a very minor role in the evolution of human influenza A virus.  相似文献   

17.
The incidence of syphilis has risen worldwide in the last decade in spite of being an easily treated infection. The causative agent of this sexually transmitted disease is the bacterium Treponema pallidum subspecies pallidum (TPA), very closely related to subsp. pertenue (TPE) and endemicum (TEN), responsible for the human treponematoses yaws and bejel, respectively. Although much focus has been placed on the question of the spatial and temporary origins of TPA, the processes driving the evolution and epidemiological spread of TPA since its divergence from TPE and TEN are not well understood. Here, we investigate the effects of recombination and selection as forces of genetic diversity and differentiation acting during the evolution of T. pallidum subspecies. Using a custom-tailored procedure, named phylogenetic incongruence method, with 75 complete genome sequences, we found strong evidence for recombination among the T. pallidum subspecies, involving 12 genes and 21 events. In most cases, only one recombination event per gene was detected and all but one event corresponded to intersubspecies transfers, from TPE/TEN to TPA. We found a clear signal of natural selection acting on the recombinant genes, which is more intense in their recombinant regions. The phylogenetic location of the recombination events detected and the functional role of the genes with signals of positive selection suggest that these evolutionary processes had a key role in the evolution and recent expansion of the syphilis bacteria and significant implications for the selection of vaccine candidates and the design of a broadly protective syphilis vaccine.  相似文献   

18.
Takeuchi Y  Myers R  Danos O 《PloS one》2008,3(2):e1634
Homologous recombination is a dominant force in evolution and results in genetic mosaics. To detect evidence of recombination events and assess the biological significance of genetic mosaics, genome sequences for various viral populations of reasonably large size are now available in the GenBank. We studied a multi-functional viral gene, the adeno-associated virus (AAV) cap gene, which codes for three capsid proteins, VP1, VP2 and VP3. VP1-3 share a common C-terminal domain corresponding to VP3, which forms the viral core structure, while the VP1 unique N-terminal part contains an enzymatic domain with phospholipase A2 activity. Our recombinant detection program (RecI) revealed five novel recombination events, four of which have their cross-over points in the N-terminal, VP1 and VP2 unique region. Comparison of phylogenetic trees for different cap gene regions confirmed discordant phylogenies for the recombinant sequences. Furthermore, differences in the phylogenetic tree structures for the VP1 unique (VP1u) region and the rest of cap highlighted the mosaic nature of cap gene in the AAV population: two dominant forms of VP1u sequences were identified and these forms are linked to diverse sequences in the rest of cap gene. This observation together with the finding of frequent recombination in the VP1 and 2 unique regions suggests that this region is a recombination hot spot. Recombination events in this region preserve protein blocks of distinctive functions and contribute to convergence in VP1u and divergence of the rest of cap. Additionally the possible biological significance of two dominant VP1u forms is inferred.  相似文献   

19.
It is well known that molecular data "saturates" with increasing sequence divergence (thereby losing phylogenetic information) and that in addition the accumulation of misleading information due to chance similarities or to systematic bias may accompany saturation as well. Exploratory data analysis methods that can quantify the extent of signal loss or convergence for a given data set are scarce. Such methods are needed because genomics delivers very long sequence alignments spanning substantial phylogenetic depth, where site saturation may be compounded by systematic biases or other alternative signals. Here we introduce the Treeness Triangle (TT) graph, in which signals detectable by Hadamard (spectral) analysis are summed into 3 categories--those supporting 1) external and 2) internal branches in the optimal tree, in addition to 3) the residuals (potential internal branches not present in the optimal tree). These 3 values are plotted in a standard ternary coordinate system. The approach is illustrated with simulated and real data sets, the latter from complete chloroplast genomes, where potential problems of paralogy or lateral gene acquisition can be excluded. The TT uncovers the divergence-dependent loss of phylogenetic signal as subsets of chloroplast genomes are investigated that span increasingly deeper evolutionary timescales. The rate of signal loss (or signal retention) varies with the gene and/or the method of analysis.  相似文献   

20.
Hoppenrath M  Leander BS 《PloS one》2010,5(10):e13220

Background

Interrelationships among dinoflagellates in molecular phylogenies are largely unresolved, especially in the deepest branches. Ribosomal DNA (rDNA) sequences provide phylogenetic signals only at the tips of the dinoflagellate tree. Two reasons for the poor resolution of deep dinoflagellate relationships using rDNA sequences are (1) most sites are relatively conserved and (2) there are different evolutionary rates among sites in different lineages. Therefore, alternative molecular markers are required to address the deeper phylogenetic relationships among dinoflagellates. Preliminary evidence indicates that the heat shock protein 90 gene (Hsp90) will provide an informative marker, mainly because this gene is relatively long and appears to have relatively uniform rates of evolution in different lineages.

Methodology/Principal Findings

We more than doubled the previous dataset of Hsp90 sequences from dinoflagellates by generating additional sequences from 17 different species, representing seven different orders. In order to concatenate the Hsp90 data with rDNA sequences, we supplemented the Hsp90 sequences with three new SSU rDNA sequences and five new LSU rDNA sequences. The new Hsp90 sequences were generated, in part, from four additional heterotrophic dinoflagellates and the type species for six different genera. Molecular phylogenetic analyses resulted in a paraphyletic assemblage near the base of the dinoflagellate tree consisting of only athecate species. However, Noctiluca was never part of this assemblage and branched in a position that was nested within other lineages of dinokaryotes. The phylogenetic trees inferred from Hsp90 sequences were consistent with trees inferred from rDNA sequences in that the backbone of the dinoflagellate clade was largely unresolved.

Conclusions/Significance

The sequence conservation in both Hsp90 and rDNA sequences and the poor resolution of the deepest nodes suggests that dinoflagellates reflect an explosive radiation in morphological diversity in their recent evolutionary past. Nonetheless, the more comprehensive analysis of Hsp90 sequences enabled us to infer phylogenetic interrelationships of dinoflagellates more rigorously. For instance, the phylogenetic position of Noctiluca, which possesses several unusual features, was incongruent with previous phylogenetic studies. Therefore, the generation of additional dinoflagellate Hsp90 sequences is expected to refine the stem group of athecate species observed here and contribute to future multi-gene analyses of dinoflagellate interrelationships.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号