共查询到20条相似文献,搜索用时 0 毫秒
1.
Phylogenetic incongruence among oncogenic genital alpha human papillomaviruses 总被引:7,自引:0,他引:7
下载免费PDF全文

The human papillomaviruses (HPVs) have long been thought to follow a monophyletic pattern of evolution with little if any evidence for recombination between genomes. On the basis of this model, both oncogenicity and tissue tropism appear to have evolved once. Still, no systematic statistical analyses have shown whether monophyly is the rule across all HPV open reading frames (ORFs). We conducted a taxonomic analysis of 59 mucosal/genital HPVs using whole-genome and sliding-window similarity measures; maximum-parsimony, neighbor-joining, and Bayesian phylogenetic analyses; and localized incongruence length difference (LILD) analyses. The algorithm for the LILD analyses localized incongruence by calculating the tree length differences between constrained and unconstrained nodes in a total-evidence tree across all HPV ORFs. The process allows statistical evaluation of every ORF/node pair in the total-evidence tree. The most significant incongruence was observed at the putative high-risk (i.e., cancer-associated) node, the common oncogenic ancestor for alpha HPV species 9 (e.g., HPV type 16 [HPV16]), 11, 7 (e.g., HPV18), 5, and 6. Although these groups share early-gene homology, including high degrees of similarity among E6 and E7, groups 9 and 11 diverge from groups 7, 5, and 6 with respect to L2 and L1. The HPV species groups primarily associated with cervical and anogenital cancers appear to follow two distinct evolutionary paths, one conferred by the early genes and another by the late genes. The incongruence in the genital HPV phylogeny could have occurred from an early recombination event, an ecological niche change, and/or asymmetric genome convergence driven by intense selection. These data indicate that the phylogeny of the oncogenic HPVs is complex and that their evolution may not be monophyletic across all genes. 相似文献
2.
Flagel LE Rapp RA Grover CE Widrlechner MP Hawkins J Grafenberg JL Alvarez I Chung GY Wendel JF 《American journal of botany》2008,95(6):756-765
The study of recently formed species is important because it can help us to better understand organismal divergence and the speciation process. However, these species often present difficult challenges in the field of molecular phylogenetics because the processes that drive molecular divergence can lag behind phenotypic divergence. In the current study we show that species of the recently diverged North American endemic genus of purple coneflower, Echinacea, have low levels of molecular divergence. Data from three nuclear loci and two plastid loci provide neither resolved topologies nor congruent hypotheses about species-level relationships. This lack of phylogenetic resolution is likely due to the combined effects of incomplete lineage sorting, hybridization, and backcrossing following secondary contact. The poor resolution provided by molecular markers contrasts previous studies that found well-resolved and taxonomically supported relationships from metabolic and morphological data. These results suggest that phenotypic canalization, resulting in identifiable morphological species, has occurred rapidly within Echinacea. Conversely, molecular signals have been distorted by gene flow and incomplete lineage sorting. Here we explore the impact of natural history on the genetic organization and phylogenetic relationships of Echinacea. 相似文献
3.
Drosophila melanogaster and its close relatives are used extensively in comparative biology. Despite the importance of phylogenetic information for such studies, relationships between some melanogaster species group members are unclear due to conflicting phylogenetic signals at different loci. In this study, we use twelve nuclear loci (eleven coding and one non-coding) to assess the degree of phylogenetic incongruence in this model system. We focus on two nodes: (1) the node joining the Drosophila erecta-Drosophila orena, Drosophila melanogaster-Drosophila simulans, and Drosophila yakuba-Drosophila teissieri lineages, and (2) the node joining the lineages leading to the melanogaster, takahashii, and eugracilis subgroups. We find limited evidence for incongruence at the first node; our data, as well as those of several previous studies, strongly support monophyly of a clade consisting of D. erecta-D. orena and D. yakuba-D. teissieri. By contrast, using likelihood based tests of congruence, we find robust evidence for topological incongruence at the second node. Different loci support different relationships among the melanogaster, takahashii, and eugracilis subgroups, and the observed incongruence is not easily attributable to homoplasy, non-equilibrium base composition, or positive selection on a subset of loci. We argue that lineage sorting in the common ancestor of these three subgroups is the most plausible explanation for our observations. Such lineage sorting may lead to biased estimation of tree topology and evolutionary rates, and may confound inferences of positive selection. 相似文献
4.
5.
V Makarenkov 《Bioinformatics (Oxford, England)》2001,17(7):664-668
T-REX (tree and reticulogram reconstruction) is an application to reconstruct phylogenetic trees and reticulation networks from distance matrices. The application includes a number of tree fitting methods like NJ, UNJ or ADDTREE which have been very popular in phylogenetic analysis. At the same time, the software comprises several new methods of phylogenetic analysis such as: tree reconstruction using weights, tree inference from incomplete distance matrices or modeling a reticulation network for a collection of objects or species. T-REX also allows the user to visualize obtained tree or network structures using Hierarchical, Radial or Axial types of tree drawing and manipulate them interactively. AVAILABILITY: T-REX is a freeware package available online at: http://www.fas.umontreal.ca/biol/casgrain/en/labo/t-rex 相似文献
6.
The protistan phylum Apicomplexa contains many important pathogens and is the subject of intense genome sequencing efforts. Based upon the genome sequences from seven apicomplexan species and a ciliate outgroup, we identified 268 single-copy genes suitable for phylogenetic inference. Both concatenation and consensus approaches inferred the same species tree topology. This topology is consistent with most prior conceptions of apicomplexan evolution based upon ultrastructural and developmental characters, that is, the piroplasm genera Theileria and Babesia form the sister group to the Plasmodium species, the coccidian genera Eimeria and Toxoplasma are monophyletic and are the sister group to the Plasmodium species and piroplasm genera, and Cryptosporidium forms the sister group to the above mentioned with the ciliate Tetrahymena as the outgroup. The level of incongruence among gene trees appears to be high at first glance; only 19% of the genes support the species tree, and a total of 48 different gene-tree topologies are observed. Detailed investigations suggest that the low signal-to-noise ratio in many genes may be the main source of incongruence. The probability of being consistent with the species tree increases as a function of the minimum bootstrap support observed at tree nodes for a given gene tree. Moreover, gene sequences that generate high bootstrap support are robust to the changes in alignment parameters or phylogenetic method used. However, caution should be taken in that some genes can infer a "wrong" tree with strong support because of paralogy, model violations, or other causes. The importance of examining multiple, unlinked genes that possess a strong phylogenetic signal cannot be overstated. 相似文献
7.
Systematists and comparative biologists commonly want to make statements about relationships among taxa that have never been collectively included in any single phylogenetic analysis. Construction of phylogenetic 'supertrees' provides one solution. Supertrees are estimates of phylogeny assembled from sets of smaller estimates (source trees) sharing some but not necessarily all their taxa in common. If certain conditions are met, supertrees can retain all or most of the information from the source trees and also make novel statements about relationships of taxa that do not co-occur on any one source tree. Supertrees have commonly been constructed using subjective and informal approaches, but several explicit approaches have recently been proposed. 相似文献
8.
It is now quite well accepted that the evolutionary past of certain species is better represented by phylogenetic networks
as opposed to trees. For example, polyploids are typically thought to have resulted through hybridization and duplication,
processes that are probably not best represented as bifurcating speciation events. Based on the knowledge of a multi-labelled
tree relating collection of polyploids, we present a canonical construction of a phylogenetic network that exhibits the tree.
In addition, we prove that the resulting network is in some well-defined sense a minimal network having this property. 相似文献
9.
ABSTRACT: BACKGROUND: Ongoing innovation in phylogenetics and evolutionary biology has been accompanied by a proliferation of software tools, data formats, analytical techniques and web servers. This brings with it the challenge of integrating phylogenetic and other related biological data found in a wide variety of formats, and underlines the need for reusable software that can read, manipulate and transform this information into the various forms required to build computational pipelines. RESULTS: We built a Python software library for working with phylogenetic data that is tightly integrated with Biopython, a broad-ranging toolkit for computational biology. Our library, Bio.Phylo, is highly interoperable with existing libraries, tools and standards, and is capable of parsing common file formats for phylogenetic trees, performing basic transformations and manipulations, attaching rich annotations, and visualizing trees. We unified the modules for working with the standard file formats Newick, NEXUS and phyloXML behind a consistent and simple API, providing a common set of functionality independent of the data source. CONCLUSIONS: Bio.Phylo meets a growing need in bioinformatics for working with heterogeneous types of phylogenetic data. By supporting interoperability with multiple file formats and leveraging existing Biopython features, this library simplifies the construction of phylogenetic workflows. We also provide examples of the benefits of building a community around a shared open-source project. Bio.Phylo is included with Biopython, available through the Biopython website, http://biopython.org. 相似文献
10.
Raymond Wan Larisa Kiseleva Hajime Harada Hiroshi Mamitsuka Paul Horton 《Source code for biology and medicine》2009,4(1):1-18
Background
Visualization tools allow researchers to obtain a global view of the interrelationships between the probes or experiments of a gene expression (e.g. microarray) data set. Some existing methods include hierarchical clustering and k-means. In recent years, others have proposed applying minimum spanning trees (MST) for microarray clustering. Although MST-based clustering is formally equivalent to the dendrograms produced by hierarchical clustering under certain conditions; visually they can be quite different. 相似文献11.
Phylogenetic trees based on gene content 总被引:2,自引:0,他引:2
Comparing gene content between species can be a useful approach for reconstructing phylogenetic trees. In this paper, we derive a maximum-likelihood estimation of evolutionary distance between species under a simple model of gene genesis and gene loss. Using simulated data on a biological tree with 107 taxa (and on a number of randomly generated trees), we compare the accuracy of tree reconstruction using this ML distance measure to an earlier ad hoc distance. We then compare these distance-based approaches to a character-based tree reconstruction method (Dollo parsimony) which seems well suited to the analysis of gene content data. To simplify simulations, we give a formal proof of the well-known 'fact' that the Dollo parsimony score is independent of the choice of root. Our results show a consistent trend, with the character-based method and ML distance measure outperforming the earlier ad hoc distance method. AVAILABILITY: http://www.ab.informatik.uni-tuebingen.de/software/genecontent/welcome_en.html 相似文献
12.
The woodcreepers is a highly specialized lineage within the New World suboscine radiation. Most systematic studies of higher level relationships of this group rely on morphological characters, and few studies utilizing molecular data exist. In this paper, we present a molecular phylogeny of the major lineages of woodcreepers (Aves: Dendrocolaptinae), based on nucleotide sequence data from a nuclear non-coding gene region (myoglobin intron II) and a protein-coding mitochondrial gene (cytochrome b ). A good topological agreement between the individual gene trees suggests that the resulting phylogeny reflects the true evolutionary history of woodcreepers well. However, the DNA-based phylogeny conflicts with the results of a parsimony analysis of morphological characters. The topological differences mainly concern the basal branches of the trees. The morphological data places the genus Drymornis in a basal position (mainly supported by characters in the hindlimb), while our data suggests it to be derived among woodcreepers. Unlike most other woodcreepers, Drymornis is ground-adapted, as are the ovenbirds. The observed morphological similarities between Drymornis and the ovenbird outgroup may thus be explained with convergence or with reversal to an ancestral state. This observation raises the question of the use of characters associated with locomotion and feeding in phylogenetic reconstruction based on parsimony. 相似文献
13.
Detecting protein-protein interactions and assigning proteins to functional complexes are key challenges of modern biology. The rise of genomics has lead to evidence that correlated patterns of presence/absence and/or fusing of proteins in any organism suggest these proteins interact. Unfortunately, methods based on such data work best with divergent genomes, whereas major sequencing efforts in vertebrates, for example, are yielding alignments of the same set of proteins sampled from the same set of taxa (species). Using vertebrate mitochondrial genomes to illustrate a novel method, we associate proteins based on vectors of their evolutionary tree edge (branch or internode) lengths. This approach is based on the expectation that molecular coevolution is greatest between proteins that interact in some way. Mitochondrial DNA-encoded proteins are associated into groups largely consistent with the complexes they come from. This association is apparently not due to the tree structure or mutation processes, leaving coevolution as the best explanation. We show that it is important that the tree used to derive the edge-length vector is estimated accurately in terms of both topology and edge lengths. Although more complex substitution models reduce systematic error, they also inflate stochastic error. This makes the use of less complex substitution models preferable in some circumstances. We describe a method to estimate correlations of pairwise evolutionary distances, which adjusts for non-independent correlations due to shared evolutionary history. Associations of proteins based on their edge-length vectors are visualized and assessed using a variety of hierarchical clustering and multidimensional scaling methods. New formula for estimating the fit of data to model, including the average percent standard deviation of distances on least squares trees, are presented. Use of edge-length vectors is compared and contrasted with correlated distance methods, correlated rates methods, and site-specific evidence of coevolution. 相似文献
14.
15.
Noise and incongruence: interpreting results of the incongruence length difference test 总被引:5,自引:0,他引:5
Incongruence between data sets is an important concept in molecular phylogenetics and is commonly measured by the incongruence length difference (ILD) test (J. S. Farris et al., Cladistics 10, 315-319). The ILD test has been used to infer specific evolutionary events and to determine whether to combine data sets for phylogenetic analysis. However, the interpretation in the literature of the test's results varies because authors have conflicting expectations of the effect that noise will have. Using simulations we demonstrate that noise can by itself generate highly significant results in the ILD test and demonstrate why this is the case. To clarify the interpretation of test results, we suggest an additional procedure in which the result is compared against a frequency distribution generated from completely shuffled data. As examples, we apply this approach to two previous studies that have reported incongruence. 相似文献
16.
Jesper R. G?din Ferdinand M. van’t Hooft Per Eriksson Lasse Folkersen 《BMC bioinformatics》2015,16(1)
Background
One aspect in which RNA sequencing is more valuable than microarray-based methods is the ability to examine the allelic imbalance of the expression of a gene. This process is often a complex task that entails quality control, alignment, and the counting of reads over heterozygous single-nucleotide polymorphisms. Allelic imbalance analysis is subject to technical biases, due to differences in the sequences of the measured alleles. Flexible bioinformatics tools are needed to ease the workflow while retaining as much RNA sequencing information as possible throughout the analysis to detect and address the possible biases.Results
We present AllelicImblance, a software program that is designed to detect, manage, and visualize allelic imbalances comprehensively. The purpose of this software is to allow users to pose genetic questions in any RNA sequencing experiment quickly, enhancing the general utility of RNA sequencing. The visualization features can reveal notable, non-trivial allelic imbalance behavior over specific regions, such as exons.Conclusions
The software provides a complete framework to perform allelic imbalance analyses of aligned RNA sequencing data, from detection to visualization, within the robust and versatile management class, ASEset.Electronic supplementary material
The online version of this article (doi:10.1186/s12859-015-0620-2) contains supplementary material, which is available to authorized users. 相似文献17.
Ricardo Campos-Soto Fernando Torres-Pérez Aldo Solari 《Genetics and molecular biology》2015,38(3):390-395
Mitochondrial DNA (mtDNA) is widely used to clarify phylogenetic relationships among and within species, and to determine population structure. Due to the linked nature of mtDNA genes it is expected that different genes will show similar results. Phylogenetic incongruence using mtDNA genes may result from processes such as heteroplasmy, nuclear integration of mitochondrial genes, polymerase errors, contamination, and recombination. In this study we used sequences from two mitochondrial genes (cytochrome b and cytochrome oxidase subunit I) from the wild vectors of Chagas disease, Triatoma eratyrusiformis and Mepraia species to test for topological congruence. The results showed some cases of phylogenetic incongruence due to misplacement of four haplotypes of four individuals. We discuss the possible causes of such incongruence and suggest that the explanation is an intra-individual variation likely due to heteroplasmy. This phenomenon is an independent evidence of common ancestry between these taxa. 相似文献
18.
Phylogenetic test of the molecular clock and linearized trees 总被引:23,自引:7,他引:23
To estimate approximate divergence times of species or species groups with
molecular data, we have developed a method of constructing a linearized
tree under the assumption of a molecular clock. We present two tests of the
molecular clock for a given topology: two-cluster test and branch-length
test. The two-cluster test examines the hypothesis of the molecular clock
for the two lineages created by an interior node of the tree, whereas the
branch-length test examines the deviation of the branch length between the
tree root and a tip from the average length. Sequences evolving excessively
fast or slow at a high significance level may be eliminated. A linearized
tree will then be constructed for a given topology for the remaining
sequences under the assumption of rate constancy. We have used these
methods to analyze hominoid mitochondrial DNA and drosophilid Adh gene
sequences.
相似文献
19.
DiffTool is a resource to build and visualize protein clusters computed from a sequence database. The package provides a clustering tool to construct protein families according to sequence similarities and a web interface to query the corresponding clusters. A subtractive genome analysis tool selects protein families specific for a genome or a group of genomes. For each protein cluster, DiffTool includes access to sequences, coloured multiple alignments and phylogenetic trees. AVAILABILITY: A cluster database built from yeast and complete prokaryotic genomes is queryable at http://bioweb.pasteur.fr/seqanal/difftool. All the Perl sources are freely available to non-profit organizations upon request. 相似文献
20.
ABSTRACT: BACKGROUND: The increased use of multi-locus data sets for phylogenetic reconstruction has increased the need to determine whether a set of gene trees significantly deviate from the phylogenetic patterns of other genes. Such unusual gene trees may have been influenced by other evolutionary processes such as selection, gene duplication, or horizontal gene transfer. RESULTS: Motivated by this problem we propose a nonparametric goodness-of-fit test for two empirical distributions of gene trees, and we developed the software GeneOut to estimate a p-value for the test. Our approach maps trees into a multi-dimensional vector space and then applies support vector machines (SVMs) to measure the separation between two sets of pre-defined trees. We use a permutation test to assess the significance of the SVM separation. To demonstrate the performance of GeneOut, we applied it to the comparison of gene trees simulated within different species trees across a range of species tree depths. Applied directly to sets of simulated gene trees with large sample sizes, GeneOut was able to detect very small differences between two set of gene trees generated under different species trees. Our statistical test can also include tree reconstruction into its test framework through a variety of phylogenetic optimality criteria. When applied to DNA sequence data simulated from different sets of gene trees, results in the form of receiver operating characteristic (ROC) curves indicated that GeneOut performed well in the detection of differences between sets of trees with different distributions in a multi-dimensional space. Furthermore, it controlled false positive and false negative rates very well, indicating a high degree of accuracy. CONCLUSIONS: The non-parametric nature of our statistical test provides fast and efficient analyses, and makes it an applicable test for any scenario where evolutionary or other factors can lead to trees with different multi-dimensional distributions. The software GeneOut is freely available under the GNU public license. 相似文献