首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 531 毫秒
The automation and evaluation of nested clade phylogeographic analysis   总被引:15,自引:1,他引:14  
Nested clade phylogeographic analysis (NCPA) is a popular method for reconstructing the demographic history of spatially distributed populations from genetic data. Although some parts of the analysis are automated, there is no unique and widely followed algorithm for doing this in its entirety, beginning with the data, and ending with the inferences drawn from the data. This article describes a method that automates NCPA, thereby providing a framework for replicating analyses in an objective way. To do so, a number of decisions need to be made so that the automated implementation is representative of previous analyses. We review how the NCPA procedure has evolved since its inception and conclude that there is scope for some variability in the manual application of NCPA. We apply the automated software to three published datasets previously analyzed manually and replicate many details of the manual analyses, suggesting that the current algorithm is representative of how a typical user will perform NCPA. We simulate a large number of replicate datasets for geographically distributed, but entirely random-mating, populations. These are then analyzed using the automated NCPA algorithm. Results indicate that NCPA tends to give a high frequency of false positives. In our simulations we observe that 14% of the clades give a conclusive inference that a demographic event has occurred, and that 75% of the datasets have at least one clade that gives such an inference. This is mainly due to the generation of multiple statistics per clade, of which only one is required to be significant to apply the inference key. We survey the inferences that have been made in recent publications and show that the most commonly inferred processes (restricted gene flow with isolation by distance and contiguous range expansion) are those that are commonly inferred in our simulations. However, published datasets typically yield a richer set of inferences with NCPA than obtained in our random-mating simulations, and further testing of NCPA with models of structured populations is necessary to examine its accuracy.  相似文献   

As the number of sequenced genomes from diverse walks of life rapidly increases, phylogenetic analysis is entering a new era: reconstruction of the evolutionary history of organisms on the basis of full-scale comparison of their genomes. In addition to brute force, genome-wide analysis of alignments, rare genomic changes (RGCs) that are thought to comprise derived shared characters of individual clades are increasingly used in genome-wide phylogenetic studies. We propose a new type of RGCs designated RGC_CAMs (after Conserved Amino acids-Multiple substitutions), which are inferred using a genome-scale analysis of protein and underlying nucleotide sequence alignments. The RGC_CAM approach utilizes amino acid residues conserved in major eukaryotic lineages, with the exception of a few species comprising a putative clade, and selects for phylogenetic inference only those amino acid replacements that require 2 or 3 nucleotide substitutions, in order to reduce homoplasy. The RGC_CAM analysis was combined with a procedure for rigorous statistical testing of competing phylogenetic hypotheses. The RGC_CAM method is shown to be robust to branch length differences and taxon sampling. When applied to animal phylogeny, the RGC_CAM approach strongly supports the coelomate clade that unites chordates with arthropods as opposed to the ecdysozoan (molting animals) clade. This conclusion runs against the view of animal evolution that is currently prevailing in the evo-devo community. The final solution to the coelomate-ecdysozoa controversy will require a much larger set of complete genome sequences representing diverse animal taxa. It is expected that RGC_CAM and other RGC-based methods will be crucial for these future, definitive phylogenetic studies.  相似文献   

Summary We examine situations where interest lies in the conditional association between outcome and exposure variables, given potential confounding variables. Concern arises that some potential confounders may not be measured accurately, whereas others may not be measured at all. Some form of sensitivity analysis might be employed, to assess how this limitation in available data impacts inference. A Bayesian approach to sensitivity analysis is straightforward in concept: a prior distribution is formed to encapsulate plausible relationships between unobserved and observed variables, and posterior inference about the conditional exposure–disease relationship then follows. In practice, though, it can be challenging to form such a prior distribution in both a realistic and simple manner. Moreover, it can be difficult to develop an attendant Markov chain Monte Carlo (MCMC) algorithm that will work effectively on a posterior distribution arising from a highly nonidentified model. In this article, a simple prior distribution for acknowledging both poorly measured and unmeasured confounding variables is developed. It requires that only a small number of hyperparameters be set by the user. Moreover, a particular computational approach for posterior inference is developed, because application of MCMC in a standard manner is seen to be ineffective in this problem.  相似文献   

Many models for inference of population genetic parameters are based on the assumption that the data set at hand consists of groups displaying within-group Hardy-Weinberg equilibrium at individual loci and linkage equilibrium between loci. This assumption is commonly violated by the presence of within-group spatial structure arising from nonrandom mating of individuals due to isolation by distance (IBD). This paper proposes a model and simulation method implemented in a computer program to flexibly simulate data displaying such patterns. The program permits displaying of smooth spatial variations of allele frequencies due to IBD and more abrupt variations due to presence of strong barriers to gene flow. It is useful in assessing performance of various statistical inference methods and in designing spatial sampling schemes. This is shown by a simulation study aimed at assessing the extent to which IBD patterns affect accuracy of cluster inferences performed in models assuming panmixia. The program is also used to study the effects of spatial sampling scheme (e.g. sampling individuals in clumps or uniformly across the spatial domain). The accuracy of such inferences is assessed in terms of number of inferred populations, assignment of individuals to populations and location of borders between populations. The effect of spatial sampling was weak while the effect of IBD may be substantial, leading to the inference of spurious populations, especially when IBD was strong with respect to the size of the sampling domain. The model and program are new and have been embedded in the R package Geneland, for user convenience and compliance with existing data formats.  相似文献   

The identification of related and unrelated individuals from molecular marker data is often difficult, particularly when no pedigree information is available and the data set is large. High levels of relatedness or inbreeding can influence genotype frequencies and thus genetic marker evaluation, as well as the accurate inference of hidden genetic structure. Identification of related and unrelated individuals is also important in breeding programmes, to inform decisions about breeding pairs and translocations. We present Friends and Family, a Windows executable program with a graphical user interface that identifies unrelated individuals from a pairwise relatedness matrix or table generated in programs such as coancestry and genalex . Friends and Family outputs a list of samples that are all unrelated to each other, based on a user‐defined relatedness cut‐off value. This unrelated data set can be used in downstream analyses, such as marker evaluation or inference of genetic structure. The results can be compared to that of the full data set to determine the effect related individuals have on the analyses. We demonstrate one of the applications of the program: how the removal of related individuals altered the Hardy–Weinberg equilibrium test outcome for microsatellite markers in an empirical data set. Friends and Family can be obtained from https://github.com/DeondeJager/Friends-and-Family .  相似文献   

Among passerine birds (order Passeriformes), tribe- to family-level clades with five or fewer species are more frequent than one would expect from a homogeneous speciation and extinction process. Previous analyses also suggested that small clades tend to be marginal geographically and/or ecologically. In this study, I use principal component (PC) scores based on eight log-transformed measurements of the wing, tail, leg, and beak to test the hypothesis that small clades (相似文献   

This study introduces the NMπ computer program designed for estimation of plant mating system and seed and pollen dispersal kernels. NMπ is a re‐implementation of the NM+ program and provides new features such as support for multicore processors, explicit treatment of dioecy, the possibility of incorporating uniparentally cytoplasmic markers, the possibility of assessing assortative mating due to phenotypic similarity and inference about offspring genealogies. The probability model of parentage (the neighbourhood model) accounts for missing data and genotyping errors, which can be estimated along with regular parameters of the mating system. The program has virtually no restrictions with respect to a number of individuals, markers or phenotypic characters. A console version of NMπ can be run under a wide variety of operating systems, including Windows, Linux or Mac OS. For Windows users, a graphical user interface is provided to facilitate operating the software. The program, user manual and example data are available on http://www.ukw.edu.pl/pracownicy/plik/igor_chybicki/3694/ .  相似文献   

Near-full-length 18S and 28S rRNA gene sequences were obtained for 33 nematode species. Datasets were constructed based on secondary structure and progressive multiple alignments, and clades were compared for phylogenies inferred by Bayesian and maximum likelihood methods. Clade comparisons were also made following removal of ambiguously aligned sites as determined using the program ProAlign. Different alignments of these data produced tree topologies that differed, sometimes markedly, when analyzed by the same inference method. With one exception, the same alignment produced an identical tree topology when analyzed by different methods. Removal of ambiguously aligned sites altered the tree topology and also reduced resolution. Nematode clades were sensitive to differences in multiple alignments, and more than doubling the amount of sequence data by addition of 28S rRNA did not fully mitigate this result. Although some individual clades showed substantially higher support when 28S data were combined with 18S data, the combined analysis yielded no statistically significant increases in the number of clades receiving higher support when compared to the 18S data alone. Secondary structure alignment increased accuracy in positional homology assignment and, when used in combination with paired-site substitution models, these structural hypotheses of characters and improved models of character state change yielded high levels of phylogenetic resolution. Phylogenetic results included strong support for inclusion of Daubaylia potomaca within Cephalobidae, whereas the position of Fescia grossa within Tylenchina varied depending on the alignment, and the relationships among Rhabditidae, Diplogastridae, and Bunonematidae were not resolved.  相似文献   

The vlei rat Otomys irroratus has a wide distribution in southern Africa with several datasets indicating the presence of two putative species (O. irroratus and O. auratus). In the present study we use mitochrondrial cyt b data (~950 bp) from 98 specimens (including museum material) collected throughout the range of the species to determine the geographical limits of the two recognized species, and we link this to niche modelling to validate these species. Phylogenetic analysis of the DNA sequence data, using maximum parsimony, neighbour joining and Bayesian inference, retrieved two divergent statistically well‐supported clades. Clade A occurs in the Western and Eastern Cape while Clade B occurs in the Free State, KwaZulu‐Natal, Northern Cape and Mpumalanga provinces of South Africa and Zimbabwe. Mean sequence divergence between the two clades (A and B) was 7.0% and between sub‐clades comprising clade B it was 4.8%; the two clades diverged during the Pleistocene. Within Clade A the mean sequence divergence among specimens was 1.91%. Niche modelling revealed that the incipient species occupy distinct bioclimatic niches associated with seasonality of precipitation. Our data allow insightful analysis into the factors that could have led to cladogenesis within this rodent. More significantly, the new data enable us to pinpoint the Eastern Cape province as a contact zone for the divergent species. © 2011 The Linnean Society of London, Biological Journal of the Linnean Society, 2011, 104 , 192–206.  相似文献   

We developed MrEnt, a Windows‐based, user‐friendly software that allows the production of complex, high‐resolution, publication‐quality phylogenetic trees in few steps, directly from the analysis output. The program recognizes the standard Nexus tree format and the annotated tree files produced by BEAST and MrBayes. MrEnt combines in a single software a large suite of tree manipulation functions (e.g. handling of multiple trees, tree rotation, character mapping, node collapsing, compression of large clades, handling of time scale and error bars for chronograms) with drawing tools typical of standard graphic editors, including handling of graphic elements and images. The tree illustration can be printed or exported in several standard formats suitable for journal publication, PowerPoint presentation or Web publication.  相似文献   

The Bryaceae are a large cosmopolitan moss family including genera of significant morphological and taxonomic complexity. Phylogenetic relationships within the Bryaceae were reconstructed based on DNA sequence data from all three genomic compartments. In addition, maximum parsimony and Bayesian inference were employed to reconstruct ancestral character states of 38 morphological plus four habitat characters and eight insertion/deletion events. The recovered phylogenetic patterns are generally in accord with previous phylogenies based on chloroplast DNA sequence data and three major clades are identified. The first clade comprises Bryum bornholmense, B. rubens, B. caespiticium, and Plagiobryum. This corroborates the hypothesis suggested by previous studies that several Bryum species are more closely related to Plagiobryum than to the core Bryum species. The second clade includes Acidodontium, Anomobryum, and Haplodontium, while the third clade contains the core Bryum species plus Imbribryum. Within the latter clade, B. subapiculatum and B. tenuisetum form the sister clade to Imbribryum. Reconstructions of ancestral character states under maximum parsimony and Bayesian inference suggest fourteen morphological synapomorphies for the ingroup and synapomorphies are detected for most clades within the ingroup. Maximum parsimony and Bayesian reconstructions of ancestral character states are mostly congruent although Bayesian inference shows that the posterior probability of ancestral character states may decrease dramatically when node support is taken into account. Bayesian inference also indicates that reconstructions may be ambiguous at internal nodes for highly polymorphic characters.  相似文献   

Multigene and genomic data sets have become commonplace in the field of phylogenetics, but many existing tools are not designed for such data sets, which often makes the analysis time‐consuming and tedious. Here, we present PhyloSuite , a (cross‐platform, open‐source, stand‐alone Python graphical user interface) user‐friendly workflow desktop platform dedicated to streamlining molecular sequence data management and evolutionary phylogenetics studies. It uses a plugin‐based system that integrates several phylogenetic and bioinformatic tools, thereby streamlining the entire procedure, from data acquisition to phylogenetic tree annotation (in combination with iTOL). It has the following features: (a) point‐and‐click and drag‐and‐drop graphical user interface; (b) a workplace to manage and organize molecular sequence data and results of analyses; (c) GenBank entry extraction and comparative statistics; and (d) a phylogenetic workflow with batch processing capability, comprising sequence alignment (mafft and macse ), alignment optimization (trimAl, HmmCleaner and Gblocks), data set concatenation, best partitioning scheme and best evolutionary model selection (PartitionFinder and modelfinder ), and phylogenetic inference (MrBayes and iq‐tree ). PhyloSuite is designed for both beginners and experienced researchers, allowing the former to quick‐start their way into phylogenetic analysis, and the latter to conduct, store and manage their work in a streamlined way, and spend more time investigating scientific questions instead of wasting it on transferring files from one software program to another.  相似文献   

叶肢介(Conchostraca)的系统发育问题一直是甲壳动物研究中颇具争议的一个课题.本研究测定了我国2种叶肢介(Eocyzicus mongolianus,Eoc yzicus orientalis)的28S rDNA D1-D2区基因序列和16S rDNA E-G区序列,并与GenBank中的20种叶肢介序列一起...  相似文献   

研究利用细胞色素b(Cyt b)基因分析了采自于伊洛河的48个马口鱼(Opsariichthys bidens)个体间的遗传距离, 并构建其系统发育关系。分析结果显示, 48个个体聚为两个支持率为100%的分支, 分支间没有共享单倍型。每个分支的样本覆盖了所有的采样点, 分支内个体间的平均遗传距离为0.2%, 而分支间的遗传距离为3.1%。微卫星分析结果显示, 99.88%的遗传差异来自于种群内个体间, 种群间的差异只占了0.12%, 两个分支种群并没有发生显著的遗传分化(Fst=0.0012, P=1)。以δ13C和δ15N构建了两个分支的生态位, 结果显示, 伊洛河马口鱼的两个分支的营养生态位没有发生分离。基于线粒体Cyt b基因的遗传分歧, 伊洛河马口鱼的两个分支可能代表不同的物种。但它们在种群遗传结构上并没有发生显著的种群分化, 个体间亲缘关系树与系统发育树的分歧暗示种群间不存在生殖隔离, 营养生态位也没有分离。研究结果并不符合隐存种的解释, 伊洛河马口鱼两个分支间线粒体DNA的遗传差异可能源自于祖先种群或者种间杂交。  相似文献   

Phylogenetic trees are useful tools to infer evolutionary relationships between genetic entities. Phylogenetics enables not only evolution-based gene clustering but also the assignment of gene duplication and deletion events to the nodes when coupled with statistical approaches such as bootstrapping. However, extensive gene duplication and deletion events bring along a challenge in interpreting phylogenetic trees and require manual inference. In particular, there has been no robust method of determining whether one of the paralog clades systematically shows higher divergence following the gene duplication event as a sign of functional divergence. Here, we provide Phylostat, a graphical user interface that enables clade divergence analysis, visually and statistically. Phylostat is a web-based tool built on phylo.io to allow comparative clade divergence analysis, which is available at https://phylostat.adebalilab.org under an MIT open-source licence.  相似文献   

Contemporary phylogenomic studies frequently incorporate two-step coalescent analyses wherein the first step is to infer individual-gene trees, generally using maximum-likelihood implemented in the popular programs PhyML or RAxML . Four concerns with this approach are that these programs only present a single fully resolved gene tree to the user despite potential for ambiguous support, insufficient phylogenetic signal to fully resolve each gene tree, inexact computer arithmetic affecting the reported likelihood of gene trees, and an exclusive focus on the most likely tree while ignoring trees that are only slightly suboptimal or within the error tolerance. Taken together, these four concerns are sufficient for RAxML and Phy ML users to be suspicious of the resulting (perhaps over-resolved) gene-tree topologies and (perhaps unjustifiably high) bootstrap support for individual clades. In this study, we sought to determine how frequently these concerns apply in practice to contemporary phylogenomic studies that use RAxML for gene-tree inference. We did so by re-analyzing 100 genes from each of ten studies that, taken together, are representative of many empirical phylogenomic studies. Our seven findings are as follows. First, the few search replicates that are frequently applied in phylogenomic studies are generally insufficient to find the optimal gene-tree topology. Second, there is often more topological variation among slightly suboptimal gene trees relative to the best-reported tree than can be safely ignored. Third, the Shimodaira–Hasegawa-like approximate likelihood ratio test is highly effective at identifying dubiously supported clades and outperforms the alternative approaches of relying on bootstrap support or collapsing minimum-length branches. Fourth, the bootstrap can, but rarely does, indicate high support for clades that are not supported amongst slightly suboptimal trees. Fifth, increasing the accuracy by which RA xML optimizes model-parameter values generally has a nominal effect on selection of optimal trees. Sixth, tree searches using the GTRCAT model were generally less effective at finding optimal known trees than those using the GTRGAMMA model. Seventh, choice of gene-tree sampling strategy can affect inferred coalescent branch lengths, species-tree topology and branch support.  相似文献   

In this article we describe the construction of a general computer program for the iterative calculation of maximum likelihood estimators. The program is general in the sense that it allows the maximization of any given likelihood function. The user only has to write a subroutine LKLHD, in which the special likelihood function and their first and second derivatives will be calculated. This subroutine is an input parameter of the optimization program. This enables the user to employ one main program for the maximization of various likelihood functions. This advantage will be shown for the evaluation of qualitative dose response relationships (quantal assays: probit-, logit-analysis).  相似文献   

Phylogenetic networks represent the evolution of organisms that have undergone reticulate events, such as recombination, hybrid speciation or lateral gene transfer. An important way to interpret a phylogenetic network is in terms of the trees it displays, which represent all the possible histories of the characters carried by the organisms in the network. Interestingly, however, different networks may display exactly the same set of trees, an observation that poses a problem for network reconstruction: from the perspective of many inference methods such networks are indistinguishable. This is true for all methods that evaluate a phylogenetic network solely on the basis of how well the displayed trees fit the available data, including all methods based on input data consisting of clades, triples, quartets, or trees with any number of taxa, and also sequence-based approaches such as popular formalisations of maximum parsimony and maximum likelihood for networks. This identifiability problem is partially solved by accounting for branch lengths, although this merely reduces the frequency of the problem. Here we propose that network inference methods should only attempt to reconstruct what they can uniquely identify. To this end, we introduce a novel definition of what constitutes a uniquely reconstructible network. For any given set of indistinguishable networks, we define a canonical network that, under mild assumptions, is unique and thus representative of the entire set. Given data that underwent reticulate evolution, only the canonical form of the underlying phylogenetic network can be uniquely reconstructed. While on the methodological side this will imply a drastic reduction of the solution space in network inference, for the study of reticulate evolution this is a fundamental limitation that will require an important change of perspective when interpreting phylogenetic networks.  相似文献   

Nemacheilidae, in the superfamily Cobitoidea, is comprised of many of morphologically similar fish species that occur in Eurasian water bodies. This large group shows inconsistencies between traditional morphological taxonomy and molecular phylogenetic data. We used mitochondrial genomes, recombinase‐activating gene proteins 1 (RAG1) and the mitochondrial cytochrome c oxidase I gene (COI) to study the phylogenetic relationships among Nemacheilidae species using Bayesian inference and maximum likelihood approaches. Phylogenetic analyses based on mitogenomes provided support for two clades (I and II). The mitogenomes, RAG1, and COI results indicated that several species and genera were not consistent with the traditional morphological subdivisions. The two clades inferred from mitogenomes showed clear geographical patterns. The Tibetan Plateau, Hengduan Mountains, and the Iran Plateau may act as a barrier dividing the clades. The estimated timing of clades separation (36.05 million years ago) coincides with the first uplift of the Tibetan Plateau. We conclude that the geological history of the Tibetan Plateau played a role in the diversification and distribution of the Nemacheilidae taxa. These results provided a phylogenetic framework for future studies of this complex group.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号