首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
The rate of adaptive evolution depends on the rate at which beneficial mutations are introduced into a population and the fitness effects of those mutations. The rate of beneficial mutations and their expected fitness effects is often difficult to empirically quantify. As these 2 parameters determine the pace of evolutionary change in a population, the dynamics of adaptive evolution may enable inference of their values. Copy number variants (CNVs) are a pervasive source of heritable variation that can facilitate rapid adaptive evolution. Previously, we developed a locus-specific fluorescent CNV reporter to quantify CNV dynamics in evolving populations maintained in nutrient-limiting conditions using chemostats. Here, we use CNV adaptation dynamics to estimate the rate at which beneficial CNVs are introduced through de novo mutation and their fitness effects using simulation-based likelihood–free inference approaches. We tested the suitability of 2 evolutionary models: a standard Wright–Fisher model and a chemostat model. We evaluated 2 likelihood-free inference algorithms: the well-established Approximate Bayesian Computation with Sequential Monte Carlo (ABC-SMC) algorithm, and the recently developed Neural Posterior Estimation (NPE) algorithm, which applies an artificial neural network to directly estimate the posterior distribution. By systematically evaluating the suitability of different inference methods and models, we show that NPE has several advantages over ABC-SMC and that a Wright–Fisher evolutionary model suffices in most cases. Using our validated inference framework, we estimate the CNV formation rate at the GAP1 locus in the yeast Saccharomyces cerevisiae to be 10−4.7 to 10−4 CNVs per cell division and a fitness coefficient of 0.04 to 0.1 per generation for GAP1 CNVs in glutamine-limited chemostats. We experimentally validated our inference-based estimates using 2 distinct experimental methods—barcode lineage tracking and pairwise fitness assays—which provide independent confirmation of the accuracy of our approach. Our results are consistent with a beneficial CNV supply rate that is 10-fold greater than the estimated rates of beneficial single-nucleotide mutations, explaining the outsized importance of CNVs in rapid adaptive evolution. More generally, our study demonstrates the utility of novel neural network–based likelihood–free inference methods for inferring the rates and effects of evolutionary processes from empirical data with possible applications ranging from tumor to viral evolution.

This study shows that simulation-based inference of evolutionary dynamics using neural networks can yield parameter values for fitness and mutation rate that are difficult to determine experimentally, including those of copy number variants (CNVs) during experimental adaptive evolution of yeast.  相似文献   

3.
When a high-quality genome assembly of a target species is unavailable, an option to avoid the costly de novo assembly process is a mapping-based assembly. However, mapping shotgun data to a distant relative may lead to biased or erroneous evolutionary inference. Here, we used short-read data from a mammal (beluga whale) and a bird species (rowi kiwi) to evaluate whether reference genome phylogenetic distance can impact downstream demographic (Pairwise Sequentially Markovian Coalescent) and genetic diversity (heterozygosity, runs of homozygosity) analyses. We mapped to assemblies of species of varying phylogenetic distance (from conspecific to genome-wide divergence of >7%), and de novo assemblies created using cross-species scaffolding. We show that while reference genome phylogenetic distance has an impact on demographic analyses, it is not pronounced until using a reference genome with >3% divergence from the target species. When mapping to cross-species scaffolded assemblies, we are unable to replicate the original beluga demographic results, but are able with the rowi kiwi, presumably reflecting the more fragmented nature of the beluga assemblies. We find that increased phylogenetic distance has a pronounced impact on genetic diversity estimates; heterozygosity estimates deviate incrementally with increasing phylogenetic distance. Moreover, runs of homozygosity are largely undetectable when mapping to any nonconspecific assembly. However, these biases can be reduced when mapping to a cross-species scaffolded assembly. Taken together, our results show that caution should be exercised when selecting reference genomes. Cross-species scaffolding may offer a way to avoid a costly, traditional de novo assembly, while still producing robust, evolutionary inference.  相似文献   

4.
The statistical rigor of species delimitation has increased dramatically over the past decade. Coalescent theory provides powerful models for population genetic inference, and is now increasingly important in phylogenetics and speciation research. By applying probabilistic models, coalescent-based species delimitation provides clear and objective testing of alternative hypotheses of evolutionary independence. As acquisition of multilocus data becomes increasingly automated, coalescent-based species delimitation will improve the discovery, resolution, consistency, and stability of the taxonomy of species. Along with other tools and data types, coalescent-based species delimitation will play an important role in an integrative taxonomy that emphasizes the identification of species limits and the processes that have promoted lineage diversification.  相似文献   

5.
Boolean networks and, more generally, probabilistic Boolean networks, as one class of gene regulatory networks, model biological processes with the network dynamics determined by the logic-rule regulatory functions in conjunction with probabilistic parameters involved in network transitions. While there has been significant research on applying different control policies to alter network dynamics as future gene therapeutic intervention, we have seen less work on understanding the sensitivity of network dynamics with respect to perturbations to networks, including regulatory rules and the involved parameters, which is particularly critical for the design of intervention strategies. This paper studies this less investigated issue of network sensitivity in the long run. As the underlying model of probabilistic Boolean networks is a finite Markov chain, we define the network sensitivity based on the steady-state distributions of probabilistic Boolean networks and call it long-run sensitivity. The steady-state distribution reflects the long-run behavior of the network and it can give insight into the dynamics or momentum existing in a system. The change of steady-state distribution caused by possible perturbations is the key measure for intervention. This newly defined long-run sensitivity can provide insight on both network inference and intervention. We show the results for probabilistic Boolean networks generated from random Boolean networks and the results from two real biological networks illustrate preliminary applications of sensitivity in intervention for practical problems.  相似文献   

6.
A fundamental task in sequence analysis is to calculate the probability of a multiple alignment given a phylogenetic tree relating the sequences and an evolutionary model describing how sequences change over time. However, the most widely used phylogenetic models only account for residue substitution events. We describe a probabilistic model of a multiple sequence alignment that accounts for insertion and deletion events in addition to substitutions, given a phylogenetic tree, using a rate matrix augmented by the gap character. Starting from a continuous Markov process, we construct a non-reversible generative (birth-death) evolutionary model for insertions and deletions. The model assumes that insertion and deletion events occur one residue at a time. We apply this model to phylogenetic tree inference by extending the program dnaml in phylip. Using standard benchmarking methods on simulated data and a new "concordance test" benchmark on real ribosomal RNA alignments, we show that the extended program dnamlepsilon improves accuracy relative to the usual approach of ignoring gaps, while retaining the computational efficiency of the Felsenstein peeling algorithm.  相似文献   

7.
Recombination is an important evolutionary force in bacteria, but it remains challenging to reconstruct the imports that occurred in the ancestry of a genomic sample. Here we present ClonalFrameML, which uses maximum likelihood inference to simultaneously detect recombination in bacterial genomes and account for it in phylogenetic reconstruction. ClonalFrameML can analyse hundreds of genomes in a matter of hours, and we demonstrate its usefulness on simulated and real datasets. We find evidence for recombination hotspots associated with mobile elements in Clostridium difficile ST6 and a previously undescribed 310kb chromosomal replacement in Staphylococcus aureus ST582. ClonalFrameML is freely available at http://clonalframeml.googlecode.com/.  相似文献   

8.
9.
A "Long Indel" model for evolutionary sequence alignment   总被引:7,自引:0,他引:7  
We present a new probabilistic model of sequence evolution, allowing indels of arbitrary length, and give sequence alignment algorithms for our model. Previously implemented evolutionary models have allowed (at most) single-residue indels or have introduced artifacts such as the existence of indivisible "fragments." We compare our algorithm to these previous methods by applying it to the structural homology dataset HOMSTRAD, evaluating the accuracy of (1) alignments and (2) evolutionary time estimates. With our method, it is possible (for the first time) to integrate probabilistic sequence alignment, with reliability indicators and arbitrary gap penalties, in the same framework as phylogenetic reconstruction. Our alignment algorithm requires that we evaluate the likelihood of any specific path of mutation events in a continuous-time Markov model, with the event times integrated out. To this effect, we introduce a "trajectory likelihood" algorithm (Appendix A). We anticipate that this algorithm will be useful in more general contexts, such as Markov Chain Monte Carlo simulations.  相似文献   

10.
We conduct species delimitation of the widespread parachuting frog species Rhacophorus catamitus using samples from across the island of Sumatra, Indonesia. We use mitochondrial, genomic and morphological data, and find that R. catamitus is composed of three lineages corresponding to northern, central and southern lineages. Mitochondrial and genomic data show admixture or incomplete lineage sorting between the central and southern lineages, but deep divergence from the northern lineage. Coalescent species delimitation supports a three species model for this complex, and we recommend that the northern lineage be described as a new species. Our study highlights the power of coalescent species delimitation in an integrative framework for identifying unrecognised diversity in understudied tropical species complexes. We also emphasise the evolutionary importance of northern Sumatra, a region that harboured montane refugia during Pliocene–Pleistocene climate change, but has also been heavily affected by volcanic activity.  相似文献   

11.
Inference of protein functions is one of the most important aims of modern biology. To fully exploit the large volumes of genomic data typically produced in modern-day genomic experiments, automated computational methods for protein function prediction are urgently needed. Established methods use sequence or structure similarity to infer functions but those types of data do not suffice to determine the biological context in which proteins act. Current high-throughput biological experiments produce large amounts of data on the interactions between proteins. Such data can be used to infer interaction networks and to predict the biological process that the protein is involved in. Here, we develop a probabilistic approach for protein function prediction using network data, such as protein-protein interaction measurements. We take a Bayesian approach to an existing Markov Random Field method by performing simultaneous estimation of the model parameters and prediction of protein functions. We use an adaptive Markov Chain Monte Carlo algorithm that leads to more accurate parameter estimates and consequently to improved prediction performance compared to the standard Markov Random Fields method. We tested our method using a high quality S.cereviciae validation network with 1622 proteins against 90 Gene Ontology terms of different levels of abstraction. Compared to three other protein function prediction methods, our approach shows very good prediction performance. Our method can be directly applied to protein-protein interaction or coexpression networks, but also can be extended to use multiple data sources. We apply our method to physical protein interaction data from S. cerevisiae and provide novel predictions, using 340 Gene Ontology terms, for 1170 unannotated proteins and we evaluate the predictions using the available literature.  相似文献   

12.
This paper describes a variational free-energy formulation of (partially observable) Markov decision problems in decision making under uncertainty. We show that optimal control can be cast as active inference. In active inference, both action and posterior beliefs about hidden states minimise a free energy bound on the negative log-likelihood of observed states, under a generative model. In this setting, reward or cost functions are absorbed into prior beliefs about state transitions and terminal states. Effectively, this converts optimal control into a pure inference problem, enabling the application of standard Bayesian filtering techniques. We then consider optimal trajectories that rest on posterior beliefs about hidden states in the future. Crucially, this entails modelling control as a hidden state that endows the generative model with a representation of agency. This leads to a distinction between models with and without inference on hidden control states; namely, agency-free and agency-based models, respectively.  相似文献   

13.
14.
We study the phylogeny of the placental mammals using molecular data from all mitochondrial tRNAs and rRNAs of 54 species. We use probabilistic substitution models specific to evolution in base paired regions of RNA. A number of these models have been implemented in a new phylogenetic inference software package for carrying out maximum likelihood and Bayesian phylogenetic inferences. We describe our Bayesian phylogenetic method which uses a Markov chain Monte Carlo algorithm to provide samples from the posterior distribution of tree topologies. Our results show support for four primary mammalian clades, in agreement with recent studies of much larger data sets mainly comprising nuclear DNA. We discuss some issues arising when using Bayesian techniques on RNA sequence data.  相似文献   

15.
Transmission lies at the interface of human immunodeficiency virus type 1 (HIV-1) evolution within and among hosts and separates distinct selective pressures that impose differences in both the mode of diversification and the tempo of evolution. In the absence of comprehensive direct comparative analyses of the evolutionary processes at different biological scales, our understanding of how fast within-host HIV-1 evolutionary rates translate to lower rates at the between host level remains incomplete. Here, we address this by analyzing pol and env data from a large HIV-1 subtype C transmission chain for which both the timing and the direction is known for most transmission events. To this purpose, we develop a new transmission model in a Bayesian genealogical inference framework and demonstrate how to constrain the viral evolutionary history to be compatible with the transmission history while simultaneously inferring the within-host evolutionary and population dynamics. We show that accommodating a transmission bottleneck affords the best fit our data, but the sparse within-host HIV-1 sampling prevents accurate quantification of the concomitant loss in genetic diversity. We draw inference under the transmission model to estimate HIV-1 evolutionary rates among epidemiologically-related patients and demonstrate that they lie in between fast intra-host rates and lower rates among epidemiologically unrelated individuals infected with HIV subtype C. Using a new molecular clock approach, we quantify and find support for a lower evolutionary rate along branches that accommodate a transmission event or branches that represent the entire backbone of transmitted lineages in our transmission history. Finally, we recover the rate differences at the different biological scales for both synonymous and non-synonymous substitution rates, which is only compatible with the ‘store and retrieve’ hypothesis positing that viruses stored early in latently infected cells preferentially transmit or establish new infections upon reactivation.  相似文献   

16.
Driven by the desire to understand genomic functions through the interactions among genes and gene products, the research in gene regulatory networks has become a heated area in genomic signal processing. Among the most studied mathematical models are Boolean networks and probabilistic Boolean networks, which are rule-based dynamic systems. This tutorial provides an introduction to the essential concepts of these two Boolean models, and presents the up-to-date analysis and simulation methods developed for them. In the Analysis section, we will show that Boolean models are Markov chains, based on which we present a Markovian steady-state analysis on attractors, and also reveal the relationship between probabilistic Boolean networks and dynamic Bayesian networks (another popular genetic network model), again via Markov analysis; we dedicate the last subsection to structural analysis, which opens a door to other topics such as network control. The Simulation section will start from the basic tasks of creating state transition diagrams and finding attractors, proceed to the simulation of network dynamics and obtaining the steady-state distributions, and finally come to an algorithm of generating artificial Boolean networks with prescribed attractors. The contents are arranged in a roughly logical order, such that the Markov chain analysis lays the basis for the most part of Analysis section, and also prepares the readers to the topics in Simulation section.  相似文献   

17.
A quantitative trait depends on multiple quantitative trait loci (QTL) and on the interaction between two or more QTL, named epistasis. Several methods to detect multiple QTL in various types of design have been proposed, but most of these are based on the assumption that each QTL works independently and epistasis has not been explored sufficiently. The objective of the study was to propose an integrated method to detect multiple QTL with epistases using Bayesian inference via a Markov chain Monte Carlo (MCMC) algorithm. Since the mixed inheritance model is assumed and the deterministic algorithm to calculate the probabilities of QTL genotypes is incorporated in the method, this can be applied to an outbred population such as livestock. Additionally, we treated a pair of QTL as one variable in the Reversible jump Markov chain Monte Carlo (RJMCMC) algorithm so that two QTL were able to be simultaneously added into or deleted from a model. As a result, both of the QTL can be detected, not only in cases where either of the two QTL has main effects and they have epistatic effects between each other, but also in cases where neither of the two QTL has main effects but they have epistatic effects. The method will help ascertain the complicated structure of quantitative traits.  相似文献   

18.
With the continued adoption of genome‐scale data in evolutionary biology comes the challenge of adequately harnessing the information to make accurate phylogenetic inferences. Coalescent‐based methods of species tree inference have become common, and concatenation has been shown in simulation to perform well, particularly when levels of incomplete lineage sorting are low. However, simulation conditions are often overly simplistic, leaving empiricists with uncertainty regarding analytical tools. We use a large ultraconserved element data set (>3,000 loci) from rattlesnakes of the Crotalus triseriatus group to delimit lineages and estimate species trees using concatenation and several coalescent‐based methods. Unpartitioned and partitioned maximum likelihood and Bayesian analysis of the concatenated matrix yield a topology identical to coalescent analysis of a subset of the data in bpp . ASTRAL analysis on a subset of the more variable loci also results in a tree consistent with concatenation and bpp , whereas the SVDquartets phylogeny differs at additional nodes. The size of the concatenated matrix has a strong effect on species tree inference using SVDquartets , warranting additional investigation on optimal data characteristics for this method. Species delimitation analyses suggest up to 16 unique lineages may be present within the C. triseriatus group, with divergences occurring during the Neogene and Quaternary. Network analyses suggest hybridization within the group is relatively rare. Altogether, our results reaffirm the Mexican highlands as a biodiversity hotspot and suggest that coalescent‐based species tree inference on data subsets can provide a strongly supported species tree consistent with concatenation of all loci with a large amount of missing data.  相似文献   

19.
Recombination has been invoked to explain the disparate evolutionary relationships observed for different genes or sequence segments of a single HIV-1 genome. We present a new method of assessing confidence in HIV-1 recombination as an alternative to the segment-by-segment nonparametric bootstrap commonly applied to confirm HIV-1 recombinant data. Our new method uses the bias-corrected accelerated percentile interval (BCa) bootstrap method as applied to the ``problem of regions' (Efron and Tibshirani 1998). It is an extension of the BCa method used in the inference of evolutionary relationships (Efron et al. 1996). This method has two advantages over the traditional bootstrap procedure: (1) it gives a single overall confidence measure rather than segment-by-segment results, and (2) it is more accurate. We test our method on 61 sequences, including 16 with ambiguous recombinant status. Received: 9 August 2000 / Accepted: 6 July 2001  相似文献   

20.
Inferring gene regulatory networks from expression data is difficult, but it is common and often useful. Most network problems are under-determined–there are more parameters than data points–and therefore data or parameter set reduction is often necessary. Correlation between variables in the model also contributes to confound network coefficient inference. In this paper, we present an algorithm that uses integrated, probabilistic clustering to ease the problems of under-determination and correlated variables within a fully Bayesian framework. Specifically, ours is a dynamic Bayesian network with integrated Gaussian mixture clustering, which we fit using variational Bayesian methods. We show, using public, simulated time-course data sets from the DREAM4 Challenge, that our algorithm outperforms non-clustering methods in many cases (7 out of 25) with fewer samples, rarely underperforming (1 out of 25), and often selects a non-clustering model if it better describes the data. Source code (GNU Octave) for BAyesian Clustering Over Networks (BACON) and sample data are available at: http://code.google.com/p/bacon-for-genetic-networks.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号