首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this article, we propose a method for analyzing the spatial variations in the range expansion of the pine processionary moth (PPM), an invasive species in France. Based on binary measurements - the presence or absence of PPM nests - the proposed method allows us to infer the local effect of the environment on PPM population expansion. This effect is estimated at each position x using a parameter F(x) that corresponds to the local PPM fitness. The data type and the two stage PPM life cycle make estimating this parameter difficult. To overcome these difficulties we adopt a mechanistic-statistical approach that combines a statistical model for the observation process with a hierarchical,reaction-diffusion based mechanistic model for the expansion process. Bayesian inference of the parameter F(x) reveals that PPM fitness is spatially heterogeneous and highlights the existence of large regions associated with lower fitness. The factors underlying this lower fitness are yet to be determined.  相似文献   

2.
We propose a Bayesian method for testing molecular clock hypotheses for use with aligned sequence data from multiple taxa. Our method utilizes a nonreversible nucleotide substitution model to avoid the necessity of specifying either a known tree relating the taxa or an outgroup for rooting the tree. We employ reversible jump Markov chain Monte Carlo to sample from the posterior distribution of the phylogenetic model parameters and conduct hypothesis testing using Bayes factors, the ratio of the posterior to prior odds of competing models. Here, the Bayes factors reflect the relative support of the sequence data for equal rates of evolutionary change between taxa versus unequal rates, averaged over all possible phylogenetic parameters, including the tree and root position. As the molecular clock model is a restriction of the more general unequal rates model, we use the Savage-Dickey ratio to estimate the Bayes factors. The Savage-Dickey ratio provides a convenient approach to calculating Bayes factors in favor of sharp hypotheses. Critical to calculating the Savage-Dickey ratio is a determination of the prior induced on the modeling restrictions. We demonstrate our method on a well-studied mtDNA sequence data set consisting of nine primates. We find strong support against a global molecular clock, but do find support for a local clock among the anthropoids. We provide mathematical derivations of the induced priors on branch length restrictions assuming equally likely trees. These derivations also have more general applicability to the examination of prior assumptions in Bayesian phylogenetics.  相似文献   

3.
Neuron models, in particular conductance-based compartmental models, often have numerous parameters that cannot be directly determined experimentally and must be constrained by an optimization procedure. A common practice in evaluating the utility of such procedures is using a previously developed model to generate surrogate data (e.g., traces of spikes following step current pulses) and then challenging the algorithm to recover the original parameters (e.g., the value of maximal ion channel conductances) that were used to generate the data. In this fashion, the success or failure of the model fitting procedure to find the original parameters can be easily determined. Here we show that some model fitting procedures that provide an excellent fit in the case of such model-to-model comparisons provide ill-balanced results when applied to experimental data. The main reason is that surrogate and experimental data test different aspects of the algorithm’s function. When considering model-generated surrogate data, the algorithm is required to locate a perfect solution that is known to exist. In contrast, when considering experimental target data, there is no guarantee that a perfect solution is part of the search space. In this case, the optimization procedure must rank all imperfect approximations and ultimately select the best approximation. This aspect is not tested at all when considering surrogate data since at least one perfect solution is known to exist (the original parameters) making all approximations unnecessary. Furthermore, we demonstrate that distance functions based on extracting a set of features from the target data (such as time-to-first-spike, spike width, spike frequency, etc.)—rather than using the original data (e.g., the whole spike trace) as the target for fitting—are capable of finding imperfect solutions that are good approximations of the experimental data.  相似文献   

4.
A computational methodology for accurately predicting flow and oxygen-transport characteristics and performance of an intravenous membrane oxygenator (IMO) device is developed, tested, and validated. This methodology uses extensive numerical simulations of three-dimensional computational models to determine flow-mixing characteristics and oxygen-transfer performance, and analytical models to indirectly validate numerical predictions with experimental data, using both blood and water as working fluids. Direct numerical simulations for IMO stationary and pulsating balloons predict flow field and oxygen transport performance in response to changes in the device length, number of and balloon pulsation frequency. Multifiber models are used to investigate interfiber interference and length effects for a stationary balloon whereas a single fiber model is used to analyze the effect of balloon pulsations on velocity and oxygen concentration fields and to evaluate oxygen transfer rates. An analytical lumped model is developed and validated by comparing its numerical predictions with experimental data. Numerical results demonstrate that oxygen transfer rates for a stationary balloon regime decrease with increasing number of fibers, independent of the fluid type. The oxygen transfer rate ratio obtained with blood and water is approximately two. Balloon pulsations show an effective and enhanced flow mixing, with time-dependent recirculating flows around the fibers regions which induce higher oxygen transfer rates. The mass transfer rates increase approximately 100% and 80%, with water and blood, respectively, compared with stationary balloon operation. Calculations with combinations of frequency, number of fibers, fiber length and diameter, and inlet volumetric flow rates, agree well with the reported experimental results, and provide a solid comparative base for analysis, predictions, and comparisons with numerical and experimental data.  相似文献   

5.
Detection-nondetection data are often used to investigate species range dynamics using Bayesian occupancy models which rely on the use of Markov chain Monte Carlo (MCMC) methods to sample from the posterior distribution of the parameters of the model. In this article we develop two Variational Bayes (VB) approximations to the posterior distribution of the parameters of a single-season site occupancy model which uses logistic link functions to model the probability of species occurrence at sites and of species detection probabilities. This task is accomplished through the development of iterative algorithms that do not use MCMC methods. Simulations and small practical examples demonstrate the effectiveness of the proposed technique. We specifically show that (under certain circumstances) the variational distributions can provide accurate approximations to the true posterior distributions of the parameters of the model when the number of visits per site (K) are as low as three and that the accuracy of the approximations improves as K increases. We also show that the methodology can be used to obtain the posterior distribution of the predictive distribution of the proportion of sites occupied (PAO).  相似文献   

6.
Finite mixture models can provide the insights about behavioral patterns as a source of heterogeneity of the various dynamics of time course gene expression data by reducing the high dimensionality and making clear the major components of the underlying structure of the data in terms of the unobservable latent variables. The latent structure of the dynamic transition process of gene expression changes over time can be represented by Markov processes. This paper addresses key problems in the analysis of large gene expression data sets that describe systemic temporal response cascades and dynamic changes to therapeutic doses in multiple tissues, such as liver, skeletal muscle, and kidney from the same animals. Bayesian Finite Markov Mixture Model with a Dirichlet Prior is developed for the identifications of differentially expressed time related genes and dynamic clusters. Deviance information criterion is applied to determine the number of components for model comparisons and selections. The proposed Bayesian models are applied to multiple tissue polygenetic temporal gene expression data and compared to a Bayesian model‐based clustering method, named CAGED. Results show that our proposed Bayesian Finite Markov Mixture model can well capture the dynamic changes and patterns for irregular complex temporal data (© 2009 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

7.
In epidemic models, the effective reproduction number is of central importance to assess the transmission dynamics of an infectious disease and to orient health intervention strategies. Publicly shared data during an outbreak often suffers from two sources of misreporting (underreporting and delay in reporting) that should not be overlooked when estimating epidemiological parameters. The main statistical challenge in models that intrinsically account for a misreporting process lies in the joint estimation of the time-varying reproduction number and the delay/underreporting parameters. Existing Bayesian approaches typically rely on Markov chain Monte Carlo algorithms that are extremely costly from a computational perspective. We propose a much faster alternative based on Laplacian-P-splines (LPS) that combines Bayesian penalized B-splines for flexible and smooth estimation of the instantaneous reproduction number and Laplace approximations to selected posterior distributions for fast computation. Assuming a known generation interval distribution, the incidence at a given calendar time is governed by the epidemic renewal equation and the delay structure is specified through a composite link framework. Laplace approximations to the conditional posterior of the spline vector are obtained from analytical versions of the gradient and Hessian of the log-likelihood, implying a drastic speed-up in the computation of posterior estimates. Furthermore, the proposed LPS approach can be used to obtain point estimates and approximate credible intervals for the delay and reporting probabilities. Simulation of epidemics with different combinations for the underreporting rate and delay structure (one-day, two-day, and weekend delays) show that the proposed LPS methodology delivers fast and accurate estimates outperforming existing methods that do not take into account underreporting and delay patterns. Finally, LPS is illustrated in two real case studies of epidemic outbreaks.  相似文献   

8.
In network meta-analysis (NMA), treatments can be complex interventions, for example, some treatments may be combinations of others or of common components. In standard NMA, all existing (single or combined) treatments are different nodes in the network. However, sometimes an alternative model is of interest that utilizes the information that some treatments are combinations of common components, called component network meta-analysis (CNMA) model. The additive CNMA model assumes that the effect of a treatment combined of two components A and B is the sum of the effects of A and B, which is easily extended to treatments composed of more than two components. This implies that in comparisons equal components cancel out. Interaction CNMA models also allow interactions between the components. Bayesian analyses have been suggested. We report an implementation of CNMA models in the frequentist R package netmeta . All parameters are estimated using weighted least squares regression. We illustrate the application of CNMA models using an NMA of treatments for depression in primary care. Moreover, we show that these models can even be applied to disconnected networks, if the composite treatments in the subnetworks contain common components.  相似文献   

9.
10.
The analysis of nonlinear function-valued characters is very important in genetic studies, especially for growth traits of agricultural and laboratory species. Inference in nonlinear mixed effects models is, however, quite complex and is usually based on likelihood approximations or Bayesian methods. The aim of this paper was to present an efficient stochastic EM procedure, namely the SAEM algorithm, which is much faster to converge than the classical Monte Carlo EM algorithm and Bayesian estimation procedures, does not require specification of prior distributions and is quite robust to the choice of starting values. The key idea is to recycle the simulated values from one iteration to the next in the EM algorithm, which considerably accelerates the convergence. A simulation study is presented which confirms the advantages of this estimation procedure in the case of a genetic analysis. The SAEM algorithm was applied to real data sets on growth measurements in beef cattle and in chickens. The proposed estimation procedure, as the classical Monte Carlo EM algorithm, provides significance tests on the parameters and likelihood based model comparison criteria to compare the nonlinear models with other longitudinal methods.  相似文献   

11.
Deterministic processes may uniquely affect codistributed species’ phylogeographic patterns such that discordant genetic variation among taxa is predicted. Yet, explicitly testing expectations of genomic discordance in a statistical framework remains challenging. Here, we construct spatially and temporally dynamic models to investigate the hypothesized effect of microhabitat preferences on the permeability of glaciated regions to gene flow in two closely related montane species. Utilizing environmental niche models from the Last Glacial Maximum and the present to inform demographic models of changes in habitat suitability over time, we evaluate the relative probabilities of two alternative models using approximate Bayesian computation (ABC) in which glaciated regions are either (i) permeable or (ii) a barrier to gene flow. Results based on the fit of the empirical data to data sets simulated using a spatially explicit coalescent under alternative models indicate that genomic data are consistent with predictions about the hypothesized role of microhabitat in generating discordant patterns of genetic variation among the taxa. Specifically, a model in which glaciated areas acted as a barrier was much more probable based on patterns of genomic variation in Carex nova, a wet‐adapted species. However, in the dry‐adapted Carex chalciolepis, the permeable model was more probable, although the difference in the support of the models was small. This work highlights how statistical inferences can be used to distinguish deterministic processes that are expected to result in discordant genomic patterns among species, including species‐specific responses to climate change.  相似文献   

12.
Phylogenetic studies incorporating multiple loci, and multiple genomes, are becoming increasingly common. Coincident with this trend in genetic sampling, model-based likelihood techniques including Bayesian phylogenetic methods continue to gain popularity. Few studies, however, have examined model fit and sensitivity to such potentially heterogeneous data partitions within combined data analyses using empirical data. Here we investigate the relative model fit and sensitivity of Bayesian phylogenetic methods when alternative site-specific partitions of among-site rate variation (with and without autocorrelated rates) are considered. Our primary goal in choosing a best-fit model was to employ the simplest model that was a good fit to the data while optimizing topology and/or Bayesian posterior probabilities. Thus, we were not interested in complex models that did not practically affect our interpretation of the topology under study. We applied these alternative models to a four-gene data set including one protein-coding nuclear gene (c-mos), one protein-coding mitochondrial gene (ND4), and two mitochondrial rRNA genes (12S and 16S) for the diverse yet poorly known lizard family Gymnophthalmidae. Our results suggest that the best-fit model partitioned among-site rate variation separately among the c-mos, ND4, and 12S + 16S gene regions. We found this model yielded identical topologies to those from analyses based on the GTR+I+G model, but significantly changed posterior probability estimates of clade support. This partitioned model also produced more precise (less variable) estimates of posterior probabilities across generations of long Bayesian runs, compared to runs employing a GTR+I+G model estimated for the combined data. We use this three-way gamma partitioning in Bayesian analyses to reconstruct a robust phylogenetic hypothesis for the relationships of genera within the lizard family Gymnophthalmidae. We then reevaluate the higher-level taxonomic arrangement of the Gymnophthalmidae. Based on our findings, we discuss the utility of nontraditional parameters for modeling among-site rate variation and the implications and future directions for complex model building and testing.  相似文献   

13.
As the field of phylogeography has continued to move in the model‐based direction, researchers continue struggling to construct useful models for inference. These models must be both simple enough to be tractable yet contain enough of the complexity of the natural world to make meaningful inference. Beyond constructing such models for inference, researchers explore model space and test competing models with the data on hand, with the goal of improving the understanding of the natural world and the processes underlying natural biological communities. Approximate Bayesian computation (ABC) has increased in recent popularity as a tool for evaluating alternative historical demographic models given population genetic samples. As a thorough demonstration, Pelletier & Carstens ( 2014 ) use ABC to test 143 phylogeographic submodels given geographically widespread genetic samples from the salamander species Plethodon idahoensis (Carstens et al. 2004 ) and, in so doing, demonstrate how the results of the ABC model choice procedure are dependent on the model set one chooses to evaluate.  相似文献   

14.
15.
Though stochastic models are widely used to describe single ion channel behaviour, statistical inference based on them has received little consideration. This paper describes techniques of statistical inference, in particular likelihood methods, suitable for Markov models incorporating limited time resolution by means of a discrete detection limit. To simplify the analysis, attention is restricted to two-state models, although the methods have more general applicability. Non-uniqueness of the mean open-time and mean closed-time estimators obtained by moment methods based on single exponential approximations to the apparent open-time and apparent closed-time distributions has been reported. The present study clarifies and extends this previous work by proving that, for such approximations, the likelihood equations as well as the moment equations (usually) have multiple solutions. Such non-uniqueness corresponds to non-identifiability of the statistical model for the apparent quantities. By contrast, higher-order approximations yield theoretically identifiable models. Likelihood-based estimation procedures are developed for both single exponential and bi-exponential approximations. The methods and results are illustrated by numerical examples based on literature and simulated data, with consideration given to empirical distributions and model control, likelihood plots, and point estimation and confidence regions.  相似文献   

16.
17.
Marker pair selection for mapping quantitative trait loci   总被引:10,自引:0,他引:10  
Piepho HP  Gauch HG 《Genetics》2001,157(1):433-444
Mapping of quantitative trait loci (QTL) for backcross and F(2) populations may be set up as a multiple linear regression problem, where marker types are the regressor variables. It has been shown previously that flanking markers absorb all information on isolated QTL. Therefore, selection of pairs of markers flanking QTL is useful as a direct approach to QTL detection. Alternatively, selected pairs of flanking markers can be used as cofactors in composite interval mapping (CIM). Overfitting is a serious problem, especially if the number of regressor variables is large. We suggest a procedure denoted as marker pair selection (MPS) that uses model selection criteria for multiple linear regression. Markers enter the model in pairs, which reduces the number of models to be considered, thus alleviating the problem of overfitting and increasing the chances of detecting QTL. MPS entails an exhaustive search per chromosome to maximize the chance of finding the best-fitting models. A simulation study is conducted to study the merits of different model selection criteria for MPS. On the basis of our results, we recommend the Schwarz Bayesian criterion (SBC) for use in practice.  相似文献   

18.
A general Bayesian model, Diploffect, is described for estimating the effects of founder haplotypes at quantitative trait loci (QTL) detected in multiparental genetic populations; such populations include the Collaborative Cross (CC), Heterogeneous Socks (HS), and many others for which local genetic variation is well described by an underlying, usually probabilistically inferred, haplotype mosaic. Our aim is to provide a framework for coherent estimation of haplotype and diplotype (haplotype pair) effects that takes into account the following: uncertainty in haplotype composition for each individual; uncertainty arising from small sample sizes and infrequently observed haplotype combinations; possible effects of dominance (for noninbred subjects); genetic background; and that provides a means to incorporate data that may be incomplete or has a hierarchical structure. Using the results of a probabilistic haplotype reconstruction as prior information, we obtain posterior distributions at the QTL for both haplotype effects and haplotype composition. Two alternative computational approaches are supplied: a Markov chain Monte Carlo sampler and a procedure based on importance sampling of integrated nested Laplace approximations. Using simulations of QTL in the incipient CC (pre-CC) and Northport HS populations, we compare the accuracy of Diploffect, approximations to it, and more commonly used approaches based on Haley–Knott regression, describing trade-offs between these methods. We also estimate effects for three QTL previously identified in those populations, obtaining posterior intervals that describe how the phenotype might be affected by diplotype substitutions at the modeled locus.  相似文献   

19.
Asymmetric regression is an alternative to conventional linear regression that allows us to model the relationship between predictor variables and the response variable while accommodating skewness. Advantages of asymmetric regression include incorporating realistic ecological patterns observed in data, robustness to model misspecification and less sensitivity to outliers. Bayesian asymmetric regression relies on asymmetric distributions such as the asymmetric Laplace (ALD) or asymmetric normal (AND) in place of the normal distribution used in classic linear regression models. Asymmetric regression concepts can be used for process and parameter components of hierarchical Bayesian models and have a wide range of applications in data analyses. In particular, asymmetric regression allows us to fit more realistic statistical models to skewed data and pairs well with Bayesian inference. We first describe asymmetric regression using the ALD and AND. Second, we show how the ALD and AND can be used for Bayesian quantile and expectile regression for continuous response data. Third, we consider an extension to generalize Bayesian asymmetric regression to survey data consisting of counts of objects. Fourth, we describe a regression model using the ALD, and show that it can be applied to add needed flexibility, resulting in better predictive models compared to Poisson or negative binomial regression. We demonstrate concepts by analyzing a data set consisting of counts of Henslow’s sparrows following prescribed fire and provide annotated computer code to facilitate implementation. Our results suggest Bayesian asymmetric regression is an essential component of a scientist’s statistical toolbox.  相似文献   

20.
Estimating polygenic effects using markers of the entire genome   总被引:26,自引:0,他引:26  
Xu S 《Genetics》2003,163(2):789-801
Molecular markers have been used to map quantitative trait loci. However, they are rarely used to evaluate effects of chromosome segments of the entire genome. The original interval-mapping approach and various modified versions of it may have limited use in evaluating the genetic effects of the entire genome because they require evaluation of multiple models and model selection. Here we present a Bayesian regression method to simultaneously estimate genetic effects associated with markers of the entire genome. With the Bayesian method, we were able to handle situations in which the number of effects is even larger than the number of observations. The key to the success is that we allow each marker effect to have its own variance parameter, which in turn has its own prior distribution so that the variance can be estimated from the data. Under this hierarchical model, we were able to handle a large number of markers and most of the markers may have negligible effects. As a result, it is possible to evaluate the distribution of the marker effects. Using data from the North American Barley Genome Mapping Project in double-haploid barley, we found that the distribution of gene effects follows closely an L-shaped Gamma distribution, which is in contrast to the bell-shaped Gamma distribution when the gene effects were estimated from interval mapping. In addition, we show that the Bayesian method serves as an alternative or even better QTL mapping method because it produces clearer signals for QTL. Similar results were found from simulated data sets of F(2) and backcross (BC) families.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号