首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
We propose a genealogy-sampling algorithm, Sequential Markov Ancestral Recombination Tree (SMARTree), that provides an approach to estimation from SNP haplotype data of the patterns of coancestry across a genome segment among a set of homologous chromosomes. To enable analysis across longer segments of genome, the sequence of coalescent trees is modeled via the modified sequential Markov coalescent (Marjoram and Wall, Genetics 7:16, 2006). To assess performance in estimating these local trees, our SMARTree implementation is tested on simulated data. Our base data set is of the SNPs in 10 DNA sequences over 50 kb. We examine the effects of longer sequences and of more sequences, and of a recombination and/or mutational hotspot. The model underlying SMARTree is an approximation to the full recombinant-coalescent distribution. However, in a small trial on simulated data, recovery of local trees was similar to that of LAMARC (Kuhner et al. Genetics 156:1393-1401, 2000a), a sampler which uses the full model.  相似文献   

2.
The accurate identification of the route of transmission taken by an infectious agent through a host population is critical to understanding its epidemiology and informing measures for its control. However, reconstruction of transmission routes during an epidemic is often an underdetermined problem: data about the location and timings of infections can be incomplete, inaccurate, and compatible with a large number of different transmission scenarios. For fast-evolving pathogens like RNA viruses, inference can be strengthened by using genetic data, nowadays easily and affordably generated. However, significant statistical challenges remain to be overcome in the full integration of these different data types if transmission trees are to be reliably estimated. We present here a framework leading to a bayesian inference scheme that combines genetic and epidemiological data, able to reconstruct most likely transmission patterns and infection dates. After testing our approach with simulated data, we apply the method to two UK epidemics of Foot-and-Mouth Disease Virus (FMDV): the 2007 outbreak, and a subset of the large 2001 epidemic. In the first case, we are able to confirm the role of a specific premise as the link between the two phases of the epidemics, while transmissions more densely clustered in space and time remain harder to resolve. When we consider data collected from the 2001 epidemic during a time of national emergency, our inference scheme robustly infers transmission chains, and uncovers the presence of undetected premises, thus providing a useful tool for epidemiological studies in real time. The generation of genetic data is becoming routine in epidemiological investigations, but the development of analytical tools maximizing the value of these data remains a priority. Our method, while applied here in the context of FMDV, is general and with slight modification can be used in any situation where both spatiotemporal and genetic data are available.  相似文献   

3.
A text that has a systematic account of Bayesian analysis incomputational biology has been needed for a long time. Thisbook is a timely publication entirely devoted to cutting-edgeBayesian methods in genomics and proteomics research and manyof its contributors are leading authorities in the field. Itis thus an indispensable reference for researchers who are interestedin applying Bayesian techniques in their own biological research.Moreover, the book calls for more methodological and theoreticalresearch to  相似文献   

4.
5.
Bayesian Inference for a Random Tessellation Process   总被引:2,自引:0,他引:2  
P. G. Blackwell 《Biometrics》2001,57(2):502-507
This article describes an inhomogeneous Poisson point process in the plane with an intensity function based on a Dirichlet tessellation process and a method for using observations on the point process to make fully Bayesian inferences about the underlying tessellation. The method is implemented using a Markov chain Monte Carlo approach. An application to modeling the territories of clans of badgers, Meles meles, is described.  相似文献   

6.
Summary We propose a nonparametric Bayesian approach to estimate the natural direct and indirect effects through a mediator in the setting of a continuous mediator and a binary response. Several conditional independence assumptions are introduced (with corresponding sensitivity parameters) to make these effects identifiable from the observed data. We suggest strategies for eliciting sensitivity parameters and conduct simulations to assess violations to the assumptions. This approach is used to assess mediation in a recent weight management clinical trial.  相似文献   

7.
Haplotypes have gained increasing attention in the mapping of complex-disease genes, because of the abundance of single-nucleotide polymorphisms (SNPs) and the limited power of conventional single-locus analyses. It has been shown that haplotype-inference methods such as Clark's algorithm, the expectation-maximization algorithm, and a coalescence-based iterative-sampling algorithm are fairly effective and economical alternatives to molecular-haplotyping methods. To contend with some weaknesses of the existing algorithms, we propose a new Monte Carlo approach. In particular, we first partition the whole haplotype into smaller segments. Then, we use the Gibbs sampler both to construct the partial haplotypes of each segment and to assemble all the segments together. Our algorithm can accurately and rapidly infer haplotypes for a large number of linked SNPs. By using a wide variety of real and simulated data sets, we demonstrate the advantages of our Bayesian algorithm, and we show that it is robust to the violation of Hardy-Weinberg equilibrium, to the presence of missing data, and to occurrences of recombination hotspots.  相似文献   

8.
Bayesian inference is a powerful statistical paradigm that has gained popularity in many fields of science, but adoption has been somewhat slower in biophysics. Here, I provide an accessible tutorial on the use of Bayesian methods by focusing on example applications that will be familiar to biophysicists. I first discuss the goals of Bayesian inference and show simple examples of posterior inference using conjugate priors. I then describe Markov chain Monte Carlo sampling and, in particular, discuss Gibbs sampling and Metropolis random walk algorithms with reference to detailed examples. These Bayesian methods (with the aid of Markov chain Monte Carlo sampling) provide a generalizable way of rigorously addressing parameter inference and identifiability for arbitrarily complicated models.  相似文献   

9.
In this paper we develop a Bayesian approach to parameter estimation in a stochastic spatio-temporal model of the spread of invasive species across a landscape. To date, statistical techniques, such as logistic and autologistic regression, have outstripped stochastic spatio-temporal models in their ability to handle large numbers of covariates. Here we seek to address this problem by making use of a range of covariates describing the bio-geographical features of the landscape. Relative to regression techniques, stochastic spatio-temporal models are more transparent in their representation of biological processes. They also explicitly model temporal change, and therefore do not require the assumption that the species' distribution (or other spatial pattern) has already reached equilibrium as is often the case with standard statistical approaches. In order to illustrate the use of such techniques we apply them to the analysis of data detailing the spread of an invasive plant, Heracleum mantegazzianum, across Britain in the 20th Century using geo-referenced covariate information describing local temperature, elevation and habitat type. The use of Markov chain Monte Carlo sampling within a Bayesian framework facilitates statistical assessments of differences in the suitability of different habitat classes for H. mantegazzianum, and enables predictions of future spread to account for parametric uncertainty and system variability. Our results show that ignoring such covariate information may lead to biased estimates of key processes and implausible predictions of future distributions.  相似文献   

10.
Genetic variation is now routinely screened at the DNA sequence level in many studies. If the DNA region being screened has not experienced excessive amounts of recombination, it is often possible to reconstruct the evolutionary history of the genetic variation in the form of a haplotype tree. This tree estimates the evolutionary pathway that interconnects all the different haplotypes (sequence variants) observed in the sample. This haplotype tree can be used to define a series of nested branches (clades) that reflects the relative temporal history of the haplotypes and groups of haplotypes. Geographical information can then be overlaid upon this temporal series to test for significant associations between geography and temporal position in the haplotype tree. This allows a reconstruction of how the genetic variation arose and spread in both space and time. Such reconstructions can yield many insights into the joint roles of recurrent events such as gene flow and of historical events such as fragmentation or range expansion. These points are illustrated with studies on the chub, Leuciscus cephalus. There is also a need to extend such nested phylogeographic analyses to a phylo/reticulate geographic analysis that incorporates both assortment and recombination between and within DNA regions. A preliminary phylo/reticulate geographic analysis is presented of the transferrin locus in the brown trout, Salmo trutta, species complex that reveals the importance of hybridization in the recent evolutionary history of this group. This example shows the inadequacy of a strictly phylogenetic approach and illustrates the need to incorporate reticulate evolution. The results of nested clade phylogeographic analysis and the new phylo/reticulate geographic analysis are then used for inferring species status of the marbled trout. The results indicate that an old hybridization event may have played a role in the origin of the marbled trout. Currently the marbled trout is primarily endangered by hybridization with introduced brown trout. These results show both the positive and negative impacts of hybridization upon biodiversity. Such phylo/reticulate geographic studies will challenge both our concepts of species and our conservation management strategies.  相似文献   

11.
He  Li  Wang  Yu-Bo  Bridges  William C.  He  Zhulin  Che  S. Megan 《Statistics in biosciences》2023,15(1):114-140
Statistics in Biosciences - In observational studies, principal stratification is a well-established method in causal analysis to adjust the treatment effect estimation for post-treatment...  相似文献   

12.
Bayesian Inference in Semiparametric Mixed Models for Longitudinal Data   总被引:1,自引:0,他引:1  
Summary .  We consider Bayesian inference in semiparametric mixed models (SPMMs) for longitudinal data. SPMMs are a class of models that use a nonparametric function to model a time effect, a parametric function to model other covariate effects, and parametric or nonparametric random effects to account for the within-subject correlation. We model the nonparametric function using a Bayesian formulation of a cubic smoothing spline, and the random effect distribution using a normal distribution and alternatively a nonparametric Dirichlet process (DP) prior. When the random effect distribution is assumed to be normal, we propose a uniform shrinkage prior (USP) for the variance components and the smoothing parameter. When the random effect distribution is modeled nonparametrically, we use a DP prior with a normal base measure and propose a USP for the hyperparameters of the DP base measure. We argue that the commonly assumed DP prior implies a nonzero mean of the random effect distribution, even when a base measure with mean zero is specified. This implies weak identifiability for the fixed effects, and can therefore lead to biased estimators and poor inference for the regression coefficients and the spline estimator of the nonparametric function. We propose an adjustment using a postprocessing technique. We show that under mild conditions the posterior is proper under the proposed USP, a flat prior for the fixed effect parameters, and an improper prior for the residual variance. We illustrate the proposed approach using a longitudinal hormone dataset, and carry out extensive simulation studies to compare its finite sample performance with existing methods.  相似文献   

13.
Boolean networks are a simple but efficient model for describing gene regulatory systems. A number of algorithms have been proposed to infer Boolean networks. However, these methods do not take full consideration of the effects of noise and model uncertainty. In this paper, we propose a full Bayesian approach to infer Boolean genetic networks. Markov chain Monte Carlo algorithms are used to obtain the posterior samples of both the network structure and the related parameters. In addition to regular link addition and removal moves, which can guarantee the irreducibility of the Markov chain for traversing the whole network space, carefully constructed mixture proposals are used to improve the Markov chain Monte Carlo convergence. Both simulations and a real application on cell-cycle data show that our method is more powerful than existing methods for the inference of both the topology and logic relations of the Boolean network from observed data.  相似文献   

14.
Single-molecule-localization-based superresolution microscopy requires accurate sample drift correction to achieve good results. Common approaches for drift compensation include using fiducial markers and direct drift estimation by image correlation. The former increases the experimental complexity and the latter estimates drift at a reduced temporal resolution. Here, we present, to our knowledge, a new approach for drift correction based on the Bayesian statistical framework. The technique has the advantage of being able to calculate the drifts for every image frame of the data set directly from the single-molecule coordinates. We present the theoretical foundation of the algorithm and an implementation that achieves significantly higher accuracy than image-correlation-based estimations.  相似文献   

15.
Summary .  We present a Bayesian approach to modeling dynamic smoking addiction behavior processes when cure is not directly observed due to censoring. Subject-specific probabilities model the stochastic transitions among three behavioral states: smoking, transient quitting, and permanent quitting (absorbent state). A multivariate normal distribution for random effects is used to account for the potential correlation among the subject-specific transition probabilities. Inference is conducted using a Bayesian framework via Markov chain Monte Carlo simulation. This framework provides various measures of subject-specific predictions, which are useful for policy-making, intervention development, and evaluation. Simulations are used to validate our Bayesian methodology and assess its frequentist properties. Our methods are motivated by, and applied to, the Alpha-Tocopherol, Beta-Carotene Lung Cancer Prevention study, a large (29,133 individuals) longitudinal cohort study of smokers from Finland.  相似文献   

16.
Mol. Biol. Evol. 24:412-426. 2007 In Bayesian estimation of concordance among gene trees (Vol.24(2), 412–426)  相似文献   

17.
Suitability of trees as hosts for epiphytic lichens are studied in a forest stand of size 25 ha. Suitability is measured as occupation probabilites which are modelled using hierarchical Bayesian approach. These probabilities are useful for an ecologist. They give smoothed spatial distribution map of suitability for each of the species and can be used in detecting high‐ and low‐probability areas. In addition, suitability is explained by tree‐level covariates. Spatial dependence, which is due to unobserved spatially structured covariates, is modelled through an unobserved Markov random field. Markov chain Monte Carlo method has been applied in Bayesian computation. The extensive spatial data consist of the occurrences of eight lichen species and one bryophyte on all of the 1253 potential host trees. In addition, coordinates of the trees and several tree characteristics have been recorded. The data have been analysed for four most abundant species: Lobaria pulmonaria, Nephroma bellum, Nephroma parile and Peltigera praetextata. The tree level parameters, subject to estimation, consist of the occurrence probabilities for each tree and for each lichen species. Model validation is discussed in detail and, in addition to Bayesian validation tools, the autologistic model and case‐control design based on logistic regression have been suggested for validation of covariate effects. As a result we present suitability maps for the four lichen species. We observed, that among the observed tree covariates, the diameter at breast height (DBH) correlates with lichen occurrence. Our modelling approach has close connections to disease mapping in spatial epidemiology.  相似文献   

18.
19.
Inference of gene pathways using mixture Bayesian networks   总被引:1,自引:0,他引:1  

Background  

Inference of gene networks typically relies on measurements across a wide range of conditions or treatments. Although one network structure is predicted, the relationship between genes could vary across conditions. A comprehensive approach to infer general and condition-dependent gene networks was evaluated. This approach integrated Bayesian network and Gaussian mixture models to describe continuous microarray gene expression measurements, and three gene networks were predicted.  相似文献   

20.
The multispecies coalescent (MSC) model accommodates both species divergences and within-species coalescent and provides a natural framework for phylogenetic analysis of genomic data when the gene trees vary across the genome. The MSC model implemented in the program bpp assumes a molecular clock and the Jukes–Cantor model, and is suitable for analyzing genomic data from closely related species. Here we extend our implementation to more general substitution models and relaxed clocks to allow the rate to vary among species. The MSC-with-relaxed-clock model allows the estimation of species divergence times and ancestral population sizes using genomic sequences sampled from contemporary species when the strict clock assumption is violated, and provides a simulation framework for evaluating species tree estimation methods. We conducted simulations and analyzed two real datasets to evaluate the utility of the new models. We confirm that the clock-JC model is adequate for inference of shallow trees with closely related species, but it is important to account for clock violation for distant species. Our simulation suggests that there is valuable phylogenetic information in the gene-tree branch lengths even if the molecular clock assumption is seriously violated, and the relaxed-clock models implemented in bpp are able to extract such information. Our Markov chain Monte Carlo algorithms suffer from mixing problems when used for species tree estimation under the relaxed clock and we discuss possible improvements. We conclude that the new models are currently most effective for estimating population parameters such as species divergence times when the species tree is fixed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号