首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Thompson E  Basu S 《Human heredity》2003,56(1-3):119-125
Our objective is the development of robust methods for assessment of evidence for linkage of loci affecting a complex trait to a marker linkage group, using data on extended pedigrees. Using Markov chain Monte Carlo (MCMC) methods, it is possible to sample realizations from the distribution of gene identity by descent (IBD) patterns on a pedigree, conditional on observed data YM at multiple marker loci. Measures of gene IBDW which capture joint genome sharing in extended pedigrees often have unknown and highly skewed distributions, particularly when conditioned on marker data. MCMC provides a direct estimate of the distribution of such measures. Let W be the IBD measure from data YM, and W* the IBD measure from pseudo-data Y*M simulated with the same data availability and genetic marker model as the true data YM, but in the absence of linkage. Then measures of the difference in distributions of W and W* provide evidence for linkage. This approach extracts more information from the data YM than either comparison to the pedigree prior distribution of W or use of statistics that are expectations of W given the data YM. A small example is presented.  相似文献   

2.
Linear mixed effects models have been widely used in analysis of data where responses are clustered around some random effects, so it is not reasonable to assume independence between observations in the same cluster. In most biological applications, it is assumed that the distributions of the random effects and of the residuals are Gaussian. This makes inferences vulnerable to the presence of outliers. Here, linear mixed effects models with normal/independent residual distributions for robust inferences are described. Specific distributions examined include univariate and multivariate versions of the Student‐ t, the slash and the contaminated normal. A Bayesian framework is adopted and Markov chain Monte Carlo is used to carry out the posterior analysis. The procedures are illustrated using birth weight data on rats in a toxicological experiment. Results from the Gaussian and robust models are contrasted, and it is shown how the implementation can be used for outlier detection. The thick‐tailed distributions provide an appealing robust alternative to the Gaussian process in linear mixed models, and they are easily implemented using data augmentation and MCMC techniques.  相似文献   

3.
The need for tests dealing with different features of small area health data is less important with the increase in computation speed of computers and the access to MCMC methods. However there are many situations where exploratory testing could be useful and where MCMC methods are not readily usable or available. In this paper, a number of simple tests are derived for the logistic model for case events. This model assumes that a control disease is available and that the events have a binary label relating to case or control state. The tests are derived from likelihood considerations and Monte Carlo critical regions are examined. A simulated evaluation of the tests is presented in terms of Monte Carlo power. A data example is considered.  相似文献   

4.
Nearly all current Bayesian phylogenetic applications rely on Markov chain Monte Carlo (MCMC) methods to approximate the posterior distribution for trees and other parameters of the model. These approximations are only reliable if Markov chains adequately converge and sample from the joint posterior distribution. Although several studies of phylogenetic MCMC convergence exist, these have focused on simulated data sets or select empirical examples. Therefore, much that is considered common knowledge about MCMC in empirical systems derives from a relatively small family of analyses under ideal conditions. To address this, we present an overview of commonly applied phylogenetic MCMC diagnostics and an assessment of patterns of these diagnostics across more than 18,000 empirical analyses. Many analyses appeared to perform well and failures in convergence were most likely to be detected using the average standard deviation of split frequencies, a diagnostic that compares topologies among independent chains. Different diagnostics yielded different information about failed convergence, demonstrating that multiple diagnostics must be employed to reliably detect problems. The number of taxa and average branch lengths in analyses have clear impacts on MCMC performance, with more taxa and shorter branches leading to more difficult convergence. We show that the usage of models that include both Γ-distributed among-site rate variation and a proportion of invariable sites is not broadly problematic for MCMC convergence but is also unnecessary. Changes to heating and the usage of model-averaged substitution models can both offer improved convergence in some cases, but neither are a panacea.  相似文献   

5.
MOTIVATION: Bayesian estimation of phylogeny is based on the posterior probability distribution of trees. Currently, the only numerical method that can effectively approximate posterior probabilities of trees is Markov chain Monte Carlo (MCMC). Standard implementations of MCMC can be prone to entrapment in local optima. Metropolis coupled MCMC [(MC)(3)], a variant of MCMC, allows multiple peaks in the landscape of trees to be more readily explored, but at the cost of increased execution time. RESULTS: This paper presents a parallel algorithm for (MC)(3). The proposed parallel algorithm retains the ability to explore multiple peaks in the posterior distribution of trees while maintaining a fast execution time. The algorithm has been implemented using two popular parallel programming models: message passing and shared memory. Performance results indicate nearly linear speed improvement in both programming models for small and large data sets.  相似文献   

6.
Ayres KL  Balding DJ 《Genetics》2001,157(1):413-423
We describe a Bayesian approach to analyzing multilocus genotype or haplotype data to assess departures from gametic (linkage) equilibrium. Our approach employs a Markov chain Monte Carlo (MCMC) algorithm to approximate the posterior probability distributions of disequilibrium parameters. The distributions are computed exactly in some simple settings. Among other advantages, posterior distributions can be presented visually, which allows the uncertainties in parameter estimates to be readily assessed. In addition, background knowledge can be incorporated, where available, to improve the precision of inferences. The method is illustrated by application to previously published datasets; implications for multilocus forensic match probabilities and for simple association-based gene mapping are also discussed.  相似文献   

7.
We present a Bayesian approach to the problem of inferring the history of inversions separating homologous chromosomes from two different species. The method is based on Markov Chain Monte Carlo (MCMC) and takes full advantage of all the information from marker order. We apply the method both to simulated data and to two real data sets. For the simulated data, we show that the MCMC method provides accurate estimates of the true posterior distributions and in the analysis of the real data we show that the most likely number of inversions in some cases is considerably larger than estimates obtained based on the parsimony inferred number of inversions. Indeed, in the case of the Drosophila repleta-D. melanogaster comparison, the lower boundary of a 95% highest posterior density credible interval for the number of inversions is considerably larger than the most parsimonious number of inversions.  相似文献   

8.
Schoen DJ  Clegg MT 《Genetics》1986,112(4):927-945
Estimation of mating system parameters in plant populations typically employs family-structured samples of progeny genotypes. These estimation models postulate a mixture of self-fertilization and random outcrossing. One assumption of such models concerns the distribution of pollen genotypes among eggs within single maternal families. Previous applications of the mixed mating model to mating system estimation have assumed that pollen genotypes are sampled randomly from the total population in forming outcrossed progeny within families. In contrast, the one-pollen parent model assumes that outcrossed progeny within a family share a single-pollen parent genotype. Monte Carlo simulations of family-structured sampling were carried out to examine the consequences of violations of the different assumptions of the two models regarding the distribution of pollen genotypes among eggs. When these assumptions are violated, estimates of mating system parameters may be significantly different from their true values and may exhibit distributions which depart from normality. Monte Carlo methods were also used to examine the utility of the bootstrap resampling algorithm for estimating the variances of mating system parameters. The bootstrap method gives variance estimates that approximate empirically determined values. When applied to data from two plant populations which differ in pollen genotype distributions within families, the two estimation procedures exhibit the same behavior as that seen with the simulated data.  相似文献   

9.
Accurate and rapid methods for the detection of quantitative trait loci (QTLs) and evaluation of consequent allelic effects are required to implement marker-assisted selection in outbred populations. In this study, we present a simple deterministic method for estimating identity-by-descent (IBD) coefficients in full- and half-sib families that can be used for the detection of QTLs via a variance-component approach. In a simulated dataset, IBD coefficients among sibs estimated by the simple deterministic and Markov chain Monte Carlo (MCMC) methods with three or four alleles at each marker locus exhibited a correlation of greater than 0.99. This high correlation was also found in QTL analyses of data from an outbred pig population. Variance component analysis used both the simple deterministic and MCMC methods to estimate IBD coefficients. Both procedures detected a QTL at the same position and gave similar test statistics and heritabilities. The MCMC method, however, required much longer computation than the simple method. The conversion of estimated QTL genotypic effects into allelic effects for use in marker-assisted selection is also demonstrated.  相似文献   

10.
A Bayesian approach is presented for mapping a quantitative trait locus (QTL) using the 'Fernando and Grossman' multivariate Normal approximation to QTL inheritance. For this model, a Bayesian implementation that includes QTL position is problematic because standard Markov chain Monte Carlo (MCMC) algorithms do not mix, i.e. the QTL position gets stuck in one marker interval. This is because of the dependence of the covariance structure for the QTL effects on the adjacent markers and may be typical of the 'Fernando and Grossman' model. A relatively new MCMC technique, simulated tempering, allows mixing and so makes possible inferences about QTL position based on marginal posterior probabilities. The model was implemented for estimating variance ratios and QTL position using a continuous grid of allowed positions and was applied to simulated data of a standard granddaughter design. The results showed a smooth mixing of QTL position after implementation of the simulated tempering sampler. In this implementation, map distance between QTL and its flanking markers was artificially stretched to reduce the dependence of markers and covariance. The method generalizes easily to more complicated applications and can ultimately contribute to QTL mapping in complex, heterogeneous, human, animal or plant populations.  相似文献   

11.
Albers CA  Heskes T  Kappen HJ 《Genetics》2007,177(2):1101-1116
We present CVMHAPLO, a probabilistic method for haplotyping in general pedigrees with many markers. CVMHAPLO reconstructs the haplotypes by assigning in every iteration a fixed number of the ordered genotypes with the highest marginal probability, conditioned on the marker data and ordered genotypes assigned in previous iterations. CVMHAPLO makes use of the cluster variation method (CVM) to efficiently estimate the marginal probabilities. We focused on single-nucleotide polymorphism (SNP) markers in the evaluation of our approach. In simulated data sets where exact computation was feasible, we found that the accuracy of CVMHAPLO was high and similar to that of maximum-likelihood methods. In simulated data sets where exact computation of the maximum-likelihood haplotype configuration was not feasible, the accuracy of CVMHAPLO was similar to that of state of the art Markov chain Monte Carlo (MCMC) maximum-likelihood approximations when all ordered genotypes were assigned and higher when only a subset of the ordered genotypes was assigned. CVMHAPLO was faster than the MCMC approach and provided more detailed information about the uncertainty in the inferred haplotypes. We conclude that CVMHAPLO is a practical tool for the inference of haplotypes in large complex pedigrees.  相似文献   

12.
Markov chain-Monte Carlo (MCMC) techniques for multipoint mapping of quantitative trait loci have been developed on nuclear-family and extended-pedigree data. These methods are based on repeated sampling-peeling and gene dropping of genotype vectors and random sampling of each of the model parameters from their full conditional distributions, given phenotypes, markers, and other model parameters. We further refine such approaches by improving the efficiency of the marker haplotype-updating algorithm and by adopting a new proposal for adding loci. Incorporating these refinements, we have performed an extensive simulation study on simulated nuclear-family data, varying the number of trait loci, family size, displacement, and other segregation parameters. Our simulation studies show that our MCMC algorithm identifies the locations of the true trait loci and estimates their segregation parameters well-provided that the total number of sibship pairs in the pedigree data is reasonably large, heritability of each individual trait locus is not too low, and the loci are not too close together. Our MCMC algorithm was shown to be significantly more efficient than LOKI (Heath 1997) in our simulation study using nuclear-family data.  相似文献   

13.
Increasingly, large data sets pose a challenge for computationally intensive phylogenetic methods such as Bayesian Markov chain Monte Carlo (MCMC). Here, we investigate the performance of common MCMC proposal distributions in terms of median and variance of run time to convergence on 11 data sets. We introduce two new Metropolized Gibbs Samplers for moving through "tree space." MCMC simulation using these new proposals shows faster average run time and dramatically improved predictability in performance, with a 20-fold reduction in the variance of the time to estimate the posterior distribution to a given accuracy. We also introduce conditional clade probabilities and demonstrate that they provide a superior means of approximating tree topology posterior probabilities from samples recorded during MCMC.  相似文献   

14.
Aiming at the reliability evaluation method of the complex network, and network reliability is an important index in measuring the reliability of large-sized network. The Monte Carlo method is studied, and the general principle of MC simulation and the reliability evaluation approach based on MC are introduced. Sampling is very important in the Monte Carlo simulation, and random variable is studied, and several kinds of discrete distributions are introduced. A novel reliability evaluation method based on Monte Carlo method is proposed. To evaluate network reliability efficiently, the proposed method generates time-pointer of the arc failure events and constructs the event-table of the complex network, and updates the network states, and sampling is selected by geometric distribution. Precision and unbiased of the reliability evaluating are discussed. Furthermore, a series of numerical experiments are implemented to compare the efficiency of the CMC and the other traditional methods under the same experimental condition.  相似文献   

15.
Bayesian lasso for semiparametric structural equation models   总被引:1,自引:0,他引:1  
Guo R  Zhu H  Chow SM  Ibrahim JG 《Biometrics》2012,68(2):567-577
There has been great interest in developing nonlinear structural equation models and associated statistical inference procedures, including estimation and model selection methods. In this paper a general semiparametric structural equation model (SSEM) is developed in which the structural equation is composed of nonparametric functions of exogenous latent variables and fixed covariates on a set of latent endogenous variables. A basis representation is used to approximate these nonparametric functions in the structural equation and the Bayesian Lasso method coupled with a Markov Chain Monte Carlo (MCMC) algorithm is used for simultaneous estimation and model selection. The proposed method is illustrated using a simulation study and data from the Affective Dynamics and Individual Differences (ADID) study. Results demonstrate that our method can accurately estimate the unknown parameters and correctly identify the true underlying model.  相似文献   

16.
Improved efficiency of Markov chain Monte Carlo facilitates all aspects of statistical analysis with Bayesian hierarchical models. Identifying strategies to improve MCMC performance is becoming increasingly crucial as the complexity of models, and the run times to fit them, increases. We evaluate different strategies for improving MCMC efficiency using the open‐source software NIMBLE (R package nimble) using common ecological models of species occurrence and abundance as examples. We ask how MCMC efficiency depends on model formulation, model size, data, and sampling strategy. For multiseason and/or multispecies occupancy models and for N‐mixture models, we compare the efficiency of sampling discrete latent states vs. integrating over them, including more vs. fewer hierarchical model components, and univariate vs. block‐sampling methods. We include the common MCMC tool JAGS in comparisons. For simple models, there is little practical difference between computational approaches. As model complexity increases, there are strong interactions between model formulation and sampling strategy on MCMC efficiency. There is no one‐size‐fits‐all best strategy, but rather problem‐specific best strategies related to model structure and type. In all but the simplest cases, NIMBLE's default or customized performance achieves much higher efficiency than JAGS. In the two most complex examples, NIMBLE was 10–12 times more efficient than JAGS. We find NIMBLE is a valuable tool for many ecologists utilizing Bayesian inference, particularly for complex models where JAGS is prohibitively slow. Our results highlight the need for more guidelines and customizable approaches to fit hierarchical models to ensure practitioners can make the most of occupancy and other hierarchical models. By implementing model‐generic MCMC procedures in open‐source software, including the NIMBLE extensions for integrating over latent states (implemented in the R package nimbleEcology), we have made progress toward this aim.  相似文献   

17.
Tandem repeats occur frequently in biological sequences. They are important for studying genome evolution and human disease. A number of methods have been designed to detect a single tandem repeat in a sliding window. In this article, we focus on the case that an unknown number of tandem repeat segments of the same pattern are dispersively distributed in a sequence. We construct a probabilistic generative model for the tandem repeats, where the sequence pattern is represented by a motif matrix. A Bayesian approach is adopted to compute this model. Markov chain Monte Carlo (MCMC) algorithms are used to explore the posterior distribution as an effort to infer both the motif matrix of tandem repeats and the location of repeat segments. Reversible jump Markov chain Monte Carlo (RJMCMC) algorithms are used to address the transdimensional model selection problem raised by the variable number of repeat segments. Experiments on both synthetic data and real data show that this new approach is powerful in detecting dispersed short tandem repeats. As far as we know, it is the first work to adopt RJMCMC algorithms in the detection of tandem repeats.  相似文献   

18.
Bayesian LASSO for quantitative trait loci mapping   总被引:7,自引:1,他引:6       下载免费PDF全文
Yi N  Xu S 《Genetics》2008,179(2):1045-1055
The mapping of quantitative trait loci (QTL) is to identify molecular markers or genomic loci that influence the variation of complex traits. The problem is complicated by the facts that QTL data usually contain a large number of markers across the entire genome and most of them have little or no effect on the phenotype. In this article, we propose several Bayesian hierarchical models for mapping multiple QTL that simultaneously fit and estimate all possible genetic effects associated with all markers. The proposed models use prior distributions for the genetic effects that are scale mixtures of normal distributions with mean zero and variances distributed to give each effect a high probability of being near zero. We consider two types of priors for the variances, exponential and scaled inverse-chi(2) distributions, which result in a Bayesian version of the popular least absolute shrinkage and selection operator (LASSO) model and the well-known Student's t model, respectively. Unlike most applications where fixed values are preset for hyperparameters in the priors, we treat all hyperparameters as unknowns and estimate them along with other parameters. Markov chain Monte Carlo (MCMC) algorithms are developed to simulate the parameters from the posteriors. The methods are illustrated using well-known barley data.  相似文献   

19.
MOTIVATION: In this study, we address the problem of estimating the parameters of regulatory networks and provide the first application of Markov chain Monte Carlo (MCMC) methods to experimental data. As a case study, we consider a stochastic model of the Hes1 system expressed in terms of stochastic differential equations (SDEs) to which rigorous likelihood methods of inference can be applied. When fitting continuous-time stochastic models to discretely observed time series the lengths of the sampling intervals are important, and much of our study addresses the problem when the data are sparse. RESULTS: We estimate the parameters of an autoregulatory network providing results both for simulated and real experimental data from the Hes1 system. We develop an estimation algorithm using MCMC techniques which are flexible enough to allow for the imputation of latent data on a finer time scale and the presence of prior information about parameters which may be informed from other experiments as well as additional measurement error.  相似文献   

20.
I J Wilson  D J Balding 《Genetics》1998,150(1):499-510
Ease and accuracy of typing, together with high levels of polymorphism and widespread distribution in the genome, make microsatellite (or short tandem repeat) loci an attractive potential source of information about both population histories and evolutionary processes. However, microsatellite data are difficult to interpret, in particular because of the frequency of back-mutations. Stochastic models for the underlying genetic processes can be specified, but in the past they have been too complicated for direct analysis. Recent developments in stochastic simulation methodology now allow direct inference about both historical events, such as genealogical coalescence times, and evolutionary parameters, such as mutation rates. A feature of the Markov chain Monte Carlo (MCMC) algorithm that we propose here is that the likelihood computations are simplified by treating the (unknown) ancestral allelic states as auxiliary parameters. We illustrate the algorithm by analyzing microsatellite samples simulated under the model. Our results suggest that a single microsatellite usually does not provide enough information for useful inferences, but that several completely linked microsatellites can be informative about some aspects of genealogical history and evolutionary processes. We also reanalyze data from a previously published human Y chromosome microsatellite study, finding evidence for an effective population size for human Y chromosomes in the low thousands and a recent time since their most recent common ancestor: the 95% interval runs from approximately 15, 000 to 130,000 years, with most likely values around 30,000 years.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号