首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Models of amino acid substitution present challenges beyond those often faced with the analysis of DNA sequences. The alignments of amino acid sequences are often small, whereas the number of parameters to be estimated is potentially large when compared with the number of free parameters for nucleotide substitution models. Most approaches to the analysis of amino acid alignments have focused on the use of fixed amino acid models in which all of the potentially free parameters are fixed to values estimated from a large number of sequences. Often, these fixed amino acid models are specific to a gene or taxonomic group (e.g. the Mtmam model, which has parameters that are specific to mammalian mitochondrial gene sequences). Although the fixed amino acid models succeed in reducing the number of free parameters to be estimated--indeed, they reduce the number of free parameters from approximately 200 to 0--it is possible that none of the currently available fixed amino acid models is appropriate for a specific alignment. Here, we present four approaches to the analysis of amino acid sequences. First, we explore the use of a general time reversible model of amino acid substitution using a Dirichlet prior probability distribution on the 190 exchangeability parameters. Second, we then explore the behaviour of prior probability distributions that are'centred' on the rates specified by the fixed amino acid model. Third, we consider a mixture of fixed amino acid models. Finally, we consider constraints on the exchangeability parameters as partitions,similar to how nucleotide substitution models are specified, and place a Dirichlet process prior model on all the possible partitioning schemes.  相似文献   

2.
In recent years, codon substitution models based on the mutation–selection principle have been extended for the purpose of detecting signatures of adaptive evolution in protein-coding genes. However, the approaches used to date have either focused on detecting global signals of adaptive regimes—across the entire gene—or on contexts where experimentally derived, site-specific amino acid fitness profiles are available. Here, we present a Bayesian site-heterogeneous mutation–selection framework for site-specific detection of adaptive substitution regimes given a protein-coding DNA alignment. We offer implementations, briefly present simulation results, and apply the approach on a few real data sets. Our analyses suggest that the new approach shows greater sensitivity than traditional methods. However, more study is required to assess the impact of potential model violations on the method, and gain a greater empirical sense its behavior on a broader range of real data sets. We propose an outline of such a research program.  相似文献   

3.
4.
In this paper we develop a Bayesian approach to parameter estimation in a stochastic spatio-temporal model of the spread of invasive species across a landscape. To date, statistical techniques, such as logistic and autologistic regression, have outstripped stochastic spatio-temporal models in their ability to handle large numbers of covariates. Here we seek to address this problem by making use of a range of covariates describing the bio-geographical features of the landscape. Relative to regression techniques, stochastic spatio-temporal models are more transparent in their representation of biological processes. They also explicitly model temporal change, and therefore do not require the assumption that the species' distribution (or other spatial pattern) has already reached equilibrium as is often the case with standard statistical approaches. In order to illustrate the use of such techniques we apply them to the analysis of data detailing the spread of an invasive plant, Heracleum mantegazzianum, across Britain in the 20th Century using geo-referenced covariate information describing local temperature, elevation and habitat type. The use of Markov chain Monte Carlo sampling within a Bayesian framework facilitates statistical assessments of differences in the suitability of different habitat classes for H. mantegazzianum, and enables predictions of future spread to account for parametric uncertainty and system variability. Our results show that ignoring such covariate information may lead to biased estimates of key processes and implausible predictions of future distributions.  相似文献   

5.
Adaptive sampling for Bayesian variable selection   总被引:1,自引:0,他引:1  
Nott  David J.; Kohn  Robert 《Biometrika》2005,92(4):747-763
  相似文献   

6.
Fang M  Liu J  Sun D  Zhang Y  Zhang Q  Zhang Y  Zhang S 《Heredity》2011,107(3):265-276
In this article, we propose a model selection method, the Bayesian composite model space approach, to map quantitative trait loci (QTL) in a half-sib population for continuous and binary traits. In our method, the identity-by-descent-based variance component model is used. To demonstrate the performance of this model, the method was applied to map QTL underlying production traits on BTA6 in a Chinese half-sib dairy cattle population. A total of four QTLs were detected, whereas only one QTL was identified using the traditional least square (LS) method. We also conducted two simulation experiments to validate the efficiency of our method. The results suggest that the proposed method based on a multiple-QTL model is efficient in mapping multiple QTL for an outbred half-sib population and is more powerful than the LS method based on a single-QTL model.  相似文献   

7.
A Bayesian approach to analysing data from family-based association studies is developed. This permits direct assessment of the range of possible values of model parameters, such as the recombination frequency and allelic associations, in the light of the data. In addition, sophisticated comparisons of different models may be handled easily, even when such models are not nested. The methodology is developed in such a way as to allow separate inferences to be made about linkage and association by including theta, the recombination fraction between the marker and disease susceptibility locus under study, explicitly in the model. The method is illustrated by application to a previously published data set. The data analysis raises some interesting issues, notably with regard to the weight of evidence necessary to convince us of linkage between a candidate locus and disease.  相似文献   

8.
In this article, we develop a latent class model with class probabilities that depend on subject-specific covariates. One of our major goals is to identify important predictors of latent classes. We consider methodology that allows estimation of latent classes while allowing for variable selection uncertainty. We propose a Bayesian variable selection approach and implement a stochastic search Gibbs sampler for posterior computation to obtain model-averaged estimates of quantities of interest such as marginal inclusion probabilities of predictors. Our methods are illustrated through simulation studies and application to data on weight gain during pregnancy, where it is of interest to identify important predictors of latent weight gain classes.  相似文献   

9.
Ando  Tomohiro 《Biometrika》2007,94(2):443-458
The problem of evaluating the goodness of the predictive distributionsof hierarchical Bayesian and empirical Bayes models is investigated.A Bayesian predictive information criterion is proposed as anestimator of the posterior mean of the expected loglikelihoodof the predictive distribution when the specified family ofprobability distributions does not contain the true distribution.The proposed criterion is developed by correcting the asymptoticbias of the posterior mean of the loglikelihood as an estimatorof its expected loglikelihood. In the evaluation of hierarchicalBayesian models with random effects, regardless of our parametricfocus, the proposed criterion considers the bias correctionof the posterior mean of the marginal loglikelihood becauseit requires a consistent parameter estimator. The use of thebootstrap in model evaluation is also discussed.  相似文献   

10.
In protein-coding DNA sequences, historical patterns of selection can be inferred from amino acid substitution patterns. High relative rates of nonsynonymous to synonymous changes (=d N /d S ) are a clear indicator of positive, or directional, selection, and several recently developed methods attempt to distinguish these sites from those under neutral or purifying selection. One method uses an empirical Bayesian framework that accounts for varying selective pressures across sites while conditioning on the parameters of the model of DNA evolution and on the phylogenetic history. We describe a method that identifies sites under diversifying selection using a fully Bayesian framework. Similar to earlier work, the method presented here allows the rate of nonsynonymous to synonymous changes to vary among sites. The significant difference in using a fully Bayesian approach lies in our ability to account for uncertainty in parameters including the tree topology, branch lengths, and the codon model of DNA substitution. We demonstrate the utility of the fully Bayesian approach by applying our method to a data set of the vertebrate -globin gene. Compared to a previous analysis of this data set, the hierarchical model found most of the same sites to be in the positive selection class, but with a few striking exceptions.  相似文献   

11.
12.
Swartz MD  Kimmel M  Mueller P  Amos CI 《Biometrics》2006,62(2):495-503
Mapping the genes for a complex disease, such as diabetes or rheumatoid arthritis (RA), involves finding multiple genetic loci that may contribute to the onset of the disease. Pairwise testing of the loci leads to the problem of multiple testing. Looking at haplotypes, or linear sets of loci, avoids multiple tests but results in a contingency table with sparse counts, especially when using marker loci with multiple alleles. We propose a hierarchical Bayesian model for case-parent triad data that uses a conditional logistic regression likelihood to model the probability of transmission to a diseased child. We define hierarchical prior distributions on the allele main effects to model the genetic dependencies present in the human leukocyte antigen (HLA) region of chromosome 6. First, we add a hierarchical level for model selection that accounts for both locus and allele selection. This allows us to cast the problem of identifying genetic loci relevant to the disease into a problem of Bayesian variable selection. Second, we attempt to include linkage disequilibrium as a covariance structure in the prior for model coefficients. We evaluate the performance of the procedure with some simulated examples and then apply our procedure to identifying genetic markers in the HLA region that influence risk for RA. Our software is available on the website http://www.epigenetic.org/Linkage/ssgs-public/.  相似文献   

13.
A Bayesian framework for the analysis of cospeciation   总被引:8,自引:0,他引:8  
Abstract.— Information on the history of cospeciation and host switching for a group of host and parasite species is contained in the DNA sequences sampled from each. Here, we develop a Bayesian framework for the analysis of cospeciation. We suggest a simple model of host switching by a parasite on a host phylogeny in which host switching events are assumed to occur at a constant rate over the entire evolutionary history of associated hosts and parasites. The posterior probability density of the parameters of the model of host switching are evaluated numerically using Markov chain Monte Carlo. In particular, the method generates the probability density of the number of host switches and of the host switching rate. Moreover, the method provides information on the probability that an event of host switching is associated with a particular pair of branches. A Bayesian approach has several advantages over other methods for the analysis of cospeciation. In particular, it does not assume that the host or parasite phylogenies are known without error; many alternative phylogenies are sampled in proportion to their probability of being correct.  相似文献   

14.
Ying Wang  Bruce Rannala 《Genetics》2014,198(4):1621-1628
Recombination generates variation and facilitates evolution. Recombination (or lack thereof) also contributes to human genetic disease. Methods for mapping genes influencing complex genetic diseases via association rely on linkage disequilibrium (LD) in human populations, which is influenced by rates of recombination across the genome. Comparative population genomic analyses of recombination using related primate species can identify factors influencing rates of recombination in humans. Such studies can indicate how variable hotspots for recombination may be both among individuals (or populations) and over evolutionary timescales. Previous studies have suggested that locations of recombination hotspots are not conserved between humans and chimpanzees. We made use of the data sets from recent resequencing projects and applied a Bayesian method for identifying hotspots and estimating recombination rates. We also reanalyzed SNP data sets for regions with known hotspots in humans using samples from the human and chimpanzee. The Bayes factors (BF) of shared recombination hotspots between human and chimpanzee across regions were obtained. Based on the analysis of the aligned regions of human chromosome 21, locations where the two species show evidence of shared recombination hotspots (with high BFs) were identified. Interestingly, previous comparative studies of human and chimpanzee that focused on the known human recombination hotspots within the β-globin and HLA regions did not find overlapping of hotspots. Our results show high BFs of shared hotspots at locations within both regions, and the estimated locations of shared hotspots overlap with the locations of human recombination hotspots obtained from sperm-typing studies.  相似文献   

15.
Model-based estimation of the human health risks resulting from exposure to environmental contaminants can be an important tool for structuring public health policy. Due to uncertainties in the modeling process, the outcomes of these assessments are usually probabilistic representations of a range of possible risks. In some cases, health surveillance data are available for the assessment population over all or a subset of the risk projection period and this additional information can be used to augment the model-based estimates. We use a Bayesian approach to update model-based estimates of health risks based on available health outcome data. Updated uncertainty distributions for risk estimates are derived using Monte Carlo sampling, which allows flexibility to model realistic situations including measurement error in the observable outcomes. We illustrate the approach by using imperfect public health surveillance data on lung cancer deaths to update model-based lung cancer mortality risk estimates in a population exposed to ionizing radiation from a uranium processing facility.  相似文献   

16.
Zhao JX  Foulkes AS  George EI 《Biometrics》2005,61(2):591-599
Characterizing the process by which molecular and cellular level changes occur over time will have broad implications for clinical decision making and help further our knowledge of disease etiology across many complex diseases. However, this presents an analytic challenge due to the large number of potentially relevant biomarkers and the complex, uncharacterized relationships among them. We propose an exploratory Bayesian model selection procedure that searches for model simplicity through independence testing of multiple discrete biomarkers measured over time. Bayes factor calculations are used to identify and compare models that are best supported by the data. For large model spaces, i.e., a large number of multi-leveled biomarkers, we propose a Markov chain Monte Carlo (MCMC) stochastic search algorithm for finding promising models. We apply our procedure to explore the extent to which HIV-1 genetic changes occur independently over time.  相似文献   

17.
King R  Brooks SP 《Biometrics》2008,64(3):816-824
Summary .   We consider the estimation of the size of a closed population, often of interest for wild animal populations, using a capture–recapture study. The estimate of the total population size can be very sensitive to the choice of model used to fit to the data. We consider a Bayesian approach, in which we consider all eight plausible models initially described by Otis et al. (1978, Wildlife Monographs 62, 1–135) within a single framework, including models containing an individual heterogeneity component. We show how we are able to obtain a model-averaged estimate of the total population, incorporating both parameter and model uncertainty. To illustrate the methodology we initially perform a simulation study and analyze two datasets where the population size is known, before considering a real example relating to a population of dolphins off northeast Scotland.  相似文献   

18.
Summary We estimate the parameters of a stochastic process model for a macroparasite population within a host using approximate Bayesian computation (ABC). The immunity of the host is an unobserved model variable and only mature macroparasites at sacrifice of the host are counted. With very limited data, process rates are inferred reasonably precisely. Modeling involves a three variable Markov process for which the observed data likelihood is computationally intractable. ABC methods are particularly useful when the likelihood is analytically or computationally intractable. The ABC algorithm we present is based on sequential Monte Carlo, is adaptive in nature, and overcomes some drawbacks of previous approaches to ABC. The algorithm is validated on a test example involving simulated data from an autologistic model before being used to infer parameters of the Markov process model for experimental data. The fitted model explains the observed extra‐binomial variation in terms of a zero‐one immunity variable, which has a short‐lived presence in the host.  相似文献   

19.
Bayesian curve-fitting with free-knot splines   总被引:6,自引:0,他引:6  
  相似文献   

20.
Wu XL  Gianola D  Weigel K 《Genetica》2009,135(3):367-377
Methodology for joint mapping of quantitative trait loci (QTL) affecting continuous and binary characters in experimental crosses is presented. The procedure consists of a Bayesian Gaussian-threshold model implemented via Markov chain Monte Carlo, which bypasses bottlenecks due to high-dimensional integrals required in maximum likelihood approaches. The method handles multiple binary traits and multiple QTL. Modeling of ordered categorical traits is discussed as well. Features of the method are illustrated using simulated datasets representing a backcross design, and the data are analyzed using mixed-trait and single-trait models. The mixed-trait analysis provides greater detection power of a QTL than a single-trait analysis when the QTL affects two or more traits. The number of QTL inferred in the mixed-trait analysis does not pertain to a specific trait, but the roles of each QTL on specific traits can be assessed from estimates of its effects. The impacts of varying incidence level and sample size on the mixed-trait QTL mapping analysis are investigated as well.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号