共查询到20条相似文献,搜索用时 15 毫秒
1.
A flexible cure rate model for spatially correlated survival data based on generalized extreme value distribution and Gaussian process priors 下载免费PDF全文
Our present work proposes a new survival model in a Bayesian context to analyze right‐censored survival data for populations with a surviving fraction, assuming that the log failure time follows a generalized extreme value distribution. Many applications require a more flexible modeling of covariate information than a simple linear or parametric form for all covariate effects. It is also necessary to include the spatial variation in the model, since it is sometimes unexplained by the covariates considered in the analysis. Therefore, the nonlinear covariate effects and the spatial effects are incorporated into the systematic component of our model. Gaussian processes (GPs) provide a natural framework for modeling potentially nonlinear relationship and have recently become extremely powerful in nonlinear regression. Our proposed model adopts a semiparametric Bayesian approach by imposing a GP prior on the nonlinear structure of continuous covariate. With the consideration of data availability and computational complexity, the conditionally autoregressive distribution is placed on the region‐specific frailties to handle spatial correlation. The flexibility and gains of our proposed model are illustrated through analyses of simulated data examples as well as a dataset involving a colon cancer clinical trial from the state of Iowa. 相似文献
2.
Causal models including genetic factors are important for understanding the presentation mechanisms of complex diseases. Familial aggregation and segregation analyses based on polygenic threshold models have been the primary approach to fitting genetic models to the family data of complex diseases. In the current study, an advanced approach to obtaining appropriate causal models for complex diseases based on the sufficient component cause (SCC) model involving combinations of traditional genetics principles was proposed. The probabilities for the entire population, i.e., normal–normal, normal–disease, and disease–disease, were considered for each model for the appropriate handling of common complex diseases. The causal model in the current study included the genetic effects from single genes involving epistasis, complementary gene interactions, gene–environment interactions, and environmental effects. Bayesian inference using a Markov chain Monte Carlo algorithm (MCMC) was used to assess of the proportions of each component for a given population lifetime incidence. This approach is flexible, allowing both common and rare variants within a gene and across multiple genes. An application to schizophrenia data confirmed the complexity of the causal factors. An analysis of diabetes data demonstrated that environmental factors and gene–environment interactions are the main causal factors for type II diabetes. The proposed method is effective and useful for identifying causal models, which can accelerate the development of efficient strategies for identifying causal factors of complex diseases. 相似文献
3.
Estimation of extreme quantal-response statistics, such as the concentration required to kill 99.9% of test subjects (LC99.9), remains a challenge in the presence of multiple covariates and complex study designs. Accurate and precise estimates of the LC99.9 for mixtures of toxicants are critical to ongoing control of a parasitic invasive species, the sea lamprey, in the Laurentian Great Lakes of North America. The toxicity of those chemicals is affected by local and temporal variations in water chemistry, which must be incorporated into the modeling. We develop multilevel empirical Bayes models for data from multiple laboratory studies. Our approach yields more accurate and precise estimation of the LC99.9 compared to alternative models considered. This study demonstrates that properly incorporating hierarchical structure in laboratory data yields better estimates of LC99.9 stream treatment values that are critical to larvae control in the field. In addition, out-of-sample prediction of the results of in situ tests reveals the presence of a latent seasonal effect not manifest in the laboratory studies, suggesting avenues for future study and illustrating the importance of dual consideration of both experimental and observational data. 相似文献
4.
Bayesian shrinkage analysis is arguably the state-of-the-art technique for large-scale multiple quantitative trait locus (QTL) mapping. However, when the shrinkage model does not involve indicator variables for marker inclusion, QTL detection remains heavily dependent on significance thresholds derived from phenotype permutation under the null hypothesis of no phenotype-to-genotype association. This approach is computationally intensive and more importantly, the hypothetical data generation at the heart of the permutation-based method violates the Bayesian philosophy. Here we propose a fully Bayesian decision rule for QTL detection under the recently introduced extended Bayesian LASSO for QTL mapping. Our new decision rule is free of any hypothetical data generation and relies on the well-established Bayes factors for evaluating the evidence for QTL presence at any locus. Simulation results demonstrate the remarkable performance of our decision rule. An application to real-world data is considered as well. 相似文献
5.
Various simple mathematical models have been used to investigate dengue transmission. Some of these models explicitly model the mosquito population, while others model the mosquitoes implicitly in the transmission term. We study the impact of modeling assumptions on the dynamics of dengue in Thailand by fitting dengue hemorrhagic fever (DHF) data to simple vector–host and SIR models using Bayesian Markov chain Monte Carlo estimation. The parameter estimates obtained for both models were consistent with previous studies. Most importantly, model selection found that the SIR model was substantially better than the vector–host model for the DHF data from Thailand. Therefore, explicitly incorporating the mosquito population may not be necessary in modeling dengue transmission for some populations. 相似文献
6.
Transmission events are the fundamental building blocks of the dynamics of any infectious disease. Much about the epidemiology of a disease can be learned when these individual transmission events are known or can be estimated. Such estimations are difficult and generally feasible only when detailed epidemiological data are available. The genealogy estimated from genetic sequences of sampled pathogens is another rich source of information on transmission history. Optimal inference of transmission events calls for the combination of genetic data and epidemiological data into one joint analysis. A key difficulty is that the transmission tree, which describes the transmission events between infected hosts, differs from the phylogenetic tree, which describes the ancestral relationships between pathogens sampled from these hosts. The trees differ both in timing of the internal nodes and in topology. These differences become more pronounced when a higher fraction of infected hosts is sampled. We show how the phylogenetic tree of sampled pathogens is related to the transmission tree of an outbreak of an infectious disease, by the within-host dynamics of pathogens. We provide a statistical framework to infer key epidemiological and mutational parameters by simultaneously estimating the phylogenetic tree and the transmission tree. We test the approach using simulations and illustrate its use on an outbreak of foot-and-mouth disease. The approach unifies existing methods in the emerging field of phylodynamics with transmission tree reconstruction methods that are used in infectious disease epidemiology. 相似文献
7.
Jun Huang Yuttapong Thawornwattana Tom Flouri James Mallet Ziheng Yang 《Molecular biology and evolution》2022,39(12)
Genomic sequence data provide a rich source of information about the history of species divergence and interspecific hybridization or introgression. Despite recent advances in genomics and statistical methods, it remains challenging to infer gene flow, and as a result, one may have to estimate introgression rates and times under misspecified models. Here we use mathematical analysis and computer simulation to examine estimation bias and issues of interpretation when the model of gene flow is misspecified in analysis of genomic datasets, for example, if introgression is assigned to the wrong lineages. In the case of two species, we establish a correspondence between the migration rate in the continuous migration model and the introgression probability in the introgression model. When gene flow occurs continuously through time but in the analysis is assumed to occur at a fixed time point, common evolutionary parameters such as species divergence times are surprisingly well estimated. However, the time of introgression tends to be estimated towards the recent end of the period of continuous gene flow. When introgression events are assigned incorrectly to the parental or daughter lineages, introgression times tend to collapse onto species divergence times, with introgression probabilities underestimated. Overall, our analyses suggest that the simple introgression model is useful for extracting information concerning between-specific gene flow and divergence even when the model may be misspecified. However, for reliable inference of gene flow it is important to include multiple samples per species, in particular, from hybridizing species. 相似文献
8.
For time series of count data, correlated measurements, clustering as well as excessive zeros occur simultaneously in biomedical applications. Ignoring such effects might contribute to misleading treatment outcomes. A generalized mixture Poisson geometric process (GMPGP) model and a zero‐altered mixture Poisson geometric process (ZMPGP) model are developed from the geometric process model, which was originally developed for modelling positive continuous data and was extended to handle count data. These models are motivated by evaluating the trend development of new tumour counts for bladder cancer patients as well as by identifying useful covariates which affect the count level. The models are implemented using Bayesian method with Markov chain Monte Carlo (MCMC) algorithms and are assessed using deviance information criterion (DIC). 相似文献
9.
Ewing G Rodrigo A;SMBE Tri-National Young Investigators 《Molecular biology and evolution》2006,23(5):988-996
We expand a coalescent-based method that uses serially sampled genetic data from a subdivided population to incorporate changes to the number of demes and patterns of colonization. Often, when estimating population parameters or other parameters of interest from genetic data, the demographic structure and parameters are not constant over evolutionary time. In this paper, we develop a Bayesian Markov chain Monte Carlo method that allows for step changes in mutation, migration, and population sizes, as well as changing numbers of demes, where the times of these changes are also estimated. We show that in parameter ranges of interest, reliable estimates can often be obtained, including the historical times of parameter changes. However, posterior densities of migration rates can be quite diffuse and estimators somewhat biased, as reported by other authors. 相似文献
10.
Bayesian nonparametric inference for panel count data with an informative observation process 下载免费PDF全文
In this paper, the panel count data analysis for recurrent events is considered. Such analysis is useful for studying tumor or infection recurrences in both clinical trial and observational studies. A bivariate Gaussian Cox process model is proposed to jointly model the observation process and the recurrent event process. Bayesian nonparametric inference is proposed for simultaneously estimating regression parameters, bivariate frailty effects, and baseline intensity functions. Inference is done through Markov chain Monte Carlo, with fully developed computational techniques. Predictive inference is also discussed under the Bayesian setting. The proposed method is shown to be efficient via simulation studies. A clinical trial dataset on skin cancer patients is analyzed to illustrate the proposed approach. 相似文献
11.
12.
Bayesian model–based clustering provides a powerful and flexible tool that can be incorporated into regression models to better understand the grouping of observations. Using data from the Seychelles Child Development Study, we explore the effect of prenatal methylmercury exposure on 20 neurodevelopmental outcomes measured in 9-year-old children. Rather than cluster individual subjects, we cluster the outcomes within a multiple outcomes model. By using information in the data to nest the outcomes into groups called domains, the model more accurately reflects the shared characteristics of neurodevelopmental domains and improves estimation of the overall and outcome-specific exposure effects by shrinking effects within and between domains selected by the data. The Bayesian paradigm allows for sampling from the posterior distribution of the grouping parameters; thus, inference can be made about group membership and their defining characteristics. We avoid the often difficult and highly subjective requirement of a priori identification of the total number of groups by incorporating a Dirichlet process prior to form a fully Bayesian multiple outcomes model. 相似文献
13.
In this paper, we analyze infant mortality in Nigeria based on the data set from the 1999 Nigeria Demographic and Health Survey (NDHS). We investigate spatial patterns at a highly disaggregated level of Nigerian states and consider non-linear effects of mother's age at birth. Time to the occurrence of a child's death can intuitively be considered to be categorical in nature and the determinants of a child's death may differ in different age groups. Thus, it may be desirable to investigate separately the death of a child in the first month and in the remaining 11 months of the first year of life. To avoid selection bias, the data set used for this case study is based on information on children who were born 12 months preceding the survey. Inference is Bayesian and is based on Markov chain Monte Carlo (MCMC) techniques. We find that spatial variation and the determinants of death indeed differ considerably for the two age groups considered. 相似文献
14.
15.
16.
Divergence population genetics of chimpanzees 总被引:18,自引:0,他引:18
The divergence of two subspecies of common chimpanzees (Pan troglodytes troglodytes and P. t. verus) and the bonobo (P. paniscus) was studied using a recently developed method for analyzing population divergence. Under the isolation with migration model, the posterior probability distributions of divergence time, migration rates, and effective population sizes were estimated for large multilocus DNA sequence data sets drawn from the literature. The bonobo and the common chimpanzee are estimated to have diverged approximately 0.86 to 0.89 MYA, and the divergence of the two common chimpanzee subspecies is estimated to have occurred 0.42 MYA. P. t. troglodytes appears to have had a larger effective population size (22,400 to 27,900) compared with P. paniscus, P. t. verus, and the ancestral populations of these species. No evidence of gene flow was found in the comparisons involving P. paniscus; however a clear signal of unidirectional gene flow was found from P. t. verus to P. t. troglodytes (2Nm = 0.51). 相似文献
17.
BETH GARDNER J. ANDREW ROYLE MICHAEL T. WEGAN RAYMOND E. RAINBOLT PAUL D. CURTIS 《The Journal of wildlife management》2010,74(2):318-325
ABSTRACT DNA-based mark-recapture has become a methodological cornerstone of research focused on bear species. The objective of such studies is often to estimate population size; however, doing so is frequently complicated by movement of individual bears. Movement affects the probability of detection and the assumption of closure of the population required in most models. To mitigate the bias caused by movement of individuals, population size and density estimates are often adjusted using ad hoc methods, including buffering the minimum polygon of the trapping array. We used a hierarchical, spatial capture-recapture model that contains explicit components for the spatial-point process that governs the distribution of individuals and their exposure to (via movement), and detection by, traps. We modeled detection probability as a function of each individual's distance to the trap and an indicator variable for previous capture to account for possible behavioral responses. We applied our model to a 2006 hair-snare study of a black bear (Ursus americanus) population in northern New York, USA. Based on the microsatellite marker analysis of collected hair samples, 47 individuals were identified. We estimated mean density at 0.20 bears/km2. A positive estimate of the indicator variable suggests that bears are attracted to baited sites; therefore, including a trap-dependence covariate is important when using bait to attract individuals. Bayesian analysis of the model was implemented in WinBUGS, and we provide the model specification. The model can be applied to any spatially organized trapping array (hair snares, camera traps, mist nests, etc.) to estimate density and can also account for heterogeneity and covariate information at the trap or individual level. 相似文献
18.
Bayesian analyses of spatial data often use a conditionally autoregressive (CAR) prior, which can be written as the kernel of an improper density that depends on a precision parameter tau that is typically unknown. To include tau in the Bayesian analysis, the kernel must be multiplied by tau(k) for some k. This article rigorously derives k = (n - I)/2 for the L2 norm CAR prior (also called a Gaussian Markov random field model) and k = n - I for the L1 norm CAR prior, where n is the number of regions and I the number of "islands" (disconnected groups of regions) in the spatial map. Since I = 1 for a spatial structure defining a connected graph, this supports Knorr-Held's (2002, in Highly Structured Stochastic Systems, 260-264) suggestion that k = (n - 1)/2 in the L2 norm case, instead of the more common k = n/2. We illustrate the practical significance of our results using a periodontal example. 相似文献
19.
A Bayesian approach to analysing data from family-based association studies is developed. This permits direct assessment of the range of possible values of model parameters, such as the recombination frequency and allelic associations, in the light of the data. In addition, sophisticated comparisons of different models may be handled easily, even when such models are not nested. The methodology is developed in such a way as to allow separate inferences to be made about linkage and association by including theta, the recombination fraction between the marker and disease susceptibility locus under study, explicitly in the model. The method is illustrated by application to a previously published data set. The data analysis raises some interesting issues, notably with regard to the weight of evidence necessary to convince us of linkage between a candidate locus and disease. 相似文献
20.
Bayesian coalescent inference of past population dynamics from molecular sequences 总被引:31,自引:0,他引:31
We introduce the Bayesian skyline plot, a new method for estimating past population dynamics through time from a sample of molecular sequences without dependence on a prespecified parametric model of demographic history. We describe a Markov chain Monte Carlo sampling procedure that efficiently samples a variant of the generalized skyline plot, given sequence data, and combines these plots to generate a posterior distribution of effective population size through time. We apply the Bayesian skyline plot to simulated data sets and show that it correctly reconstructs demographic history under canonical scenarios. Finally, we compare the Bayesian skyline plot model to previous coalescent approaches by analyzing two real data sets (hepatitis C virus in Egypt and mitochondrial DNA of Beringian bison) that have been previously investigated using alternative coalescent methods. In the bison analysis, we detect a severe but previously unrecognized bottleneck, estimated to have occurred 10,000 radiocarbon years ago, which coincides with both the earliest undisputed record of large numbers of humans in Alaska and the megafaunal extinctions in North America at the beginning of the Holocene. 相似文献