首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Maximum likelihood and Bayesian approaches are presented for analyzing hierarchical statistical models of natural selection operating on DNA polymorphism within a panmictic population. For analyzing Bayesian models, we present Markov chain Monte-Carlo (MCMC) methods for sampling from the joint posterior distribution of parameters. For frequentist analysis, an Expectation-Maximization (EM) algorithm is presented for finding the maximum likelihood estimate of the genome wide mean and variance in selection intensity among classes of mutations. The framework presented here provides an ideal setting for modeling mutations dispersed through the genome and, in particular, for the analysis of how natural selection operates on different classes of single nucleotide polymorphisms (SNPs).  相似文献   

2.
Stockmarr A 《Biometrics》1999,55(3):671-677
A crime has been committed, and a DNA profile of the perpetrator is obtained from the crime scene. A suspect with a matching profile is found. The problem of evaluating this DNA evidence in a forensic context, when the suspect is found through a database search, is analysed through a likelihood approach. The recommendations of the National Research Council of the U.S. are derived in this setting as the proper way of evaluating the evidence when finiteness of the population of possible perpetrators is not taken into account. When a finite population of possible perpetrators may be assumed, it is possible to take account of the sampling process that resulted in the actual database, so one can deal with the problem where a large proportion of the possible perpetrators belongs to the database in question. It is shown that the last approach does not in general result in a greater weight being assigned to the evidence, though it does when a sufficiently large amount of the possible perpetrators are in the database. The value of the likelihood ratio corresponding to the probable cause setting constitutes an upper bound for this weight, and the upper bound is only attained when all but one of the possible perpetrators are in the database.  相似文献   

3.
Meester R  Sjerps M 《Biometrics》2003,59(3):727-732
Summary . Does the evidential strength of a DNA match depend on whether the suspect was identified through database search or through other evidence (“probable cause”)? In Balding and Donnelly (1995, Journal of the Royal Statistical Society, Series A 158, 21–53) and elsewhere, it has been argued that the evidential strength is slightly larger in a database search case than in a probable cause case, while Stockmarr (1999 , Biometrics 55, 671–677) reached the opposite conclusion. Both these approaches use likelihood ratios. By making an excursion to a similar problem, the two‐stain problem, we argue in this article that there are certain fundamental difficulties with the use of a likelihood ratio, which can be avoided by concentrating on the posterior odds. This approach helps resolving the above‐mentioned conflict.  相似文献   

4.
Numerous statistical methods have been developed for analyzing high‐dimensional data. These methods often focus on variable selection approaches but are limited for the purpose of testing with high‐dimensional data. They are often required to have explicit‐likelihood functions. In this article, we propose a “hybrid omnibus test” for high‐dicmensional data testing purpose with much weaker requirements. Our hybrid omnibus test is developed under a semiparametric framework where a likelihood function is no longer necessary. Our test is a version of a frequentist‐Bayesian hybrid score‐type test for a generalized partially linear single‐index model, which has a link function being a function of a set of variables through a generalized partially linear single index. We propose an efficient score based on estimating equations, define local tests, and then construct our hybrid omnibus test using local tests. We compare our approach with an empirical‐likelihood ratio test and Bayesian inference based on Bayes factors, using simulation studies. Our simulation results suggest that our approach outperforms the others, in terms of type I error, power, and computational cost in both the low‐ and high‐dimensional cases. The advantage of our approach is demonstrated by applying it to genetic pathway data for type II diabetes mellitus.  相似文献   

5.
Summary .  With the aim of bridging the gap between DNA mixture analysis and DNA database search, a novel approach is proposed to evaluate the forensic evidence of DNA mixtures when the suspect is identified by the search of a database of DNA profiles. General formulae are developed for the calculation of the likelihood ratio for a two-person mixture under general situations including multiple matches and imperfect evidence. The influence of the prior probabilities on the weight of evidence under the scenario of multiple matches is demonstrated by a numerical example based on Hong Kong data. Our approach is shown to be capable of presenting the forensic evidence of DNA mixtures in a comprehensive way when the suspect is identified through database search.  相似文献   

6.
Houseman EA  Marsit C  Karagas M  Ryan LM 《Biometrics》2007,63(4):1269-1277
Increasingly used in health-related applications, latent variable models provide an appealing framework for handling high-dimensional exposure and response data. Item response theory (IRT) models, which have gained widespread popularity, were originally developed for use in the context of educational testing, where extremely large sample sizes permitted the estimation of a moderate-to-large number of parameters. In the context of public health applications, smaller sample sizes preclude large parameter spaces. Therefore, we propose a penalized likelihood approach to reduce mean square error and improve numerical stability. We present a continuous family of models, indexed by a tuning parameter, that range between the Rasch model and the IRT model. The tuning parameter is selected by cross validation or approximations such as Akaike Information Criterion. While our approach can be placed easily in a Bayesian context, we find that our frequentist approach is more computationally efficient. We demonstrate our methodology on a study of methylation silencing of gene expression in bladder tumors. We obtain similar results using both frequentist and Bayesian approaches, although the frequentist approach is less computationally demanding. In particular, we find high correlation of methylation silencing among 16 loci in bladder tumors, that methylation is associated with smoking and also with patient survival.  相似文献   

7.
We consider a new frequentist gene expression index for Affymetrix oligonucleotide DNA arrays, using a similar probe intensity model as suggested by Hein and others (2005), called the Bayesian gene expression index (BGX). According to this model, the perfect match and mismatch values are assumed to be correlated as a result of sharing a common gene expression signal. Rather than a Bayesian approach, we develop a maximum likelihood algorithm for estimating the underlying common signal. In this way, estimation is explicit and much faster than the BGX implementation. The observed Fisher information matrix, rather than a posterior credibility interval, gives an idea of the accuracy of the estimators. We evaluate our method using benchmark spike-in data sets from Affymetrix and GeneLogic by analyzing the relationship between estimated signal and concentration, i.e. true signal, and compare our results with other commonly used methods.  相似文献   

8.
During the 20th century ecologists largely relied on the frequentist system of inference for the analysis of their data. However, in the past few decades ecologists have become increasingly interested in the use of Bayesian methods of data analysis. In this article I provide guidance to ecologists who would like to decide whether Bayesian methods can be used to improve their conclusions and predictions. I begin by providing a concise summary of Bayesian methods of analysis, including a comparison of differences between Bayesian and frequentist approaches to inference when using hierarchical models. Next I provide a list of problems where Bayesian methods of analysis may arguably be preferred over frequentist methods. These problems are usually encountered in analyses based on hierarchical models of data. I describe the essentials required for applying modern methods of Bayesian computation, and I use real-world examples to illustrate these methods. I conclude by summarizing what I perceive to be the main strengths and weaknesses of using Bayesian methods to solve ecological inference problems.  相似文献   

9.
MOTIVATION: Array CGH technologies enable the simultaneous measurement of DNA copy number for thousands of sites on a genome. We developed the circular binary segmentation (CBS) algorithm to divide the genome into regions of equal copy number. The algorithm tests for change-points using a maximal t-statistic with a permutation reference distribution to obtain the corresponding P-value. The number of computations required for the maximal test statistic is O(N2), where N is the number of markers. This makes the full permutation approach computationally prohibitive for the newer arrays that contain tens of thousands markers and highlights the need for a faster algorithm. RESULTS: We present a hybrid approach to obtain the P-value of the test statistic in linear time. We also introduce a rule for stopping early when there is strong evidence for the presence of a change. We show through simulations that the hybrid approach provides a substantial gain in speed with only a negligible loss in accuracy and that the stopping rule further increases speed. We also present the analyses of array CGH data from breast cancer cell lines to show the impact of the new approaches on the analysis of real data. AVAILABILITY: An R version of the CBS algorithm has been implemented in the "DNAcopy" package of the Bioconductor project. The proposed hybrid method for the P-value is available in version 1.2.1 or higher and the stopping rule for declaring a change early is available in version 1.5.1 or higher.  相似文献   

10.
Wavelet thresholding with bayesian false discovery rate control   总被引:1,自引:0,他引:1  
The false discovery rate (FDR) procedure has become a popular method for handling multiplicity in high-dimensional data. The definition of FDR has a natural Bayesian interpretation; it is the expected proportion of null hypotheses mistakenly rejected given a measure of evidence for their truth. In this article, we propose controlling the positive FDR using a Bayesian approach where the rejection rule is based on the posterior probabilities of the null hypotheses. Correspondence between Bayesian and frequentist measures of evidence in hypothesis testing has been studied in several contexts. Here we extend the comparison to multiple testing with control of the FDR and illustrate the procedure with an application to wavelet thresholding. The problem consists of recovering signal from noisy measurements. This involves extracting wavelet coefficients that result from true signal and can be formulated as a multiple hypotheses-testing problem. We use simulated examples to compare the performance of our approach to the Benjamini and Hochberg (1995, Journal of the Royal Statistical Society, Series B57, 289-300) procedure. We also illustrate the method with nuclear magnetic resonance spectral data from human brain.  相似文献   

11.
In epidemiologic studies, measurement error in the exposure variable can have a detrimental effect on the power of hypothesis testing for detecting the impact of exposure in the development of a disease. To adjust for misclassification in the hypothesis testing procedure involving a misclassified binary exposure variable, we consider a retrospective case–control scenario under the assumption of nondifferential misclassification. We develop a test under Bayesian approach from a posterior distribution generated by a MCMC algorithm and a normal prior under realistic assumptions. We compared this test with an equivalent likelihood ratio test developed under the frequentist approach, using various simulated settings and in the presence or the absence of validation data. In our simulations, we considered varying degrees of sensitivity, specificity, sample sizes, exposure prevalence, and proportion of unvalidated and validated data. In these scenarios, our simulation study shows that the adjusted model (with-validation data model) is always better than the unadjusted model (without validation data model). However, we showed that exception is possible in the fixed budget scenario where collection of the validation data requires a much higher cost. We also showed that both Bayesian and frequentist hypothesis testing procedures reach the same conclusions for the scenarios under consideration. The Bayesian approach is, however, computationally more stable in rare exposure contexts. A real case–control study was used to show the application of the hypothesis testing procedures under consideration.  相似文献   

12.
In the management of most chronic conditions characterized by the lack of universally effective treatments, adaptive treatment strategies (ATSs) have grown in popularity as they offer a more individualized approach. As a result, sequential multiple assignment randomized trials (SMARTs) have gained attention as the most suitable clinical trial design to formalize the study of these strategies. While the number of SMARTs has increased in recent years, sample size and design considerations have generally been carried out in frequentist settings. However, standard frequentist formulae require assumptions on interim response rates and variance components. Misspecifying these can lead to incorrect sample size calculations and correspondingly inadequate levels of power. The Bayesian framework offers a straightforward path to alleviate some of these concerns. In this paper, we provide calculations in a Bayesian setting to allow more realistic and robust estimates that account for uncertainty in inputs through the ‘two priors’ approach. Additionally, compared to the standard frequentist formulae, this methodology allows us to rely on fewer assumptions, integrate pre-trial knowledge, and switch the focus from the standardized effect size to the MDD. The proposed methodology is evaluated in a thorough simulation study and is implemented to estimate the sample size for a full-scale SMART of an internet-based adaptive stress management intervention on cardiovascular disease patients using data from its pilot study conducted in two Canadian provinces.  相似文献   

13.
Two-part joint models for a longitudinal semicontinuous biomarker and a terminal event have been recently introduced based on frequentist estimation. The biomarker distribution is decomposed into a probability of positive value and the expected value among positive values. Shared random effects can represent the association structure between the biomarker and the terminal event. The computational burden increases compared to standard joint models with a single regression model for the biomarker. In this context, the frequentist estimation implemented in the R package frailtypack can be challenging for complex models (i.e., a large number of parameters and dimension of the random effects). As an alternative, we propose a Bayesian estimation of two-part joint models based on the Integrated Nested Laplace Approximation (INLA) algorithm to alleviate the computational burden and fit more complex models. Our simulation studies confirm that INLA provides accurate approximation of posterior estimates and to reduced computation time and variability of estimates compared to frailtypack in the situations considered. We contrast the Bayesian and frequentist approaches in the analysis of two randomized cancer clinical trials (GERCOR and PRIME studies), where INLA has a reduced variability for the association between the biomarker and the risk of event. Moreover, the Bayesian approach was able to characterize subgroups of patients associated with different responses to treatment in the PRIME study. Our study suggests that the Bayesian approach using the INLA algorithm enables to fit complex joint models that might be of interest in a wide range of clinical applications.  相似文献   

14.
针对DNA(脱氧核糖核酸)证据的量化过程中常用的插入算法存在的缺陷,即量化结果与样本大小无关,小样本时过分量化了DNA证据,本文考虑了样本大小的影响,引入了Bayes模型。给出了基于Bayes模型下的似然比的计算公式,结合实际案例,对比了两种方法下的计算结果,数据结果表明基于Bayes模型下的算法比插入算法更加精确和合理。  相似文献   

15.
The problem of making inferences about the ratio of two normal means has been addressed, both from the frequentist and Bayesian perspectives, by several authors. Most of this work is concerned with the homoscedastic case. In contrast, the situation where the variances are not equal has received little attention. Cox (1985) deals, within the frequentist framework, with a model where the variances are related to the means. His results are mainly based on Fieller's theorem whose drawbacks are well known. In this paper we present a Bayesian analysis of this model and discuss some related problems. An agronomical example is used throughout to illustrate the methods.  相似文献   

16.
The dynamics of species diversification rates are a key component of macroevolutionary patterns. Although not absolutely necessary, the use of divergence times inferred from sequence data has led to development of more powerful methods for inferring diversification rates. However, it is unclear what impact uncertainty in age estimates have on diversification rate inferences. Here, we quantify these effects using both Bayesian and frequentist methodology. Through simulation, we demonstrate that adding sequence data results in more precise estimates of internal node ages, but a reasonable approximation of these node ages is often sufficient to approach the theoretical minimum variance in speciation rate estimates. We also find that even crude estimates of divergence times increase the power of tests of diversification rate differences between sister clades. Finally, because Bayesian and frequentist methods provided similar assessments of error, novel Bayesian approaches may provide a useful framework for tests of diversification rates in more complex contexts than are addressed here.  相似文献   

17.
Selecting the best-fit model of nucleotide substitution   总被引:2,自引:0,他引:2  
Despite the relevant role of models of nucleotide substitution in phylogenetics, choosing among different models remains a problem. Several statistical methods for selecting the model that best fits the data at hand have been proposed, but their absolute and relative performance has not yet been characterized. In this study, we compare under various conditions the performance of different hierarchical and dynamic likelihood ratio tests, and of Akaike and Bayesian information methods, for selecting best-fit models of nucleotide substitution. We specifically examine the role of the topology used to estimate the likelihood of the different models and the importance of the order in which hypotheses are tested. We do this by simulating DNA sequences under a known model of nucleotide substitution and recording how often this true model is recovered by the different methods. Our results suggest that model selection is reasonably accurate and indicate that some likelihood ratio test methods perform overall better than the Akaike or Bayesian information criteria. The tree used to estimate the likelihood scores does not influence model selection unless it is a randomly chosen tree. The order in which hypotheses are tested, and the complexity of the initial model in the sequence of tests, influence model selection in some cases. Model fitting in phylogenetics has been suggested for many years, yet many authors still arbitrarily choose their models, often using the default models implemented in standard computer programs for phylogenetic estimation. We show here that a best-fit model can be readily identified. Consequently, given the relevance of models, model fitting should be routine in any phylogenetic analysis that uses models of evolution.  相似文献   

18.
A Bayesian framework for comparative quantitative genetics   总被引:1,自引:0,他引:1  
Bayesian approaches have been extensively used in animal breeding sciences, but similar approaches in the context of evolutionary quantitative genetics have been rare. We compared the performance of Bayesian and frequentist approaches in estimation of quantitative genetic parameters (viz. matrices of additive and dominance variances) in datasets typical of evolutionary studies and traits differing in their genetic architecture. Our results illustrate that it is difficult to disentangle the relative roles of different genetic components from small datasets, and that ignoring, e.g. dominance is likely to lead to biased estimates of additive variance. We suggest that a natural summary statistic for G-matrix comparisons can be obtained by examining how different the underlying multinormal probability distributions are, and illustrate our approach with data on the common frog (Rana temporaria). Furthermore, we derive a simple Monte Carlo method for computation of fraternity coefficients needed for the estimation of dominance variance, and use the pedigree of a natural Siberian jay (Perisoreus infaustus) population to illustrate that the commonly used approximate values can be substantially biased.  相似文献   

19.
There is sometimes a clear evidence of a strong secular trend in the treatment effect of studies included in a meta‐analysis. In such cases, estimating the present‐day treatment effect by meta‐regression is both reasonable and straightforward. We however consider the more common situation where a secular trend is suspected, but is not strongly statistically significant. Typically, this lack of significance is due to the small number of studies included in the analysis, so that a meta‐regression could give wild point estimates. We introduce an empirical Bayes meta‐analysis methodology, which shrinks the secular trend toward zero. This has the effect that treatment effects are adjusted for trend, but where the evidence from data is weak, wild results are not obtained. We explore several frequentist approaches and a fully Bayesian method is also implemented. A measure of trend analogous to I2 is described, and exact significance tests for trend are given. Our preferred method is one based on penalized or h‐likelihood, which is computationally simple, and allows invariance of predictions to the (arbitrary) choice of time origin. We suggest that a trendless standard random effects meta‐analysis should routinely be supplemented with an h‐likelihood analysis as a sensitivity analysis.  相似文献   

20.
Kneib T  Fahrmeir L 《Biometrics》2006,62(1):109-118
Motivated by a space-time study on forest health with damage state of trees as the response, we propose a general class of structured additive regression models for categorical responses, allowing for a flexible semiparametric predictor. Nonlinear effects of continuous covariates, time trends, and interactions between continuous covariates are modeled by penalized splines. Spatial effects can be estimated based on Markov random fields, Gaussian random fields, or two-dimensional penalized splines. We present our approach from a Bayesian perspective, with inference based on a categorical linear mixed model representation. The resulting empirical Bayes method is closely related to penalized likelihood estimation in a frequentist setting. Variance components, corresponding to inverse smoothing parameters, are estimated using (approximate) restricted maximum likelihood. In simulation studies we investigate the performance of different choices for the spatial effect, compare the empirical Bayes approach to competing methodology, and study the bias of mixed model estimates. As an application we analyze data from the forest health survey.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号