首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 234 毫秒
1.
Anderson EC 《Genetics》2005,170(2):955-967
This article presents an efficient importance-sampling method for computing the likelihood of the effective size of a population under the coalescent model of Berthier et al. Previous computational approaches, using Markov chain Monte Carlo, required many minutes to several hours to analyze small data sets. The approach presented here is orders of magnitude faster and can provide an approximation to the likelihood curve, even for large data sets, in a matter of seconds. Additionally, confidence intervals on the estimated likelihood curve provide a useful estimate of the Monte Carlo error. Simulations show the importance sampling to be stable across a wide range of scenarios and show that the N(e) estimator itself performs well. Further simulations show that the 95% confidence intervals around the N(e) estimate are accurate. User-friendly software implementing the algorithm for Mac, Windows, and Unix/Linux is available for download. Applications of this computational framework to other problems are discussed.  相似文献   

2.
Statistical models support medical research by facilitating individualized outcome prognostication conditional on independent variables or by estimating effects of risk factors adjusted for covariates. Theory of statistical models is well‐established if the set of independent variables to consider is fixed and small. Hence, we can assume that effect estimates are unbiased and the usual methods for confidence interval estimation are valid. In routine work, however, it is not known a priori which covariates should be included in a model, and often we are confronted with the number of candidate variables in the range 10–30. This number is often too large to be considered in a statistical model. We provide an overview of various available variable selection methods that are based on significance or information criteria, penalized likelihood, the change‐in‐estimate criterion, background knowledge, or combinations thereof. These methods were usually developed in the context of a linear regression model and then transferred to more generalized linear models or models for censored survival data. Variable selection, in particular if used in explanatory modeling where effect estimates are of central interest, can compromise stability of a final model, unbiasedness of regression coefficients, and validity of p‐values or confidence intervals. Therefore, we give pragmatic recommendations for the practicing statistician on application of variable selection methods in general (low‐dimensional) modeling problems and on performing stability investigations and inference. We also propose some quantities based on resampling the entire variable selection process to be routinely reported by software packages offering automated variable selection algorithms.  相似文献   

3.
Comparison of the performance and accuracy of different inference methods, such as maximum likelihood (ML) and Bayesian inference, is difficult because the inference methods are implemented in different programs, often written by different authors. Both methods were implemented in the program MIGRATE, that estimates population genetic parameters, such as population sizes and migration rates, using coalescence theory. Both inference methods use the same Markov chain Monte Carlo algorithm and differ from each other in only two aspects: parameter proposal distribution and maximization of the likelihood function. Using simulated datasets, the Bayesian method generally fares better than the ML approach in accuracy and coverage, although for some values the two approaches are equal in performance. MOTIVATION: The Markov chain Monte Carlo-based ML framework can fail on sparse data and can deliver non-conservative support intervals. A Bayesian framework with appropriate prior distribution is able to remedy some of these problems. RESULTS: The program MIGRATE was extended to allow not only for ML(-) maximum likelihood estimation of population genetics parameters but also for using a Bayesian framework. Comparisons between the Bayesian approach and the ML approach are facilitated because both modes estimate the same parameters under the same population model and assumptions.  相似文献   

4.
Ecologists often contrast diversity (species richness and abundances) using tests for comparing means or indices. However, many popular software applications do not support performing standard inferential statistics for estimates of species richness and/or density. In this study we simulated the behavior of asymmetric log-normal confidence intervals and determined an interval level that mimics statistical tests with P(α) = 0.05 when confidence intervals from two distributions do not overlap. Our results show that 84% confidence intervals robustly mimic 0.05 statistical tests for asymmetric confidence intervals, as has been demonstrated for symmetric ones in the past. Finally, we provide detailed user-guides for calculating 84% confidence intervals in two of the most robust and highly-used freeware related to diversity measurements for wildlife (i.e., EstimateS, Distance).  相似文献   

5.
Every statistical model is based on explicitly or implicitly formulated assumptions. In this study we address new techniques of calculation of variances and confidence intervals, analyse some statistical methods applied to modelling twinning rates, and investigate whether the improvements give more reliable results. For an observed relative frequency, the commonly used variance formula holds exactly with the assumptions that the repetitions are independent and that the probability of success is constant. The probability of a twin maternity depends not only on genetic predisposition, but also on several demographic factors, particularly ethnicity, maternal age and parity. Therefore, the assumption of constancy is questionable. The effect of grouping on the analysis of regression models for twinning rates is also considered. Our results indicate that grouping influences the efficiency of the estimates but not the estimates themselves. Recently, confidence intervals for proportions of low-incidence events have been a target for revived interest and we present the new alternatives. These confidence intervals are slightly wider and their midpoints do not coincide with the maximum-likelihood estimate of the twinning rate, but their actual coverage is closer to the nominal one than the coverage of the traditional confidence interval. In general, our findings indicate that the traditional methods are mainly satisfactorily robust and give reliable results. However, we propose that new formulae for the confidence intervals should be used. Our results are applied to twin-maternity data from Finland and Denmark.  相似文献   

6.
This paper develops mathematical and computational methods for fitting, by the method of maximum likelihood (ML), the two-parameter, right-truncated Weibull distribution (RTWD) to life-test or survival data. Some important statistical properties of the RTWD are derived and ML estimating equations for the scale and shape parameters of the RTWD are developed. The ML equations are used to express the scale parameter as an analytic function of the shape parameter and to establish a computationally useful lower bound on the ML estimate of the shape parameter. This bound is a function only of the sample observations and the (known) truncation point T. The ML equations are reducible to a single nonlinear, transcendental equation in the shape parameter, and a computationally efficient algorithm is described for solving this equation. The practical use of the methods is illustrated in two numerical examples.  相似文献   

7.
A plethora of statistical models have recently been developed to estimate components of population genetic history. Very few of these methods, however, have been adequately evaluated for their performance in accurately estimating population genetic parameters of interest. In this paper, we continue a research program of evaluation of population genetic methods through computer simulation. Specifically, we examine the software MIGRATEE-N 1.6.8 and test the accuracy of this software to estimate genetic diversity (Theta), migration rates, and confidence intervals. We simulated nucleotide sequence data under a neutral coalescent model with lengths of 500 bp and 1000 bp, and with three different per site Theta values of (0.00025, 0.0025, 0.025) crossed with four different migration rates (0.0000025, 0.025, 0.25, 2.5) to construct 1000 evolutionary trees per-combination per-sequence-length. We found that while MIGRATEE-N 1.6.8 performs reasonably well in estimating genetic diversity (Theta), it does poorly at estimating migration rates and the confidence intervals associated with them. We recommend researchers use this software with caution under conditions similar to those used in this evaluation.  相似文献   

8.
Duval S  Tweedie R 《Biometrics》2000,56(2):455-463
We study recently developed nonparametric methods for estimating the number of missing studies that might exist in a meta-analysis and the effect that these studies might have had on its outcome. These are simple rank-based data augmentation techniques, which formalize the use of funnel plots. We show that they provide effective and relatively powerful tests for evaluating the existence of such publication bias. After adjusting for missing studies, we find that the point estimate of the overall effect size is approximately correct and coverage of the effect size confidence intervals is substantially improved, in many cases recovering the nominal confidence levels entirely. We illustrate the trim and fill method on existing meta-analyses of studies in clinical trials and psychometrics.  相似文献   

9.
Evaluating the goodness of fit of logistic regression models is crucial to ensure the accuracy of the estimated probabilities. Unfortunately, such evaluation is problematic in large samples. Because the power of traditional goodness of fit tests increases with the sample size, practically irrelevant discrepancies between estimated and true probabilities are increasingly likely to cause the rejection of the hypothesis of perfect fit in larger and larger samples. This phenomenon has been widely documented for popular goodness of fit tests, such as the Hosmer-Lemeshow test. To address this limitation, we propose a modification of the Hosmer-Lemeshow approach. By standardizing the noncentrality parameter that characterizes the alternative distribution of the Hosmer-Lemeshow statistic, we introduce a parameter that measures the goodness of fit of a model but does not depend on the sample size. We provide the methodology to estimate this parameter and construct confidence intervals for it. Finally, we propose a formal statistical test to rigorously assess whether the fit of a model, albeit not perfect, is acceptable for practical purposes. The proposed method is compared in a simulation study with a competing modification of the Hosmer-Lemeshow test, based on repeated subsampling. We provide a step-by-step illustration of our method using a model for postneonatal mortality developed in a large cohort of more than 300 000 observations.  相似文献   

10.
Quantitative predictions in computational life sciences are often based on regression models. The advent of machine learning has led to highly accurate regression models that have gained widespread acceptance. While there are statistical methods available to estimate the global performance of regression models on a test or training dataset, it is often not clear how well this performance transfers to other datasets or how reliable an individual prediction is–a fact that often reduces a user’s trust into a computational method. In analogy to the concept of an experimental error, we sketch how estimators for individual prediction errors can be used to provide confidence intervals for individual predictions. Two novel statistical methods, named CONFINE and CONFIVE, can estimate the reliability of an individual prediction based on the local properties of nearby training data. The methods can be applied equally to linear and non-linear regression methods with very little computational overhead. We compare our confidence estimators with other existing confidence and applicability domain estimators on two biologically relevant problems (MHC–peptide binding prediction and quantitative structure-activity relationship (QSAR)). Our results suggest that the proposed confidence estimators perform comparable to or better than previously proposed estimation methods. Given a sufficient amount of training data, the estimators exhibit error estimates of high quality. In addition, we observed that the quality of estimated confidence intervals is predictable. We discuss how confidence estimation is influenced by noise, the number of features, and the dataset size. Estimating the confidence in individual prediction in terms of error intervals represents an important step from plain, non-informative predictions towards transparent and interpretable predictions that will help to improve the acceptance of computational methods in the biological community.  相似文献   

11.
R Bürger  A Gimelfarb 《Genetics》1999,152(2):807-820
Stabilizing selection for an intermediate optimum is generally considered to deplete genetic variation in quantitative traits. However, conflicting results from various types of models have been obtained. While classical analyses assuming a large number of independent additive loci with individually small effects indicated that no genetic variation is preserved under stabilizing selection, several analyses of two-locus models showed the contrary. We perform a complete analysis of a generalization of Wright's two-locus quadratic-optimum model and investigate numerically the ability of quadratic stabilizing selection to maintain genetic variation in additive quantitative traits controlled by up to five loci. A statistical approach is employed by choosing randomly 4000 parameter sets (allelic effects, recombination rates, and strength of selection) for a given number of loci. For each parameter set we iterate the recursion equations that describe the dynamics of gamete frequencies starting from 20 randomly chosen initial conditions until an equilibrium is reached, record the quantities of interest, and calculate their corresponding mean values. As the number of loci increases from two to five, the fraction of the genome expected to be polymorphic declines surprisingly rapidly, and the loci that are polymorphic increasingly are those with small effects on the trait. As a result, the genetic variance expected to be maintained under stabilizing selection decreases very rapidly with increased number of loci. The equilibrium structure expected under stabilizing selection on an additive trait differs markedly from that expected under selection with no constraints on genotypic fitness values. The expected genetic variance, the expected polymorphic fraction of the genome, as well as other quantities of interest, are only weakly dependent on the selection intensity and the level of recombination.  相似文献   

12.
We propose a method to construct simultaneous confidence intervals for a parameter vector from inverting a series of randomization tests (RT). The randomization tests are facilitated by an efficient multivariate Robbins–Monro procedure that takes the correlation information of all components into account. The estimation method does not require any distributional assumption of the population other than the existence of the second moments. The resulting simultaneous confidence intervals are not necessarily symmetric about the point estimate of the parameter vector but possess the property of equal tails in all dimensions. In particular, we present the constructing the mean vector of one population and the difference between two mean vectors of two populations. Extensive simulation is conducted to show numerical comparison with four methods. We illustrate the application of the proposed method to test bioequivalence with multiple endpoints on some real data.  相似文献   

13.
Both the absolute risk and the relative risk (RR) have a crucial role to play in epidemiology. RR is often approximated by odds ratio (OR) under the rare-disease assumption in conventional case-control study; however, such a study design does not provide an estimate for absolute risk. The case-base study is an alternative approach which readily produces RR estimation without resorting to the rare-disease assumption. However, previous researchers only considered one single dichotomous exposure and did not elaborate how absolute risks can be estimated in a case-base study. In this paper, the authors propose a logistic model for the case-base study. The model is flexible enough to admit multiple exposures in any measurement scale—binary, categorical or continuous. It can be easily fitted using common statistical packages. With one additional step of simple calculations of the model parameters, one readily obtains relative and absolute risk estimates as well as their confidence intervals. Monte-Carlo simulations show that the proposed method can produce unbiased estimates and adequate-coverage confidence intervals, for ORs, RRs and absolute risks. The case-base study with all its desirable properties and its methods of analysis fully developed in this paper may become a mainstay in epidemiology.  相似文献   

14.
Chan IS  Zhang Z 《Biometrics》1999,55(4):1202-1209
Confidence intervals are often provided to estimate a treatment difference. When the sample size is small, as is typical in early phases of clinical trials, confidence intervals based on large sample approximations may not be reliable. In this report, we propose test-based methods of constructing exact confidence intervals for the difference in two binomial proportions. These exact confidence intervals are obtained from the unconditional distribution of two binomial responses, and they guarantee the level of coverage. We compare the performance of these confidence intervals to ones based on the observed difference alone. We show that a large improvement can be achieved by using the standardized Z test with a constrained maximum likelihood estimate of the variance.  相似文献   

15.
Demographic studies focusing on age-specific mortality rates are becoming increasingly common throughout the fields of life-history evolution, ecology and biogerontology. Well-defined statistical techniques for quantifying patterns of mortality within a cohort and identifying differences in age-specific mortality among cohorts are needed. Here I discuss using maximum likelihood (ML) statistical methods to estimate the parameters of mathematical models, which are used to describe the change in mortality with age. ML provides a convenient and powerful framework for choosing an adequate mortality model, estimating model parameters and testing hypotheses about differences in parameters among experimental or ecological treatments. Simulations suggest that experiments designed to estimate age-specific mortality should involve at least 100-500 individuals per cohort per treatment. Significant bias in the estimation of model parameters is introduced when the mortality model is misspecified and samples are too small to detect the true mortality pattern. Furthermore, the lack of simple and efficient procedures for comparing different mortality models has forced the use of the Gompertz model, which specifies an exponentially increasing mortality with age, and which may not apply to the majority of experimental systems.  相似文献   

16.
The current approach to using machine learning (ML) algorithms in healthcare is to either require clinician oversight for every use case or use their predictions without any human oversight. We explore a middle ground that lets ML algorithms abstain from making a prediction to simultaneously improve their reliability and reduce the burden placed on human experts. To this end, we present a general penalized loss minimization framework for training selective prediction-set (SPS) models, which choose to either output a prediction set or abstain. The resulting models abstain when the outcome is difficult to predict accurately, such as on subjects who are too different from the training data, and achieve higher accuracy on those they do give predictions for. We then introduce a model-agnostic, statistical inference procedure for the coverage rate of an SPS model that ensembles individual models trained using K-fold cross-validation. We find that SPS ensembles attain prediction-set coverage rates closer to the nominal level and have narrower confidence intervals for its marginal coverage rate. We apply our method to train neural networks that abstain more for out-of-sample images on the MNIST digit prediction task and achieve higher predictive accuracy for ICU patients compared to existing approaches.  相似文献   

17.
This paper introduces a simple stochastic model for waterfowl movement. After outlining the properties of the model, we focus on parameter estimation. We compare three standard least squares estimation procedures with maximum likelihood (ML) estimates using Monte Carlo simulations. For our model, little is gained by incorporating information about the covariance structure of the process into least squares estimation. In fact, misspecifying the covariance produces worse estimates than ignoring heteroscedasticity and autocorrelation. We also develop a modified least squares procedure that performs as well as ML. We then apply the five estimators to field data and show that differences in the statistical properties of the estimators can greatly affect our interpretation of the data. We conclude by highlighting the effects of density on per capita movement rates.  相似文献   

18.
Two-stage, drop-the-losers designs for adaptive treatment selection have been considered by many authors. The distributions of conditional sufficient statistics and the Rao-Blackwell technique were used to obtain an unbiased estimate and to construct an exact confidence interval for the parameter of interest. In this paper, we characterize the selection process from a binomial drop-the-losers design using a truncated binomial distribution. We propose a new estimator and show that it is asymptotically consistent with a large sample size in either the first stage or the second stage. Supported by simulation analyses, we recommend the new estimator over the naive estimator and the Rao-Blackwell-type estimator based on its robustness in the finite-sample setting. We frame the concept as a simple and easily implemented procedure for phase 2 oncology trial design that can be confirmatory in nature, and we use an example to illustrate its application.  相似文献   

19.
20.
Flandre P 《PloS one》2011,6(9):e22871

Background

In recent years the “noninferiority” trial has emerged as the new standard design for HIV drug development among antiretroviral patients often with a primary endpoint based on the difference in success rates between the two treatment groups. Different statistical methods have been introduced to provide confidence intervals for that difference. The main objective is to investigate whether the choice of the statistical method changes the conclusion of the trials.

Methods

We presented 11 trials published in 2010 using a difference in proportions as the primary endpoint. In these trials, 5 different statistical methods have been used to estimate such confidence intervals. The five methods are described and applied to data from the 11 trials. The noninferiority of the new treatment is not demonstrated if the prespecified noninferiority margin it includes in the confidence interval of the treatment difference.

Results

Results indicated that confidence intervals can be quite different according to the method used. In many situations, however, conclusions of the trials are not altered because point estimates of the treatment difference were too far from the prespecified noninferiority margins. Nevertheless, in few trials the use of different statistical methods led to different conclusions. In particular the use of “exact” methods can be very confusing.

Conclusion

Statistical methods used to estimate confidence intervals in noninferiority trials have a strong impact on the conclusion of such trials.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号