首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
Raw estimates of disease rates over a geographical region are frequently quite variable, even though one may reasonably expect adjacent communities to have similar true rates. Smoother estimates are obtained by incorporating a penalty into a multinomial likelihood estimation procedure. For each pair of locations, this penalty increases with the difference between the rates and decreases with the distance between the two sites. The resulting estimates have smaller mean squared error than the raw estimates. Expansions are developed which demonstrate the contributions of the smoothing constant, spatial configuration, risk population and raw estimates to the amount of smoothing. Simulations and an example involving gastric cancer data illustrate the proposed method.  相似文献   

2.
Inferring speciation times under an episodic molecular clock   总被引:5,自引:0,他引:5  
We extend our recently developed Markov chain Monte Carlo algorithm for Bayesian estimation of species divergence times to allow variable evolutionary rates among lineages. The method can use heterogeneous data from multiple gene loci and accommodate multiple fossil calibrations. Uncertainties in fossil calibrations are described using flexible statistical distributions. The prior for divergence times for nodes lacking fossil calibrations is specified by use of a birth-death process with species sampling. The prior for lineage-specific substitution rates is specified using either a model with autocorrelated rates among adjacent lineages (based on a geometric Brownian motion model of rate drift) or a model with independent rates among lineages specified by a log-normal probability distribution. We develop an infinite-sites theory, which predicts that when the amount of sequence data approaches infinity, the width of the posterior credibility interval and the posterior mean of divergence times form a perfect linear relationship, with the slope indicating uncertainties in time estimates that cannot be reduced by sequence data alone. Simulations are used to study the influence of among-lineage rate variation and the number of loci sampled on the uncertainty of divergence time estimates. The analysis suggests that posterior time estimates typically involve considerable uncertainties even with an infinite amount of sequence data, and that the reliability and precision of fossil calibrations are critically important to divergence time estimation. We apply our new algorithms to two empirical data sets and compare the results with those obtained in previous Bayesian and likelihood analyses. The results demonstrate the utility of our new algorithms.  相似文献   

3.
The risk difference is an intelligible measure for comparing disease incidence in two exposure or treatment groups. Despite its convenience in interpretation, it is less prevalent in epidemiological and clinical areas where regression models are required in order to adjust for confounding. One major barrier to its popularity is that standard linear binomial or Poisson regression models can provide estimated probabilities out of the range of (0,1), resulting in possible convergence issues. For estimating adjusted risk differences, we propose a general framework covering various constraint approaches based on binomial and Poisson regression models. The proposed methods span the areas of ordinary least squares, maximum likelihood estimation, and Bayesian inference. Compared to existing approaches, our methods prevent estimates and confidence intervals of predicted probabilities from falling out of the valid range. Through extensive simulation studies, we demonstrate that the proposed methods solve the issue of having estimates or confidence limits of predicted probabilities out of (0,1), while offering performance comparable to its alternative in terms of the bias, variability, and coverage rates in point and interval estimation of the risk difference. An application study is performed using data from the Prospective Registry Evaluating Myocardial Infarction: Event and Recovery (PREMIER) study.  相似文献   

4.
Heinze G  Schemper M 《Biometrics》2001,57(1):114-119
The phenomenon of monotone likelihood is observed in the fitting process of a Cox model if the likelihood converges to a finite value while at least one parameter estimate diverges to +/- infinity. Monotone likelihood primarily occurs in small samples with substantial censoring of survival times and several highly predictive covariates. Previous options to deal with monotone likelihood have been unsatisfactory. The solution we suggest is an adaptation of a procedure by Firth (1993, Biometrika 80, 27-38) originally developed to reduce the bias of maximum likelihood estimates. This procedure produces finite parameter estimates by means of penalized maximum likelihood estimation. Corresponding Wald-type tests and confidence intervals are available, but it is shown that penalized likelihood ratio tests and profile penalized likelihood confidence intervals are often preferable. An empirical study of the suggested procedures confirms satisfactory performance of both estimation and inference. The advantage of the procedure over previous options of analysis is finally exemplified in the analysis of a breast cancer study.  相似文献   

5.
Single nucleotide polymorphism (SNP) data can be used for parameter estimation via maximum likelihood methods as long as the way in which the SNPs were determined is known, so that an appropriate likelihood formula can be constructed. We present such likelihoods for several sampling methods. As a test of these approaches, we consider use of SNPs to estimate the parameter Theta = 4N(e)micro (the scaled product of effective population size and per-site mutation rate), which is related to the branch lengths of the reconstructed genealogy. With infinite amounts of data, ML models using SNP data are expected to produce consistent estimates of Theta. With finite amounts of data the estimates are accurate when Theta is high, but tend to be biased upward when Theta is low. If recombination is present and not allowed for in the analysis, the results are additionally biased upward, but this effect can be removed by incorporating recombination into the analysis. SNPs defined as sites that are polymorphic in the actual sample under consideration (sample SNPs) are somewhat more accurate for estimation of Theta than SNPs defined by their polymorphism in a panel chosen from the same population (panel SNPs). Misrepresenting panel SNPs as sample SNPs leads to large errors in the maximum likelihood estimate of Theta. Researchers collecting SNPs should collect and preserve information about the method of ascertainment so that the data can be accurately analyzed.  相似文献   

6.
Estimating kinetic constants from single channel data.   总被引:35,自引:14,他引:21       下载免费PDF全文
The process underlying the opening and closing of ionic channels in biological or artificial lipid membranes can be modeled kinetically as a time-homogeneous Markov chain. The elements of the chain are kinetic states that can be either open or closed. A maximum likelihood procedure is described for estimating the transition rates between these states from single channel data. The method has been implemented for linear kinetic schemes of fewer than six states, and is suitable for nonstationary data in which one or more independent channels are functioning simultaneously. It also provides standard errors for all estimates of rate constants and permits testing of smoothly parameterized subhypotheses of a general model. We have illustrated our approach by analysis of single channel data simulated on a computer and have described a procedure for analysis of experimental data.  相似文献   

7.
Abstract We present moments and likelihood methods that estimate a DNA substitution rate from a group of closely related sister species pairs separated at an assumed time, and we test these methods with simulations. The methods also estimate ancestral population size and can test whether there is a significant difference among the ancestral population sizes of the sister species pairs. Estimates presented in the literature often ignore the ancestral coalescent prior to speciation and therefore should be biased upward. The simulations show that both methods yield accurate estimates given sample sizes of five or more species pairs and that better likelihood estimates are obtained if there is no significant difference among ancestral population sizes. The model presented here indicates that the larger than expected variation found in multitaxa datasets can be explained by variation in the ancestral coalescence and the Poisson mutation process. In this context, observed variation can often be accounted for by variation in ancestral population sizes rather than invoking variation in other parameters, such as divergence time or mutation rate. The methods are applied to data from two groups of species pairs (sea urchins and Alpheus snapping shrimp) that are thought to have separated by the rise of Panama three million years ago.  相似文献   

8.
C L Addy  I M Longini  M Haber 《Biometrics》1991,47(3):961-974
A stochastic infectious disease model was developed by Ball (1986, Advances in Applied Probability 18, 289-310) in which the distribution of the length of the infectious period is allowed to have any distribution that can be described by its Laplace transform. We extend this model such that the infection can be transmitted within the population or from an unspecified source outside the population. Also, discrete heterogeneity in the population can be modeled to incorporate variable susceptibility, variable infectivity, and/or mixing behaviors. The model is fitted to serologic data from two influenza epidemics in Tecumseh, Michigan, using maximum likelihood estimation procedures. The estimates show a clustering pattern by age groups.  相似文献   

9.
We derive a new method to estimate the age specific incidence of an infection with a differential mortality, using individual level infection status data from successive surveys. The method consists of a) an SI-type model to express the incidence rate in terms of the prevalence and its derivatives as well as the difference in mortality rate, and b) a maximum likelihood approach to estimate the prevalence and its derivatives. Estimates can in principle be obtained for any chosen age and time, and no particular assumptions are made about the epidemiological or demographic context. This is in contrast with earlier methods for estimating incidence from prevalence data, which work with aggregated data, and the aggregated effect of demographic and epidemiological rates over the time interval between prevalence surveys. Numerical simulation of HIV epidemics, under the presumption of known excess mortality due to infection, shows improved control of bias and variance, compared to previous methods. Our analysis motivates for a) effort to be applied to obtain accurate estimates of excess mortality rates as a function of age and time among HIV infected individuals and b) use of individual level rather than aggregated data in order to estimate HIV incidence rates at times between two prevalence surveys.  相似文献   

10.
We evaluate the performance of maximum likelihood (ML) analysis of allele frequency data in a linear array of populations. The parameters are a mutation rate and either the dispersal rate in a stepping stone model or a dispersal rate and a scale parameter in a geometric dispersal model. An approximate procedure known as maximum product of approximate conditional (PAC) likelihood is found to perform as well as ML. Mis-specification biases may occur because the importance sampling algorithm is formally defined in term of mutation and migration rates scaled by the total size of the population, and this size may differ widely in the statistical model and in reality. As could be expected, ML generally performs well when the statistical model is correctly specified. Otherwise, mutation rate estimates are much closer to mutation probability scaled by number of demes in the statistical model than scaled by number of demes in reality when mutation probability is high and dispersal is most limited. This mis-specification bias actually has practical benefits. However, opposite results are found in opposite conditions. Migration rate estimates show roughly similar trends, but they may not always be easily interpreted as low-bias estimates of dispersal rate under any scaling. Estimation of the dispersal scale parameter is also affected by mis-specification of the number of demes, and the different biases compensate each other in such a way that good estimation of the so-called neighborhood size (or more precisely the product of population density and mean-squared parent-offspring dispersal distance) is achieved. Results congruent with these findings are found in an application to a damselfly data set.  相似文献   

11.

Objective

Develop a simple method for optimal estimation of HIV incidence using the BED capture enzyme immunoassay.

Design

Use existing BED data to estimate mean recency duration, false recency rates and HIV incidence with reference to a fixed time period, T.

Methods

Compare BED and cohort estimates of incidence referring to identical time frames. Generalize this approach to suggest a method for estimating HIV incidence from any cross-sectional survey.

Results

Follow-up and BED analyses of the same, initially HIV negative, cases followed over the same set time period T, produce estimates of the same HIV incidence, permitting the estimation of the BED mean recency period for cases who have been HIV positive for less than T. Follow-up of HIV positive cases over T, similarly, provides estimates of the false-recent rate appropriate for T. Knowledge of these two parameters for a given population allows the estimation of HIV incidence during T by applying the BED method to samples from cross-sectional surveys. An algorithm is derived for providing these estimates, adjusted for the false-recent rate. The resulting estimator is identical to one derived independently using a more formal mathematical analysis. Adjustments improve the accuracy of HIV incidence estimates. Negative incidence estimates result from the use of inappropriate estimates of the false-recent rate and/or from sampling error, not from any error in the adjustment procedure.

Conclusions

Referring all estimates of mean recency periods, false-recent rates and incidence estimates to a fixed period T simplifies estimation procedures and allows the development of a consistent method for producing adjusted estimates of HIV incidence of improved accuracy. Unadjusted BED estimates of incidence, based on life-time recency periods, would be both extremely difficult to produce and of doubtful value.  相似文献   

12.
Ambitious programs have recently been advocated or launched to create genomewide databases for meta-analysis of association between DNA markers and phenotypes of medical and/or social concern. A necessary but not sufficient condition for success in association mapping is that the data give accurate estimates of both genomic location and its standard error, which are provided for multifactorial phenotypes by composite likelihood. That class includes the Malecot model, which we here apply with an illustrative example. This preliminary analysis leads to five inferences: permutation of cases and controls provides a test of association free of autocorrelation; two hypotheses give similar estimates, but one is consistently more accurate; estimation of the false-discovery rate is extended to causal genes in a small proportion of regions; the minimal data for successful meta-analysis are inferred; and power is robust for all genomic factors except minor-allele frequency. An extension to meta-analysis is proposed. Other approaches to genome scanning and meta-analysis should, if possible, be similarly extended so that their operating characteristics can be compared.  相似文献   

13.
Longitudinal data usually consist of a number of short time series. A group of subjects or groups of subjects are followed over time and observations are often taken at unequally spaced time points, and may be at different times for different subjects. When the errors and random effects are Gaussian, the likelihood of these unbalanced linear mixed models can be directly calculated, and nonlinear optimization used to obtain maximum likelihood estimates of the fixed regression coefficients and parameters in the variance components. For binary longitudinal data, a two state, non-homogeneous continuous time Markov process approach is used to model serial correlation within subjects. Formulating the model as a continuous time Markov process allows the observations to be equally or unequally spaced. Fixed and time varying covariates can be included in the model, and the continuous time model allows the estimation of the odds ratio for an exposure variable based on the steady state distribution. Exact likelihoods can be calculated. The initial probability distribution on the first observation on each subject is estimated using logistic regression that can involve covariates, and this estimation is embedded in the overall estimation. These models are applied to an intervention study designed to reduce children's sun exposure.  相似文献   

14.
A method to estimate genetic variance components in populations partially pedigreed by DNA fingerprinting is presented. The focus is on aquaculture, where breeding procedures may produce thousands of individuals. In aquaculture populations the individuals available for measurement will often be selected, i.e. will come from the upper tail of a size‐at‐age distribution, or the lower tail of an age‐at‐maturity distribution etc. Selection typically occurs by size grading during grow‐out and/or choice of superior fish as broodstock. The method presented in this paper enables us to estimate genetic variance components when only a small proportion of individuals, those with extreme phenotypes, have been identified by DNA fingerprinting. We replace the usual normal density by appropriate robust least favourable densities to ensure the robustness of our estimates. Standard analysis of variance or maximum likelihood estimation cannot be used when only the extreme progeny have been pedigreed because of the biased nature of the estimates. In our model‐based procedure a full robust likelihood function is defined, in which the missing information about non‐extreme progeny has been taken into account. This robust likelihood function is transformed into a computable function which is maximized to get the estimates. The estimates of sire and dam additive variance components are significantly and uniformly more accurate than those obtained by any of the standard methods when tested on simulated population data and have desirable robustness properties.  相似文献   

15.
Breslow (1984) described an efficient score test for trend in incidence density rate ratios for cohort studies under a conditional Poisson or binomial model employing maximum likelihood estimation of the rate parameters. In this communication, an alternative derivation of this statistic that is based on an unconditional approach is provided, along with an examination of associated goodness-of-fit tests and methods of confidence interval estimation. The procedures are illustrated by a cohort study of ischemic heart disease mortality following industrial exposure to carbon disulfide.  相似文献   

16.
Summary Logistic regression is an important statistical procedure used in many disciplines. The standard software packages for data analysis are generally equipped with this procedure where the maximum likelihood estimates of the regression coefficients are obtained iteratively. It is well known that the estimates from the analyses of small‐ or medium‐sized samples are biased. Also, in finding such estimates, often a separation is encountered in which the likelihood converges but at least one of the parameter estimates diverges to infinity. Standard approaches of finding such estimates do not take care of these problems. Moreover, the missingness in the covariates adds an extra layer of complexity to the whole process. In this article, we address these three practical issues—bias, separation, and missing covariates by means of simple adjustments. We have applied the proposed technique using real and simulated data. The proposed method always finds a solution and the estimates are less biased. A SAS macro that implements the proposed method can be obtained from the authors.  相似文献   

17.
Multi-trait (co)variance estimation is an important topic in plant and animal breeding. In this study we compare estimates obtained with restricted maximum likelihood (REML) and Bayesian Gibbs sampling of simulated data and of three traits (diameter, height and branch angle) from a 26-year-old partial diallel progeny test of Scots pine (Pinus sylvestris L.). Based on the results from the simulated data we can conclude that the REML estimates are accurate but the mode of posterior distributions from the Gibbs sampling can be overestimated depending on the level of the heritability. The mean and median of the posteriors were considerably higher than the expected values of the heritabilities. The confidence intervals calculated with the delta method were biased downwardly. The highest probablity density (HPD) interval provides a better interval estimate, but could be slightly biased at the lower level. Similar differences between REML and Gibbs sampling estimates were found for the Scots pine data. We conclude that further simulation studies are needed in order to evaluate the effect of different priors on (co)variance components in the genetic individual model.  相似文献   

18.
Mitochondrial D-loop hypervariable region I (HVI) sequences are widely used in human molecular evolutionary studies, and therefore accurate assessment of rate heterogeneity among sites is essential. We used the maximum-likelihood method to estimate the gamma shape parameter alpha for variable substitution rates among sites for HVI from humans and chimpanzees to provide estimates for future studies. The complete data of 839 humans and 224 chimpanzees, as well as many subsets of these data, were analyzed to examine the effect of sequence sampling. The effects of the genealogical tree and the nucleotide substitution model were also examined. The transition/transversion rate ratio (kappa) is estimated to be about 25, although much larger and biased estimates were also obtained from small data sets at low divergences. Estimates of alpha were 0.28-0.39 for human data sets of different sizes and 0.20-0.39 for data sets including different chimpanzee subspecies. The combined data set of both species gave estimates of 0.42-0.45. While all those estimates suggest highly variable substitution rates among sites, smaller samples tend to give smaller estimates of alpha. Possible causes for this pattern were examined, such as biases in the estimation procedure and shifts in the rate distribution along certain lineages. Computer simulations suggest that the estimation procedure is quite reliable for large trees but can be biased for small samples at low divergences. Thus, an alpha of 0.4 appears suitable for both humans and chimpanzees. Estimates of alpha can be affected by the nucleotide sites included in the data, the overall tree length (the amount of sequence divergence), the number of rate classes used for the estimation, and to a lesser extent, the included sequences. The genealogical tree, the substitution model, and demographic processes such as population expansion do not have much effect.  相似文献   

19.
Researchers are often interested in predicting outcomes, detecting distinct subgroups of their data, or estimating causal treatment effects. Pathological data distributions that exhibit skewness and zero‐inflation complicate these tasks—requiring highly flexible, data‐adaptive modeling. In this paper, we present a multipurpose Bayesian nonparametric model for continuous, zero‐inflated outcomes that simultaneously predicts structural zeros, captures skewness, and clusters patients with similar joint data distributions. The flexibility of our approach yields predictions that capture the joint data distribution better than commonly used zero‐inflated methods. Moreover, we demonstrate that our model can be coherently incorporated into a standardization procedure for computing causal effect estimates that are robust to such data pathologies. Uncertainty at all levels of this model flow through to the causal effect estimates of interest—allowing easy point estimation, interval estimation, and posterior predictive checks verifying positivity, a required causal identification assumption. Our simulation results show point estimates to have low bias and interval estimates to have close to nominal coverage under complicated data settings. Under simpler settings, these results hold while incurring lower efficiency loss than comparator methods. We use our proposed method to analyze zero‐inflated inpatient medical costs among endometrial cancer patients receiving either chemotherapy or radiation therapy in the SEER‐Medicare database.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号