首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 640 毫秒
1.
Benchmark dose calculation from epidemiological data   总被引:7,自引:0,他引:7  
A threshold for dose-dependent toxicity is crucial for standards setting but may not be possible to specify from empirical studies. Crump (1984) instead proposed calculating the lower statistical confidence bound of the benchmark dose, which he defined as the dose that causes a small excess risk. This concept has several advantages and has been adopted by regulatory agencies for establishing safe exposure limits for toxic substances such as mercury. We have examined the validity of this method as applied to an epidemiological study of continuous response data associated with mercury exposure. For models that are linear in the parameters, we derived an approximative expression for the lower confidence bound of the benchmark dose. We find that the benchmark calculations are highly dependent on the choice of the dose-effect function and the definition of the benchmark dose. We therefore recommend that several sets of biologically relevant default settings be used to illustrate the effect on the benchmark results and to stimulate research that will guide an a priori choice of proper default settings.  相似文献   

2.
In risk assessment, it is often desired to make inferences on the low dose levels at which a specific benchmark risk is attained. Applications of simultaneous hyperbolic confidence bands for low‐dose risk estimation with quantal data under different dose‐response models (multistage, Abbott‐adjusted Weibull, and Abbott‐adjusted log‐logistic models) have appeared in the literature. The use of simultaneous three‐segment bands under the multistage model has also been proposed recently. In this article, we present explicit formulas for constructing asymptotic one‐sided simultaneous hyperbolic and three‐segment bands for the simple log‐logistic regression model. We use the simultaneous construction to estimate upper hyperbolic and three‐segment confidence bands on extra risk and to obtain lower limits on the benchmark dose by inverting the upper bands on risk under the Abbott‐adjusted log‐logistic model. Monte Carlo simulations evaluate the characteristics of the simultaneous limits. An example is given to illustrate the use of the proposed methods and to compare the two types of simultaneous limits at very low dose levels.  相似文献   

3.
针对DNA序列编码区的识别问题,本研究提出一个特征向量和逻辑回归的组合模型。首先对DNA序列进行数值处理转化为特征向量,并结合k字符相对频率技术提取特征向量的元素特征,之后利用二分类逻辑回归算法,对编码区和非编码区进行准确区分。选取了HMR195和BG570两个基准数据集进行五折交叉验证,结果表明,平均AUC(Area Under Curve)值分别为0.981 3和0.987 4,明显优于传统的贝叶斯判别法和VOSSDFT等方法。此外,本文提出的特征向量的维度很低,提高了运算效率。因此,本文组合模型能够较为高效准确地识别蛋白质编码区。  相似文献   

4.
Two-phase designs can reduce the cost of epidemiological studies by limiting the ascertainment of expensive covariates or/and exposures to an efficiently selected subset (phase-II) of a larger (phase-I) study. Efficient analysis of the resulting data set combining disparate information from phase-I and phase-II, however, can be complex. Most of the existing methods, including semiparametric maximum-likelihood estimator, require the information in phase-I to be summarized into a fixed number of strata. In this paper, we describe a novel method for the analysis of two-phase studies where information from phase-I is summarized by parameters associated with a reduced logistic regression model of the disease outcome on available covariates. We then setup estimating equations for parameters associated with the desired extended logistic regression model, based on information on the reduced model parameters from phase-I and complete data available at phase-II after accounting for nonrandom sampling design. We use generalized method of moments to solve overly identified estimating equations and develop the resulting asymptotic theory for the proposed estimator. Simulation studies show that the use of reduced parametric models, as opposed to summarizing data into strata, can lead to more efficient utilization of phase-I data. An application of the proposed method is illustrated using the data from the U.S. National Wilms Tumor Study.  相似文献   

5.
Genetic information, such as single nucleotide polymorphism (SNP) data, has been widely recognized as useful in prediction of disease risk. However, how to model the genetic data that is often categorical in disease class prediction is complex and challenging. In this paper, we propose a novel class of nonlinear threshold index logistic models to deal with the complex, nonlinear effects of categorical/discrete SNP covariates for Schizophrenia class prediction. A maximum likelihood methodology is suggested to estimate the unknown parameters in the models. Simulation studies demonstrate that the proposed methodology works viably well for moderate-size samples. The suggested approach is therefore applied to the analysis of the Schizophrenia classification by using a real set of SNP data from Western Australian Family Study of Schizophrenia (WAFSS). Our empirical findings provide evidence that the proposed nonlinear models well outperform the widely used linear and tree based logistic regression models in class prediction of schizophrenia risk with SNP data in terms of both Types I/II error rates and ROC curves.  相似文献   

6.
Yu ZF  Catalano PJ 《Biometrics》2005,61(3):757-766
The neurotoxic effects of chemical agents are often investigated in controlled studies on rodents, with multiple binary and continuous endpoints routinely collected. One goal is to conduct quantitative risk assessment to determine safe dose levels. Such studies face two major challenges for continuous outcomes. First, characterizing risk and defining a benchmark dose are difficult. Usually associated with an adverse binary event, risk is clearly definable in quantal settings as presence or absence of an event; finding a similar probability scale for continuous outcomes is less clear. Often, an adverse event is defined for continuous outcomes as any value below a specified cutoff level in a distribution assumed normal or log normal. Second, while continuous outcomes are traditionally analyzed separately for such studies, recent literature advocates also using multiple outcomes to assess risk. We propose a method for modeling and quantitative risk assessment for bivariate continuous outcomes that address both difficulties by extending existing percentile regression methods. The model is likelihood based; it allows separate dose-response models for each outcome while accounting for the bivariate correlation and overall characterization of risk. The approach to estimation of a benchmark dose is analogous to that for quantal data without the need to specify arbitrary cutoff values. We illustrate our methods with data from a neurotoxicity study of triethyl tin exposure in rats.  相似文献   

7.
Random-effects models for serial observations with binary response   总被引:9,自引:0,他引:9  
R Stiratelli  N Laird  J H Ware 《Biometrics》1984,40(4):961-971
This paper presents a general mixed model for the analysis of serial dichotomous responses provided by a panel of study participants. Each subject's serial responses are assumed to arise from a logistic model, but with regression coefficients that vary between subjects. The logistic regression parameters are assumed to be normally distributed in the population. Inference is based upon maximum likelihood estimation of fixed effects and variance components, and empirical Bayes estimation of random effects. Exact solutions are analytically and computationally infeasible, but an approximation based on the mode of the posterior distribution of the random parameters is proposed, and is implemented by means of the EM algorithm. This approximate method is compared with a simpler two-step method proposed by Korn and Whittemore (1979, Biometrics 35, 795-804), using data from a panel study of asthmatics originally described in that paper. One advantage of the estimation strategy described here is the ability to use all of the data, including that from subjects with insufficient data to permit fitting of a separate logistic regression model, as required by the Korn and Whittemore method. However, the new method is computationally intensive.  相似文献   

8.
OBJECTIVES: To assess the ability of a biomechanical impact model to predict the likelihood of distal radius fracture in children using data gathered for a previous epidemiological case-control study of falls from playground equipment. METHODOLOGY: Factor of Risk (FR) values were generated for each of selected subjects from the case-control study using a biomechanical model. Logistic regression curves were fitted to examine the relationship between the FR values and the probability of radius fracture. RESULTS: Forty-five cases and thirty-one controls were selected. The logistic regression analyses showed a significant association between the probability of fracture and FR. CONCLUSIONS: The biomechanical model distinguished between children who fractured their distal radius and those who did not. The model can be used to test how risk factors, such as fall height and ground surface type, affect physical stresses transmitted through the arm and their relation to the fracture tolerance of the distal radius.  相似文献   

9.
Motivated by a clinical prediction problem, a simulation study was performed to compare different approaches for building risk prediction models. Robust prediction models for hospital survival in patients with acute heart failure were to be derived from three highly correlated blood parameters measured up to four times, with predictive ability having explicit priority over interpretability. Methods that relied only on the original predictors were compared with methods using an expanded predictor space including transformations and interactions. Predictors were simulated as transformations and combinations of multivariate normal variables which were fitted to the partly skewed and bimodally distributed original data in such a way that the simulated data mimicked the original covariate structure. Different penalized versions of logistic regression as well as random forests and generalized additive models were investigated using classical logistic regression as a benchmark. Their performance was assessed based on measures of predictive accuracy, model discrimination, and model calibration. Three different scenarios using different subsets of the original data with different numbers of observations and events per variable were investigated. In the investigated setting, where a risk prediction model should be based on a small set of highly correlated and interconnected predictors, Elastic Net and also Ridge logistic regression showed good performance compared to their competitors, while other methods did not lead to substantial improvements or even performed worse than standard logistic regression. Our work demonstrates how simulation studies that mimic relevant features of a specific data set can support the choice of a good modeling strategy.  相似文献   

10.
Different studies have demonstrated the importance of comorbidities to better understand the origin and evolution of medical complications. This study focuses on improvement of the predictive model interpretability based on simple logical features representing comorbidities. We use group lasso based feature interaction discovery followed by a post-processing step, where simple logic terms are added. In the final step, we reduce the feature set by applying lasso logistic regression to obtain a compact set of non-zero coefficients that represent a more comprehensible predictive model. The effectiveness of the proposed approach was demonstrated on a pediatric hospital discharge dataset that was used to build a readmission risk estimation model. The evaluation of the proposed method demonstrates a reduction of the initial set of features in a regression model by 72%, with a slight improvement in the Area Under the ROC Curve metric from 0.763 (95% CI: 0.755–0.771) to 0.769 (95% CI: 0.761–0.777). Additionally, our results show improvement in comprehensibility of the final predictive model using simple comorbidity based terms for logistic regression.  相似文献   

11.
An estimate of the risk, adjusted for confounders, can be obtained from a fitted logistic regression model, but it substantially over-estimates when the outcome is not rare. The log binomial model, binomial errors and log link, is increasingly being used for this purpose. However this model's performance, goodness of fit tests and case-wise diagnostics have not been studied. Extensive simulations are used to compare the performance of the log binomial, a logistic regression based method proposed by Schouten et al. (1993) and a Poisson regression approach proposed by Zou (2004) and Carter, Lipsitz, and Tilley (2005). Log binomial regression resulted in "failure" rates (non-convergence, out-of-bounds predicted probabilities) as high as 59%. Estimates by the method of Schouten et al. (1993) produced fitted log binomial probabilities greater than unity in up to 19% of samples to which a log binomial model had been successfully fit and in up to 78% of samples when the log binomial model fit failed. Similar percentages were observed for the Poisson regression approach. Coefficient and standard error estimates from the three models were similar. Rejection rates for goodness of fit tests for log binomial fit were around 5%. Power of goodness of fit tests was modest when an incorrect logistic regression model was fit. Examples demonstrate the use of the methods. Uncritical use of the log binomial regression model is not recommended.  相似文献   

12.
《Small Ruminant Research》2008,79(1-3):197-201
A cross-sectional study was performed to investigate some epidemiological aspects of foot and mouth disease (FMD) and paratuberculosis in small ruminant flocks located in two governorates in Southern Jordan. A total of 320 sheep and 300 goats from 38 and 24, sheep and goat flocks, respectively, were randomly sampled and assayed for presence of antibodies against FMD virus and Mycobacterium paratuberculosis using commercially available kits. A structured pre-tested questionnaire was administered to collect information on flocks’ health and management. A multivariable logistic regression model was constructed to investigate risk factors associated with seropositivity to the two studied diseases. The individual prevalence of FMD and paratuberculosis in sheep was 10.4 and 22.1%, respectively. The sheep flock level seroprevalence for FMD and paratuberculosis was 44.7 and 50%, respectively. In goats, the individual prevalence of FMD and paratuberculosis was 6.3 and 18.1%, respectively. The goat flock level seroprevalence for FMD and paratuberculosis was 33.3 and 45.8%, respectively. The logistic regression model revealed mixed farming as a common risk factor for both FMD and paratuberculosis. Grazing in communal areas and addition of new animals were identified as risk factors for paratuberculosis.  相似文献   

13.
Multiple logistic regression analysis is used to estimate the relative risk in case control studies. The estimators obtained are valid when disease is rare. In this paper an estimator of relative risk in a case control study has been proposed using logistic regression results when the incidence of disease is not small. The bias of the usual estimator through logistic regression as compared to the new estimator has been worked out. The expression of Mean Square Error of proposed estimator has been derived in situations when the incidence of disease is known exactly as well as when estimated through an independent survey. It has been observed that there is a significant bias using the conventional estimator of relative risk when incidence of disease is high. In such situations the proposed estimator can be used with advantage.  相似文献   

14.
Semiparametric analysis of zero-inflated count data   总被引:1,自引:0,他引:1  
Lam KF  Xue H  Cheung YB 《Biometrics》2006,62(4):996-1003
Medical and public health research often involve the analysis of count data that exhibit a substantially large proportion of zeros, such as the number of heart attacks and the number of days of missed primary activities in a given period. A zero-inflated Poisson regression model, which hypothesizes a two-point heterogeneity in the population characterized by a binary random effect, is generally used to model such data. Subjects are broadly categorized into the low-risk group leading to structural zero counts and high-risk (or normal) group so that the counts can be modeled by a Poisson regression model. The main aim is to identify the explanatory variables that have significant effects on (i) the probability that the subject is from the low-risk group by means of a logistic regression formulation; and (ii) the magnitude of the counts, given that the subject is from the high-risk group by means of a Poisson regression where the effects of the covariates are assumed to be linearly related to the natural logarithm of the mean of the counts. In this article we consider a semiparametric zero-inflated Poisson regression model that postulates a possibly nonlinear relationship between the natural logarithm of the mean of the counts and a particular covariate. A sieve maximum likelihood estimation method is proposed. Asymptotic properties of the proposed sieve maximum likelihood estimators are discussed. Under some mild conditions, the estimators are shown to be asymptotically efficient and normally distributed. Simulation studies were carried out to investigate the performance of the proposed method. For illustration purpose, the method is applied to a data set from a public health survey conducted in Indonesia where the variable of interest is the number of days of missed primary activities due to illness in a 4-week period.  相似文献   

15.
For modelling dose-response relationships in case-control studies the multiplicative logistic regression model, assuming the relative risk to be an exponential function of the dose, is widely known. If the relative risk is assumed to be a linear function of the dose, several authors (see e.g. BERRY (1980)) have proposed an additive (linear) model. This model has a better fit with the data if such a linear relation holds. Confidence limits for the relative risk derived from the information matrix, however, appear to be rather inaccurate. Therefore, use of the ‘standard’ logistic model in two different ways was studied: extension with a quadratic term or a logarithmic transformation of the dose. By applying the methods both to an empirical data set and in a simulation experiment, it is shown that appropriate transformation (often logarithmic) of the dosage and then applying the ‘standard’ logistic model is an useful approach if a linear dose-response relationship holds.  相似文献   

16.
High-dimensional biomarker data are often collected in epidemiological studies when assessing the association between biomarkers and human disease is of interest. We develop a latent class modeling approach for joint analysis of high-dimensional semicontinuous biomarker data and a binary disease outcome. To model the relationship between complex biomarker expression patterns and disease risk, we use latent risk classes to link the 2 modeling components. We characterize complex biomarker-specific differences through biomarker-specific random effects, so that different biomarkers can have different baseline (low-risk) values as well as different between-class differences. The proposed approach also accommodates data features that are common in environmental toxicology and other biomarker exposure data, including a large number of biomarkers, numerous zero values, and complex mean-variance relationship in the biomarkers levels. A Monte Carlo EM (MCEM) algorithm is proposed for parameter estimation. Both the MCEM algorithm and model selection procedures are shown to work well in simulations and applications. In applying the proposed approach to an epidemiological study that examined the relationship between environmental polychlorinated biphenyl (PCB) exposure and the risk of endometriosis, we identified a highly significant overall effect of PCB concentrations on the risk of endometriosis.  相似文献   

17.
OBJECTIVE: p Values are inaccurate for model-free linkage analysis using the conditional logistic model if we assume that the LOD score is asymptotically distributed as a simple mixture of chi-square distributions. When analyzing affected relative pairs alone, permuting the allele sharing of relative pairs does not lead to a useful permutation distribution. As an alternative, we have developed regression prediction models that provide more accurate p values. METHODS: Let E(alpha) be the empirical p value, which is the proportion of statistical tests whose LOD score under the null hypothesis exceeds a threshold determined by alpha, the nominal single test significance value. We used simulated data to obtain values of E(alpha) and compared them with alpha. We also developed a regression model, based on sample size, number of covariates in the model, alpha and marker density, to derive predicted p values for both single-point and multipoint analyses. To evaluate our predictions we used another set of simulated data, comparing the Ealpha for these data with those obtained by using the prediction model, referred to as predicted p values (P(alpha)). RESULTS: Under almost all circumstances the values of P(alpha) were closer to the E(alpha) than were the values of alpha. CONCLUSION: The regression models suggested by our analysis provide more accurate alternative p values for model-free linkage analysis when using the conditional logistic model.  相似文献   

18.
We introduce a new method, moment reconstruction, of correcting for measurement error in covariates in regression models. The central idea is similar to regression calibration in that the values of the covariates that are measured with error are replaced by "adjusted" values. In regression calibration the adjusted value is the expectation of the true value conditional on the measured value. In moment reconstruction the adjusted value is the variance-preserving empirical Bayes estimate of the true value conditional on the outcome variable. The adjusted values thereby have the same first two moments and the same covariance with the outcome variable as the unobserved "true" covariate values. We show that moment reconstruction is equivalent to regression calibration in the case of linear regression, but leads to different results for logistic regression. For case-control studies with logistic regression and covariates that are normally distributed within cases and controls, we show that the resulting estimates of the regression coefficients are consistent. In simulations we demonstrate that for logistic regression, moment reconstruction carries less bias than regression calibration, and for case-control studies is superior in mean-square error to the standard regression calibration approach. Finally, we give an example of the use of moment reconstruction in linear discriminant analysis and a nonstandard problem where we wish to adjust a classification tree for measurement error in the explanatory variables.  相似文献   

19.
Eumelanins are brown-black pigments present in the hair and in the epidermis which are acknowledged as protection factors against cell damage caused by ultraviolet radiation. The quantity of eumelanin present in hair has recently been put forward as a means of identifying subjects with a higher risk of skin tumours. For epidemiological studies, chromatographic methods of determining pyrrole-2,3,5-tricarboxylic acid (PTCA; the principal marker of eumelanin) are long, laborious and unsuitable for screening large populations. We suggest near infrared (NIR) spectroscopy as an alternative method of analysing eumelanin in hair samples. PCTA was determined on 93 samples of hair by means of oxidizing with hydrogen peroxide in a basic environment followed by chromatographic separation. The same 93 samples were then subjected to NIR spectrophotometric analysis. The spectra were obtained in reflectance mode on hair samples which had not undergone any preliminary treatment, but had simply been pressed and placed on the measuring window of the spectrophotometer. The PTCA values obtained by means of HPLC were correlated with the near infrared spectrum of the respective samples. A correlation between the PTCA values obtained by means of HPLC and the PTCA values obtained from an analysis of the spectra was obtained using the principal component regression (PCR) algorithm. The correlation obtained has a coefficient of regression (R(2)) of 0.89 and a standard error of prediction (SEP) of 13.8 for a mean value of 108.6 ng PTCA/mg hair. Some considerations about the accuracy of the obtained correlation and the main sources of error are made and some validation results are shown.  相似文献   

20.
Genetic case-control association studies often include data on clinical covariates, such as body mass index (BMI), smoking status, or age, that may modify the underlying genetic risk of case or control samples. For example, in type 2 diabetes, odds ratios for established variants estimated from low–BMI cases are larger than those estimated from high–BMI cases. An unanswered question is how to use this information to maximize statistical power in case-control studies that ascertain individuals on the basis of phenotype (case-control ascertainment) or phenotype and clinical covariates (case-control-covariate ascertainment). While current approaches improve power in studies with random ascertainment, they often lose power under case-control ascertainment and fail to capture available power increases under case-control-covariate ascertainment. We show that an informed conditioning approach, based on the liability threshold model with parameters informed by external epidemiological information, fully accounts for disease prevalence and non-random ascertainment of phenotype as well as covariates and provides a substantial increase in power while maintaining a properly controlled false-positive rate. Our method outperforms standard case-control association tests with or without covariates, tests of gene x covariate interaction, and previously proposed tests for dealing with covariates in ascertained data, with especially large improvements in the case of case-control-covariate ascertainment. We investigate empirical case-control studies of type 2 diabetes, prostate cancer, lung cancer, breast cancer, rheumatoid arthritis, age-related macular degeneration, and end-stage kidney disease over a total of 89,726 samples. In these datasets, informed conditioning outperforms logistic regression for 115 of the 157 known associated variants investigated (P-value = 1×10−9). The improvement varied across diseases with a 16% median increase in χ2 test statistics and a commensurate increase in power. This suggests that applying our method to existing and future association studies of these diseases may identify novel disease loci.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号