首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Generalized linear models (GLM) with a canonical logit link function are the primary modeling technique used to relate a binary outcome to predictor variables. However, noncanonical links can offer more flexibility, producing convenient analytical quantities (e.g., probit GLMs in toxicology) and desired measures of effect (e.g., relative risk from log GLMs). Many summary goodness‐of‐fit (GOF) statistics exist for logistic GLM. Their properties make the development of GOF statistics relatively straightforward, but it can be more difficult under noncanonical links. Although GOF tests for logistic GLM with continuous covariates (GLMCC) have been applied to GLMCCs with log links, we know of no GOF tests in the literature specifically developed for GLMCCs that can be applied regardless of link function chosen. We generalize the Tsiatis GOF statistic originally developed for logistic GLMCCs, (), so that it can be applied under any link function. Further, we show that the algebraically related Hosmer–Lemeshow () and Pigeon–Heyse (J2) statistics can be applied directly. In a simulation study, , , and J2 were used to evaluate the fit of probit, log–log, complementary log–log, and log models, all calculated with a common grouping method. The statistic consistently maintained Type I error rates, while those of and J2 were often lower than expected if terms with little influence were included. Generally, the statistics had similar power to detect an incorrect model. An exception occurred when a log GLMCC was incorrectly fit to data generated from a logistic GLMCC. In this case, had more power than or J2.  相似文献   

2.
Biomarkers are subject to censoring whenever some measurements are not quantifiable given a laboratory detection limit. Methods for handling censoring have received less attention in genetic epidemiology, and censored data are still often replaced with a fixed value. We compared different strategies for handling a left‐censored continuous biomarker in a family‐based study, where the biomarker is tested for association with a genetic variant, , adjusting for a covariate, X. Allowing different correlations between X and , we compared simple substitution of censored observations with the detection limit followed by a linear mixed effect model (LMM), Bayesian model with noninformative priors, Tobit model with robust standard errors, the multiple imputation (MI) with and without in the imputation followed by a LMM. Our comparison was based on real and simulated data in which 20% and 40% censoring were artificially induced. The complete data were also analyzed with a LMM. In the MICROS study, the Bayesian model gave results closer to those obtained with the complete data. In the simulations, simple substitution was always the most biased method, the Tobit approach gave the least biased estimates at all censoring levels and correlation values, the Bayesian model and both MI approaches gave slightly biased estimates but smaller root mean square errors. On the basis of these results the Bayesian approach is highly recommended for candidate gene studies; however, the computationally simpler Tobit and the MI without are both good options for genome‐wide studies.  相似文献   

3.
When establishing a treatment in clinical trials, it is important to evaluate both effectiveness and toxicity. In phase II clinical trials, multinomial data are collected in m‐stage designs, especially in two‐stage () design. Exact tests on two proportions, for the response rate and for the nontoxicity rate, should be employed due to limited sample sizes. However, existing tests use certain parameter configurations at the boundary of null hypothesis space to determine rejection regions without showing that the maximum Type I error rate is achieved at the boundary of null hypothesis. In this paper, we show that the power function for each test in a large family of tests is nondecreasing in both and ; identify the parameter configurations at which the maximum Type I error rate and the minimum power are achieved and derive level‐α tests; provide optimal two‐stage designs with the least expected total sample size and the optimization algorithm; and extend the results to the case of . Some R‐codes are given in the Supporting Information.  相似文献   

4.
In scientific research, many hypotheses relate to the comparison of two independent groups. Usually, it is of interest to use a design (i.e., the allocation of sample sizes m and n for fixed ) that maximizes the power of the applied statistical test. It is known that the two‐sample t‐tests for homogeneous and heterogeneous variances may lose substantial power when variances are unequal but equally large samples are used. We demonstrate that this is not the case for the nonparametric Wilcoxon–Mann–Whitney‐test, whose application in biometrical research fields is motivated by two examples from cancer research. We prove the optimality of the design in case of symmetric and identically shaped distributions using normal approximations and show that this design generally offers power only negligibly lower than the optimal design for a wide range of distributions.  相似文献   

5.
Few articles have been written on analyzing three‐way interactions between drugs. It may seem to be quite straightforward to extend a statistical method from two‐drugs to three‐drugs. However, there may exist more complex nonlinear response surface of the interaction index () with more complex local synergy and/or local antagonism interspersed in different regions of drug combinations in a three‐drug study, compared in a two‐drug study. In addition, it is not possible to obtain a four‐dimensional (4D) response surface plot for a three‐drug study. We propose an analysis procedure to construct the dose combination regions of interest (say, the synergistic areas with ). First, use the model robust regression method (MRR), a semiparametric method, to fit the entire response surface of the , which allows to fit a complex response surface with local synergy/antagonism. Second, we run a modified genetic algorithm (MGA), a stochastic optimization method, many times with different random seeds, to allow to collect as many feasible points as possible that satisfy the estimated values of . Last, all these feasible points are used to construct the approximate dose regions of interest in a 3D. A case study with three anti‐cancer drugs in an in vitro experiment is employed to illustrate how to find the dose regions of interest.  相似文献   

6.
Matched case‐control paired data are commonly used to study the association between a disease and an exposure of interest. This work provides a consistent test for this association with respect to the conditional odds ratio (), which is a measure of association that is also valid in prospective studies. We formulate the test from the maximum likelihood (ML) estimate of by using data under inverse binomial sampling, in which individuals are selected sequentially to form matched pairs until for the first time one obtains either a prefixed number of index pairs with the case unexposed but the control exposed or with the case exposed but the control unexposed. We discuss the situation of possible early stopping. We compare numerically the performance of our procedure with a competitor proposed by Lui ( 1996 ) in terms of type I error rate, power, average sample number (ASN) and the corresponding standard error. Our numerical study shows a gain in sample size without loss in power as compared to the competitor. Finally, we use the data taken from a case‐control study on the use of X‐rays and the risk of childhood acute myeloid leukemia for illustration.  相似文献   

7.
We develop nonparametric maximum likelihood estimation for the parameters of an irreversible Markov chain on states from the observations with interval censored times of 0 → 1, 0 → 2 and 1 → 2 transitions. The distinguishing aspect of the data is that, in addition to all transition times being interval censored, the times of two events (0 → 1 and 1 → 2 transitions) can be censored into the same interval. This development was motivated by a common data structure in oral health research, here specifically illustrated by the data from a prospective cohort study on the longevity of dental veneers. Using the self‐consistency algorithm we obtain the maximum likelihood estimators of the cumulative incidences of the times to events 1 and 2 and of the intensity of the 1 → 2 transition. This work generalizes previous results on the estimation in an “illness‐death” model from interval censored observations.  相似文献   

8.
We study bias arising as a result of nonlinear transformations of random variables in random or mixed effects models and its effect on inference in group‐level studies or in meta‐analysis. The findings are illustrated on the example of overdispersed binomial distributions, where we demonstrate considerable biases arising from standard log‐odds and arcsine transformations of the estimated probability , both for single‐group studies and in combining results from several groups or studies in meta‐analysis. Our simulations confirm that these biases are linear in ρ, for small values of ρ, the intracluster correlation coefficient. These biases do not depend on the sample sizes or the number of studies K in a meta‐analysis and result in abysmal coverage of the combined effect for large K. We also propose bias‐correction for the arcsine transformation. Our simulations demonstrate that this bias‐correction works well for small values of the intraclass correlation. The methods are applied to two examples of meta‐analyses of prevalence.  相似文献   

9.
In this paper, we introduce a new estimator of a percentile residual life function with censored data under a monotonicity constraint. Specifically, it is assumed that the percentile residual life is a decreasing function. This assumption is useful when estimating the percentile residual life of units, which degenerate with age. We establish a law of the iterated logarithm for the proposed estimator, and its ‐equivalence to the unrestricted estimator. The asymptotic normal distribution of the estimator and its strong approximation to a Gaussian process are also established. We investigate the finite sample performance of the monotone estimator in an extensive simulation study. Finally, data from a clinical trial in primary biliary cirrhosis of the liver are analyzed with the proposed methods. One of the conclusions of our work is that the restricted estimator may be much more efficient than the unrestricted one.  相似文献   

10.
Regression modelling is a powerful statistical tool often used in biomedical and clinical research. It could be formulated as an inverse problem that measures the discrepancy between the target outcome and the data produced by representation of the modelled predictors. This approach could simultaneously perform variable selection and coefficient estimation. We focus particularly on a linear regression issue, , where is the parameter of interest and its components are the regression coefficients. The inverse problem finds an estimate for the parameter , which is mapped by the linear operator to the observed outcome data . This problem could be conveyed by finding a solution in the affine subspace . However, in the presence of collinearity, high-dimensional data and high conditioning number of the related covariance matrix, the solution may not be unique, so the introduction of prior information to reduce the subset and regularize the inverse problem is needed. Informed by Huber's robust statistics framework, we propose an optimal regularizer to the regression problem. We compare results of the proposed method and other penalized regression regularization methods: ridge, lasso, adaptive-lasso and elastic-net under different strong hypothesis such as high conditioning number of the covariance matrix and high error amplitude, on both simulated and real data from the South London Stroke Register. The proposed approach can be extended to mixed regression models. Our inverse problem framework coupled with robust statistics methodology offer new insights in statistical regression and learning. It could open a new research development for model fitting and learning.  相似文献   

11.
Many approaches for variable selection with multiply imputed data in the development of a prognostic model have been proposed. However, no method prevails as uniformly best. We conducted a simulation study with a binary outcome and a logistic regression model to compare two classes of variable selection methods in the presence of MI data: (I) Model selection on bootstrap data, using backward elimination based on AIC or lasso, and fit the final model based on the most frequently (e.g. ) selected variables over all MI and bootstrap data sets; (II) Model selection on original MI data, using lasso. The final model is obtained by (i) averaging estimates of variables that were selected in any MI data set or (ii) in 50% of the MI data; (iii) performing lasso on the stacked MI data, and (iv) as in (iii) but using individual weights as determined by the fraction of missingness. In all lasso models, we used both the optimal penalty and the 1‐se rule. We considered recalibrating models to correct for overshrinkage due to the suboptimal penalty by refitting the linear predictor or all individual variables. We applied the methods on a real dataset of 951 adult patients with tuberculous meningitis to predict mortality within nine months. Overall, applying lasso selection with the 1‐se penalty shows the best performance, both in approach I and II. Stacking MI data is an attractive approach because it does not require choosing a selection threshold when combining results from separate MI data sets  相似文献   

12.
In this work we propose the use of functional data analysis (FDA) to deal with a very large dataset of atmospheric aerosol size distribution resolved in both space and time. Data come from a mobile measurement platform in the town of Perugia (Central Italy). An OPC (Optical Particle Counter) is integrated on a cabin of the Minimetrò, an urban transportation system, that moves along a monorail on a line transect of the town. The OPC takes a sample of air every six seconds and counts the number of particles of urban aerosols with a diameter between 0.28 m and 10 m and classifies such particles into 21 size bins according to their diameter. Here, we adopt a 2D functional data representation for each of the 21 spatiotemporal series. In fact, space is unidimensional since it is measured as the distance on the monorail from the base station of the Minimetrò. FDA allows for a reduction of the dimensionality of each dataset and accounts for the high space‐time resolution of the data. Functional cluster analysis is then performed to search for similarities among the 21 size channels in terms of their spatiotemporal pattern. Results provide a good classification of the 21 size bins into a relatively small number of groups (between three and four) according to the season of the year. Groups including coarser particles have more similar patterns, while those including finer particles show a more different behavior according to the period of the year. Such features are consistent with the physics of atmospheric aerosol and the highlighted patterns provide a very useful ground for prospective model‐based studies.  相似文献   

13.
In recent years accelerometers have become widely used to objectively assess physical activity. Usually intensity ranges are assigned to the measured accelerometer counts by simple cut points, disregarding the underlying activity pattern. Under the assumption that physical activity can be seen as distinct sequence of distinguishable activities, the use of hidden Markov models (HMM) has been proposed to improve the modeling of accelerometer data. As further improvement we propose to use expectile regression utilizing a Whittaker smoother with an L0‐penalty to better capture the intensity levels underlying the observed counts. Different expectile asymmetries beyond the mean allow the distinction of monotonous and more variable activities as expectiles effectively model the complete distribution of the counts. This new approach is investigated in a simulation study, where we simulated 1,000 days of accelerometer data with 1 and 5 s epochs, based on collected labeled data to resemble real‐life data as closely as possible. The expectile regression is compared to HMMs and the commonly used cut point method with regard to misclassification rate, number of identified bouts and identified levels as well as the proportion of the estimate being in the range of of the true activity level. In summary, expectile regression utilizing a Whittaker smoother with an L0‐penalty outperforms HMMs and the cut point method and is hence a promising approach to model accelerometer data.  相似文献   

14.
In the risk analysis of sequential events, the successive gap times are often correlated, e.g. as a result of an individual heterogeneity. Correlation is usually accounted for by using a shared gamma‐frailty model, where the variance φ of the random individual effect quantifies the correlation between gap times. This method is known to yield satisfactory estimates of covariate effects, but underestimates φ, which could result in a lack of power of the test of independence. We propose a new test of independence between two sequential gap times where the first is the time elapsed from the origin. The test is based on an approximation of the hazard of the second event given the first gap time in a frailty model, with a frailty distribution belonging to the power variance function family. Simulation results show an increased power of the new test compared with the test derived from the gamma‐frailty model. In the realistic case where hazards are event specific, and using event‐specific approaches, the proposed estimation of the variance of the frailty is less biased than the gamma‐frailty based estimation for a wide range of values ( with the set of parameters considered), and similar for higher values. As an illustration, the methods are applied to a previously analysed asthma prevention trial with results showing a significant positive association between the successive times to asthmatic events. We also analyse data from a cohort of HIV‐seropositive patients in order to assess the effect of risk factors on the occurrence of two successive markers of progression of the HIV disease. The results demonstrate the ability of the proposed model to account for negative correlations between gap times.  相似文献   

15.
Shape analysis is of great importance in many fields of medical imaging and computational biology. In this paper, we consider the shape space as the set of smooth planar immersed curves in (parameterized curves) and, using the property of being isometric to a classical manifold immersed in a Euclidean space, we introduce a new extrinsic sample mean and a new extrinsic variance for a finite set of shapes, which are not necessarily star shaped. This is a fundamental tool in medical image analysis, for instance, to assess uncertainties that arise in locating anatomical structures such as the prostate and the bladder. We apply it to a dataset consisting of parallel planar axial CT sections of human prostate, in order to study the variability between boundaries that have been manually delineated by several observers.  相似文献   

16.
We proposed a new residual to be used in linear and nonlinear beta regressions. Unlike the residuals that had already been proposed, the derivation of the new residual takes into account not only information relative to the estimation of the mean submodel but also takes into account information obtained from the precision submodel. This is an advantage of the residual we introduced. Additionally, the new residual is computationally less intensive than the weighted residual. Recall that the computation of the latter involves an matrix, where n is the sample size. Obviously, that can be a problem when the sample size is very large. In contrast, our residual does not suffer from that. It can be easily computed even in large samples. Finally, our residual proved to be able to identify atypical observations as well as the weighted residual. We also propose new thresholds for residual plots and a scheme for the choice of starting values to be used in maximum likelihood point estimation in the class of nonlinear beta regression models. We report Monte Carlo simulation results on the behavior of different residuals. We also present and discuss two empirical applications; one uses the proportion of killed grasshoppers in an assay on the grasshopper Melanopus sanguinipes with the insecticide carbofuran and the synergist piperonyl butoxide, which enhances the toxicity of the insecticide, and the other uses simulated data. The results favor the new methodology we introduce.  相似文献   

17.
The decision curve plots the net benefit of a risk model for making decisions over a range of risk thresholds, corresponding to different ratios of misclassification costs. We discuss three methods to estimate the decision curve, together with corresponding methods of inference and methods to compare two risk models at a given risk threshold. One method uses risks (R) and a binary event indicator (Y) on the entire validation cohort. This method makes no assumptions on how well-calibrated the risk model is nor on the incidence of disease in the population and is comparatively robust to model miscalibration. If one assumes that the model is well-calibrated, one can compute a much more precise estimate of based on risks R alone. However, if the risk model is miscalibrated, serious bias can result. Case–control data can also be used to estimate if the incidence (or prevalence) of the event () is known. This strategy has comparable efficiency to using the full data, and its efficiency is only modestly less than that for the full data if the incidence is estimated from the mean of Y. We estimate variances using influence functions and propose a bootstrap procedure to obtain simultaneous confidence bands around the decision curve for a range of thresholds. The influence function approach to estimate variances can also be applied to cohorts derived from complex survey samples instead of simple random samples.  相似文献   

18.
When the objective is to administer the best of two treatments to an individual, it is necessary to know his or her individual treatment effects (ITEs) and the correlation between the potential responses (PRs) and under treatments 1 and 0. Data that are generated in a parallel-group design RCT does not allow the ITE to be determined because only two samples from the marginal distributions of these PRs are observed and not the corresponding joint distribution. This is due to the “fundamental problem of causal inference.” Here, we present a counterfactual approach for estimating the joint distribution of two normally distributed responses to two treatments. This joint distribution of the PRs and can be estimated by assuming a bivariate normal distribution for the PRs and by using a normally distributed baseline biomarker functionally related to the sum . Such a functional relationship is plausible since a biomarker and the sum encode for the same information in an RCT, namely the variation between subjects. The estimation of the joint trivariate distribution is subjected to some constraints. These constraints can be framed in the context of linear regressions with regard to the proportions of variances in the responses explained and with regard to the residual variation. This presents new insights on the presence of treatment–biomarker interactions. We applied our approach to example data on exercise and heart rate and extended the approach to survival data.  相似文献   

19.
Count data sets are traditionally analyzed using the ordinary Poisson distribution. However, such a model has its applicability limited as it can be somewhat restrictive to handle specific data structures. In this case, it arises the need for obtaining alternative models that accommodate, for example, (a) zero‐modification (inflation or deflation at the frequency of zeros), (b) overdispersion, and (c) individual heterogeneity arising from clustering or repeated (correlated) measurements made on the same subject. Cases (a)–(b) and (b)–(c) are often treated together in the statistical literature with several practical applications, but models supporting all at once are less common. Hence, this paper's primary goal was to jointly address these issues by deriving a mixed‐effects regression model based on the hurdle version of the Poisson–Lindley distribution. In this framework, the zero‐modification is incorporated by assuming that a binary probability model determines which outcomes are zero‐valued, and a zero‐truncated process is responsible for generating positive observations. Approximate posterior inferences for the model parameters were obtained from a fully Bayesian approach based on the Adaptive Metropolis algorithm. Intensive Monte Carlo simulation studies were performed to assess the empirical properties of the Bayesian estimators. The proposed model was considered for the analysis of a real data set, and its competitiveness regarding some well‐established mixed‐effects models for count data was evaluated. A sensitivity analysis to detect observations that may impact parameter estimates was performed based on standard divergence measures. The Bayesian ‐value and the randomized quantile residuals were considered for model diagnostics.  相似文献   

20.
By starting from the Johnson distribution pioneered by Johnson ( 1949 ), we propose a broad class of distributions with bounded support on the basis of the symmetric family of distributions. The new class of distributions provides a rich source of alternative distributions for analyzing univariate bounded data. A comprehensive account of the mathematical properties of the new family is provided. We briefly discuss estimation of the model parameters of the new class of distributions based on two estimation methods. Additionally, a new regression model is introduced by considering the distribution proposed in this article, which is useful for situations where the response is restricted to the standard unit interval and the regression structure involves regressors and unknown parameters. The regression model allows to model both location and dispersion effects. We define two residuals for the proposed regression model to assess departures from model assumptions as well as to detect outlying observations, and discuss some influence methods such as the local influence and generalized leverage. Finally, an application to real data is presented to show the usefulness of the new regression model.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号