期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Bayesian analysis for generalized linear models with nonignorably missing covariates

Huang L Chen MH Ibrahim JG 《Biometrics》2005,61(3):767-780

We propose Bayesian methods for estimating parameters in generalized linear models (GLMs) with nonignorably missing covariate data. We show that when improper uniform priors are used for the regression coefficients, phi, of the multinomial selection model for the missing data mechanism, the resulting joint posterior will always be improper if (i) all missing covariates are discrete and an intercept is included in the selection model for the missing data mechanism, or (ii) at least one of the covariates is continuous and unbounded. This impropriety will result regardless of whether proper or improper priors are specified for the regression parameters, beta, of the GLM or the parameters, alpha, of the covariate distribution. To overcome this problem, we propose a novel class of proper priors for the regression coefficients, phi, in the selection model for the missing data mechanism. These priors are robust and computationally attractive in the sense that inferences about beta are not sensitive to the choice of the hyperparameters of the prior for phi and they facilitate a Gibbs sampling scheme that leads to accelerated convergence. In addition, we extend the model assessment criterion of Chen, Dey, and Ibrahim (2004a, Biometrika 91, 45-63), called the weighted L measure, to GLMs and missing data problems as well as extend the deviance information criterion (DIC) of Spiegelhalter et al. (2002, Journal of the Royal Statistical Society B 64, 583-639) for assessing whether the missing data mechanism is ignorable or nonignorable. A novel Markov chain Monte Carlo sampling algorithm is also developed for carrying out posterior computation. Several simulations are given to investigate the performance of the proposed Bayesian criteria as well as the sensitivity of the prior specification. Real datasets from a melanoma cancer clinical trial and a liver cancer study are presented to further illustrate the proposed methods. 相似文献

2.

Cox Regression in Nested Case–Control Studies with Auxiliary Covariates

Mengling Liu Wenbin Lu Chi‐hong Tseng 《Biometrics》2010,66(2):374-381

Summary Nested case–control (NCC) design is a popular sampling method in large epidemiological studies for its cost effectiveness to investigate the temporal relationship of diseases with environmental exposures or biological precursors. Thomas' maximum partial likelihood estimator is commonly used to estimate the regression parameters in Cox's model for NCC data. In this article, we consider a situation in which failure/censoring information and some crude covariates are available for the entire cohort in addition to NCC data and propose an improved estimator that is asymptotically more efficient than Thomas' estimator. We adopt a projection approach that, heretofore, has only been employed in situations of random validation sampling and show that it can be well adapted to NCC designs where the sampling scheme is a dynamic process and is not independent for controls. Under certain conditions, consistency and asymptotic normality of the proposed estimator are established and a consistent variance estimator is also developed. Furthermore, a simplified approximate estimator is proposed when the disease is rare. Extensive simulations are conducted to evaluate the finite sample performance of our proposed estimators and to compare the efficiency with Thomas' estimator and other competing estimators. Moreover, sensitivity analyses are conducted to demonstrate the behavior of the proposed estimator when model assumptions are violated, and we find that the biases are reasonably small in realistic situations. We further demonstrate the proposed method with data from studies on Wilms' tumor. 相似文献

3.

Mixed effect regression analysis for a cluster-based two-stage outcome-auxiliary-dependent sampling design with a continuous outcome

Xu W Zhou H 《Biostatistics (Oxford, England)》2012,13(4):650-664

Two-stage design is a well-known cost-effective way for conducting biomedical studies when the exposure variable is expensive or difficult to measure. Recent research development further allowed one or both stages of the two-stage design to be outcome dependent on a continuous outcome variable. This outcome-dependent sampling feature enables further efficiency gain in parameter estimation and overall cost reduction of the study (e.g. Wang, X. and Zhou, H., 2010. Design and inference for cancer biomarker study with an outcome and auxiliary-dependent subsampling. Biometrics 66, 502-511; Zhou, H., Song, R., Wu, Y. and Qin, J., 2011. Statistical inference for a two-stage outcome-dependent sampling design with a continuous outcome. Biometrics 67, 194-202). In this paper, we develop a semiparametric mixed effect regression model for data from a two-stage design where the second-stage data are sampled with an outcome-auxiliary-dependent sample (OADS) scheme. Our method allows the cluster- or center-effects of the study subjects to be accounted for. We propose an estimated likelihood function to estimate the regression parameters. Simulation study indicates that greater study efficiency gains can be achieved under the proposed two-stage OADS design with center-effects when compared with other alternative sampling schemes. We illustrate the proposed method by analyzing a dataset from the Collaborative Perinatal Project. 相似文献

4.

Estimation of Clustering Parameters Using Gaussian Process Regression

Paul Rigby Oscar Pizarro Stefan B. Williams 《PloS one》2014,9(11)

We propose a method for estimating the clustering parameters in a Neyman-Scott Poisson process using Gaussian process regression. It is assumed that the underlying process has been observed within a number of quadrats, and from this sparse information the distribution is modelled as a Gaussian process. The clustering parameters are then estimated numerically by fitting to the covariance structure of the model. It is shown that the proposed method is resilient to any sampling regime. The method is applied to simulated two-dimensional clustered populations and the results are compared to a related method from the literature. 相似文献

5.

Iwao’s patchiness regression through the origin: biological importance and efficiency of sampling applications

Edward Kyle Waters Michael J. Furlong Kurt K. Benke James Robin Grove Andrew John Hamilton 《Population Ecology》2014,56(2):393-399

Iwao’s mean crowding-mean density relation can be treated both as a linear function describing the biological characteristics of a species at a population level, or a regression model fitted to empirical data (Iwao’s patchiness regression). In this latter form its parameters are commonly used to construct sampling plans for insect pests, which are characteristically patchily distributed or overdispersed. It is shown in this paper that modifying both the linear function and statistical model to force the intercept or lower functional limit through the origin results in more intuitive biological interpretation of parameters and better sampling economy. Firstly, forcing the function through the origin has the effect of ensuring that zero crowding occurs when zero individuals occupy a patch. Secondly, it ensures that negative values of the intercept, which do not yield an intuitive biological interpretation, will not arise. It is shown analytically that sequential sampling plans based on regression through the origin should be more efficient compared to plans based on conventional regression. For two overdispersed data sets, through-origin based plans collected a significantly lower sample size during validation than plans based on conventional regression, but the improvement in sampling efficiency was not large enough to be of practical benefit. No difference in sample size was observed when through-origin and conventional regression based plans were validated using underdispersed data. A field researcher wishing to adopt a through-origin form of Iwao’s regression for the biological reasons outlined above can therefore be confident that their sampling strategies will not be affected by doing so. 相似文献

6.

Using Inverse Probability Bootstrap Sampling to Eliminate Sample Induced Bias in Model Based Analysis of Unequal Probability Samples

Matthew Nahorniak David P. Larsen Carol Volk Chris E. Jordan 《PloS one》2015,10(6)

In ecology, as in other research fields, efficient sampling for population estimation often drives sample designs toward unequal probability sampling, such as in stratified sampling. Design based statistical analysis tools are appropriate for seamless integration of sample design into the statistical analysis. However, it is also common and necessary, after a sampling design has been implemented, to use datasets to address questions that, in many cases, were not considered during the sampling design phase. Questions may arise requiring the use of model based statistical tools such as multiple regression, quantile regression, or regression tree analysis. However, such model based tools may require, for ensuring unbiased estimation, data from simple random samples, which can be problematic when analyzing data from unequal probability designs. Despite numerous method specific tools available to properly account for sampling design, too often in the analysis of ecological data, sample design is ignored and consequences are not properly considered. We demonstrate here that violation of this assumption can lead to biased parameter estimates in ecological research. In addition, to the set of tools available for researchers to properly account for sampling design in model based analysis, we introduce inverse probability bootstrapping (IPB). Inverse probability bootstrapping is an easily implemented method for obtaining equal probability re-samples from a probability sample, from which unbiased model based estimates can be made. We demonstrate the potential for bias in model-based analyses that ignore sample inclusion probabilities, and the effectiveness of IPB sampling in eliminating this bias, using both simulated and actual ecological data. For illustration, we considered three model based analysis tools—linear regression, quantile regression, and boosted regression tree analysis. In all models, using both simulated and actual ecological data, we found inferences to be biased, sometimes severely, when sample inclusion probabilities were ignored, while IPB sampling effectively produced unbiased parameter estimates. 相似文献

7.

Monte Carlo EM for missing covariates in parametric regression models

Ibrahim JG Chen MH Lipsitz SR 《Biometrics》1999,55(2):591-596

We propose a method for estimating parameters for general parametric regression models with an arbitrary number of missing covariates. We allow any pattern of missing data and assume that the missing data mechanism is ignorable throughout. When the missing covariates are categorical, a useful technique for obtaining parameter estimates is the EM algorithm by the method of weights proposed in Ibrahim (1990, Journal of the American Statistical Association 85, 765-769). We extend this method to continuous or mixed categorical and continuous covariates, and for arbitrary parametric regression models, by adapting a Monte Carlo version of the EM algorithm as discussed by Wei and Tanner (1990, Journal of the American Statistical Association 85, 699-704). In addition, we discuss the Gibbs sampler for sampling from the conditional distribution of the missing covariates given the observed data and show that the appropriate complete conditionals are log-concave. The log-concavity property of the conditional distributions will facilitate a straightforward implementation of the Gibbs sampler via the adaptive rejection algorithm of Gilks and Wild (1992, Applied Statistics 41, 337-348). We assume the model for the response given the covariates is an arbitrary parametric regression model, such as a generalized linear model, a parametric survival model, or a nonlinear model. We model the marginal distribution of the covariates as a product of one-dimensional conditional distributions. This allows us a great deal of flexibility in modeling the distribution of the covariates and reduces the number of nuisance parameters that are introduced in the E-step. We present examples involving both simulated and real data. 相似文献

8.

Proportional hazards regression for cancer studies 总被引：1，自引：0，他引：1

Ghosh D 《Biometrics》2008,64(1):141-148

Summary. There has been some recent work in the statistical literature for modeling the relationship between the size of cancers and probability of detecting metastasis, i.e., aggressive disease. Methods for assessing covariate effects in these studies are limited. In this article, we formulate the problem as assessing covariate effects on a right-censored variable subject to two types of sampling bias. The first is the length-biased sampling that is inherent in screening studies; the second is the two-phase design in which a fraction of tumors are measured. We construct estimation procedures for the proportional hazards model that account for these two sampling issues. In addition, a Nelson–Aalen type estimator is proposed as a summary statistic. Asymptotic results for the regression methodology are provided. The methods are illustrated by application to data from an observational cancer study as well as to simulated data. 相似文献

9.

Proportional Hazards Regression for the Analysis of Clustered Survival Data from Case–Cohort Studies

Hui Zhang Douglas E. Schaubel John D. Kalbfleisch 《Biometrics》2011,67(1):18-28

Summary Case–cohort sampling is a commonly used and efficient method for studying large cohorts. Most existing methods of analysis for case–cohort data have concerned the analysis of univariate failure time data. However, clustered failure time data are commonly encountered in public health studies. For example, patients treated at the same center are unlikely to be independent. In this article, we consider methods based on estimating equations for case–cohort designs for clustered failure time data. We assume a marginal hazards model, with a common baseline hazard and common regression coefficient across clusters. The proposed estimators of the regression parameter and cumulative baseline hazard are shown to be consistent and asymptotically normal, and consistent estimators of the asymptotic covariance matrices are derived. The regression parameter estimator is easily computed using any standard Cox regression software that allows for offset terms. The proposed estimators are investigated in simulation studies, and demonstrated empirically to have increased efficiency relative to some existing methods. The proposed methods are applied to a study of mortality among Canadian dialysis patients. 相似文献

10.

Resolution enhancement for lung 4D-CT based on transversal structures by using multiple Gaussian process regression learning

《Physica medica : PM : an international journal devoted to the applications of physics to medicine and biology : official journal of the Italian Association of Biomedical Physics (AIFB)》2020

PurposeFour-dimensional computed tomography (4D-CT) plays a useful role in many clinical situations. However, due to the hardware limitation of system, dense sampling along superior–inferior direction is often not practical. In this paper, we develop a novel multiple Gaussian process regression model to enhance the superior-inferior resolution for lung 4D-CT based on transversal structures.MethodsThe proposed strategy is based on the observation that high resolution transversal images can recover missing pixels in the superior-inferior direction. Based on this observation and motived by random forest algorithm, we employ multiple Gaussian process regression model learned from transversal images to improve superior–inferior resolution. Specifically, we first randomly sample 3 × 3 patches from original transversal images. The central pixel of these patches and the eight-neighbour pixels of their corresponding degraded versions form the label and input of training data, respectively. Multiple Gaussian process regression model is then built on the basis of multiple training subsets obtained by random sampling. Finally, the central pixel of the patch is estimated based on the proposed model, with the eight-neighbour pixels of each 3 × 3 patch from interpolated superior-inferior direction images as inputs.ResultsThe performance of our method is extensively evaluated using simulated and publicly available datasets. Our experiments show the remarkable performance of the proposed method.ConclusionsIn this paper, we propose a new approach to improve the 4D-CT resolution, which does not require any external data and hardware support, and can produce clear coronal/sagittal images for easy viewing. 相似文献

11.

Logistic regression analysis of two-phase studies using generalized method of moments

Prosenjit Kundu Nilanjan Chatterjee 《Biometrics》2023,79(1):241-252

Two-phase designs can reduce the cost of epidemiological studies by limiting the ascertainment of expensive covariates or/and exposures to an efficiently selected subset (phase-II) of a larger (phase-I) study. Efficient analysis of the resulting data set combining disparate information from phase-I and phase-II, however, can be complex. Most of the existing methods, including semiparametric maximum-likelihood estimator, require the information in phase-I to be summarized into a fixed number of strata. In this paper, we describe a novel method for the analysis of two-phase studies where information from phase-I is summarized by parameters associated with a reduced logistic regression model of the disease outcome on available covariates. We then setup estimating equations for parameters associated with the desired extended logistic regression model, based on information on the reduced model parameters from phase-I and complete data available at phase-II after accounting for nonrandom sampling design. We use generalized method of moments to solve overly identified estimating equations and develop the resulting asymptotic theory for the proposed estimator. Simulation studies show that the use of reduced parametric models, as opposed to summarizing data into strata, can lead to more efficient utilization of phase-I data. An application of the proposed method is illustrated using the data from the U.S. National Wilms Tumor Study. 相似文献

12.

The relationship between virologic and immunologic responses in AIDS clinical research using mixed-effects varying-coefficient models with measurement error

Liang H Wu H Carroll RJ 《Biostatistics (Oxford, England)》2003,4(2):297-312

In this article we study the relationship between virologic and immunologic responses in AIDS clinical trials. Since plasma HIV RNA copies (viral load) and CD4+ cell counts are crucial virologic and immunologic markers for HIV infection, it is important to study their relationship during HIV/AIDS treatment. We propose a mixed-effects varying-coefficient model based on an exploratory analysis of data from a clinical trial. Since both viral load and CD4+ cell counts are subject to measurement error, we also consider the measurement error problem in covariates in our model. The regression spline method is proposed for inference for parameters in the proposed model. The regression spline method transforms the unknown nonparametric components into parametric functions. It is relatively simple to implement using readily available software, and parameter inference can be developed from standard parametric models. We apply the proposed models and methods to an AIDS clinical study. From this study, we find an interesting relationship between viral load and CD4+ cell counts during antiviral treatments. Biological interpretations and clinical implications are discussed. 相似文献

13.

Bayesian estimation of dominance merits in noninbred populations by using Gibbs sampling with two reduced sets of mixed model equations

Chalh A Gazzah M M 《Journal of applied genetics》2004,45(3):331-339

Henderson's mixed model equations system is generally required in a Gibbs sampling application. In two previous studies, we proposed two indirect solving approaches that give dominance values in an animal model context with no need to process all this system. The first one does not require D-1 and the second is based on processing the additive animal model residuals. In the present work, we show that these two methods can be handled iteratively. Since the Bayesian approach is now a widely used tool in estimation of genetic parameters, the main part of this work is devoted to a Gibbs sampling application that can be accelerated by means of the aforementioned indirect solving methods. Three replicates of a population data set are simulated in the paper to compare the applications and estimates. This shows effectively that the estimates given by implementing a Gibbs sampler with each of the two suggested solving methods are obtained with less computational time and are comparable to those given by considering the integral system, particularly when priors are more weighted. 相似文献

14.

Model selection and efficiency testing for normalization of cDNA microarray data 总被引：3，自引：0，他引：3

下载免费PDF全文

Futschik M Crompton T 《Genome biology》2004,5(8):R60

In this study we present two novel normalization schemes for cDNA microarrays. They are based on iterative local regression and optimization of model parameters by generalized cross-validation. Permutation tests assessing the efficiency of normalization demonstrated that the proposed schemes have an improved ability to remove systematic errors and to reduce variability in microarray data. The analysis also reveals that without parameter optimization local regression is frequently insufficient to remove systematic errors in microarray data. 相似文献

15.

A bivariate quantitative genetic model for a linear Gaussian trait and a survival trait

Lars Holm Damgaard Inge Riis Korsgaard 《遗传、选种与进化》2006,38(1):45-64

With the increasing use of survival models in animal breeding to address the genetic aspects of mainly longevity of livestock but also disease traits, the need for methods to infer genetic correlations and to do multivariate evaluations of survival traits and other types of traits has become increasingly important. In this study we derived and implemented a bivariate quantitative genetic model for a linear Gaussian and a survival trait that are genetically and environmentally correlated. For the survival trait, we considered the Weibull log-normal animal frailty model. A Bayesian approach using Gibbs sampling was adopted. Model parameters were inferred from their marginal posterior distributions. The required fully conditional posterior distributions were derived and issues on implementation are discussed. The two Weibull baseline parameters were updated jointly using a Metropolis-Hasting step. The remaining model parameters with non-normalized fully conditional distributions were updated univariately using adaptive rejection sampling. Simulation results showed that the estimated marginal posterior distributions covered well and placed high density to the true parameter values used in the simulation of data. In conclusion, the proposed method allows inferring additive genetic and environmental correlations, and doing multivariate genetic evaluation of a linear Gaussian trait and a survival trait. 相似文献

16.

Optimal sampling in retrospective logistic regression via two-stage method 总被引：1，自引：0，他引：1

Chien CY Ivan Chang YC Hsueh HM 《Biometrical journal. Biometrische Zeitschrift》2011,53(1):5-18

Case-control sampling is popular in epidemiological research because of its cost and time saving. In a logistic regression model, with limited knowledge on the covariance matrix of the point estimator of the regression coefficients a priori, there exists no fixed sample size analysis. In this study, we propose a two-stage sequential analysis, in which the optimal sample fraction and the required sample size to achieve a predetermined volume of a joint confidence set are estimated in an interim analysis. Additionally required observations are collected in the second stage according to the estimated optimal sample fraction. At the end of the experiment, data from these two stages are combined and analyzed for statistical inference. Simulation studies are conducted to justify the proposed two-stage procedure and an example is presented for illustration. It is found that the proposed two-stage procedure performs adequately in the sense that the resultant joint confidence set has a well-controlled volume and achieves the required coverage probability. Furthermore, the optimal sample fractions among all the selected scenarios are close to one. Hence, the proposed procedure can be simplified by always considering a balance design. 相似文献

17.

Linear regression analysis based on Buckley-James estimating equation.

J S Lin L J Wei 《Biometrics》1992,48(3):679-681

In this note we consider the problem of drawing inference about the regression parameters in a linear model with survival data. A simple procedure based on the Buckley-James (1979, Biometrika 66, 429-436) estimating equation is proposed and illustrated with an example. 相似文献

18.

Construction of a risk model through the fusion of experimental data and finite element modeling: Application to car crash-induced TBI

Seyed Saeed Ahmadisoleymani 《Computer methods in biomechanics and biomedical engineering》2019,22(6):605-619

This article introduces a new approach for the construction of a risk model for the prediction of Traumatic Brain Injury (TBI) as a result of a car crash. The probability of TBI is assessed through the fusion of an experiment-based logistic regression risk model and a finite element (FE) simulation-based risk model. The proposed approach uses a multilevel framework which includes FE simulations of vehicle crashes with dummy and FE simulations of the human brain. The loading conditions derived from the crash simulations are transferred to the brain model thus allowing the calculation of injury metrics such as the Cumulative Strain Damage Measure (CSDM). The framework is used to propagate uncertainties and obtain probabilities of TBI based on the CSDM injury metric. The risk model from FE simulations is constructed from a support vector machine classifier, adaptive sampling, and Monte-Carlo simulations. An approach to compute the total probability of TBI, which combines the FE-based risk assessment as well as the risk prediction from the experiment-based logistic regression model is proposed. In contrast to previous published work, the proposed methodology includes the uncertainty of explicit parameters such as impact conditions (e.g., velocity, impact angle), and material properties of the brain model. This risk model can provide, for instance, the probability of TBI for a given assumed crash impact velocity. 相似文献

19.

Modeling clustered long‐term survivors using marginal mixture cure model

下载免费PDF全文

Yi Niu Lixin Song Yufeng Liu Yingwei Peng 《Biometrical journal. Biometrische Zeitschrift》2018,60(4):780-796

There is a great deal of recent interests in modeling right‐censored clustered survival time data with a possible fraction of cured subjects who are nonsusceptible to the event of interest using marginal mixture cure models. In this paper, we consider a semiparametric marginal mixture cure model for such data and propose to extend an existing generalized estimating equation approach by a new unbiased estimating equation for the regression parameters in the latency part of the model. The large sample properties of the regression effect estimators in both incidence and the latency parts are established. The finite sample properties of the estimators are studied in simulation studies. The proposed method is illustrated with a bone marrow transplantation data and a tonsil cancer data. 相似文献

20.

The Use of a Predictive Habitat Model and a Fuzzy Logic Approach for Marine Management and Planning

Tarek Hattab Frida Ben Rais Lasram Camille Albouy Chérif Sammari Mohamed Salah Romdhane Philippe Cury Fabien Leprieur Fran?ois Le Loc’h 《PloS one》2013,8(10)

Bottom trawl survey data are commonly used as a sampling technique to assess the spatial distribution of commercial species. However, this sampling technique does not always correctly detect a species even when it is present, and this can create significant limitations when fitting species distribution models. In this study, we aim to test the relevance of a mixed methodological approach that combines presence-only and presence-absence distribution models. We illustrate this approach using bottom trawl survey data to model the spatial distributions of 27 commercially targeted marine species. We use an environmentally- and geographically-weighted method to simulate pseudo-absence data. The species distributions are modelled using regression kriging, a technique that explicitly incorporates spatial dependence into predictions. Model outputs are then used to identify areas that met the conservation targets for the deployment of artificial anti-trawling reefs. To achieve this, we propose the use of a fuzzy logic framework that accounts for the uncertainty associated with different model predictions. For each species, the predictive accuracy of the model is classified as ‘high’. A better result is observed when a large number of occurrences are used to develop the model. The map resulting from the fuzzy overlay shows that three main areas have a high level of agreement with the conservation criteria. These results align with expert opinion, confirming the relevance of the proposed methodology in this study. 相似文献