首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The joint analysis of spatial and temporal processes poses computational challenges due to the data's high dimensionality. Furthermore, such data are commonly non‐Gaussian. In this paper, we introduce a copula‐based spatiotemporal model for analyzing spatiotemporal data and propose a semiparametric estimator. The proposed algorithm is computationally simple, since it models the marginal distribution and the spatiotemporal dependence separately. Instead of assuming a parametric distribution, the proposed method models the marginal distributions nonparametrically and thus offers more flexibility. The method also provides a convenient way to construct both point and interval predictions at new times and locations, based on the estimated conditional quantiles. Through a simulation study and an analysis of wind speeds observed along the border between Oregon and Washington, we show that our method produces more accurate point and interval predictions for skewed data than those based on normality assumptions.  相似文献   

2.
Summary Gene co‐expressions have been widely used in the analysis of microarray gene expression data. However, the co‐expression patterns between two genes can be mediated by cellular states, as reflected by expression of other genes, single nucleotide polymorphisms, and activity of protein kinases. In this article, we introduce a bivariate conditional normal model for identifying the variables that can mediate the co‐expression patterns between two genes. Based on this model, we introduce a likelihood ratio (LR) test and a penalized likelihood procedure for identifying the mediators that affect gene co‐expression patterns. We propose an efficient computational algorithm based on iterative reweighted least squares and cyclic coordinate descent and have shown that when the tuning parameter in the penalized likelihood is appropriately selected, such a procedure has the oracle property in selecting the variables. We present simulation results to compare with existing methods and show that the LR‐based approach can perform similarly or better than the existing method of liquid association and the penalized likelihood procedure can be quite effective in selecting the mediators. We apply the proposed method to yeast gene expression data in order to identify the kinases or single nucleotide polymorphisms that mediate the co‐expression patterns between genes.  相似文献   

3.
Chen HY  Xie H  Qian Y 《Biometrics》2011,67(3):799-809
Multiple imputation is a practically useful approach to handling incompletely observed data in statistical analysis. Parameter estimation and inference based on imputed full data have been made easy by Rubin's rule for result combination. However, creating proper imputation that accommodates flexible models for statistical analysis in practice can be very challenging. We propose an imputation framework that uses conditional semiparametric odds ratio models to impute the missing values. The proposed imputation framework is more flexible and robust than the imputation approach based on the normal model. It is a compatible framework in comparison to the approach based on fully conditionally specified models. The proposed algorithms for multiple imputation through the Markov chain Monte Carlo sampling approach can be straightforwardly carried out. Simulation studies demonstrate that the proposed approach performs better than existing, commonly used imputation approaches. The proposed approach is applied to imputing missing values in bone fracture data.  相似文献   

4.
Jing Qin  Yu Shen 《Biometrics》2010,66(2):382-392
Summary Length‐biased time‐to‐event data are commonly encountered in applications ranging from epidemiological cohort studies or cancer prevention trials to studies of labor economy. A longstanding statistical problem is how to assess the association of risk factors with survival in the target population given the observed length‐biased data. In this article, we demonstrate how to estimate these effects under the semiparametric Cox proportional hazards model. The structure of the Cox model is changed under length‐biased sampling in general. Although the existing partial likelihood approach for left‐truncated data can be used to estimate covariate effects, it may not be efficient for analyzing length‐biased data. We propose two estimating equation approaches for estimating the covariate coefficients under the Cox model. We use the modern stochastic process and martingale theory to develop the asymptotic properties of the estimators. We evaluate the empirical performance and efficiency of the two methods through extensive simulation studies. We use data from a dementia study to illustrate the proposed methodology, and demonstrate the computational algorithms for point estimates, which can be directly linked to the existing functions in S‐PLUS or R .  相似文献   

5.
Zhang J  Peng Y  Zhao O 《Biometrics》2011,67(4):1352-1360
The accelerated hazard model has been proposed for more than a decade. However, its application is still very limited, partly due to the complexity of the existing semiparametric estimation method. We propose a new semiparametric estimation method based on a kernel-smoothed approximation to the limit of a profile likelihood function of the model. The method leads to smooth estimating equations and is easy to use. The estimates from the method are proved to be consistent and asymptotically normal. Our numerical study shows that the new method is more efficient than the existing method. The proposed method is employed to reanalyze the data from a brain tumor treatment study.  相似文献   

6.
Variable Selection for Semiparametric Mixed Models in Longitudinal Studies   总被引:2,自引:0,他引:2  
Summary .  We propose a double-penalized likelihood approach for simultaneous model selection and estimation in semiparametric mixed models for longitudinal data. Two types of penalties are jointly imposed on the ordinary log-likelihood: the roughness penalty on the nonparametric baseline function and a nonconcave shrinkage penalty on linear coefficients to achieve model sparsity. Compared to existing estimation equation based approaches, our procedure provides valid inference for data with missing at random, and will be more efficient if the specified model is correct. Another advantage of the new procedure is its easy computation for both regression components and variance parameters. We show that the double-penalized problem can be conveniently reformulated into a linear mixed model framework, so that existing software can be directly used to implement our method. For the purpose of model inference, we derive both frequentist and Bayesian variance estimation for estimated parametric and nonparametric components. Simulation is used to evaluate and compare the performance of our method to the existing ones. We then apply the new method to a real data set from a lactation study.  相似文献   

7.
Zhou H  Chen J  Cai J 《Biometrics》2002,58(2):352-360
We study a semiparametric estimation method for the random effects logistic regression when there is auxiliary covariate information about the main exposure variable. We extend the semiparametric estimator of Pepe and Fleming (1991, Journal of the American Statistical Association 86, 108-113) to the random effects model using the best linear unbiased prediction approach of Henderson (1975, Biometrics 31, 423-448). The method can be used to handle the missing covariate or mismeasured covariate data problems in a variety of real applications. Simulation study results show that the proposed method outperforms the existing methods. We analyzed a data set from the Collaborative Perinatal Project using the proposed method and found that the use of DDT increases the risk of preterm births among U.S. children.  相似文献   

8.
Many different methods for evaluating diagnostic test results in the absence of a gold standard have been proposed. In this paper, we discuss how one common method, a maximum likelihood estimate for a latent class model found via the Expectation-Maximization (EM) algorithm can be applied to longitudinal data where test sensitivity changes over time. We also propose two simplified and nonparametric methods which use data-based indicator variables for disease status and compare their accuracy to the maximum likelihood estimation (MLE) results. We find that with high specificity tests, the performance of simpler approximations may be just as high as the MLE.  相似文献   

9.
Sequential designs for phase I clinical trials which incorporate maximum likelihood estimates (MLE) as data accrue are inherently problematic because of limited data for estimation early on. We address this problem for small phase I clinical trials with ordinal responses. In particular, we explore the problem of the nonexistence of the MLE of the logistic parameters under a proportional odds model with one predictor. We incorporate the probability of an undetermined MLE as a restriction, as well as ethical considerations, into a proposed sequential optimal approach, which consists of a start‐up design, a follow‐on design and a sequential dose‐finding design. Comparisons with nonparametric sequential designs are also performed based on simulation studies with parameters drawn from a real data set.  相似文献   

10.
Yip PS  Lin HZ  Xi L 《Biometrics》2005,61(4):1085-1092
A semiparametric estimation procedure is proposed to model capture-recapture data with the aim of estimating the population size for a closed population. Individuals' covariates are possibly time dependent and missing at noncaptured times and may be measured with error. A set of estimating equations (EEs) based on covariate process and capture-recapture data is constructed to estimate the relevant parameters and the population size. These EEs can be solved by an algorithm similar to an EM algorithm. Simulation results show that the proposed procedures work better than the naive estimate. In some cases they are even better than "ideal" estimates, for which the true values of covariates are available for all captured subjects over the entire experimental period. We apply the method to a capture-recapture experiment on the bird species Prinia flaviventris in Hong Kong.  相似文献   

11.
Auxiliary covariate data are often collected in biomedical studies when the primary exposure variable is only assessed on a subset of the study subjects. In this study, we investigate a semiparametric‐estimated likelihood estimation for the generalized linear mixed models (GLMM) in the presence of a continuous auxiliary variable. We use a kernel smoother to handle continuous auxiliary data. The method can be used to deal with missing or mismeasured covariate data problems in a variety of applications when an auxiliary variable is available and cluster sizes are not too small. Simulation study results show that the proposed method performs better than that which ignores the random effects in GLMM and that which only uses data in the validation data set. We illustrate the proposed method with a real data set from a recent environmental epidemiology study on the maternal serum 1,1‐dichloro‐2,2‐bis(p‐chlorophenyl) ethylene level in relationship to preterm births.  相似文献   

12.
This work addresses rapid resin selection for integrated chromatographic separations when conducted as part of a high‐throughput screening exercise during the early stages of purification process development. An optimization‐based decision support framework is proposed to process the data generated from microscale experiments to identify the best resins to maximize key performance metrics for a biopharmaceutical manufacturing process, such as yield and purity. A multiobjective mixed integer nonlinear programming model is developed and solved using the ε‐constraint method. Dinkelbach's algorithm is used to solve the resulting mixed integer linear fractional programming model. The proposed framework is successfully applied to an industrial case study of a process to purify recombinant Fc Fusion protein from low molecular weight and high molecular weight product related impurities, involving two chromatographic steps with eight and three candidate resins for each step, respectively. The computational results show the advantage of the proposed framework in terms of computational efficiency and flexibility. © 2017 The Authors Biotechnology Progress published by Wiley Periodicals, Inc. on behalf of American Institute of Chemical Engineers Biotechnol. Prog., 33:1116–1126, 2017  相似文献   

13.
Individualized treatment regimes (ITRs) aim to recommend treatments based on patient‐specific characteristics in order to maximize the expected clinical outcome. Outcome weighted learning approaches have been proposed for this optimization problem with primary focus on the binary treatment case. Many require assumptions of the outcome value or the randomization mechanism. In this paper, we propose a general framework for multicategory ITRs using generic surrogate risk. The proposed method accommodates the situations when the outcome takes negative value and/or when the propensity score is unknown. Theoretical results about Fisher consistency, excess risk, and risk consistency are established. In practice, we recommend using differentiable convex loss for computational optimization. We demonstrate the superiority of the proposed method under multinomial deviance risk to some existing methods by simulation and application on data from a clinical trial.  相似文献   

14.
Clustered data frequently arise in biomedical studies, where observations, or subunits, measured within a cluster are associated. The cluster size is said to be informative, if the outcome variable is associated with the number of subunits in a cluster. In most existing work, the informative cluster size issue is handled by marginal approaches based on within-cluster resampling, or cluster-weighted generalized estimating equations. Although these approaches yield consistent estimation of the marginal models, they do not allow estimation of within-cluster associations and are generally inefficient. In this paper, we propose a semiparametric joint model for clustered interval-censored event time data with informative cluster size. We use a random effect to account for the association among event times of the same cluster as well as the association between event times and the cluster size. For estimation, we propose a sieve maximum likelihood approach and devise a computationally-efficient expectation-maximization algorithm for implementation. The estimators are shown to be strongly consistent, with the Euclidean components being asymptotically normal and achieving semiparametric efficiency. Extensive simulation studies are conducted to evaluate the finite-sample performance, efficiency and robustness of the proposed method. We also illustrate our method via application to a motivating periodontal disease dataset.  相似文献   

15.
Mixed case interval‐censored data arise when the event of interest is known only to occur within an interval induced by a sequence of random examination times. Such data are commonly encountered in disease research with longitudinal follow‐up. Furthermore, the medical treatment has progressed over the last decade with an increasing proportion of patients being cured for many types of diseases. Thus, interest has grown in cure models for survival data which hypothesize a certain proportion of subjects in the population are not expected to experience the events of interest. In this article, we consider a two‐component mixture cure model for regression analysis of mixed case interval‐censored data. The first component is a logistic regression model that describes the cure rate, and the second component is a semiparametric transformation model that describes the distribution of event time for the uncured subjects. We propose semiparametric maximum likelihood estimation for the considered model. We develop an EM type algorithm for obtaining the semiparametric maximum likelihood estimators (SPMLE) of regression parameters and establish their consistency, efficiency, and asymptotic normality. Extensive simulation studies indicate that the SPMLE performs satisfactorily in a wide variety of settings. The proposed method is illustrated by the analysis of the hypobaric decompression sickness data from National Aeronautics and Space Administration.  相似文献   

16.
An accelerated failure time (AFT) model assuming a log-linear relationship between failure time and a set of covariates can be either parametric or semiparametric, depending on the distributional assumption for the error term. Both classes of AFT models have been popular in the analysis of censored failure time data. The semiparametric AFT model is more flexible and robust to departures from the distributional assumption than its parametric counterpart. However, the semiparametric AFT model is subject to producing biased results for estimating any quantities involving an intercept. Estimating an intercept requires a separate procedure. Moreover, a consistent estimation of the intercept requires stringent conditions. Thus, essential quantities such as mean failure times might not be reliably estimated using semiparametric AFT models, which can be naturally done in the framework of parametric AFT models. Meanwhile, parametric AFT models can be severely impaired by misspecifications. To overcome this, we propose a new type of the AFT model using a nonparametric Gaussian-scale mixture distribution. We also provide feasible algorithms to estimate the parameters and mixing distribution. The finite sample properties of the proposed estimators are investigated via an extensive stimulation study. The proposed estimators are illustrated using a real dataset.  相似文献   

17.
Recurrent events data are commonly encountered in medical studies. In many applications, only the number of events during the follow‐up period rather than the recurrent event times is available. Two important challenges arise in such studies: (a) a substantial portion of subjects may not experience the event, and (b) we may not observe the event count for the entire study period due to informative dropout. To address the first challenge, we assume that underlying population consists of two subpopulations: a subpopulation nonsusceptible to the event of interest and a subpopulation susceptible to the event of interest. In the susceptible subpopulation, the event count is assumed to follow a Poisson distribution given the follow‐up time and the subject‐specific characteristics. We then introduce a frailty to account for informative dropout. The proposed semiparametric frailty models consist of three submodels: (a) a logistic regression model for the probability such that a subject belongs to the nonsusceptible subpopulation; (b) a nonhomogeneous Poisson process model with an unspecified baseline rate function; and (c) a Cox model for the informative dropout time. We develop likelihood‐based estimation and inference procedures. The maximum likelihood estimators are shown to be consistent. Additionally, the proposed estimators of the finite‐dimensional parameters are asymptotically normal and the covariance matrix attains the semiparametric efficiency bound. Simulation studies demonstrate that the proposed methodologies perform well in practical situations. We apply the proposed methods to a clinical trial on patients with myelodysplastic syndromes.  相似文献   

18.
Grigoletto M  Akritas MG 《Biometrics》1999,55(4):1177-1187
We propose a method for fitting semiparametric models such as the proportional hazards (PH), additive risks (AR), and proportional odds (PO) models. Each of these semiparametric models implies that some transformation of the conditional cumulative hazard function (at each t) depends linearly on the covariates. The proposed method is based on nonparametric estimation of the conditional cumulative hazard function, forming a weighted average over a range of t-values, and subsequent use of least squares to estimate the parameters suggested by each model. An approximation to the optimal weight function is given. This allows semiparametric models to be fitted even in incomplete data cases where the partial likelihood fails (e.g., left censoring, right truncation). However, the main advantage of this method rests in the fact that neither the interpretation of the parameters nor the validity of the analysis depend on the appropriateness of the PH or any of the other semiparametric models. In fact, we propose an integrated method for data analysis where the role of the various semiparametric models is to suggest the best fitting transformation. A single continuous covariate and several categorical covariates (factors) are allowed. Simulation studies indicate that the test statistics and confidence intervals have good small-sample performance. A real data set is analyzed.  相似文献   

19.
Clustered interval‐censored data commonly arise in many studies of biomedical research where the failure time of interest is subject to interval‐censoring and subjects are correlated for being in the same cluster. A new semiparametric frailty probit regression model is proposed to study covariate effects on the failure time by accounting for the intracluster dependence. Under the proposed normal frailty probit model, the marginal distribution of the failure time is a semiparametric probit model, the regression parameters can be interpreted as both the conditional covariate effects given frailty and the marginal covariate effects up to a multiplicative constant, and the intracluster association can be summarized by two nonparametric measures in simple and explicit form. A fully Bayesian estimation approach is developed based on the use of monotone splines for the unknown nondecreasing function and a data augmentation using normal latent variables. The proposed Gibbs sampler is straightforward to implement since all unknowns have standard form in their full conditional distributions. The proposed method performs very well in estimating the regression parameters as well as the intracluster association, and the method is robust to frailty distribution misspecifications as shown in our simulation studies. Two real‐life data sets are analyzed for illustration.  相似文献   

20.
This paper mainly focuses on how to effectively and efficiently measure visual similarity for local feature based representation. Among existing methods, metrics based on Bag of Visual Word (BoV) techniques are efficient and conceptually simple, at the expense of effectiveness. By contrast, kernel based metrics are more effective, but at the cost of greater computational complexity and increased storage requirements. We show that a unified visual matching framework can be developed to encompass both BoV and kernel based metrics, in which local kernel plays an important role between feature pairs or between features and their reconstruction. Generally, local kernels are defined using Euclidean distance or its derivatives, based either explicitly or implicitly on an assumption of Gaussian noise. However, local features such as SIFT and HoG often follow a heavy-tailed distribution which tends to undermine the motivation behind Euclidean metrics. Motivated by recent advances in feature coding techniques, a novel efficient local coding based matching kernel (LCMK) method is proposed. This exploits the manifold structures in Hilbert space derived from local kernels. The proposed method combines advantages of both BoV and kernel based metrics, and achieves a linear computational complexity. This enables efficient and scalable visual matching to be performed on large scale image sets. To evaluate the effectiveness of the proposed LCMK method, we conduct extensive experiments with widely used benchmark datasets, including 15-Scenes, Caltech101/256, PASCAL VOC 2007 and 2011 datasets. Experimental results confirm the effectiveness of the relatively efficient LCMK method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号