首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
This paper develops a model for repeated binary regression when a covariate is measured with error. The model allows for estimating the effect of the true value of the covariate on a repeated binary response. The choice of a probit link for the effect of the error-free covariate, coupled with normal measurement error for the error-free covariate, results in a probit model after integrating over the measurement error distribution. We propose a two-stage estimation procedure where, in the first stage, a linear mixed model is used to fit the repeated covariate. In the second stage, a model for the correlated binary responses conditional on the linear mixed model estimates is fit to the repeated binary data using generalized estimating equations. The approach is demonstrated using nutrient safety data from the Diet Intervention of School Age Children (DISC) study.  相似文献   

2.
Binary regression models for spatial data are commonly used in disciplines such as epidemiology and ecology. Many spatially referenced binary data sets suffer from location error, which occurs when the recorded location of an observation differs from its true location. When location error occurs, values of the covariates associated with the true spatial locations of the observations cannot be obtained. We show how a change of support (COS) can be applied to regression models for binary data to provide coefficient estimates when the true values of the covariates are unavailable, but the unknown location of the observations are contained within nonoverlapping arbitrarily shaped polygons. The COS accommodates spatial and nonspatial covariates and preserves the convenient interpretation of methods such as logistic and probit regression. Using a simulation experiment, we compare binary regression models with a COS to naive approaches that ignore location error. We illustrate the flexibility of the COS by modeling individual-level disease risk in a population using a binary data set where the locations of the observations are unknown but contained within administrative units. Our simulation experiment and data illustration corroborate that conventional regression models for binary data that ignore location error are unreliable, but that the COS can be used to eliminate bias while preserving model choice.  相似文献   

3.
A Bayesian procedure for misclassified binary data was developed. An animal breeding simulation indicated that, when error of classification was ignored, the variance between clusters was inferred incorrectly. Data were reanalyzed assuming that the probability of misclassification was either known or unknown. In the first case, input parameter values were recovered in the analysis. When the probability was unknown, there was a slight bias; the true probability of misclassification and the true number of miscoded observations appeared within high credibility regions. An analysis of fertility in dairy cows is presented.  相似文献   

4.
Summary We introduce a correction for covariate measurement error in nonparametric regression applied to longitudinal binary data arising from a study on human sleep. The data have been surveyed to investigate the association of some hormonal levels and the probability of being asleep. The hormonal effect is modeled flexibly while we account for the error‐prone measurement of its concentration in the blood and the longitudinal character of the data. We present a fully Bayesian treatment utilizing Markov chain Monte Carlo inference techniques, and also introduce block updating to improve sampling and computational performance in the binary case. Our model is partly inspired by the relevance vector machine with radial basis functions, where usually very few basis functions are automatically selected for fitting the data. In the proposed approach, we implement such data‐driven complexity regulation by adopting the idea of Bayesian model averaging. Besides the general theory and the detailed sampling scheme, we also provide a simulation study for the Gaussian and the binary cases by comparing our method to the naive analysis ignoring measurement error. The results demonstrate a clear gain when using the proposed correction method, particularly for the Gaussian case with medium and large measurement error variances, even if the covariate model is misspecified.  相似文献   

5.
Albert PS  Follmann DA  Wang SA  Suh EB 《Biometrics》2002,58(3):631-642
Longitudinal clinical trials often collect long sequences of binary data. Our application is a recent clinical trial in opiate addicts that examined the effect of a new treatment on repeated binary urine tests to assess opiate use over an extended follow-up. The dataset had two sources of missingness: dropout and intermittent missing observations. The primary endpoint of the study was comparing the marginal probability of a positive urine test over follow-up across treatment arms. We present a latent autoregressive model for longitudinal binary data subject to informative missingness. In this model, a Gaussian autoregressive process is shared between the binary response and missing-data processes, thereby inducing informative missingness. Our approach extends the work of others who have developed models that link the various processes through a shared random effect but do not allow for autocorrelation. We discuss parameter estimation using Monte Carlo EM and demonstrate through simulations that incorporating within-subject autocorrelation through a latent autoregressive process can be very important when longitudinal binary data is subject to informative missingness. We illustrate our new methodology using the opiate clinical trial data.  相似文献   

6.
One of the main tasks when dealing with the impacts of infrastructures on wildlife is to identify hotspots of high mortality so one can devise and implement mitigation measures. A common strategy to identify hotspots is to divide an infrastructure into several segments and determine when the number of collisions in a segment is above a given threshold, reflecting a desired significance level that is obtained assuming a probability distribution for the number of collisions, which is often the Poisson distribution. The problem with this approach, when applied to each segment individually, is that the probability of identifying false hotspots (Type I error) is potentially high. The way to solve this problem is to recognize that it requires multiple testing corrections or a Bayesian approach. Here, we apply three different methods that implement the required corrections to the identification of hotspots: (i) the familywise error rate correction, (ii) the false discovery rate, and (iii) a Bayesian hierarchical procedure. We illustrate the application of these methods with data on two bird species collected on a road in Brazil. The proposed methods provide practitioners with procedures that are reliable and simple to use in real situations and, in addition, can reflect a practitioner’s concerns towards identifying false positive or missing true hotspots. Although one may argue that an overly cautionary approach (reducing the probability of type I error) may be beneficial from a biological conservation perspective, it may lead to a waste of resources and, probably worse, it may raise doubts about the methodology adopted and the credibility of those suggesting it.  相似文献   

7.
A recently proposed optimal Bayesian classification paradigm addresses optimal error rate analysis for small-sample discrimination, including optimal classifiers, optimal error estimators, and error estimation analysis tools with respect to the probability of misclassification under binary classes. Here, we address multi-class problems and optimal expected risk with respect to a given risk function, which are common settings in bioinformatics. We present Bayesian risk estimators (BRE) under arbitrary classifiers, the mean-square error (MSE) of arbitrary risk estimators under arbitrary classifiers, and optimal Bayesian risk classifiers (OBRC). We provide analytic expressions for these tools under several discrete and Gaussian models and present a new methodology to approximate the BRE and MSE when analytic expressions are not available. Of particular note, we present analytic forms for the MSE under Gaussian models with homoscedastic covariances, which are new even in binary classification.  相似文献   

8.
We have developed a new general approach for handling misclassification in discrete covariates or responses in regression models. The simulation and extrapolation (SIMEX) method, which was originally designed for handling additive covariate measurement error, is applied to the case of misclassification. The statistical model for characterizing misclassification is given by the transition matrix Pi from the true to the observed variable. We exploit the relationship between the size of misclassification and bias in estimating the parameters of interest. Assuming that Pi is known or can be estimated from validation data, we simulate data with higher misclassification and extrapolate back to the case of no misclassification. We show that our method is quite general and applicable to models with misclassified response and/or misclassified discrete regressors. In the case of a binary response with misclassification, we compare our method to the approach of Neuhaus, and to the matrix method of Morrissey and Spiegelman in the case of a misclassified binary regressor. We apply our method to a study on caries with a misclassified longitudinal response.  相似文献   

9.
Efficient measurement error correction with spatially misaligned data   总被引:1,自引:0,他引:1  
Association studies in environmental statistics often involve exposure and outcome data that are misaligned in space. A common strategy is to employ a spatial model such as universal kriging to predict exposures at locations with outcome data and then estimate a regression parameter of interest using the predicted exposures. This results in measurement error because the predicted exposures do not correspond exactly to the true values. We characterize the measurement error by decomposing it into Berkson-like and classical-like components. One correction approach is the parametric bootstrap, which is effective but computationally intensive since it requires solving a nonlinear optimization problem for the exposure model parameters in each bootstrap sample. We propose a less computationally intensive alternative termed the "parameter bootstrap" that only requires solving one nonlinear optimization problem, and we also compare bootstrap methods to other recently proposed methods. We illustrate our methodology in simulations and with publicly available data from the Environmental Protection Agency.  相似文献   

10.
Cook RJ  Zeng L  Yi GY 《Biometrics》2004,60(3):820-828
In recent years there has been considerable research devoted to the development of methods for the analysis of incomplete data in longitudinal studies. Despite these advances, the methods used in practice have changed relatively little, particularly in the reporting of pharmaceutical trials. In this setting, perhaps the most widely adopted strategy for dealing with incomplete longitudinal data is imputation by the "last observation carried forward" (LOCF) approach, in which values for missing responses are imputed using observations from the most recently completed assessment. We examine the asymptotic and empirical bias, the empirical type I error rate, and the empirical coverage probability associated with estimators and tests of treatment effect based on the LOCF imputation strategy. We consider a setting involving longitudinal binary data with longitudinal analyses based on generalized estimating equations, and an analysis based simply on the response at the end of the scheduled follow-up. We find that for both of these approaches, imputation by LOCF can lead to substantial biases in estimators of treatment effects, the type I error rates of associated tests can be greatly inflated, and the coverage probability can be far from the nominal level. Alternative analyses based on all available data lead to estimators with comparatively small bias, and inverse probability weighted analyses yield consistent estimators subject to correct specification of the missing data process. We illustrate the differences between various methods of dealing with drop-outs using data from a study of smoking behavior.  相似文献   

11.
The Fourier spectral analysis of binary time series (or rectangular signals) causes methodological problems, due to the fact that it is based on sinusoidal functions. We propose a new tool for the detection of periodicities in binary time series, focusing on sleep/wake cycles. This methodology is based on a weighted histogram of cycle durations. In this paper, we compare our methodology with the Fourier spectral analysis on the basis of simulated and real binary data sets of various lengths. We also provide an approach to statistical validation of the periodicities determined with our methodology. Furthermore, we analyze the discriminating power of both methods in terms of standard deviation. Our results indicate that the Ciclograma is much more powerful than Fourier analysis when applied on this type of time series.  相似文献   

12.
The Fourier spectral analysis of binary time series (or rectangular signals) causes methodological problems, due to the fact that it is based on sinusoidal functions. We propose a new tool for the detection of periodicities in binary time series, focusing on sleep/wake cycles. This methodology is based on a weighted histogram of cycle durations. In this paper, we compare our methodology with the Fourier spectral analysis on the basis of simulated and real binary data sets of various lengths. We also provide an approach to statistical validation of the periodicities determined with our methodology. Furthermore, we analyze the discriminating power of both methods in terms of standard deviation. Our results indicate that the Ciclograma is much more powerful than Fourier analysis when applied on this type of time series.  相似文献   

13.
SUMMARY: We consider two-armed clinical trials in which the response and/or the covariates are observed on either a binary, ordinal, or continuous scale. A new general nonparametric (NP) approach for covariate adjustment is presented using the notion of a relative effect to describe treatment effects. The relative effect is defined by the probability of observing a higher response in the experimental than in the control arm. The notion is invariant under monotone transformations of the data and is therefore especially suitable for ordinal data. For a normal or binary distributed response the relative effect is the transformed effect size or the difference of response probability, respectively. An unbiased and consistent NP estimator for the relative effect is presented. Further, we suggest a NP procedure for correcting the relative effect for covariate imbalance and random covariate imbalance, yielding a consistent estimator for the adjusted relative effect. Asymptotic theory has been developed to derive test statistics and confidence intervals. The test statistic is based on the joint behavior of the estimated relative effect for the response and the covariates. It is shown that the test statistic can be used to evaluate the treatment effect in the presence of (random) covariate imbalance. Approximations for small sample sizes are considered as well. The sampling behavior of the estimator of the adjusted relative effect is examined. We also compare the probability of a type I error and the power of our approach to standard covariate adjustment methods by means of a simulation study. Finally, our approach is illustrated on three studies involving ordinal responses and covariates.  相似文献   

14.
Dynamic implementation for software-based soft error tolerance method which can protect more types of codes can cover more soft errors. This paper explores soft error tolerance with dynamic software-based method. We propose a new dynamic software-based approach to tolerate soft errors. In our approach, the objective which is protected is dynamic program. For those protected dynamic binary codes, we make sure right control flow and right data flow to significant extent in our approach. Our approach copies every data and operates every operation twice to ensure those data stored into memory are right. Additionally, we ensure every branch instruction can jump to the right address by checking condition and destination address. Our approach is implemented by the technique dynamic binary instrumentation. Specifically, our tool is implemented on the basis of valgrind framework which is a heavyweight dynamic binary instrumentation tool. Our experimental results demonstrate that our approach can get higher reliability of dynamic software than those approaches which is implemented with static program protection method. However, our approach is only suitable for the system which has a strict requirement of reliability because our approach also sacrifices more performance of software than those static program protection methods.  相似文献   

15.
MOTIVATION: We present statistical methods for determining the number of per gene replicate spots required in microarray experiments. The purpose of these methods is to obtain an estimate of the sampling variability present in microarray data, and to determine the number of replicate spots required to achieve a high probability of detecting a significant fold change in gene expression, while maintaining a low error rate. Our approach is based on data from control microarrays, and involves the use of standard statistical estimation techniques. RESULTS: After analyzing two experimental data sets containing control array data, we were able to determine the statistical power available for the detection of significant differential expression given differing levels of replication. The inclusion of replicate spots on microarrays not only allows more accurate estimation of the variability present in an experiment, but more importantly increases the probability of detecting genes undergoing significant fold changes in expression, while substantially decreasing the probability of observing fold changes due to chance rather than true differential expression.  相似文献   

16.
Using multiple historical trials with surrogate and true endpoints, we consider various models to predict the effect of treatment on a true endpoint in a target trial in which only a surrogate endpoint is observed. This predicted result is computed using (1) a prediction model (mixture, linear, or principal stratification) estimated from historical trials and the surrogate endpoint of the target trial and (2) a random extrapolation error estimated from successively leaving out each trial among the historical trials. The method applies to either binary outcomes or survival to a particular time that is computed from censored survival data. We compute a 95% confidence interval for the predicted result and validate its coverage using simulation. To summarize the additional uncertainty from using a predicted instead of true result for the estimated treatment effect, we compute its multiplier of standard error. Software is available for download.  相似文献   

17.
BACKGROUND: Human diversity, namely single nucleotide polymorphisms (SNPs), is becoming a focus of biomedical research. Despite the binary nature of SNP determination, the majority of genotyping assay data need a critical evaluation for genotype calling. We applied statistical models to improve the automated analysis of 2-dimensional SNP data. METHODS: We derived several quantities in the framework of Gaussian mixture models that provide figures of merit to objectively measure the data quality. The accuracy of individual observations is scored as the probability of belonging to a certain genotype cluster, while the assay quality is measured by the overlap between the genotype clusters. RESULTS: The approach was extensively tested with a dataset of 438 nonredundant SNP assays comprising >150,000 datapoints. The performance of our automatic scoring method was compared with manual assignments. The agreement for the overall assay quality is remarkably good, and individual observations were scored differently by man and machine in 2.6% of cases, when applying stringent probability threshold values. CONCLUSION: Our definition of bounds for the accuracy for complete assays in terms of misclassification probabilities goes beyond other proposed analysis methods. We expect the scoring method to minimise human intervention and provide a more objective error estimate in genotype calling.  相似文献   

18.
Association Models for Clustered Data with Binary and Continuous Responses   总被引:1,自引:0,他引:1  
Summary .  We consider analysis of clustered data with mixed bivariate responses, i.e., where each member of the cluster has a binary and a continuous outcome. We propose a new bivariate random effects model that induces associations among the binary outcomes within a cluster, among the continuous outcomes within a cluster, between a binary outcome and a continuous outcome from different subjects within a cluster, as well as the direct association between the binary and continuous outcomes within the same subject. For the ease of interpretations of the regression effects, the marginal model of the binary response probability integrated over the random effects preserves the logistic form and the marginal expectation of the continuous response preserves the linear form. We implement maximum likelihood estimation of our model parameters using standard software such as PROC NLMIXED of SAS . Our simulation study demonstrates the robustness of our method with respect to the misspecification of the regression model as well as the random effects model. We illustrate our methodology by analyzing a developmental toxicity study of ethylene glycol in mice.  相似文献   

19.
In this paper, we address the problems of fully automatic localization and segmentation of 3D vertebral bodies from CT/MR images. We propose a learning-based, unified random forest regression and classification framework to tackle these two problems. More specifically, in the first stage, the localization of 3D vertebral bodies is solved with random forest regression where we aggregate the votes from a set of randomly sampled image patches to get a probability map of the center of a target vertebral body in a given image. The resultant probability map is then further regularized by Hidden Markov Model (HMM) to eliminate potential ambiguity caused by the neighboring vertebral bodies. The output from the first stage allows us to define a region of interest (ROI) for the segmentation step, where we use random forest classification to estimate the likelihood of a voxel in the ROI being foreground or background. The estimated likelihood is combined with the prior probability, which is learned from a set of training data, to get the posterior probability of the voxel. The segmentation of the target vertebral body is then done by a binary thresholding of the estimated probability. We evaluated the present approach on two openly available datasets: 1) 3D T2-weighted spine MR images from 23 patients and 2) 3D spine CT images from 10 patients. Taking manual segmentation as the ground truth (each MR image contains at least 7 vertebral bodies from T11 to L5 and each CT image contains 5 vertebral bodies from L1 to L5), we evaluated the present approach with leave-one-out experiments. Specifically, for the T2-weighted MR images, we achieved for localization a mean error of 1.6 mm, and for segmentation a mean Dice metric of 88.7% and a mean surface distance of 1.5 mm, respectively. For the CT images we achieved for localization a mean error of 1.9 mm, and for segmentation a mean Dice metric of 91.0% and a mean surface distance of 0.9 mm, respectively.  相似文献   

20.
In experiments with many statistical tests there is need to balance type I and type II error rates while taking multiplicity into account. In the traditional approach, the nominal -level such as 0.05 is adjusted by the number of tests, , i.e., as 0.05/. Assuming that some proportion of tests represent “true signals”, that is, originate from a scenario where the null hypothesis is false, power depends on the number of true signals and the respective distribution of effect sizes. One way to define power is for it to be the probability of making at least one correct rejection at the assumed -level. We advocate an alternative way of establishing how “well-powered” a study is. In our approach, useful for studies with multiple tests, the ranking probability is controlled, defined as the probability of making at least correct rejections while rejecting hypotheses with smallest P-values. The two approaches are statistically related. Probability that the smallest P-value is a true signal (i.e., ) is equal to the power at the level , to an excellent approximation. Ranking probabilities are also related to the false discovery rate and to the Bayesian posterior probability of the null hypothesis. We study properties of our approach when the effect size distribution is replaced for convenience by a single “typical” value taken to be the mean of the underlying distribution. We conclude that its performance is often satisfactory under this simplification; however, substantial imprecision is to be expected when is very large and is small. Precision is largely restored when three values with the respective abundances are used instead of a single typical effect size value.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号