首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Misclassification in binary outcomes can severely bias effect estimates of regression models when the models are naively applied to error‐prone data. Here, we discuss response misclassification in studies on the special class of bilateral diseases. Such diseases can affect neither, one, or both entities of a paired organ, for example, the eyes or ears. If measurements are available on both organ entities, disease occurrence in a person is often defined as disease occurrence in at least one entity. In this setting, there are two reasons for response misclassification: (a) ignorance of missing disease assessment in one of the two entities and (b) error‐prone disease assessment in the single entities. We investigate the consequences of ignoring both types of response misclassification and present an approach to adjust the bias from misclassification by optimizing an adequate likelihood function. The inherent modelling assumptions and problems in case of entity‐specific misclassification are discussed. This work was motivated by studies on age‐related macular degeneration (AMD), a disease that can occur separately in each eye of a person. We illustrate and discuss the proposed analysis approach based on real‐world data of a study on AMD and simulated data.  相似文献   

2.
Neuhaus JM 《Biometrics》2002,58(3):675-683
Misclassified clustered and longitudinal data arise in studies where the response indicates a condition identified through an imperfect diagnostic procedure. Examples include longitudinal studies that use an imperfect diagnostic test to assess whether or not an individual has been infected with a specific virus. This article presents methods to implement both population-averaged and cluster-specific analyses of such data when the misclassification rates are known. The methods exploit the fact that the class of generalized linear models enjoys a closure property in the case of misclassified responses. Data from longitudinal studies of infectious disease will illustrate the findings.  相似文献   

3.
Paulino CD  Soares P  Neuhaus J 《Biometrics》2003,59(3):670-675
Motivated by a study of human papillomavirus infection in women, we present a Bayesian binomial regression analysis in which the response is subject to an unconstrained misclassification process. Our iterative approach provides inferences for the parameters that describe the relationships of the covariates with the response and for the misclassification probabilities. Furthermore, our approach applies to any meaningful generalized linear model, making model selection possible. Finally, it is straightforward to extend it to multinomial settings.  相似文献   

4.

Background

Misclassification has been shown to have a high prevalence in binary responses in both livestock and human populations. Leaving these errors uncorrected before analyses will have a negative impact on the overall goal of genome-wide association studies (GWAS) including reducing predictive power. A liability threshold model that contemplates misclassification was developed to assess the effects of mis-diagnostic errors on GWAS. Four simulated scenarios of case–control datasets were generated. Each dataset consisted of 2000 individuals and was analyzed with varying odds ratios of the influential SNPs and misclassification rates of 5% and 10%.

Results

Analyses of binary responses subject to misclassification resulted in underestimation of influential SNPs and failed to estimate the true magnitude and direction of the effects. Once the misclassification algorithm was applied there was a 12% to 29% increase in accuracy, and a substantial reduction in bias. The proposed method was able to capture the majority of the most significant SNPs that were not identified in the analysis of the misclassified data. In fact, in one of the simulation scenarios, 33% of the influential SNPs were not identified using the misclassified data, compared with the analysis using the data without misclassification. However, using the proposed method, only 13% were not identified. Furthermore, the proposed method was able to identify with high probability a large portion of the truly misclassified observations.

Conclusions

The proposed model provides a statistical tool to correct or at least attenuate the negative effects of misclassified binary responses in GWAS. Across different levels of misclassification probability as well as odds ratios of significant SNPs, the model proved to be robust. In fact, SNP effects, and misclassification probability were accurately estimated and the truly misclassified observations were identified with high probabilities compared to non-misclassified responses. This study was limited to situations where the misclassification probability was assumed to be the same in cases and controls which is not always the case based on real human disease data. Thus, it is of interest to evaluate the performance of the proposed model in that situation which is the current focus of our research.
  相似文献   

5.
Marginal methods have been widely used for the analysis of longitudinal ordinal and categorical data. These models do not require full parametric assumptions on the joint distribution of repeated response measurements but only specify the marginal or even association structures. However, inference results obtained from these methods often incur serious bias when variables are subject to error. In this paper, we tackle the problem that misclassification exists in both response and categorical covariate variables. We develop a marginal method for misclassification adjustment, which utilizes second‐order estimating functions and a functional modeling approach, and can yield consistent estimates and valid inference for mean and association parameters. We propose a two‐stage estimation approach for cases in which validation data are available. Our simulation studies show good performance of the proposed method under a variety of settings. Although the proposed method is phrased to data with a longitudinal design, it also applies to correlated data arising from clustered and family studies, in which association parameters may be of scientific interest. The proposed method is applied to analyze a dataset from the Framingham Heart Study as an illustration.  相似文献   

6.
This note clarifies under what conditions a naive analysis using a misclassified predictor will induce bias for the regression coefficients of other perfectly measured predictors in the model. An apparent discrepancy between some previous results and a result for measurement error of a continuous variable in linear regression is resolved. We show that similar to the linear setting, misclassification (even when not related to the other predictors) induces bias in the coefficients of the perfectly measured predictors, unless the misclassified variable and the perfectly measured predictors are independent. Conditional and asymptotic biases are discussed in the case of linear regression, and explored numerically for an example relating birth weight to the weight and smoking status of the mother.  相似文献   

7.
Stepwise discriminant function analysis for sex assessment was applied to 130 North American Black femora. The measurements included femoral length and three midshaft dimensions likely to be preserved in archaeologically derived and forensic remains. The method correctly assigned sex for 76.4% of the sample (range 70.8–81.5%). This compares favorably with results achieved with other skeletal parts; it also compares favorably with results using the femur in sexing other racial groups. Among our other conclusions are: (1) a “general size factor” is one of major significance in correct classification and in misclassification of sex, and most misclassified individuals are anomalous for this factor; (2) the inconsistency in the relation between circumference and femoral length, which characterizes the remaining misclassified individuals, suggests that anomalous functional demands of body weight/musculature are at fault, and affect circumference more than length; and (3) discriminant function analysis of the same variables in Whites produced similar results, suggesting that sex overrides race in sex assessment; this was confirmed by cross-validating the predictive accuracy of Black discriminant function coefficients on White data, and vice versa.  相似文献   

8.
In epidemiologic studies, measurement error in the exposure variable can have a detrimental effect on the power of hypothesis testing for detecting the impact of exposure in the development of a disease. To adjust for misclassification in the hypothesis testing procedure involving a misclassified binary exposure variable, we consider a retrospective case–control scenario under the assumption of nondifferential misclassification. We develop a test under Bayesian approach from a posterior distribution generated by a MCMC algorithm and a normal prior under realistic assumptions. We compared this test with an equivalent likelihood ratio test developed under the frequentist approach, using various simulated settings and in the presence or the absence of validation data. In our simulations, we considered varying degrees of sensitivity, specificity, sample sizes, exposure prevalence, and proportion of unvalidated and validated data. In these scenarios, our simulation study shows that the adjusted model (with-validation data model) is always better than the unadjusted model (without validation data model). However, we showed that exception is possible in the fixed budget scenario where collection of the validation data requires a much higher cost. We also showed that both Bayesian and frequentist hypothesis testing procedures reach the same conclusions for the scenarios under consideration. The Bayesian approach is, however, computationally more stable in rare exposure contexts. A real case–control study was used to show the application of the hypothesis testing procedures under consideration.  相似文献   

9.
In epidemiologic studies, subjects are often misclassified as to their level of exposure. Ignoring this misclassification error in the analysis introduces bias in the estimates of certain parameters and invalidates many hypothesis tests. For situations in which there is misclassification of exposure in a follow-up study with categorical data, we have developed a model that permits consideration of any number of exposure categories and any number of multiple-category covariates. When used with logistic and Poisson regression procedures, this model helps assess the potential for bias when misclassification is ignored. When reliable ancillary information is available, the model can be used to correct for misclassification bias in the estimates produced by these regression procedures.  相似文献   

10.
The effect of misclassification of phenotypes of a trait on the estimation of recombination value was investigated. The effect was larger for closer linkage. If a locus is dominant and linked with the misclassfied trait locus in the repulsion phase, then the effect on the recombination value between the two loci is largest. A method for estimating the unbiased recombination value and the misclassification rate using maximum likelihood associated with an EM algorithm is also presented. This method was applied to a numerical example from rice genome data. It was concluded that the present method combined with the metric multi-dimensional scaling method is useful for the detection of misclassified markers and for the estimation of unbiased recombination values.  相似文献   

11.
Summary Naive use of misclassified covariates leads to inconsistent estimators of covariate effects in regression models. A variety of methods have been proposed to address this problem including likelihood, pseudo‐likelihood, estimating equation methods, and Bayesian methods, with all of these methods typically requiring either internal or external validation samples or replication studies. We consider a problem arising from a series of orthopedic studies in which interest lies in examining the effect of a short‐term serological response and other covariates on the risk of developing a longer term thrombotic condition called deep vein thrombosis. The serological response is an indicator of whether the patient developed antibodies following exposure to an antithrombotic drug, but the seroconversion status of patients is only available at the time of a blood sample taken upon the discharge from hospital. The seroconversion time is therefore subject to a current status observation scheme, or Case I interval censoring, and subjects tested before seroconversion are misclassified as nonseroconverters. We develop a likelihood‐based approach for fitting regression models that accounts for misclassification of the seroconversion status due to early testing using parametric and nonparametric estimates of the seroconversion time distribution. The method is shown to reduce the bias resulting from naive analyses in simulation studies and an application to the data from the orthopedic studies provides further illustration.  相似文献   

12.
Summary .   Missing data, measurement error, and misclassification are three important problems in many research fields, such as epidemiological studies. It is well known that missing data and measurement error in covariates may lead to biased estimation. Misclassification may be considered as a special type of measurement error, for categorical data. Nevertheless, we treat misclassification as a different problem from measurement error because statistical models for them are different. Indeed, in the literature, methods for these three problems were generally proposed separately given that statistical modeling for them are very different. The problem is more challenging in a longitudinal study with nonignorable missing data. In this article, we consider estimation in generalized linear models under these three incomplete data models. We propose a general approach based on expected estimating equations (EEEs) to solve these three incomplete data problems in a unified fashion. This EEE approach can be easily implemented and its asymptotic covariance can be obtained by sandwich estimation. Intensive simulation studies are performed under various incomplete data settings. The proposed method is applied to a longitudinal study of oral bone density in relation to body bone density.  相似文献   

13.
A Bayesian procedure for misclassified binary data was developed. An animal breeding simulation indicated that, when error of classification was ignored, the variance between clusters was inferred incorrectly. Data were reanalyzed assuming that the probability of misclassification was either known or unknown. In the first case, input parameter values were recovered in the analysis. When the probability was unknown, there was a slight bias; the true probability of misclassification and the true number of miscoded observations appeared within high credibility regions. An analysis of fertility in dairy cows is presented.  相似文献   

14.
M A Espeland  S L Hui 《Biometrics》1987,43(4):1001-1012
Misclassification is a common source of bias and reduced efficiency in the analysis of discrete data. Several methods have been proposed to adjust for misclassification using information on error rates (i) gathered by resampling the study population, (ii) gathered by sampling a separate population, or (iii) assumed a priori. We present unified methods for incorporating these types of information into analyses based on log-linear models and maximum likelihood estimation. General variance expressions are developed. Examples from epidemiologic studies are used to demonstrate the proposed methodology.  相似文献   

15.
Zucker DM  Spiegelman D 《Biometrics》2004,60(2):324-334
We consider the Cox proportional hazards model with discrete-valued covariates subject to misclassification. We present a simple estimator of the regression parameter vector for this model. The estimator is based on a weighted least squares analysis of weighted-averaged transformed Kaplan-Meier curves for the different possible configurations of the observed covariate vector. Optimal weighting of the transformed Kaplan-Meier curves is described. The method is designed for the case in which the misclassification rates are known or are estimated from an external validation study. A hybrid estimator for situations with an internal validation study is also described. When there is no misclassification, the regression coefficient vector is small in magnitude, and the censoring distribution does not depend on the covariates, our estimator has the same asymptotic covariance matrix as the Cox partial likelihood estimator. We present results of a finite-sample simulation study under Weibull survival in the setting of a single binary covariate with known misclassification rates. In this simulation study, our estimator performed as well as or, in a few cases, better than the full Weibull maximum likelihood estimator. We illustrate the method on data from a study of the relationship between trans-unsaturated dietary fat consumption and cardiovascular disease incidence.  相似文献   

16.
Though twinning rates have been rapidly increasing in Japan, the problem of zygosity misclassification at birth has been paid little attention. By analyzing four independent samples, the authors found that at a constant rate about 25-30% of monozygotic twins were misclassified as dizygotic twins at birth. This percentage is in very good accordance with that of monozygotic twins having dizygous placenta. Generally the obstetricians informed twins' parents about their children's zygosity. The number of placentas, as informed by obstetricians, was very strongly associated with zygosity. Concluding, even now many monozygotic twins in Japan may be misclassified as dizygotic at birth by obstetricians based solely on the number of placenta.  相似文献   

17.
Background: Recent research suggests that the Bayesian paradigm may be useful for modeling biases in epidemiological studies, such as those due to misclassification and missing data. We used Bayesian methods to perform sensitivity analyses for assessing the robustness of study findings to the potential effect of these two important sources of bias. Methods: We used data from a study of the joint associations of radiotherapy and smoking with primary lung cancer among breast cancer survivors. We used Bayesian methods to provide an operational way to combine both validation data and expert opinion to account for misclassification of the two risk factors and missing data. For comparative purposes we considered a “full model” that allowed for both misclassification and missing data, along with alternative models that considered only misclassification or missing data, and the naïve model that ignored both sources of bias. Results: We identified noticeable differences between the four models with respect to the posterior distributions of the odds ratios that described the joint associations of radiotherapy and smoking with primary lung cancer. Despite those differences we found that the general conclusions regarding the pattern of associations were the same regardless of the model used. Overall our results indicate a nonsignificantly decreased lung cancer risk due to radiotherapy among nonsmokers, and a mildly increased risk among smokers. Conclusions: We described easy to implement Bayesian methods to perform sensitivity analyses for assessing the robustness of study findings to misclassification and missing data.  相似文献   

18.
Holcroft CA  Spiegelman D 《Biometrics》1999,55(4):1193-1201
We compared several validation study designs for estimating the odds ratio of disease with misclassified exposure. We assumed that the outcome and misclassified binary covariate are available and that the error-free binary covariate is measured in a subsample, the validation sample. We considered designs in which the total size of the validation sample is fixed and the probability of selection into the validation sample may depend on outcome and misclassified covariate values. Design comparisons were conducted for rare and common disease scenarios, where the optimal design is the one that minimizes the variance of the maximum likelihood estimator of the true log odds ratio relating the outcome to the exposure of interest. Misclassification rates were assumed to be independent of the outcome. We used a sensitivity analysis to assess the effect of misspecifying the misclassification rates. Under the scenarios considered, our results suggested that a balanced design, which allocates equal numbers of validation subjects into each of the four outcome/mismeasured covariate categories, is preferable for its simplicity and good performance. A user-friendly Fortran program is available from the second author, which calculates the optimal sampling fractions for all designs considered and the efficiencies of these designs relative to the optimal hybrid design for any scenario of interest.  相似文献   

19.

Background

Statistically reconstructing haplotypes from single nucleotide polymorphism (SNP) genotypes, can lead to falsely classified haplotypes. This can be an issue when interpreting haplotype association results or when selecting subjects with certain haplotypes for subsequent functional studies. It was our aim to quantify haplotype reconstruction error and to provide tools for it.

Methods and Results

By numerous simulation scenarios, we systematically investigated several error measures, including discrepancy, error rate, and R2, and introduced the sensitivity and specificity to this context. We exemplified several measures in the KORA study, a large population-based study from Southern Germany. We find that the specificity is slightly reduced only for common haplotypes, while the sensitivity was decreased for some, but not all rare haplotypes. The overall error rate was generally increasing with increasing number of loci, increasing minor allele frequency of SNPs, decreasing correlation between the alleles and increasing ambiguity.

Conclusions

We conclude that, with the analytical approach presented here, haplotype-specific error measures can be computed to gain insight into the haplotype uncertainty. This method provides the information, if a specific risk haplotype can be expected to be reconstructed with rather no or high misclassification and thus on the magnitude of expected bias in association estimates. We also illustrate that sensitivity and specificity separate two dimensions of the haplotype reconstruction error, which completely describe the misclassification matrix and thus provide the prerequisite for methods accounting for misclassification.  相似文献   

20.
《Cancer epidemiology》2014,38(5):619-622
IntroductionStudies have shown that women with a false-positive result from mammography screening have an excess risk for breast cancer compared with women who only have negative results. We aimed to assess the excess risk of cancer after a false-positive result excluding cases of misclassification, i.e. women who were actually false-negatives instead of false-positives.MethodWe used data from the Copenhagen Mammography Screening Programme, Denmark. The study population was the 295 women, out of 4743 recalled women from a total of 58,003 participants, with a false-positive test during the screening period 1991–2005 and who later developed breast cancer. Cancers that developed in the same location as the finding that initially caused the recall was studied in-depth in order to establish whether there had been misclassification.ResultsSeventy-two cases were found to be misclassified. When the women with misclassified tests had been excluded, there was an excess risk of breast cancer of 27% (RR = 1.27, 95% confidence interval (CI), 1.11–1.46) among the women with a false-positive test compared to women with only negative tests. Women with a false-positive test determined at assessment had an excess risk of 27%, while false-positives determined at surgery had an excess risk of 30%.ConclusionsThe results indicate that the increased risk is not explained only by misclassification. The excess risk remains for false-positives determined at assessment as well as at surgery, which favours some biological susceptibility. Further research into the true excess risk of false positives is warranted.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号