首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background  

Research has indicated that a number of different factors affect whether an animal receives treatment or not when diseased. The aim of this paper was to evaluate if herd or individual animal characteristics influence whether cattle receives veterinary treatment for disease, and thereby also introduce misclassification in the disease recording system.  相似文献   

2.
The problem of discrimination and classification is central to much of epidemiology. Here we consider the estimation of a logistic regression/discrimination function from training samples, when one of the training samples is subject to misclassification or mislabeling, e.g. diseased individuals are incorrectly classified/labeled as healthy controls. We show that this leads to zero-inflated binomial model with a defective logistic regression or discrimination function, whose parameters can be estimated using standard statistical methods such as maximum likelihood. These parameters can be used to estimate the probability of true group membership among those, possibly erroneously, classified as controls. Two examples are analyzed and discussed. A simulation study explores properties of the maximum likelihood parameter estimates and the estimates of the number of mislabeled observations.  相似文献   

3.
Marginal methods have been widely used for the analysis of longitudinal ordinal and categorical data. These models do not require full parametric assumptions on the joint distribution of repeated response measurements but only specify the marginal or even association structures. However, inference results obtained from these methods often incur serious bias when variables are subject to error. In this paper, we tackle the problem that misclassification exists in both response and categorical covariate variables. We develop a marginal method for misclassification adjustment, which utilizes second‐order estimating functions and a functional modeling approach, and can yield consistent estimates and valid inference for mean and association parameters. We propose a two‐stage estimation approach for cases in which validation data are available. Our simulation studies show good performance of the proposed method under a variety of settings. Although the proposed method is phrased to data with a longitudinal design, it also applies to correlated data arising from clustered and family studies, in which association parameters may be of scientific interest. The proposed method is applied to analyze a dataset from the Framingham Heart Study as an illustration.  相似文献   

4.
Zucker DM  Spiegelman D 《Biometrics》2004,60(2):324-334
We consider the Cox proportional hazards model with discrete-valued covariates subject to misclassification. We present a simple estimator of the regression parameter vector for this model. The estimator is based on a weighted least squares analysis of weighted-averaged transformed Kaplan-Meier curves for the different possible configurations of the observed covariate vector. Optimal weighting of the transformed Kaplan-Meier curves is described. The method is designed for the case in which the misclassification rates are known or are estimated from an external validation study. A hybrid estimator for situations with an internal validation study is also described. When there is no misclassification, the regression coefficient vector is small in magnitude, and the censoring distribution does not depend on the covariates, our estimator has the same asymptotic covariance matrix as the Cox partial likelihood estimator. We present results of a finite-sample simulation study under Weibull survival in the setting of a single binary covariate with known misclassification rates. In this simulation study, our estimator performed as well as or, in a few cases, better than the full Weibull maximum likelihood estimator. We illustrate the method on data from a study of the relationship between trans-unsaturated dietary fat consumption and cardiovascular disease incidence.  相似文献   

5.
Cohort studies and clinical trials may involve multiple events. When occurrence of one of these events prevents the observance of another, the situation is called “competing risks”. A useful measure in such studies is the cumulative incidence of an event, which is useful in evaluating interventions or assessing disease prognosis. When outcomes in such studies are subject to misclassification, the resulting cumulative incidence estimates may be biased. In this work, we study the mechanism of bias in cumulative incidence estimation due to outcome misclassification. We show that even moderate levels of misclassification can lead to seriously biased estimates in a frequently unpredictable manner. We propose an easy to use estimator for correcting this bias that is uniformly consistent. Extensive simulations suggest that this method leads to unbiased estimates in practical settings. The proposed method is useful, both in settings where misclassification probabilities are known by historical data or can be estimated by other means, and for performing sensitivity analyses when the misclassification probabilities are not precisely known.  相似文献   

6.
The accurate detection and classification of diseased pine trees with different levels of severity is important in terms of monitoring the growth of these trees and for preventing and controlling disease within pine forests. Our method combines a DDYOLOv5 with a ResNet50 network for detecting and classifying levels of pine tree disease from remote sensing UAV images. In this approach, images are preprocessed to increase the background diversity of the training samples, and efficient channel attention (ECA) and hybrid dilated convolution (HDC) modules are introduced to DDYOLOv5 to improve the detection accuracy. The ECA modules enable the network to focus on the characteristics of diseased pine trees, and solve the problem of low detection accuracy caused by the similarities in color and texture between diseased pine trees and the complex backgrounds. The HDC modules capture the contextual information of targets at different scales; they increase the receptive field to focus on targets of different sizes, and address the difficulty of detection caused by large variations in the shapes and sizes of diseased pine trees. In addition, a low confidence threshold is adopted to reduce missed detections and a ResNet50 classification network is applied to classify the detection results into different levels of severity, in order to reduce the number of false detections and improve the classification accuracy. Our experimental results show that the proposed method improves the precision by 13.55%, the recall by 5.06% and the F1-score by 9.71% on 8 test images compared with YOLOv5. Moreover, the detection and classification results from our approach show that it outperforms classical deep learning object detection methods such as Faster R-CNN and RetinaNet.  相似文献   

7.
Outcome misclassification occurs frequently in binary-outcome studies and can result in biased estimation of quantities such as the incidence, prevalence, cause-specific hazards, cumulative incidence functions, and so forth. A number of remedies have been proposed to address the potential misclassification of the outcomes in such data. The majority of these remedies lie in the estimation of misclassification probabilities, which are in turn used to adjust analyses for outcome misclassification. A number of authors advocate using a gold-standard procedure on a sample internal to the study to learn about the extent of the misclassification. With this type of internal validation, the problem of quantifying the misclassification also becomes a missing data problem as, by design, the true outcomes are only ascertained on a subset of the entire study sample. Although, the process of estimating misclassification probabilities appears simple conceptually, the estimation methods proposed so far have several methodological and practical shortcomings. Most methods rely on missing outcome data to be missing completely at random (MCAR), a rather stringent assumption which is unlikely to hold in practice. Some of the existing methods also tend to be computationally-intensive. To address these issues, we propose a computationally-efficient, easy-to-implement, pseudo-likelihood estimator of the misclassification probabilities under a missing at random (MAR) assumption, in studies with an available internal-validation sample. We present the estimator through the lens of studies with competing-risks outcomes, though the estimator extends beyond this setting. We describe the consistency and asymptotic distributional properties of the resulting estimator, and derive a closed-form estimator of its variance. The finite-sample performance of this estimator is evaluated via simulations. Using data from a real-world study with competing-risks outcomes, we illustrate how the proposed method can be used to estimate misclassification probabilities. We also show how the estimated misclassification probabilities can be used in an external study to adjust for possible misclassification bias when modeling cumulative incidence functions.  相似文献   

8.
Patients diagnosed with a standard clinical method (subject to misclassification error) are often combined with patients diagnosed with a gold-standard method (with zero or very small misclassification error) in family-based studies of complex disease. For example, non-autopsied patients (NAP) are often included along with autopsy-proven (AP) patients in family-based studies of complex diseases, such as Alzheimer's disease (AD). Theoretical and simulation studies suggest that certain misclassification errors can result in severe reduction of power in genetic linkage and association analyses and that phenotype (or diagnostic) error can produce misleading results. Morton's test for heterogeneity can identify genomic regions where error may have led to loss in power. We applied this test to pedigree data from the NIMH Alzheimer's Disease Genetics Initiative Database separated into AP and NAP pedigrees. Morton's test identified one highly significant region of heterogeneity on chromosome 2. The source of the heterogeneity was due to significant indication of linkage in the AP pedigrees at position 109 cM (p value = 6.68 x 10(-5)) with no indication in the NAP pedigrees. Furthermore, Morton's test showed no evidence for heterogeneity on chromosome 19 in early-onset pedigrees that showed highly significant evidence for linkage in other published reports. These results suggest that supplementing linkage analysis with Morton's test can be usefully applied to genetic data sets that have AP and NAP samples, or other sample mixtures that include a 'gold standard' subgroup with reduced error rate, to increase power to detect linkage in the presence of diagnostic misclassification.  相似文献   

9.
Misclassification in binary outcomes can severely bias effect estimates of regression models when the models are naively applied to error‐prone data. Here, we discuss response misclassification in studies on the special class of bilateral diseases. Such diseases can affect neither, one, or both entities of a paired organ, for example, the eyes or ears. If measurements are available on both organ entities, disease occurrence in a person is often defined as disease occurrence in at least one entity. In this setting, there are two reasons for response misclassification: (a) ignorance of missing disease assessment in one of the two entities and (b) error‐prone disease assessment in the single entities. We investigate the consequences of ignoring both types of response misclassification and present an approach to adjust the bias from misclassification by optimizing an adequate likelihood function. The inherent modelling assumptions and problems in case of entity‐specific misclassification are discussed. This work was motivated by studies on age‐related macular degeneration (AMD), a disease that can occur separately in each eye of a person. We illustrate and discuss the proposed analysis approach based on real‐world data of a study on AMD and simulated data.  相似文献   

10.
Bitter crab disease (BCD) of snow crabs Chionoecetes opilio is caused by a parasitic dinoflagellate, Hematodinium sp. In Newfoundland's commercial fishery, infected snow crabs are identified using visual, macroscopic signs of disease for separation prior to processing. We estimated the sensitivity and specificity of gross, macroscopic diagnosis of Hematodinium sp. by comparing these results with microscopic examination of prepared hemolymph smears. The sensitivity of a diagnostic test is the probability that the test will yield a positive result given that the animal has the disease. The specificity is the probability of a negative result given the animal is not diseased. In October 1998, we conducted a design-based survey using cluster sampling in 2 strata. Over 10 000 snow crabs from pot and trawl surveys were examined macroscopically for BCD. In addition, over 350 crabs were randomly examined microscopically for disease. The double sampling resulted in an estimated sensitivity of 52.7% and an estimated specificity of 100%. That is, a positive result from macroscopic examination is definitive, if the observer is well trained, but macroscopic examination will fail to detect infections in crabs with borderline clinical signs of disease. The prevalence estimated from macroscopic observations (p(st) = 2.24%) was corrected for misclassification by dividing p(st) by the estimated sensitivity (0.527), giving a corrected estimate of 4.25%. The use of double sampling provides for efficient estimation of prevalence in that large numbers of crabs can be quickly examined for gross signs of infection and the results corrected for misclassification based on a limited number of observations with a better, but time-consuming test. In addition, the prevalence of macroscopically infected male crabs was lower in a trap survey (0.57%) compared to a trawl survey (1.59%). In the trawl survey, female crabs had a significantly higher prevalence of macroscopically diagnosed infections than males (6.34%). The prevalence of BCD has shown an alarming increase since it was first detected in Newfoundland during the early 1990s. Transmission and mortality studies are warranted to better understand the effect of the disease on its commercially important host.  相似文献   

11.
Wagner M  Naik D  Pothen A 《Proteomics》2003,3(9):1692-1698
We report our results in classifying protein matrix-assisted laser desorption/ionization-time of flight mass spectra obtained from serum samples into diseased and healthy groups. We discuss in detail five of the steps in preprocessing the mass spectral data for biomarker discovery, as well as our criterion for choosing a small set of peaks for classifying the samples. Cross-validation studies with four selected proteins yielded misclassification rates in the 10-15% range for all the classification methods. Three of these proteins or protein fragments are down-regulated and one up-regulated in lung cancer, the disease under consideration in this data set. When cross-validation studies are performed, care must be taken to ensure that the test set does not influence the choice of the peaks used in the classification. Misclassification rates are lower when both the training and test sets are used to select the peaks used in classification versus when only the training set is used. This expectation was validated for various statistical discrimination methods when thirteen peaks were used in cross-validation studies. One particular classification method, a linear support vector machine, exhibited especially robust performance when the number of peaks was varied from four to thirteen, and when the peaks were selected from the training set alone. Experiments with the samples randomly assigned to the two classes confirmed that misclassification rates were significantly higher in such cases than those observed with the true data. This indicates that our findings are indeed significant. We found closely matching masses in a database for protein expression in lung cancer for three of the four proteins we used to classify lung cancer. Data from additional samples, increased experience with the performance of various preprocessing techniques, and affirmation of the biological roles of the proteins that help in classification, will strengthen our conclusions in the future.  相似文献   

12.

Background

The Jaffe and enzymatic methods are the two most common methods for measuring serum creatinine. The Jaffe method is less expensive than the enzymatic method but is also more susceptible to interferences. Interferences can lead to misdiagnosis but interferences may vary by patient population. The overall risk associated with the Jaffe method depends on the probability of misclassification and the consequences of misclassification. This study assessed the risk associated with the Jaffe method in an outpatient population. We analyzed the discordance rate in the estimated glomerular filtration rate based on serum creatinine measurements obtained by the Jaffe and enzymatic method.

Methods

Method comparison and risk analysis. Five hundred twenty-nine eGFRs obtained by the Jaffe and enzymatic method were compared at four clinical decision limits. We determined the probability of discordance and the consequence of misclassification at each decision limit to evaluate the overall risk.

Results

We obtained 529 paired observations. Of these, 29 (5.5%) were discordant with respect to one of the decision limits (i.e. 15, 30, 45 or 60 ml/min/1.73m2). The magnitude of the differences (Jaffe result minus enzymatic result) were significant relative to analytical variation in 21 of the 29 (72%) of the discordant results. The magnitude of the differences were not significant relative to biological variation. The risk associated with misclassification was greatest at the 60 ml/min/1.73m2 decision limit because the probability of misclassification and the potential for adverse outcomes were greatest at that decision limit.

Conclusion

The Jaffe method is subject to bias due to interfering substances (loss of analytical specificity). The risk of misclassification is greatest at the 60 ml/min/1.73m2 decision limit; however, the risk of misclassification due to bias is much less than the risk of misclassification due to biological variation. The Jaffe method may pose low risk in selected populations if eGFR results near the 60 ml/min/1.73m2 decision limit are interpreted with caution.  相似文献   

13.
Spatial Analysis Based on Variance of Moving Window Averages   总被引:1,自引:0,他引:1  
A new method for analysing spatial patterns was designed based on the variance of moving window averages (VMWA), which can be directly calculated in geographical information systems or a spreadsheet program (e.g. MS Excel). Different types of artificial data were generated to test the method. Regardless of data types, the VMWA method correctly determined the mean cluster sizes. This method was also employed to assess spatial patterns in historical plant disease survey data encompassing both airborne and soilborne diseases. The results obtained using the VMWA method were generally different from those obtained with Lloyd's index of patchiness and beta‐binomial distribution methods, were in partial agreement with the results from spatial analysis by distance indices, and were highly consistent with the results from semivariogram and spatial autocorrelation analysis methods. Results demonstrated that the VMWA method can be applied to many types of data, including binomial diseased or healthy plant counts, incidence, severity, and number of diseased plants or pathogen propagules although directional and edge effects may limit its application.  相似文献   

14.
Summary .   Missing data, measurement error, and misclassification are three important problems in many research fields, such as epidemiological studies. It is well known that missing data and measurement error in covariates may lead to biased estimation. Misclassification may be considered as a special type of measurement error, for categorical data. Nevertheless, we treat misclassification as a different problem from measurement error because statistical models for them are different. Indeed, in the literature, methods for these three problems were generally proposed separately given that statistical modeling for them are very different. The problem is more challenging in a longitudinal study with nonignorable missing data. In this article, we consider estimation in generalized linear models under these three incomplete data models. We propose a general approach based on expected estimating equations (EEEs) to solve these three incomplete data problems in a unified fashion. This EEE approach can be easily implemented and its asymptotic covariance can be obtained by sandwich estimation. Intensive simulation studies are performed under various incomplete data settings. The proposed method is applied to a longitudinal study of oral bone density in relation to body bone density.  相似文献   

15.
Yu C  Zelterman D 《Biometrics》2002,58(3):481-491
In many epidemiologic studies, the first indication of an environmental or genetic contribution to the disease is the way in which the diseased cases cluster within the same family units. The concept of clustering is contrasted with incidence. We assume that all individuals are exchangeable except for their disease status. This assumption is used to provide an exact test of the initial hypothesis of no familial link with the disease, conditional on the number of diseased cases and the distribution of the sizes of the various family units. New parametric generalizations of binomial sampling models are described to provide measures of the effect size of the disease clustering. We consider models and an example that takes covariates into account. Ascertainment bias is described and the appropriate sampling distribution is demonstrated. Four numerical examples with real data illustrate these methods.  相似文献   

16.
Most parametric methods for detecting foreign genes in bacterial genomes use a scoring function that measures the atypicality of a gene with respect to the bulk of the genome. Genes whose features are sufficiently atypical-lying beyond a threshold value-are deemed foreign. Yet these methods fail when the range of features of donor genomes overlaps with that of the recipient genome, leading to misclassification of foreign and native genes; existing parametric methods choose threshold parameters to balance these error rates. To circumvent this problem, we have developed a two-pronged approach to minimize the misclassification of genes. First, beyond classifying genes as merely atypical, a gene clustering method based on Jensen-Shannon entropic divergence identifies classes of foreign genes that are also similar to each other. Second, genome position is used to reassign genes among classes whose composition features overlap. This process minimizes the misclassification of either native or foreign genes that are weakly atypical. The performance of this approach was assessed using artificial chimeric genomes and then applied to the well-characterized Escherichia coli K12 genome. Not only were foreign genes identified with a high degree of accuracy, but genes originating from the same donor organism were effectively grouped.  相似文献   

17.
Knowledge of the distribution of Paenibacillus larvae spores, the causative agent of American foulbrood (AFB), among individual adult honey bees is crucial for determining the appropriate number of adult bees to include in apiary composite samples when screening for diseased colonies. To study spore distribution at the individual bee level, 500 honey bees were collected from different parts of eight clinically diseased colonies and individually analyzed for P. larvae. From the brood chamber and from the super, bees were randomly collected and individually put in Eppendorf vials. The samples were frozen as soon as possible after collection. Concurrently with sampling, each colony was visually inspected for clinical symptoms of AFB. The number of clinically diseased cells in the colony was visually estimated. All samples were cultured in the laboratory for P. larvae. The results demonstrate that the spores are not randomly distributed among the bees; some bees have much higher spore loads than others. It is also clear that as the proportion of contaminated bees increase, the number of spores from each positive bee also increases. The data also demonstrated a relationship between the number of clinically diseased cells and the proportion of positive bees in individual colonies. This relationship was used to develop a mathematical formula for estimating the minimum number of bees in a sample to detect clinical disease. The formula takes into account the size of the apiary and the degree of certainty with which one aims to discover clinical symptoms. Calculations using the formula suggest that adult bee samples at the colony level will detect light AFB infections with a high probability. However, the skewed spore distribution of the adult bees makes composite sampling at the apiary level more problematic, if the aim of the sampling is to locate lightly infected individual colonies within apiaries. The results suggest that false-negative culturing results from composite samples of adult bees from individual colonies with clinical symptoms of AFB are highly improbable. However, if single colonies have light infections in large apiaries, the dilution effect from uncontaminated bees from healthy colonies on the positive bees from diseased colonies may yield false-negative results at the apiary level.  相似文献   

18.
The feline odontoclastic resorptive lesion (FORL) is a common oral problem in cats. The disease has increased steadily since the domestication of cats and etiology of this disease has not been fully determined although several theories have been proposed. Feeding practices, vaccination, and neutering programs have all been suspected to be associated with FORL. The aim of the current study is to assess the feasibility of metabonomics to detect at an early stage the onset of the disease. The diagnostic biomarkers could then be used as “efficacy markers” for nutritional intervention in preventing and/or slowing the progression of FORL. 1H-NMR- and LC/MS-based metabonomic analysis of saliva samples obtained from a group of 21 cats (11 healthy and 10 FORL diseased) showed clear differences in the metabolic composition of saliva from healthy and FORL-diseased cats. To identify biomarkers, the spectroscopic data was processed using partial least-squares discriminant analysis (PLS-DA) and validated by leave-one-subject-out cross validation. The PLS-DA model predicted FORL- diseased cats with over 60% accuracy. The maximum value of Q2 of the random permutation sets was less than 0.3. The diseased cats showed increased levels of many organic and amino acids, such as acetate, lactate, propionate, isovalerate, tryptamine, and phenylalanine suggesting changes in oral microflora in the disease situation. This study is preliminary and a larger study with more samples to further validate the biomarker profile predictive of an early FORL pathophysiological status is in progress.  相似文献   

19.
We have developed a new general approach for handling misclassification in discrete covariates or responses in regression models. The simulation and extrapolation (SIMEX) method, which was originally designed for handling additive covariate measurement error, is applied to the case of misclassification. The statistical model for characterizing misclassification is given by the transition matrix Pi from the true to the observed variable. We exploit the relationship between the size of misclassification and bias in estimating the parameters of interest. Assuming that Pi is known or can be estimated from validation data, we simulate data with higher misclassification and extrapolate back to the case of no misclassification. We show that our method is quite general and applicable to models with misclassified response and/or misclassified discrete regressors. In the case of a binary response with misclassification, we compare our method to the approach of Neuhaus, and to the matrix method of Morrissey and Spiegelman in the case of a misclassified binary regressor. We apply our method to a study on caries with a misclassified longitudinal response.  相似文献   

20.
Extensive studies showed that no disease was caused when seeds of different forage grasses were inoculated with Xanthomonas campestris pv. graminis. The disease could easily be induced by infecting the plants in the root system, leaves or flower. The inoculation site in the leaf proved to be of vital importance for the development of the disease. Wilting symptoms were quickly induced when the pathogen was inoculated near the leaf base. Plants in root-contact with diseased plants showed disease symptoms. It is not known whether these symptoms were caused by the bacteria or by toxins released by nearby diseased plants. Cross inoculation trials on different grass varieties revealed that different pathovars exist in the group of xanthomonads, pathogenic to forage grasses. Some have a broad host range whereas others are more limited to a single plant genus. Field trials suggest that in Belgian climatic conditions, the losses caused by bacterial wilt are rather limited.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号