首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 45 毫秒
1.
The receiver operating characteristic (ROC) curve is used to evaluate a biomarker's ability for classifying disease status. The Youden Index (J), the maximum potential effectiveness of a biomarker, is a common summary measure of the ROC curve. In biomarker development, levels may be unquantifiable below a limit of detection (LOD) and missing from the overall dataset. Disregarding these observations may negatively bias the ROC curve and thus J. Several correction methods have been suggested for mean estimation and testing; however, little has been written about the ROC curve or its summary measures. We adapt non-parametric (empirical) and semi-parametric (ROC-GLM [generalized linear model]) methods and propose parametric methods (maximum likelihood (ML)) to estimate J and the optimal cut-point (c *) for a biomarker affected by a LOD. We develop unbiased estimators of J and c * via ML for normally and gamma distributed biomarkers. Alpha level confidence intervals are proposed using delta and bootstrap methods for the ML, semi-parametric, and non-parametric approaches respectively. Simulation studies are conducted over a range of distributional scenarios and sample sizes evaluating estimators' bias, root-mean square error, and coverage probability; the average bias was less than one percent for ML and GLM methods across scenarios and decreases with increased sample size. An example using polychlorinated biphenyl levels to classify women with and without endometriosis illustrates the potential benefits of these methods. We address the limitations and usefulness of each method in order to give researchers guidance in constructing appropriate estimates of biomarkers' true discriminating capabilities.  相似文献   

2.
Although most of the statistical methods for diagnostic studies focus on disease processes with binary disease status, many diseases can be naturally classified into three ordinal diagnostic categories, that is normal, early stage, and fully diseased. For such diseases, the volume under the ROC surface (VUS) is the most commonly used index of diagnostic accuracy. Because the early disease stage is most likely the optimal time window for therapeutic intervention, the sensitivity to the early diseased stage has been suggested as another diagnostic measure. For the purpose of comparing the diagnostic abilities on early disease detection between two markers, it is of interest to estimate the confidence interval of the difference between sensitivities to the early diseased stage. In this paper, we present both parametric and non‐parametric methods for this purpose. An extensive simulation study is carried out for a variety of settings for the purpose of evaluating and comparing the performance of the proposed methods. A real example of Alzheimer's disease (AD) is analyzed using the proposed approaches.  相似文献   

3.
Yi Li  Lu Tian  Lee‐Jen Wei 《Biometrics》2011,67(2):427-435
Summary In a longitudinal study, suppose that the primary endpoint is the time to a specific event. This response variable, however, may be censored by an independent censoring variable or by the occurrence of one of several dependent competing events. For each study subject, a set of baseline covariates is collected. The question is how to construct a reliable prediction rule for the future subject's profile of all competing risks of interest at a specific time point for risk‐benefit decision making. In this article, we propose a two‐stage procedure to make inferences about such subject‐specific profiles. For the first step, we use a parametric model to obtain a univariate risk index score system. We then estimate consistently the average competing risks for subjects who have the same parametric index score via a nonparametric function estimation procedure. We illustrate this new proposal with the data from a randomized clinical trial for evaluating the efficacy of a treatment for prostate cancer. The primary endpoint for this study was the time to prostate cancer death, but had two types of dependent competing events, one from cardiovascular death and the other from death of other causes.  相似文献   

4.
In the development of structural equation models (SEMs), observed variables are usually assumed to be normally distributed. However, this assumption is likely to be violated in many practical researches. As the non‐normality of observed variables in an SEM can be obtained from either non‐normal latent variables or non‐normal residuals or both, semiparametric modeling with unknown distribution of latent variables or unknown distribution of residuals is needed. In this article, we find that an SEM becomes nonidentifiable when both the latent variable distribution and the residual distribution are unknown. Hence, it is impossible to estimate reliably both the latent variable distribution and the residual distribution without parametric assumptions on one or the other. We also find that the residuals in the measurement equation are more sensitive to the normality assumption than the latent variables, and the negative impact on the estimation of parameters and distributions due to the non‐normality of residuals is more serious. Therefore, when there is no prior knowledge about parametric distributions for either the latent variables or the residuals, we recommend making parametric assumption on latent variables, and modeling residuals nonparametrically. We propose a semiparametric Bayesian approach using the truncated Dirichlet process with a stick breaking prior to tackle the non‐normality of residuals in the measurement equation. Simulation studies and a real data analysis demonstrate our findings, and reveal the empirical performance of the proposed methodology. A free WinBUGS code to perform the analysis is available in Supporting Information.  相似文献   

5.
Directly standardized rates continue to be an integral tool for presenting rates for diseases that are highly dependent on age, such as cancer. Statistically, these rates are modeled as a weighted sum of Poisson random variables. This is a difficult statistical problem, because there are k observed Poisson variables and k unknown means. The gamma confidence interval has been shown through simulations to have at least nominal coverage in all simulated scenarios, but it can be overly conservative. Previous modifications to that method have closer to nominal coverage on average, but they do not achieve the nominal coverage bound in all situations. Further, those modifications are not central intervals, and the upper coverage error rate can be substantially more than half the nominal error. Here we apply a mid‐p modification to the gamma confidence interval. Typical mid‐p methods forsake guaranteed coverage to get coverage that is sometimes higher and sometimes lower than the nominal coverage rate, depending on the values of the parameters. The mid‐p gamma interval does not have guaranteed coverage in all situations; however, in the (not rare) situations where the gamma method is overly conservative, the mid‐p gamma interval often has at least nominal coverage. The mid‐p gamma interval is especially appropriate when one wants a central interval, since simulations show that in many situations both the upper and lower coverage error rates are on average less than or equal to half the nominal error rate.  相似文献   

6.
To classify patients either as resistant or non‐resistant to HIV therapy based on longitudinal viral load profiles, we applied longitudinal quadratic discriminant analysis and examined various measures, mainly derived from the Brier Score, to assess the biomarker performance in terms of discrimination and calibration. The analysis of the application data revealed an increase in performance by using longer profiles instead of single biomarker measurements. Simulations showed that the selection of mixed models for the estimation of the group‐specific discriminant rule parameters should be based on BIC, rather than on the best performance measure. An incorrect model selection can lead to spuriously better or worse performance as misclassification and classification certainty regards, especially with increasing length of the profiles and for more complex models with random slopes.  相似文献   

7.
Existing methods for joint modeling of longitudinal measurements and survival data can be highly influenced by outliers in the longitudinal outcome. We propose a joint model for analysis of longitudinal measurements and competing risks failure time data which is robust in the presence of outlying longitudinal observations during follow‐up. Our model consists of a linear mixed effects sub‐model for the longitudinal outcome and a proportional cause‐specific hazards frailty sub‐model for the competing risks data, linked together by latent random effects. Instead of the usual normality assumption for measurement errors in the linear mixed effects sub‐model, we adopt a t ‐distribution which has a longer tail and thus is more robust to outliers. We derive an EM algorithm for the maximum likelihood estimates of the parameters and estimate their standard errors using a profile likelihood method. The proposed method is evaluated by simulation studies and is applied to a scleroderma lung study (© 2009 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

8.
Ye W  Lin X  Taylor JM 《Biometrics》2008,64(4):1238-1246
SUMMARY: In this article we investigate regression calibration methods to jointly model longitudinal and survival data using a semiparametric longitudinal model and a proportional hazards model. In the longitudinal model, a biomarker is assumed to follow a semiparametric mixed model where covariate effects are modeled parametrically and subject-specific time profiles are modeled nonparametrially using a population smoothing spline and subject-specific random stochastic processes. The Cox model is assumed for survival data by including both the current measure and the rate of change of the underlying longitudinal trajectories as covariates, as motivated by a prostate cancer study application. We develop a two-stage semiparametric regression calibration (RC) method. Two variations of the RC method are considered, risk set regression calibration and a computationally simpler ordinary regression calibration. Simulation results show that the two-stage RC approach performs well in practice and effectively corrects the bias from the naive method. We apply the proposed methods to the analysis of a dataset for evaluating the effects of the longitudinal biomarker PSA on the recurrence of prostate cancer.  相似文献   

9.
Traditionally, biomarkers of aging are classified as either pro‐longevity or antilongevity. Using longitudinal data sets from the large‐scale inbred mouse strain study at the Jackson Laboratory Nathan Shock Center, we describe a protocol to identify two kinds of biomarkers: those with prognostic implication for lifespan and those with longitudinal evidence. Our protocol also identifies biomarkers for which, at first sight, there is conflicting evidence. Conflict resolution is possible by postulating a role switch. In these cases, high biomarker values are, for example, antilongevity in early life and pro‐longevity in later life. Role‐switching biomarkers correspond to features that must, for example, be minimized early, but maximized later, for optimal longevity. The clear‐cut pro‐longevity biomarkers we found reflect anti‐inflammatory, anti‐immunosenescent or anti‐anaemic mechanisms, whereas clear‐cut antilongevity biomarkers reflect inflammatory mechanisms. Many highly significant blood biomarkers relate to immune system features, indicating a shift from adaptive to innate processes, whereas most role‐switching biomarkers relate to blood serum features and whole‐body phenotypes. Our biomarker classification approach is applicable to any combination of longitudinal studies with life expectancy data, and it provides insights beyond a simplified scheme of biomarkers for long or short lifespan.  相似文献   

10.
Open population capture‐recapture models are widely used to estimate population demographics and abundance over time. Bayesian methods exist to incorporate open population modeling with spatial capture‐recapture (SCR), allowing for estimation of the effective area sampled and population density. Here, open population SCR is formulated as a hidden Markov model (HMM), allowing inference by maximum likelihood for both Cormack‐Jolly‐Seber and Jolly‐Seber models, with and without activity center movement. The method is applied to a 12‐year survey of male jaguars (Panthera onca) in the Cockscomb Basin Wildlife Sanctuary, Belize, to estimate survival probability and population abundance over time. For this application, inference is shown to be biased when assuming activity centers are fixed over time, while including a model for activity center movement provides negligible bias and nominal confidence interval coverage, as demonstrated by a simulation study. The HMM approach is compared with Bayesian data augmentation and closed population models for this application. The method is substantially more computationally efficient than the Bayesian approach and provides a lower root‐mean‐square error in predicting population density compared to closed population models.  相似文献   

11.
Summary In a typical randomized clinical trial, a continuous variable of interest (e.g., bone density) is measured at baseline and fixed postbaseline time points. The resulting longitudinal data, often incomplete due to dropouts and other reasons, are commonly analyzed using parametric likelihood‐based methods that assume multivariate normality of the response vector. If the normality assumption is deemed untenable, then semiparametric methods such as (weighted) generalized estimating equations are considered. We propose an alternate approach in which the missing data problem is tackled using multiple imputation, and each imputed dataset is analyzed using robust regression (M‐estimation; Huber, 1973 , Annals of Statistics 1, 799–821.) to protect against potential non‐normality/outliers in the original or imputed dataset. The robust analysis results from each imputed dataset are combined for overall estimation and inference using either the simple Rubin (1987 , Multiple Imputation for Nonresponse in Surveys, New York: Wiley) method, or the more complex but potentially more accurate Robins and Wang (2000 , Biometrika 87, 113–124.) method. We use simulations to show that our proposed approach performs at least as well as the standard methods under normality, but is notably better under both elliptically symmetric and asymmetric non‐normal distributions. A clinical trial example is used for illustration.  相似文献   

12.
The Miles method of age estimation relies on molar wear to estimate age and is widely used in bioarcheological contexts. However, because the method requires physical seriation and a sample of subadults to estimate wear rates it cannot be applied to many samples. Here, we modify the Miles method by scoring occlusal wear and estimating molar wear rates from adult wear gradients in 311 hunter‐gatherers and provide formulae to estimate the error associated with each age estimate. A check of the modified method in a subsample (n = 22) shows that interval estimates overlap in all but one case with age categories estimated from traditional methods; this suggests that the modifications have not hampered the ability of the Miles method to estimate age even in heterogeneous samples. As expected, the error increases with age and in populations with smaller sample sizes. These modifications allow the Miles method to be applied to skeletal samples of adult crania that were previously only amenable to cranial suture age estimation, and importantly, provide a measure of uncertainty for each age estimate. Am J Phys Anthropol 149:181–192, 2012. © Wiley Periodicals, Inc.  相似文献   

13.
Treatment selection markers are generally sought for when the benefit of an innovative treatment in comparison with a reference treatment is considered, and this benefit is suspected to vary according to the characteristics of the patients. Classically, such quantitative markers are detected through testing a marker-by-treatment interaction in a parametric regression model. Most alternative methods rely on modeling the risk of event occurrence in each treatment arm or the benefit of the innovative treatment over the marker values, but with assumptions that may be difficult to verify. Herein, a simple non-parametric approach is proposed to detect and assess the general capacity of a quantitative marker for treatment selection when no overall difference in efficacy could be demonstrated between two treatments in a clinical trial. This graphical method relies on the area between treatment-arm-specific receiver operating characteristic curves (ABC), which reflects the treatment selection capacity of the marker. A simulation study assessed the inference properties of the ABC estimator and compared them with other parametric and non-parametric indicators. The simulations showed that the estimate of the ABC had low bias, power comparable to parametric indicators, and that its confidence interval had a good coverage probability (better than the other non-parametric indicator in some cases). Thus, the ABC is a good alternative to parametric indicators. The ABC method was applied to data of the PETACC-8 trial that investigated FOLFOX4 versus FOLFOX4 + cetuximab in stage III colon adenocarcinoma. It enabled the detection of a treatment selection marker: the DDR2 gene.  相似文献   

14.
Huang Y  Gilbert PB 《Biometrics》2011,67(4):1442-1451
Recently a new definition of surrogate endpoint, the "principal surrogate," was proposed based on causal associations between treatment effects on the biomarker and on the clinical endpoint. Despite its appealing interpretation, limited research has been conducted to evaluate principal surrogates, and existing methods focus on risk models that consider a single biomarker. How to compare principal surrogate value of biomarkers or general risk models that consider multiple biomarkers remains an open research question. We propose to characterize a marker or risk model's principal surrogate value based on the distribution of risk difference between interventions. In addition, we propose a novel summary measure (the standardized total gain) that can be used to compare markers and to assess the incremental value of a new marker. We develop a semiparametric estimated-likelihood method to estimate the joint surrogate value of multiple biomarkers. This method accommodates two-phase sampling of biomarkers and is more widely applicable than existing nonparametric methods by incorporating continuous baseline covariates to predict the biomarker(s), and is more robust than existing parametric methods by leaving the error distribution of markers unspecified. The methodology is illustrated using a simulated example set and a real data set in the context of HIV vaccine trials.  相似文献   

15.
Metric data are usually assessed on a continuous scale with good precision, but sometimes agricultural researchers cannot obtain precise measurements of a variable. Values of such a variable cannot then be expressed as real numbers (e.g., 1.51 or 2.56), but often can be represented by intervals into which the values fall (e.g., from 1 to 2 or from 2 to 3). In this situation, statisticians talk about censoring and censored data, as opposed to missing data, where no information is available at all. Traditionally, in agriculture and biology, three methods have been used to analyse such data: (a) when intervals are narrow, some form of imputation (e.g., mid‐point imputation) is used to replace the interval and traditional methods for continuous data are employed (such as analyses of variance [ANOVA] and regression); (b) for time‐to‐event data, the cumulative proportions of individuals that experienced the event of interest are analysed, instead of the individual observed times‐to‐event; (c) when intervals are wide and many individuals are collected, non‐parametric methods of data analysis are favoured, where counts are considered instead of the individual observed value for each sample element. In this paper, we show that these methods may be suboptimal: The first one does not respect the process of data collection, the second leads to unreliable standard errors (SEs), while the third does not make full use of all the available information. As an alternative, methods of survival analysis for censored data can be useful, leading to reliable inferences and sound hypotheses testing. These methods are illustrated using three examples from plant and crop sciences.  相似文献   

16.
Several methods have been developed to estimate the parental contributions in the genetic pool of an admixed population. Some pair-comparisons have been performed on real data but, to date, no systematic comparison of a large number of methods has been attempted. In this study, we performed a simulated data-based comparison of six of the most cited methods in the literature of the last 20 years. Five of these methods use allele frequencies and differ in the statistical treatment of the data. The last one also considers the degree of molecular divergence by estimating the coalescence times. Comparisons are based on the frequency at which the method can be applied, the bias and the mean square error of the estimation, and the frequency at which the true value is within the confidence interval. Eventually, each method was applied to a real data set of variously introgressed honeybee populations. In optimal conditions (highly differentiated parental populations, recent hybridization event), all methods perform equally well. When conditions are not optimal, the methods perform differently, but no method is always better or worse than all others. Some guidelines are given for the choice of the method.  相似文献   

17.
We present a parametric family of regression models for interval-censored event-time (survival) data that accomodates both fixed (e.g. baseline) and time-dependent covariates. The model employs a three-parameter family of survival distributions that includes the Weibull, negative binomial, and log-logistic distributions as special cases, and can be applied to data with left, right, interval, or non-censored event times. Standard methods, such as Newton-Raphson, can be employed to estimate the model and the resulting estimates have an asymptotically normal distribution about the true values with a covariance matrix that is consistently estimated by the information function. The deviance function is described to assess model fit and a robust sandwich estimate of the covariance may also be employed to provide asymptotically robust inferences when the model assumptions do not apply. Spline functions may also be employed to allow for non-linear covariates. The model is applied to data from a long-term study of type 1 diabetes to describe the effects of longitudinal measures of glycemia (HbA1c) over time (the time-dependent covariate) on the risk of progression of diabetic retinopathy (eye disease), an interval-censored event-time outcome.  相似文献   

18.
The present analysis revisits the impact of extremely low‐frequency magnetic fields (ELF‐MF) on melatonin (MLT) levels in human and rat subjects using both a parametric and non‐parametric approach. In this analysis, we use 62 studies from review articles. The parametric approach consists of a Bayesian logistic regression (LR) analysis and the non‐parametric approach consists of a Support Vector analysis, both of which are robust against spurious/false results. Both approaches reveal a unique well‐ordered pattern, and show that human and rat studies are consistent with each other once the MF strength is restricted to cover the same range (with B ? 50 μT). In addition, the data reveal that chronic exposure (longer than ~22 days) to ELF‐MF appears to decrease MLT levels only when the MF strength is below a threshold of ~30 μT (), i.e., when the man‐made ELF‐MF intensity is below that of the static geomagnetic field. Studies reporting an association between ELF‐MF and changes to MLT levels and the opposite (no association with ELF‐MF) can be reconciled under a single framework. Bioelectromagnetics. 2019;40:539–552. © 2019 Bioelectromagnetics Society.  相似文献   

19.
Targeted GBS is a recent approach for obtaining an effective characterization for hundreds to thousands of markers. The high throughput of next‐generation sequencing technologies, moreover, allows sample multiplexing. The aims of this study were to (i) define a panel of single nucleotide polymorphisms (SNPs) in the cat, (ii) use GBS for profiling 16 cats, and (iii) evaluate the performance with respect to the inference using standard approaches at different coverage thresholds, thereby providing useful information for designing similar experiments. Probes for sequencing 230 variants were designed based on the Felis_catus_8.0. 8.0 genome. The regions comprised anonymous and non‐anonymous SNPs. Sixteen cat samples were analysed, some of which had already been genotyped in a large group of loci and one having been whole‐genome sequenced in the 99_Lives Cat Genome Sequencing Project. The accuracy of the method was assessed by comparing the GBS results with the genotypes already available. Overall, GBS achieved good performance, with 92–96% correct assignments, depending on the coverage threshold used to define the set of trustable genotypes. Analyses confirmed that (i) the reliability of the inference of each genotype depends on the coverage at that locus and (ii) the fraction of target loci whose genotype can be inferred correctly is a function of the total coverage. GBS proves to be a valid alternative to other methods. Data suggested a depth of less than 11× is required for greater than 95% accuracy. However, sequencing depth must be adapted to the total size of the targets to ensure proper genotype inference.  相似文献   

20.
In the capture‐recapture problem for two independent samples, the traditional estimator, calculated as the product of the two sample sizes divided by the number of sampled subjects appearing commonly in both samples, is well known to be a biased estimator of the population size and have no finite variance under direct or binomial sampling. To alleviate these theoretical limitations, the inverse sampling, in which we continue sampling subjects in the second sample until we obtain a desired number of marked subjects who appeared in the first sample, has been proposed elsewhere. In this paper, we consider five interval estimators of the population size, including the most commonly‐used interval estimator using Wald's statistic, the interval estimator using the logarithmic transformation, the interval estimator derived from a quadratic equation developed here, the interval estimator using the χ2‐approximation, and the interval estimator based on the exact negative binomial distribution. To evaluate and compare the finite sample performance of these estimators, we employ Monte Carlo simulation to calculate the coverage probability and the standardized average length of the resulting confidence intervals in a variety of situations. To study the location of these interval estimators, we calculate the non‐coverage probability in the two tails of the confidence intervals. Finally, we briefly discuss the optimal sample size determination for a given precision to minimize the expected total cost. (© 2004 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号