首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The receiver operating characteristic (ROC) curve is the most widely used measure for evaluating the discriminatory performance of a continuous marker. Often, covariate information is also available and several regression methods have been proposed to incorporate covariate information in the ROC framework. Until now, these methods are only developed for the case where the covariate is univariate or multivariate. We extend ROC regression methodology for the case where the covariate is functional rather than univariate or multivariate. To this end, semiparametric- and nonparametric-induced ROC regression estimators are proposed. A simulation study is performed to assess the performance of the proposed estimators. The methods are applied to and motivated by a metabolic syndrome study in Galicia (NW Spain).  相似文献   

2.
An issue for class‐imbalanced learning is what assessment metric should be employed. So far, precision‐recall curve (PRC) as a metric is rarely used in practice as compared with its alternative of receiver operating characteristic (ROC). This study investigates the performance of PRC as the evaluating criterion to address the class‐imbalanced data and focuses on the comparison of PRC with ROC. The advantages of PRC over ROC on assessing class‐imbalanced data are also investigated and tested on our proposed algorithm by tuning the whole model parameters in simulation studies and real data examples. The result shows that PRC is competitive with ROC as performance measurement for handling class‐imbalanced data in tuning the model parameters. PRC can be considered as an alternative but effective assessment for preprocessing (such as variable selection) skewed data and building a classifier in class‐imbalanced learning.  相似文献   

3.
P. Saha  P. J. Heagerty 《Biometrics》2010,66(4):999-1011
Summary Competing risks arise naturally in time‐to‐event studies. In this article, we propose time‐dependent accuracy measures for a marker when we have censored survival times and competing risks. Time‐dependent versions of sensitivity or true positive (TP) fraction naturally correspond to consideration of either cumulative (or prevalent) cases that accrue over a fixed time period, or alternatively to incident cases that are observed among event‐free subjects at any select time. Time‐dependent (dynamic) specificity (1–false positive (FP)) can be based on the marker distribution among event‐free subjects. We extend these definitions to incorporate cause of failure for competing risks outcomes. The proposed estimation for cause‐specific cumulative TP/dynamic FP is based on the nearest neighbor estimation of bivariate distribution function of the marker and the event time. On the other hand, incident TP/dynamic FP can be estimated using a possibly nonproportional hazards Cox model for the cause‐specific hazards and riskset reweighting of the marker distribution. The proposed methods extend the time‐dependent predictive accuracy measures of Heagerty, Lumley, and Pepe (2000, Biometrics 56, 337–344) and Heagerty and Zheng (2005, Biometrics 61, 92–105).  相似文献   

4.
The use of ROC curves in evaluating a continuous or ordinal biomarker for the discrimination of two populations is commonplace. However, in many settings, marker measurements above or below a certain value cannot be obtained. In this paper, we study the construction of a smooth ROC curve (or surface in the case of three populations) when there is a lower or upper limit of detection. We propose the use of spline models that incorporate monotonicity constraints for the cumulative hazard function of the marker distribution. The proposed technique is computationally stable and simulation results showed a satisfactory performance. Other observed covariates can be also accommodated by this spline‐based approach.  相似文献   

5.
Protein–protein interactions play a key role in many biological systems. High‐throughput methods can directly detect the set of interacting proteins in yeast, but the results are often incomplete and exhibit high false‐positive and false‐negative rates. Recently, many different research groups independently suggested using supervised learning methods to integrate direct and indirect biological data sources for the protein interaction prediction task. However, the data sources, approaches, and implementations varied. Furthermore, the protein interaction prediction task itself can be subdivided into prediction of (1) physical interaction, (2) co‐complex relationship, and (3) pathway co‐membership. To investigate systematically the utility of different data sources and the way the data is encoded as features for predicting each of these types of protein interactions, we assembled a large set of biological features and varied their encoding for use in each of the three prediction tasks. Six different classifiers were used to assess the accuracy in predicting interactions, Random Forest (RF), RF similarity‐based k‐Nearest‐Neighbor, Naïve Bayes, Decision Tree, Logistic Regression, and Support Vector Machine. For all classifiers, the three prediction tasks had different success rates, and co‐complex prediction appears to be an easier task than the other two. Independently of prediction task, however, the RF classifier consistently ranked as one of the top two classifiers for all combinations of feature sets. Therefore, we used this classifier to study the importance of different biological datasets. First, we used the splitting function of the RF tree structure, the Gini index, to estimate feature importance. Second, we determined classification accuracy when only the top‐ranking features were used as an input in the classifier. We find that the importance of different features depends on the specific prediction task and the way they are encoded. Strikingly, gene expression is consistently the most important feature for all three prediction tasks, while the protein interactions identified using the yeast‐2‐hybrid system were not among the top‐ranking features under any condition. Proteins 2006. © 2006 Wiley‐Liss, Inc.  相似文献   

6.
Implicit assumptions for most mark‐recapture studies are that individuals do not lose their markers and all observed markers are correctly recorded. If these assumptions are violated, e.g., due to loss or extreme wear of markers, estimates of population size and vital rates will be biased. Double‐marking experiments have been widely used to estimate rates of marker loss and adjust for associated bias, and we extended this approach to estimate rates of recording errors. We double‐marked 309 Piping Plovers (Charadrius melodus) with unique combinations of color bands and alphanumeric flags and used multi‐state mark recapture models to estimate the frequency with which plovers were misidentified. Observers were twice as likely to read and report an invalid color‐band combination (2.4% of the time) as an invalid alphanumeric code (1.0%). Observers failed to read matching band combinations or alphanumeric flag codes 4.5% of the time. Unlike previous band resighting studies, use of two resightable markers allowed us to identify when resighting errors resulted in reports of combinations or codes that were valid, but still incorrect; our results suggest this may be a largely unappreciated problem in mark‐resight studies. Field‐readable alphanumeric flags offer a promising auxiliary marker for identifying and potentially adjusting for false‐positive resighting errors that may otherwise bias demographic estimates.  相似文献   

7.
Diagnostic or screening tests are widely used in medical fields to classify patients according to their disease status. Several statistical models for meta‐analysis of diagnostic test accuracy studies have been developed to synthesize test sensitivity and specificity of a diagnostic test of interest. Because of the correlation between test sensitivity and specificity, modeling the two measures using a bivariate model is recommended. In this paper, we extend the current standard bivariate linear mixed model (LMM) by proposing two variance‐stabilizing transformations: the arcsine square root and the Freeman–Tukey double arcsine transformation. We compared the performance of the proposed methods with the standard method through simulations using several performance measures. The simulation results showed that our proposed methods performed better than the standard LMM in terms of bias, root mean square error, and coverage probability in most of the scenarios, even when data were generated assuming the standard LMM. We also illustrated the methods using two real data sets.  相似文献   

8.
We performed a bivariate analysis on cholesterol and triglyceride levels on data from the Framingham Heart Study using a new score statistic developed for the detection of potential pleiotropic, or cluster, genes. Univariate score statistics were also computed for each trait. At a significance level 0.001, linkage signals were found at markers GATA48B01 on chromosome 1, GATA21C12 on chromosome 8, and ATA55A11 on chromosome 16 using the bivariate analysis. At the same significance level, linkage signals were found at markers 036yb8 on chromosome 3 and GATA3F02 on chromosome 12 using the univariate analysis. A strong linkage signal was also found at marker GATA112F07 by both the bivariate analysis and the univariate analysis, a marker for which evidence for linkage had been reported previously in a related study.  相似文献   

9.
Combining biomarkers to detect disease with application to prostate cancer   总被引:1,自引:0,他引:1  
In early detection of disease, combinations of biomarkers promise improved discrimination over diagnostic tests based on single markers. An example of this is in prostate cancer screening, where additional markers have been sought to improve the specificity of the conventional Prostate-Specific Antigen (PSA) test. A marker of particular interest is the percent free PSA. Studies evaluating the benefits of percent free PSA reflect the need for a methodological approach that is statistically valid and useful in the clinical setting. This article presents methods that address this need. We focus on and-or combinations of biomarker results that we call logic rules and present novel definitions for the ROC curve and the area under the curve (AUC) that are applicable to this class of combination tests. Our estimates of the ROC and AUC are amenable to statistical inference including comparisons of tests and regression analysis. The methods are applied to data on free and total PSA levels among prostate cancer cases and matched controls enrolled in the Physicians' Health Study.  相似文献   

10.
Dabney AR  Storey JD 《PloS one》2007,2(10):e1002
Nearest-centroid classifiers have recently been successfully employed in high-dimensional applications, such as in genomics. A necessary step when building a classifier for high-dimensional data is feature selection. Feature selection is frequently carried out by computing univariate scores for each feature individually, without consideration for how a subset of features performs as a whole. We introduce a new feature selection approach for high-dimensional nearest centroid classifiers that instead is based on the theoretically optimal choice of a given number of features, which we determine directly here. This allows us to develop a new greedy algorithm to estimate this optimal nearest-centroid classifier with a given number of features. In addition, whereas the centroids are usually formed from maximum likelihood estimates, we investigate the applicability of high-dimensional shrinkage estimates of centroids. We apply the proposed method to clinical classification based on gene-expression microarrays, demonstrating that the proposed method can outperform existing nearest centroid classifiers.  相似文献   

11.
A common problem in neuropathological studies is to assess the spatial patterning of cells on tissue sections and to compare spatial patterning between disorder groups. For a single cell type, the cell positions constitute a univariate point process and interest focuses on the degree of spatial aggregation. For two different cell types, the cell positions constitute a bivariate point process and the degree of spatial interaction between the cell types is of interest. We discuss the problem of analysing univariate and bivariate spatial point patterns in the one‐way design where cell patterns have been obtained for groups of subjects. A bootstrapping procedure to perform a nonparametric one‐way analysis of variance of the spatial aggregation of a univariate point process has been suggested by Diggle, Lange and Bene? (1991). We extend their replication‐based approach to allow the comparison of the spatial interaction of two cell types between groups, to include planned comparisons (contrasts) and to assess whole groups against complete spatial randomness and spatial independence. We also accommodate several replicate tissue sections per subject. An advantage of our approach is that it can be applied when processes are not stationary, a common problem in brain tissue sections since neurons are arranged in cortical layers. We illustrate our methods by applying them to a neuropathological study to investigate abnormalities in the functional relationship between neurons and astrocytes in HIV associated dementia. (© 2004 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

12.
13.
The augmentation of categorical outcomes with underlying Gaussian variables in bivariate generalized mixed effects models has facilitated the joint modeling of continuous and binary response variables. These models typically assume that random effects and residual effects (co)variances are homogeneous across all clusters and subjects, respectively. Motivated by conflicting evidence about the association between performance outcomes in dairy production systems, we consider the situation where these (co)variance parameters may themselves be functions of systematic and/or random effects. We present a hierarchical Bayesian extension of bivariate generalized linear models whereby functions of the (co)variance matrices are specified as linear combinations of fixed and random effects following a square‐root‐free Cholesky reparameterization that ensures necessary positive semidefinite constraints. We test the proposed model by simulation and apply it to the analysis of a dairy cattle data set in which the random herd‐level and residual cow‐level effects (co)variances between a continuous production trait and binary reproduction trait are modeled as functions of fixed management effects and random cluster effects.  相似文献   

14.
Functional markers and their quantitative features (eg, maximum value, time to maximum, area under the curve [AUC], etc) are increasingly being used in clinical studies to diagnose diseases. It is thus of interest to assess the diagnostic utility of functional markers by assessing alignment between their quantitative features and an ordinal gold standard test that reflects the severity of disease. The concept of broad sense agreement (BSA) has recently been introduced for studying the relationship between continuous and ordinal measurements, and provides a promising tool to address such a question. Our strategy is to adopt a general class of summary functionals (SFs), each of which flexibly captures a different quantitative feature of a functional marker, and study its alignment according to an ordinal outcome via BSA. We further illustrate the proposed framework using three special classes of SFs (AUC‐type, magnitude‐specific, and time‐specific) that are widely used in clinical settings. The proposed BSA estimator is proven to be consistent and asymptotically normal given a consistent estimator for the SF. We further provide an inferential framework for comparing a pair of candidate SFs in terms of their importance on the ordinal outcome. Our simulation results demonstrate satisfactory finite‐sample performance of the proposed framework. We demonstrate the application of our methods using a renal study.  相似文献   

15.
Hot spot residues of proteins are fundamental interface residues that help proteins perform their functions. Detecting hot spots by experimental methods is costly and time‐consuming. Sequential and structural information has been widely used in the computational prediction of hot spots. However, structural information is not always available. In this article, we investigated the problem of identifying hot spots using only physicochemical characteristics extracted from amino acid sequences. We first extracted 132 relatively independent physicochemical features from a set of the 544 properties in AAindex1, an amino acid index database. Each feature was utilized to train a classification model with a novel encoding schema for hot spot prediction by the IBk algorithm, an extension of the K‐nearest neighbor algorithm. The combinations of the individual classifiers were explored and the classifiers that appeared frequently in the top performing combinations were selected. The hot spot predictor was built based on an ensemble of these classifiers and to work in a voting manner. Experimental results demonstrated that our method effectively exploited the feature space and allowed flexible weights of features for different queries. On the commonly used hot spot benchmark sets, our method significantly outperformed other machine learning algorithms and state‐of‐the‐art hot spot predictors. The program is available at http://sfb.kaust.edu.sa/pages/software.aspx . Proteins 2013; 81:1351–1362 © 2013 Wiley Periodicals, Inc.  相似文献   

16.
Evaluating the classification accuracy of a candidate biomarker signaling the onset of disease or disease status is essential for medical decision making. A good biomarker would accurately identify the patients who are likely to progress or die at a particular time in the future or who are in urgent need for active treatments. To assess the performance of a candidate biomarker, the receiver operating characteristic (ROC) curve and the area under the ROC curve (AUC) are commonly used. In many cases, the standard simple random sampling (SRS) design used for biomarker validation studies is costly and inefficient. In order to improve the efficiency and reduce the cost of biomarker validation, marker‐dependent sampling (MDS) may be used. In a MDS design, the selection of patients to assess true survival time is dependent on the result of a biomarker assay. In this article, we introduce a nonparametric estimator for time‐dependent AUC under a MDS design. The consistency and the asymptotic normality of the proposed estimator is established. Simulation shows the unbiasedness of the proposed estimator and a significant efficiency gain of the MDS design over the SRS design.  相似文献   

17.

Background  

As in many different areas of science and technology, most important problems in bioinformatics rely on the proper development and assessment of binary classifiers. A generalized assessment of the performance of binary classifiers is typically carried out through the analysis of their receiver operating characteristic (ROC) curves. The area under the ROC curve (AUC) constitutes a popular indicator of the performance of a binary classifier. However, the assessment of the statistical significance of the difference between any two classifiers based on this measure is not a straightforward task, since not many freely available tools exist. Most existing software is either not free, difficult to use or not easy to automate when a comparative assessment of the performance of many binary classifiers is intended. This constitutes the typical scenario for the optimization of parameters when developing new classifiers and also for their performance validation through the comparison to previous art.  相似文献   

18.
Populations often contain discrete classes or morphs (e.g., sexual dimorphisms, wing dimorphisms, trophic dimorphisms) characterized by distinct patterns of trait expression. In quantitative genetic analyses, the different morphs can be considered as different environments within which traits are expressed. Genetic variances and covariances can then be estimated independently for each morph or in a combined analysis. In the latter case, morphs can be considered as separate environments in a bivariate analysis or entered as fixed effects in a univariate analysis. Although a common approach, we demonstrate that the latter produces downwardly biased estimates of additive genetic variance and heritability unless the quantitative genetic architecture of the traits concerned is perfectly correlated between the morphs. This result is derived for four widely used quantitative genetic variance partitioning methods. Given that theory predicts the evolution of genotype‐by‐environment (morph) interactions as a consequence of selection favoring different trait combinations in each morph, we argue that perfect correlations between the genetic architectures of the different morphs are unlikely. A sampling of the recent literature indicates that the majority of researchers studying traits expressed in different morphs recognize this and do estimate morph‐specific quantitative genetic architecture. However, ca. 16% of the studies in our sample utilized only univariate, fixed‐effects models. We caution against this approach and recommend that it be used only if supported by evidence that the genetic architectures of the different morphs do not differ.  相似文献   

19.
Y. Huang  M. S. Pepe 《Biometrics》2009,65(4):1133-1144
Summary The predictiveness curve shows the population distribution of risk endowed by a marker or risk prediction model. It provides a means for assessing the model's capacity for stratifying the population according to risk. Methods for making inference about the predictiveness curve have been developed using cross‐sectional or cohort data. Here we consider inference based on case–control studies, which are far more common in practice. We investigate the relationship between the ROC curve and the predictiveness curve. Insights about their relationship provide alternative ROC interpretations for the predictiveness curve and for a previously proposed summary index of it. Next the relationship motivates ROC based methods for estimating the predictiveness curve. An important advantage of these methods over previously proposed methods is that they are rank invariant. In addition they provide a way of combining information across populations that have similar ROC curves but varying prevalence of the outcome. We apply the methods to prostate‐specific antigen (PSA), a marker for predicting risk of prostate cancer.  相似文献   

20.
1. In insects, instar determination is generally based on the frequency distribution of sclerotised body part measurements. Commonly used univariate methods, such as histograms and univariate kernel smoothing, are not sufficient to reflect the distribution of the measurements, because development of sclerotised body parts is multidimensional. 2. This study used an adaptive bivariate kernel smoothing method, based on 10 pairs of separating variables, to differentiate instars of Austrosimulium tillyardianum (Diptera: Simuliidae) larvae in two‐dimensional space. A variable bandwidth matrix was used and separation lines between instars were defined. Using the Crosby growth ratio, Brooks' rule and the new standard recently proposed, larvae were separated into nine instars. It was found that, using the bivariate kernel smoothing method, the clustering accuracy and determination of separation lines as instar class limits were higher than those associated with the univariate kernel smoothing method. With the exceptions of the paired separating variables, head capsule length and antennal segment 3 length (AS3L), the mean probabilities of correct classifications was > 85%. The pair of separating variables that yielded the greatest classification accuracy comprised mandible length (ML) and AS3L, which had mean probabilities of 0.8984. The clustering accuracy was higher for early‐ and late‐instar larvae, but lower for instars 6 and 7. The adaptive bivariate kernel smoothing method was better than univariate methods for instar determination, especially in the detection of divisions between instars and identification of a larval instar.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号