首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The classification accuracy of a continuous marker is typically evaluated with the receiver operating characteristic (ROC) curve. In this paper, we study an alternative conceptual framework, the "percentile value." In this framework, the controls only provide a reference distribution to standardize the marker. The analysis proceeds by analyzing the standardized marker in cases. The approach is shown to be equivalent to ROC analysis. Advantages are that it provides a framework familiar to a broad spectrum of biostatisticians and it opens up avenues for new statistical techniques in biomarker evaluation. We develop several new procedures based on this framework for comparing biomarkers and biomarker performance in different populations. We develop methods that adjust such comparisons for covariates. The methods are illustrated on data from 2 cancer biomarker studies.  相似文献   

2.
Y. Huang  M. S. Pepe 《Biometrics》2009,65(4):1133-1144
Summary The predictiveness curve shows the population distribution of risk endowed by a marker or risk prediction model. It provides a means for assessing the model's capacity for stratifying the population according to risk. Methods for making inference about the predictiveness curve have been developed using cross‐sectional or cohort data. Here we consider inference based on case–control studies, which are far more common in practice. We investigate the relationship between the ROC curve and the predictiveness curve. Insights about their relationship provide alternative ROC interpretations for the predictiveness curve and for a previously proposed summary index of it. Next the relationship motivates ROC based methods for estimating the predictiveness curve. An important advantage of these methods over previously proposed methods is that they are rank invariant. In addition they provide a way of combining information across populations that have similar ROC curves but varying prevalence of the outcome. We apply the methods to prostate‐specific antigen (PSA), a marker for predicting risk of prostate cancer.  相似文献   

3.
Summary This article considers receiver operating characteristic (ROC) analysis for bivariate marker measurements. The research interest is to extend tools and rules from univariate marker to bivariate marker setting for evaluating predictive accuracy of markers using a tree‐based classification rule. Using an and–or classifier, an ROC function together with a weighted ROC function (WROC) and their conjugate counterparts are proposed for examining the performance of bivariate markers. The proposed functions evaluate the performance of and–or classifiers among all possible combinations of marker values, and are ideal measures for understanding the predictability of biomarkers in target population. Specific features of ROC and WROC functions and other related statistics are discussed in comparison with those familiar properties for univariate marker. Nonparametric methods are developed for estimating ROC‐related functions (partial) area under curve and concordance probability. With emphasis on average performance of markers, the proposed procedures and inferential results are useful for evaluating marker predictability based on a single or bivariate marker (or test) measurements with different choices of markers, and for evaluating different and–or combinations in classifiers. The inferential results developed in this article also extend to multivariate markers with a sequence of arbitrarily combined and–or classifier.  相似文献   

4.
Nakas CT  Alonzo TA 《Biometrics》2007,63(2):603-609
Receiver operating characteristic (ROC) curves and the area under these curves are commonly used to assess the ability of a continuous diagnostic marker (e.g., DNA methylation markers) to correctly classify subjects as having a particular disease or not (e.g., cancer). These approaches, however, are not applicable to settings where the gold standard yields more than two disease states or classes. ROC surfaces and the volume under the surfaces have been proposed for settings with more than two disease classes. These approaches, however, do not allow one to assess the ability of a marker to differentiate two disease classes from a third disease class without requiring a monotone order for the three disease classes under study. That is, existing approaches do not accommodate an umbrella ordering of disease classes. This article proposes the construction of an ROC graph that is applicable for an umbrella ordering. Furthermore, this article proposes that a summary measure for this umbrella ROC graph can be used to summarize the classification accuracy, and corresponding variance estimates can be obtained using U-statistics theory or bootstrap methods. The proposed methods are illustrated using data from a study assessing the ability of a DNA methylation marker to correctly classify lung specimens into three histologic classes: squamous cell carcinoma, large cell carcinoma, and nontumor lung.  相似文献   

5.
The receiver operating characteristic (ROC) curve is the most widely used measure for evaluating the discriminatory performance of a continuous marker. Often, covariate information is also available and several regression methods have been proposed to incorporate covariate information in the ROC framework. Until now, these methods are only developed for the case where the covariate is univariate or multivariate. We extend ROC regression methodology for the case where the covariate is functional rather than univariate or multivariate. To this end, semiparametric- and nonparametric-induced ROC regression estimators are proposed. A simulation study is performed to assess the performance of the proposed estimators. The methods are applied to and motivated by a metabolic syndrome study in Galicia (NW Spain).  相似文献   

6.
Time-dependent ROC curves for censored survival data and a diagnostic marker   总被引:13,自引:0,他引:13  
Heagerty PJ  Lumley T  Pepe MS 《Biometrics》2000,56(2):337-344
ROC curves are a popular method for displaying sensitivity and specificity of a continuous diagnostic marker, X, for a binary disease variable, D. However, many disease outcomes are time dependent, D(t), and ROC curves that vary as a function of time may be more appropriate. A common example of a time-dependent variable is vital status, where D(t) = 1 if a patient has died prior to time t and zero otherwise. We propose summarizing the discrimination potential of a marker X, measured at baseline (t = 0), by calculating ROC curves for cumulative disease or death incidence by time t, which we denote as ROC(t). A typical complexity with survival data is that observations may be censored. Two ROC curve estimators are proposed that can accommodate censored data. A simple estimator is based on using the Kaplan-Meier estimator for each possible subset X > c. However, this estimator does not guarantee the necessary condition that sensitivity and specificity are monotone in X. An alternative estimator that does guarantee monotonicity is based on a nearest neighbor estimator for the bivariate distribution function of (X, T), where T represents survival time (Akritas, M. J., 1994, Annals of Statistics 22, 1299-1327). We present an example where ROC(t) is used to compare a standard and a modified flow cytometry measurement for predicting survival after detection of breast cancer and an example where the ROC(t) curve displays the impact of modifying eligibility criteria for sample size and power in HIV prevention trials.  相似文献   

7.
8.
Recent admixture between genetically differentiated populations can result in high levels of association between alleles at loci that are <=10 cM apart. The transmission/disequilibrium test (TDT) proposed by Spielman et al. (1993) can be a powerful test of linkage between disease and marker loci in the presence of association and therefore could be a useful test of linkage in admixed populations. The degree of association between alleles at two loci depends on the differences in allele frequencies, at the two loci, in the founding populations; therefore, the choice of marker is important. For a multiallelic marker, one strategy that may improve the power of the TDT is to group marker alleles within a locus, on the basis of information about the founding populations and the admixed population, thereby collapsing the marker into one with fewer alleles. We have examined the consequences of collapsing a microsatellite into a two-allele marker, when two founding populations are assumed for the admixed population, and have found that if there is random mating in the admixed population, then typically there is a collapsing for which the power of the TDT is greater than that for the original microsatellite marker. A method is presented for finding the optimal collapsing that has minimal dependence on the disease and that uses estimates either of marker allele frequencies in the two founding populations or of marker allele frequencies in the current, admixed population and in one of the founding populations. Furthermore, this optimal collapsing is not always the collapsing with the largest difference in allele frequencies in the founding populations. To demonstrate this strategy, we considered a recent data set, published previously, that provides frequency estimates for 30 microsatellites in 13 populations.  相似文献   

9.
Receiver operating characteristic (ROC) regression methodology is used to identify factors that affect the accuracy of medical diagnostic tests. In this paper, we consider a ROC model for which the ROC curve is a parametric function of covariates but distributions of the diagnostic test results are not specified. Covariates can be either common to all subjects or specific to those with disease. We propose a new estimation procedure based on binary indicators defined by the test result for a diseased subject exceeding various specified quantiles of the distribution of test results from non-diseased subjects with the same covariate values. This procedure is conceptually and computationally simplified relative to existing procedures. Simulation study results indicate that the approach has fairly high statistical efficiency. The new ROC regression methodology is used to evaluate childhood measurements of body mass index as a predictive marker of adult obesity.  相似文献   

10.
A highly polymorphic microsatellite (CA)n-marker (CAct685) previously isolated from human chromosome 19 cosmid library was localized near GPI in 19q13.1. For the fine localization of this marker, the hybridization with chromosome 19-specific cosmid libraries assembled in contigs was used. Polymorphism analysis of the marker in 12 populations of Russia and neighboring countries showed 14 alleles containing from 16 to 30 repeat units. Populations belonging to Indo-European, Uralic and Altaic linguistic families demonstrated a great similarity in allele frequency profiles. Differences between these populations were lower for CAct685 than for classical markers. Allele distribution of CAct685 in a Chukchi population belonging to the Chukchi-Kamchatkan linguistic family differs from those in all other populations, that may be typical for Mongoloid population or reflect an ethnic history of Chukchi as a small population. Thus use of the CAct685 marker seems to be effective for analysis of distant peoples.  相似文献   

11.
The case-control design is frequently used to study the discriminatory accuracy of a screening or diagnostic biomarker. Yet, the appropriate ratio in which to sample cases and controls has never been determined. It is common for researchers to sample equal numbers of cases and controls, a strategy that can be optimal for studies of association. However, considerations are quite different when the biomarker is to be used for classification. In this paper, we provide an expression for the optimal case-control ratio, when the accuracy of the biomarker is quantified by the receiver operating characteristic (ROC) curve. We show how it can be integrated with choosing the overall sample size to yield an efficient study design with specified power and type-I error. We also derive the optimal case-control ratios for estimating the area under the ROC curve and the area under part of the ROC curve. Our methods are applied to a study of a new marker for adenocarcinoma in patients with Barrett's esophagus.  相似文献   

12.
We performed unbiased analysis of steroid-related compounds to identify novel Alzheimer's disease (AD) plasma biomarkers using liquid chromatography-atmospheric pressure chemical ionization-mass spectroscopy. The analysis revealed that desmosterol was found to be decreased in AD plasma versus controls. To precisely quantify variations in desmosterol, we established an analytical method to measure desmosterol and cholesterol. Using this LC-based method, we discovered that desmosterol and the desmosterol/cholesterol ratio are significantly decreased in AD. Finally, the validation of this assay using 109 clinical samples confirmed the decrease of desmosterol in AD as well as a change in the desmosterol/cholesterol ratio in AD. Interestingly, we could also observe a difference between mild cognitive impairment and control. In addition, the decrease of desmosterol was somewhat more significant in females. Receiver operating characteristic (ROC) analysis between controls and AD, using plasma desmosterol shows a score of 0.80, indicating a good discrimination power for this marker in the two reference populations and confirms the potential usefulness of measuring plasma desmosterol levels for diagnosing AD. Further analysis showed a significant correlation of plasma desmosterol with Mini-Mental State Examination scores. Although larger sample populations will be needed to confirm this diagnostic marker sensitivity, our studies demonstrate a sensitive and accurate method of detecting plasma desmosterol concentration and suggest that plasma desmosterol could become a powerful new specific biomarker for early and easy AD diagnosis.  相似文献   

13.
Several chemical constituents are important to the fragrance of cooked rice. However, the chemical compound 2-acetyl-1-pyrroline (AP) is regarded as the most important component of fragrance in the basmati- and jasmine-style fragrant rices. AP is found in all parts of the plant except the roots. It is believed that a single recessive gene is responsible for the production of fragrance in most rice plants. The detection of fragrance can be carried out via sensory or chemical methods, although each has their disadvantages. To overcome these difficulties, we have identified an (AT)40 repeat microsatellite or simple sequence repeat (SSR) marker for fragrant and non-fragrant alleles of the fgr gene. Identification of this marker was facilitated through use of both the publicly available and restricted access sequence information of the Monsanto rice sequence databases. Fifty F2 individuals from a mapping population were genotyped for the polymorphic marker. This marker has a high polymorphism information content (PIC = 0.9). Other SSR markers linked to fragrance could be identified in the same way of use in other populations. This study demonstrates that analysis of the rice genome sequence is an effective option for identification of markers for use in rice improvement.  相似文献   

14.
The complications introduced by the autotetraploid, outcrossing nature of alfalfa (Medicago sativa L.) as related to detecting associations of marker loci and traits of interest are discussed, and a new method of detecting marker-trait associations is suggested. This method utilizes plant populations that are likely to have been produced through the plant breeding process: populations selected for one trait, and the base, unselected population. Marker allele frequency shifts between the populations are indicative of genomic regions involved in trait expression, and may indicate alleles that have reached the triplex or homozygous state and do not segregate in S1 or F1 populations. However, because many, perhaps hundreds, of sequential frequency comparisons are needed to detect fragments in significantly different proportions in the two populations, the type I error rate is very high. A resampling-based analysis method is proposed to address the concern of the type I error rate, and identify marker alleles associated with this trait of interest. The utility of marker-trait associations thus defined for identifying individual plants from heterogeneous populations was investigated through model-building and conditional probability studies. Factors investigated that influenced the utility of the marker associations and (in the base population) the frequencies of the trait and marker, and the frequencies of the markers in plants exhibiting the trait and in the plants not exhibiting the trait. The frequency of occurrence of a marker in undesirable plants profoundly influenced the efficiency with which the marker could be used to select desirable plants, however, under some circumstances, markers or combinations of markers can be highly efficient for selecting rare, desirable plants from a heterogeneous base population.  相似文献   

15.
A procedure which involves the use of RAPD markers, obtained from bulked genomic DNA samples, to estimate genetic relatedness among heterogeneous populations is demonstrated in this study. Bulked samples of genomic DNA from several alfalfa plants per population were used as templates in polymerase chain reactions with different random primers to produce RAPD patterns. The results show that the RAPD patterns can be used to determine genetic distances among heterogeneous populations and cultivars which correspond to their known relatedness. The results also indicate that, by using ten primers with bulked DNA samples from ten individuals, 18–72 populations or cultivars can be distinguished from each other on the basis of at least one unique RAPD marker. We anticipate that DNA bulking and methods for comparing RAPD patterns will be very useful for identifying cultivars, for studying phylogenetic relationships among heterogeneous populations and for selecting parents to maximize heterosis in crosses.  相似文献   

16.
Many linkage studies are performed in inbred populations, either small isolated populations or large populations with a long tradition of marriages between relatives. In such populations, there exist very complex genealogies with unknown loops. Therefore, the true inbreeding coefficient of an individual is often unknown. Good estimators of the inbreeding coefficient (f) are important, since it has been shown that underestimation of f may lead to false linkage conclusions. When an individual is genotyped for markers spanning the whole genome, it should be possible to use this genomic information to estimate that individual's f. To do so, we propose a maximum-likelihood method that takes marker dependencies into account through a hidden Markov model. This methodology also allows us to infer the full probability distribution of the identity-by-descent (IBD) status of the two alleles of an individual at each marker along the genome (posterior IBD probabilities) and provides a variance for the estimates. We simulate a full genome scan mimicking the true autosomal genome for (1) a first-cousin pedigree and (2) a quadruple-second-cousin pedigree. In both cases, we find that our method accurately estimates f for different marker maps. We also find that the proportion of genome IBD in an individual with a given genealogy is very variable. The approach is illustrated with data from a study of demyelinating autosomal recessive Charcot-Marie-Tooth disease.  相似文献   

17.
Traditional quantitative trait loci (QTL) mapping approaches are typically based on early or advanced generation analysis of bi-parental populations. A limitation associated with this methodology is the fact that mapping populations rarely give rise to new cultivars. Additionally, markers linked to the QTL of interest are often not immediately available for use in breeding and they may not be useful within diverse genetic backgrounds. Use of breeding populations for simultaneous QTL mapping, marker validation, marker assisted selection (MAS), and cultivar release has recently caught the attention of plant breeders to circumvent the weaknesses of conventional QTL mapping. The first objective of this study was to test the feasibility of using family-pedigree based QTL mapping techniques generally used with humans and animals within plant breeding populations (PBPs). The second objective was to evaluate two methods (linkage and association) to detect marker-QTL associations. The techniques described in this study were applied to map the well characterized QTL, Fhb1 for Fusarium head blight resistance in wheat (Triticum aestivum L.). The experimental populations consisted of 82 families and 793 individuals. The QTL was mapped using both linkage (variance component and pedigree-wide regression) and association (using quantitative transmission disequilibrium test, QTDT) approaches developed for extended family-pedigrees. Each approach successfully identified the known QTL location with a high probability value. Markers linked to the QTL explained 40–50% of the phenotypic variation. These results show the usefulness of a human genetics approach to detect QTL in PBPs and subsequent use in MAS.  相似文献   

18.
If marker alleles that identify a gene for introgression are not completely unique to the different base populations, the trait allele can be lost quickly during the process of backcrossing. This study considers ways to deal with incompletely informative markers in order to retain the desired allele. Selection was based on the probability of the presence of the desired (introgressed) trait allele, which was calculated for each marker genotype, using a single marker or a diallelic or triallelic marker bracket. The percentage of individuals retaining the introgressed allele was calculated over five generations of backcrossing, for selected fractions between 0 and 1, for marker alleles that could occur in both base populations. The best results were obtained with a rather large selected fraction, when all individuals, heterozygous and homozygous for the most desirable allele at the marker loci, were selected. Additional selection against marker homozygotes (which might have the highest probability of carrying the desired-trait allele, but produce uninformative gametes) altered the optimum selected fraction, making the selected fraction more consistently inversely related to a better retention of the desired-trait allele. A marker bracket was found to give a better retention of the desired-trait allele than a single marker and triallelic markers were better than diallelic markers, giving a retention of almost 50%. The earlier that preselection of parents (on informativeness) took place the better the overall result; preselection should occur preferably in the base populations. Preselection could make marker alleles unique to alternative base populations and markers would effectively become fully informative. Selection in the base populations might not be possible or not desirable, for example, because of the available number of individuals. This is unlikely to be a problem when parents are paired up to exclude any common marker alleles.  相似文献   

19.
Allozyme and PCR-based molecular markers have been widely used to investigate genetic diversity and population genetic structure in autotetraploid species. However, an empirical but inaccurate approach was often used to infer marker genotype from the pattern and intensity of gel bands. Obviously, this introduces serious errors in prediction of the marker genotypes and severely biases the data analysis. This article developed a theoretical model to characterize genetic segregation of alleles at genetic marker loci in autotetraploid populations and a novel likelihood-based method to estimate the model parameters. The model properly accounts for segregation complexities due to multiple alleles and double reduction at autotetrasomic loci in natural populations, and the method takes appropriate account of incomplete marker phenotype information with respect to genotype due to multiple-dosage allele segregation at marker loci in tetraploids. The theoretical analyses were validated by making use of a computer simulation study and their utility is demonstrated by analyzing microsatellite marker data collected from two populations of sycamore maple (Acer pseudoplatanus L.), an economically important autotetraploid tree species. Numerical analyses based on simulation data indicate that the model parameters can be adequately estimated and double reduction is detected with good power using reasonable sample size.  相似文献   

20.
We examine the issue of population stratification in association-mapping studies. In case-control studies of association, population subdivision or recent admixture of populations can lead to spurious associations between a phenotype and unlinked candidate loci. Using a model of sampling from a structured population, we show that if population stratification exists, it can be detected by use of unlinked marker loci. We show that the case-control-study design, using unrelated control individuals, is a valid approach for association mapping, provided that marker loci unlinked to the candidate locus are included in the study, to test for stratification. We suggest guidelines as to the number of unlinked marker loci to use.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号