首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
High-dimensional biomarker data are often collected in epidemiological studies when assessing the association between biomarkers and human disease is of interest. We develop a latent class modeling approach for joint analysis of high-dimensional semicontinuous biomarker data and a binary disease outcome. To model the relationship between complex biomarker expression patterns and disease risk, we use latent risk classes to link the 2 modeling components. We characterize complex biomarker-specific differences through biomarker-specific random effects, so that different biomarkers can have different baseline (low-risk) values as well as different between-class differences. The proposed approach also accommodates data features that are common in environmental toxicology and other biomarker exposure data, including a large number of biomarkers, numerous zero values, and complex mean-variance relationship in the biomarkers levels. A Monte Carlo EM (MCEM) algorithm is proposed for parameter estimation. Both the MCEM algorithm and model selection procedures are shown to work well in simulations and applications. In applying the proposed approach to an epidemiological study that examined the relationship between environmental polychlorinated biphenyl (PCB) exposure and the risk of endometriosis, we identified a highly significant overall effect of PCB concentrations on the risk of endometriosis.  相似文献   

2.
This review discusses data analysis strategies for the discovery of biomarkers in clinical proteomics. Proteomics studies produce large amounts of data, characterized by few samples of which many variables are measured. A wealth of classification methods exists for extracting information from the data. Feature selection plays an important role in reducing the dimensionality of the data prior to classification and in discovering biomarker leads. The question which classification strategy works best is yet unanswered. Validation is a crucial step for biomarker leads towards clinical use. Here we only discuss statistical validation, recognizing that biological and clinical validation is of utmost importance. First, there is the need for validated model selection to develop a generalized classifier that predicts new samples correctly. A cross-validation loop that is wrapped around the model development procedure assesses the performance using unseen data. The significance of the model should be tested; we use permutations of the data for comparison with uninformative data. This procedure also tests the correctness of the performance validation. Preferably, a new set of samples is measured to test the classifier and rule out results specific for a machine, analyst, laboratory or the first set of samples. This is not yet standard practice. We present a modular framework that combines feature selection, classification, biomarker discovery and statistical validation; these data analysis aspects are all discussed in this review. The feature selection, classification and biomarker discovery modules can be incorporated or omitted to the preference of the researcher. The validation modules, however, should not be optional. In each module, the researcher can select from a wide range of methods, since there is not one unique way that leads to the correct model and proper validation. We discuss many possibilities for feature selection, classification and biomarker discovery. For validation we advice a combination of cross-validation and permutation testing, a validation strategy supported in the literature.  相似文献   

3.
Summary In medical research, the receiver operating characteristic (ROC) curves can be used to evaluate the performance of biomarkers for diagnosing diseases or predicting the risk of developing a disease in the future. The area under the ROC curve (ROC AUC), as a summary measure of ROC curves, is widely utilized, especially when comparing multiple ROC curves. In observational studies, the estimation of the AUC is often complicated by the presence of missing biomarker values, which means that the existing estimators of the AUC are potentially biased. In this article, we develop robust statistical methods for estimating the ROC AUC and the proposed methods use information from auxiliary variables that are potentially predictive of the missingness of the biomarkers or the missing biomarker values. We are particularly interested in auxiliary variables that are predictive of the missing biomarker values. In the case of missing at random (MAR), that is, missingness of biomarker values only depends on the observed data, our estimators have the attractive feature of being consistent if one correctly specifies, conditional on auxiliary variables and disease status, either the model for the probabilities of being missing or the model for the biomarker values. In the case of missing not at random (MNAR), that is, missingness may depend on the unobserved biomarker values, we propose a sensitivity analysis to assess the impact of MNAR on the estimation of the ROC AUC. The asymptotic properties of the proposed estimators are studied and their finite‐sample behaviors are evaluated in simulation studies. The methods are further illustrated using data from a study of maternal depression during pregnancy.  相似文献   

4.
MOTIVATION: An important application of microarrays is to discover genomic biomarkers, among tens of thousands of genes assayed, for disease classification. Thus there is a need for developing statistical methods that can efficiently use such high-throughput genomic data, select biomarkers with discriminant power and construct classification rules. The ROC (receiver operator characteristic) technique has been widely used in disease classification with low-dimensional biomarkers because (1) it does not assume a parametric form of the class probability as required for example in the logistic regression method; (2) it accommodates case-control designs and (3) it allows treating false positives and false negatives differently. However, due to computational difficulties, the ROC-based classification has not been used with microarray data. Moreover, the standard ROC technique does not incorporate built-in biomarker selection. RESULTS: We propose a novel method for biomarker selection and classification using the ROC technique for microarray data. The proposed method uses a sigmoid approximation to the area under the ROC curve as the objective function for classification and the threshold gradient descent regularization method for estimation and biomarker selection. Tuning parameter selection based on the V-fold cross validation and predictive performance evaluation are also investigated. The proposed approach is demonstrated with a simulation study, the Colon data and the Estrogen data. The proposed approach yields parsimonious models with excellent classification performance.  相似文献   

5.
Predictive and prognostic biomarkers play an important role in personalized medicine to determine strategies for drug evaluation and treatment selection. In the context of continuous biomarkers, identification of an optimal cutoff for patient selection can be challenging due to limited information on biomarker predictive value, the biomarker’s distribution in the intended use population, and the complexity of the biomarker relationship to clinical outcomes. As a result, prespecified candidate cutoffs may be rationalized based on biological and practical considerations. In this context, adaptive enrichment designs have been proposed with interim decision rules to select a biomarker-defined subpopulation to optimize study performance. With a group sequential design as a reference, the performance of several proposed adaptive designs are evaluated and compared under various scenarios (e.g., sample size, study power, enrichment effects) where type I error rates are well controlled through closed testing procedures and where subpopulation selections are based upon the predictive probability of trial success. It is found that when the treatment is more effective in a subpopulation, these adaptive designs can improve study power substantially. Furthermore, we identified one adaptive design to have generally higher study power than the other designs under various scenarios.  相似文献   

6.
The use of tissue- and cell-based methods in developing drugs for retinal diseases is inefficient. Consequently, many aspects of ocular drug therapy for retinal diseases are poorly understood. Biomarkers as prognostic indicators of change are needed to optimize the use of drugs. VEGF is considered an important target of drug therapy and VEGF levels in tissue are indicative of solid tumor growth. However, since many aspects of VEGF as a biomarker of ocular disease have not been validated, it has been difficult to ascertain without invasive procedures whether VEGF in the eye is a biomarker of response to drug therapy. Using published papers, registered clinical trials, and proteomic databases we assessed the earlier evidence for VEGF as an exploratory biomarker of proliferative and vasculopathic disease of the retina and asked whether the molecule has been rigorously validated in clinical trials. The emerging use of aqueous humor sampling has made it possible to explore biomarkers in oculo, and determine whether they are predictive of drug efficacy. We present data supporting the use of aqueous humor to validate drug-signaling pathways and biomarkers in the eye. In addition, we recommend convening a collaborative congress to help standardize the identification, validation, and use of biomarkers in retinal disease.  相似文献   

7.

Background

One of the major goals in gene and protein expression profiling of cancer is to identify biomarkers and build classification models for prediction of disease prognosis or treatment response. Many traditional statistical methods, based on microarray gene expression data alone and individual genes' discriminatory power, often fail to identify biologically meaningful biomarkers thus resulting in poor prediction performance across data sets. Nonetheless, the variables in multivariable classifiers should synergistically interact to produce more effective classifiers than individual biomarkers.

Results

We developed an integrated approach, namely network-constrained support vector machine (netSVM), for cancer biomarker identification with an improved prediction performance. The netSVM approach is specifically designed for network biomarker identification by integrating gene expression data and protein-protein interaction data. We first evaluated the effectiveness of netSVM using simulation studies, demonstrating its improved performance over state-of-the-art network-based methods and gene-based methods for network biomarker identification. We then applied the netSVM approach to two breast cancer data sets to identify prognostic signatures for prediction of breast cancer metastasis. The experimental results show that: (1) network biomarkers identified by netSVM are highly enriched in biological pathways associated with cancer progression; (2) prediction performance is much improved when tested across different data sets. Specifically, many genes related to apoptosis, cell cycle, and cell proliferation, which are hallmark signatures of breast cancer metastasis, were identified by the netSVM approach. More importantly, several novel hub genes, biologically important with many interactions in PPI network but often showing little change in expression as compared with their downstream genes, were also identified as network biomarkers; the genes were enriched in signaling pathways such as TGF-beta signaling pathway, MAPK signaling pathway, and JAK-STAT signaling pathway. These signaling pathways may provide new insight to the underlying mechanism of breast cancer metastasis.

Conclusions

We have developed a network-based approach for cancer biomarker identification, netSVM, resulting in an improved prediction performance with network biomarkers. We have applied the netSVM approach to breast cancer gene expression data to predict metastasis in patients. Network biomarkers identified by netSVM reveal potential signaling pathways associated with breast cancer metastasis, and help improve the prediction performance across independent data sets.  相似文献   

8.
An MS-based metabolomics strategy including variable selection and PLSDA analysis has been assessed as a tool to discriminate between non-steatotic and steatotic human liver profiles. Different chemometric approaches for uninformative variable elimination were performed by using two of the most common software packages employed in the field of metabolomics (i.e., MATLAB and SIMCA-P). The first considered approach was performed with MATLAB where the PLS regression vector coefficient values were used to classify variables as informative or not. The second approach was run under SIMCA-P, where variable selection was performed according to both the PLS regression vector coefficients and VIP scores. PLSDA models performance features, such as model validation, variable selection criteria, and potential biomarker output, were assessed for comparison purposes. One interesting finding is that variable selection improved the classification predictiveness of all the models by facilitating metabolite identification and providing enhanced insight into the metabolic information acquired by the UPLC-MS method. The results prove that the proposed strategy is a potentially straightforward approach to improve model performance. Among others, GSH, lysophospholipids and bile acids were found to be the most important altered metabolites in the metabolomic profiles studied. However, further research and more in-depth biochemical interpretations are needed to unambiguously propose them as disease biomarkers.  相似文献   

9.
Few strong and consistent associations have arisen from observational studies of dietary consumption in relation to chronic disease risk. Measurement error in self-reported dietary assessment may be obscuring many such associations. Attempts to correct for measurement error have mostly used a second self-reported assessment in a subset of a study cohort to calibrate the self-reported assessment used throughout the cohort, under the dubious assumption of uncorrelated measurement errors between the two assessments. The use, instead, of objective biomarkers of nutrient consumption to produce calibrated consumption estimates provides a promising approach to enhance study reliability. As summarized here, we have recently applied this nutrient biomarker approach to examine energy, protein, and percent of energy from protein, in relation to disease incidence in Women’s Health Initiative cohorts, and find strong associations that are not evident without biomarker calibration. A major bottleneck for the broader use of a biomarker-calibration approach is the rather few nutrients for which a suitable biomarker has been developed. Some methodologic approaches to the development of additional pertinent biomarkers, including the possible use of a respiratory quotient from indirect calorimetry for macronutrient biomarker development, and the potential of human feeding studies for the evaluation of a range of urine- and blood-based potential biomarkers, will briefly be described.  相似文献   

10.
Safety biomarkers are important drug development tools, both preclinically and clinically. It is a straightforward process to correlate the performance of nonclinical safety biomarkers with histopathology, and ideally, the biomarker is useful in all species commonly used in safety assessment. In clinical validation studies, where histopathology is not feasible, safety biomarkers are compared to the response of standard biomarkers and/or to clinical adjudication. Worldwide, regulatory agencies have put in place processes to qualify biomarkers to provide confidence in the manner of use and interpretation of biomarker data in drug development studies. This paper describes currently qualified safety biomarkers which can be utilized to monitor for nephrotoxicity and cardiotoxicity and ongoing projects to qualify safety biomarkers for liver, skeletal muscle, and vascular injury. In many cases, the development and use of these critical drug development tools is dependent upon partnerships and the precompetitive sharing of data to support qualification efforts.  相似文献   

11.
The use of biochemical or physiological measurements as indicators of ecotoxicity is under constant development and has the advantage of delineating effects before the appearance of disease. However, these biomarkers are often part of a battery of tests, and it is difficult to integrate them together to gain an overall view of an organism's health. The aim of this study was to develop an index that could integrate the data derived from a battery of biomarkers for application to both spatial and temporal studies. Mya arenaria clams were collected at different sites along the Saguenay Fjord (Quebec, Canada). Six biomarkers were measured: metallothioneins, DNA strand breakage, lipid peroxidation, vitellin-like proteins, phagocytosis, and non-specific esterase activity in haemocytes. A biomarker index was obtained by summing the biomarker values expressed in term of classes. Classes were determined by a distribution-free approach derived from the theory of rough sets. The results of the spatial study show that the index values discriminated well between contaminated and uncontaminated sites. The highly polluted sites had the highest index values (18 compared with a reference value of 14). In the temporal study, the index was also able to highlight possible contamination-induced alterations, even though the interpretation of temporal variation is complicated by natural variations occurring throughout the year. A control chart approach is proposed for determining contaminated sites in both spatial and temporal surveys.  相似文献   

12.
The use of biochemical or physiological measurements as indicators of ecotoxicity is under constant development and has the advantage of delineating effects before the appearance of disease. However, these biomarkers are often part of a battery of tests, and it is difficult to integrate them together to gain an overall view of an organism's health. The aim of this study was to develop an index that could integrate the data derived from a battery of biomarkers for application to both spatial and temporal studies. Mya arenaria clams were collected at different sites along the Saguenay Fjord (Quebec, Canada). Six biomarkers were measured: metallothioneins, DNA strand breakage, lipid peroxidation, vitellin-like proteins, phagocytosis, and non-specific esterase activity in haemocytes. A biomarker index was obtained by summing the biomarker values expressed in term of classes. Classes were determined by a distribution-free approach derived from the theory of rough sets. The results of the spatial study show that the index values discriminated well between contaminated and uncontaminated sites. The highly polluted sites had the highest index values (18 compared with a reference value of 14). In the temporal study, the index was also able to highlight possible contamination-induced alterations, even though the interpretation of temporal variation is complicated by natural variations occurring throughout the year. A control chart approach is proposed for determining contaminated sites in both spatial and temporal surveys.  相似文献   

13.
In many settings, including oncology, increasing the dose of treatment results in both increased efficacy and toxicity. With the increasing availability of validated biomarkers and prediction models, there is the potential for individualized dosing based on patient specific factors. We consider the setting where there is an existing dataset of patients treated with heterogenous doses and including binary efficacy and toxicity outcomes and patient factors such as clinical features and biomarkers. The goal is to analyze the data to estimate an optimal dose for each (future) patient based on their clinical features and biomarkers. We propose an optimal individualized dose finding rule by maximizing utility functions for individual patients while limiting the rate of toxicity. The utility is defined as a weighted combination of efficacy and toxicity probabilities. This approach maximizes overall efficacy at a prespecified constraint on overall toxicity. We model the binary efficacy and toxicity outcomes using logistic regression with dose, biomarkers and dose–biomarker interactions. To incorporate the large number of potential parameters, we use the LASSO method. We additionally constrain the dose effect to be non-negative for both efficacy and toxicity for all patients. Simulation studies show that the utility approach combined with any of the modeling methods can improve efficacy without increasing toxicity relative to fixed dosing. The proposed methods are illustrated using a dataset of patients with lung cancer treated with radiation therapy.  相似文献   

14.
The paper outlines a 2-tier approach for wide-scale biomonitoring programmes. To obtain a high level of standardization, we suggest the use of caged organisms (mussels or fish). An "early warning", highly sensitive, low-cost biomarker is employed in tier 1 (i.e. lysosomal membrane stability (LMS) and survival rate, a marker for highly polluted sites). Tier 2 is used only for animals sampled at sites in which LMS changes are evident and there is no mortality, with a complete battery of biomarkers assessing the levels of pollutant-induced stress syndrome. Possible approaches for integrating biomarker data in a synthetic index are discussed, along with our proposal to use a recently developed Expert System. The latter system allows a correct selection of biomarkers at different levels of biological organisation (molecular/cellular/tissue/organism) taking into account trends in pollutant-induced biomarker changes (increasing, decreasing, bell-shape). A selection of biomarkers of stress, genotoxicity and exposure usually employed in biomonitoring programmes is presented, together with a brief overview of new biomolecular approaches.  相似文献   

15.
Proteomic biomarker discovery has led to the identification of numerous potential candidates for disease diagnosis, prognosis, and prediction of response to therapy. However, very few of these identified candidate biomarkers reach clinical validation and go on to be routinely used in clinical practice. One particular issue with biomarker discovery is the identification of significantly changing proteins in the initial discovery experiment that do not validate when subsequently tested on separate patient sample cohorts. Here, we seek to highlight some of the statistical challenges surrounding the analysis of LC‐MS proteomic data for biomarker candidate discovery. We show that common statistical algorithms run on data with low sample sizes can overfit and yield misleading misclassification rates and AUC values. A common solution to this problem is to prefilter variables (via, e.g. ANOVA and or use of correction methods such as Bonferonni or false discovery rate) to give a smaller dataset and reduce the size of the apparent statistical challenge. However, we show that this exacerbates the problem yielding even higher performance metrics while reducing the predictive accuracy of the biomarker panel. To illustrate some of these limitations, we have run simulation analyses with known biomarkers. For our chosen algorithm (random forests), we show that the above problems are substantially reduced if a sufficient number of samples are analyzed and the data are not prefiltered. Our view is that LC‐MS proteomic biomarker discovery data should be analyzed without prefiltering and that increasing the sample size in biomarker discovery experiments should be a very high priority.  相似文献   

16.
17.
The identification of biomarkers is one of the leading research areas in proteomics. When biomarkers have to be searched for in spot volume datasets produced by 2D gel-electrophoresis, problems may arise related to the large number of spots present in each map and the small number of samples available in each class (control/pathological). In such cases multivariate methods are usually exploited together with variable selection procedures, to provide a set of possible biomarkers: they are however usually aimed to the selection of the smallest set of variables (spots) providing the best performances in prediction. This approach seems not to be suitable for the identification of potential biomarkers since in this case all the possible candidate biomarkers have to be identified to provide a general picture of the "pathological state": in this case exhaustivity has to be preferred to provide a complete understanding of the mechanisms underlying the pathology. We propose here a ranking and classification method, "Ranking-PCA", based on Principal Component Analysis and variable selection in forward search: the method selects one variable at a time as the one providing the best separation of the two classes investigated in the space given by the relevant PCs. The method was applied to an artificial dataset and a real case-study: Ranking-PCA exhaustively identified the potential biomarkers and provided reliable and robust results.  相似文献   

18.
Recent technological advances continue to provide noninvasive and more accurate biomarkers for evaluating disease status. One standard tool for assessing the accuracy of diagnostic tests is the receiver operating characteristic (ROC) curve. Few statistical methods exist to accommodate multiple continuous‐scale biomarkers in the framework of ROC analysis. In this paper, we propose a method to integrate continuous‐scale biomarkers to optimize classification accuracy. Specifically, we develop semiparametric transformation models for multiple biomarkers. We assume that unknown and marker‐specific transformations of biomarkers follow a multivariate normal distribution. Our models accommodate biomarkers subject to limits of detection and account for the dependence among biomarkers by including a subject‐specific random effect. We also propose a diagnostic measure using an optimal linear combination of the transformed biomarkers. Our diagnostic rule does not depend on any monotone transformation of biomarkers and is not sensitive to extreme biomarker values. Nonparametric maximum likelihood estimation (NPMLE) is used for inference. We show that the parameter estimators are asymptotically normal and efficient. We illustrate our semiparametric approach using data from the Endometriosis, Natural History, Diagnosis, and Outcomes (ENDO) study.  相似文献   

19.
For many diseases, there is an unmet need for new or better biomarkers for improved disease risk assessment and monitoring, as available markers lack sufficient specificity. Lipids are drawing major interest as potential candidates for filling these gaps. This has recently been demonstrated by the identification of selective ceramides for prediction of cardiovascular mortality, enabling improved risk assessment of cardiovascular disease compared with conventional clinical markers. In this review, we discuss current lipid biomarker findings and the possible connection between cardiovascular disease, chronic obstructive pulmonary disease, and aging. Moreover, we discuss how to overcome the current roadblocks facing lipid biomarker research. We stress the need for improved quantification, standardization of methodologies, and establishment of initial reference values to allow for an efficient transfer path of research findings into the clinical landscape, and, ultimately, to put newly identified biomarkers into practical use.  相似文献   

20.
HIV incidence estimates are used to monitor HIV-1 infection in the United States. Use of laboratory biomarkers that distinguish recent from longstanding infection to quantify HIV incidence rely on having accurate knowledge of the average time that individuals spend in a transient state of recent infection between seroconversion and reaching a specified biomarker cutoff value. This paper describes five estimation procedures from two general statistical approaches, a survival time approach and an approach that fits binomial models of the probability of being classified as recently infected, as a function of time since seroconversion. We compare these procedures for estimating the mean duration of recent infection (MDRI) for two biomarkers used by the U.S. National HIV Surveillance System for determination of HIV incidence, the Aware BED EIA HIV-1 incidence test (BED) and the avidity-based, modified Bio-Rad HIV-1/HIV-2 plus O ELISA (BRAI) assay. Collectively, 953 specimens from 220 HIV-1 subtype B seroconverters, taken from 5 cohorts, were tested with a biomarker assay. Estimates of MDRI using the non-parametric survival approach were 198.4 days (SD 13.0) for BED and 239.6 days (SD 13.9) for BRAI using cutoff values of 0.8 normalized optical density and 30%, respectively. The probability of remaining in the recent state as a function of time since seroconversion, based upon this revised statistical approach, can be applied in the calculation of annual incidence in the United States.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号