首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 352 毫秒
1.
2.
The mass spectrometry (MS) technology in clinical proteomics is very promising for discovery of new biomarkers for diseases management. To overcome the obstacles of data noises in MS analysis, we proposed a new approach of knowledge-integrated biomarker discovery using data from Major Adverse Cardiac Events (MACE) patients. We first built up a cardiovascular-related network based on protein information coming from protein annotations in Uniprot, protein-protein interaction (PPI), and signal transduction database. Distinct from the previous machine learning methods in MS data processing, we then used statistical methods to discover biomarkers in cardiovascular-related network. Through the tradeoff between known protein information and data noises in mass spectrometry data, we finally could firmly identify those high-confident biomarkers. Most importantly, aided by protein-protein interaction network, that is, cardiovascular-related network, we proposed a new type of biomarkers, that is, network biomarkers, composed of a set of proteins and the interactions among them. The candidate network biomarkers can classify the two groups of patients more accurately than current single ones without consideration of biological molecular interaction.  相似文献   

3.
Recent technical advances in the field of quantitative proteomics have stimulated a large number of biomarker discovery studies of various diseases, providing avenues for new treatments and diagnostics. However, inherent challenges have limited the successful translation of candidate biomarkers into clinical use, thus highlighting the need for a robust analytical methodology to transition from biomarker discovery to clinical implementation. We have developed an end-to-end computational proteomic pipeline for biomarkers studies. At the discovery stage, the pipeline emphasizes different aspects of experimental design, appropriate statistical methodologies, and quality assessment of results. At the validation stage, the pipeline focuses on the migration of the results to a platform appropriate for external validation, and the development of a classifier score based on corroborated protein biomarkers. At the last stage towards clinical implementation, the main aims are to develop and validate an assay suitable for clinical deployment, and to calibrate the biomarker classifier using the developed assay. The proposed pipeline was applied to a biomarker study in cardiac transplantation aimed at developing a minimally invasive clinical test to monitor acute rejection. Starting with an untargeted screening of the human plasma proteome, five candidate biomarker proteins were identified. Rejection-regulated proteins reflect cellular and humoral immune responses, acute phase inflammatory pathways, and lipid metabolism biological processes. A multiplex multiple reaction monitoring mass-spectrometry (MRM-MS) assay was developed for the five candidate biomarkers and validated by enzyme-linked immune-sorbent (ELISA) and immunonephelometric assays (INA). A classifier score based on corroborated proteins demonstrated that the developed MRM-MS assay provides an appropriate methodology for an external validation, which is still in progress. Plasma proteomic biomarkers of acute cardiac rejection may offer a relevant post-transplant monitoring tool to effectively guide clinical care. The proposed computational pipeline is highly applicable to a wide range of biomarker proteomic studies.  相似文献   

4.
The evolution of “informatics” technologies has the potential to generate massive databases, but the extent to which personalized medicine may be effectuated depends on the extent to which these rich databases may be utilized to advance understanding of the disease molecular profiles and ultimately integrated for treatment selection, necessitating robust methodology for dimension reduction. Yet, statistical methods proposed to address challenges arising with the high‐dimensionality of omics‐type data predominately rely on linear models and emphasize associations deriving from prognostic biomarkers. Existing methods are often limited for discovering predictive biomarkers that interact with treatment and fail to elucidate the predictive power of their resultant selection rules. In this article, we present a Bayesian predictive method for personalized treatment selection that is devised to integrate both the treatment predictive and disease prognostic characteristics of a particular patient's disease. The method appropriately characterizes the structural constraints inherent to prognostic and predictive biomarkers, and hence properly utilizes these complementary sources of information for treatment selection. The methodology is illustrated through a case study of lower grade glioma. Theoretical considerations are explored to demonstrate the manner in which treatment selection is impacted by prognostic features. Additionally, simulations based on an actual leukemia study are provided to ascertain the method's performance with respect to selection rules derived from competing methods.  相似文献   

5.

Background

Biomarker discovery holds the promise for advancing personalized medicine as the biomarkers can help match patients to optimal treatment to improve patient outcomes. However, serious concerns have been raised because very few molecular biomarkers or signatures discovered from high dimensional array data can be successfully validated and applied to clinical use. We propose good practice guidelines as well as a novel tool for biomarker discovery and use breast cancer prognosis as a case study to illustrate the proposed approach.

Results

We applied the proposed approach to a publicly available breast cancer prognosis dataset and identified small numbers of predictive markers for patient subpopulations stratified by clinical variables. Results from an independent cross-platform validation set show that our model compares favorably to other gene signature and clinical variable based prognostic tools. About half of the discovered candidate markers can individually achieve very good performance, which further demonstrate the high quality of feature selection. These candidate markers perform extremely well for young patient with estrogen receptor-positive, lymph node-negative early stage breast cancers, suggesting a distinct subset of these patients identified by these markers is actually at high risk of recurrence and may benefit from more aggressive treatment than cur-rent practice.

Conclusion

The results show that by following good practice guidelines, we can identify highly predictive genes in high dimensional breast cancer array data. These predictive genes have been successfully validated using an independent cross-platform dataset.
  相似文献   

6.
Wei Zou  Zhao-Bang Zeng 《Genetica》2009,137(2):125-134
To find the correlations between genome-wide gene expression variations and sequence polymorphisms in inbred cross populations, we developed a statistical method to claim expression quantitative trait loci (eQTL) in a genome. The method is based on multiple interval mapping (MIM), a model selection procedure, and uses false discovery rate (FDR) to measure the statistical significance of the large number of eQTL. We compared our method with a similar procedure proposed by Storey et al. and found that our method can be more powerful. We identified the features in the two methods that resulted in different statistical powers for eQTL detection, and confirmed them by simulation. We organized our computational procedure in an R package which can estimate FDR for positive findings from similar model selection procedures. The R package, MIM-eQTL, can be found at .  相似文献   

7.

Background  

Robust biomarkers are needed to improve microbial identification and diagnostics. Proteomics methods based on mass spectrometry can be used for the discovery of novel biomarkers through their high sensitivity and specificity. However, there has been a lack of a coherent pipeline connecting biomarker discovery with established approaches for evaluation and validation. We propose such a pipeline that uses in silico methods for refined biomarker discovery and confirmation.  相似文献   

8.
Large-scale hypothesis testing has become a ubiquitous problem in high-dimensional statistical inference, with broad applications in various scientific disciplines. One relevant application is constituted by imaging mass spectrometry (IMS) association studies, where a large number of tests are performed simultaneously in order to identify molecular masses that are associated with a particular phenotype, for example, a cancer subtype. Mass spectra obtained from matrix-assisted laser desorption/ionization (MALDI) experiments are dependent, when considered as statistical quantities. False discovery proportion (FDP) estimation and  control under arbitrary dependency structure among test statistics is an active topic in modern multiple testing research. In this context, we are concerned with the evaluation of associations between the binary outcome variable (describing the phenotype) and multiple predictors derived from MALDI measurements. We propose an inference procedure in which the correlation matrix of the test statistics is utilized. The approach is based on multiple marginal models. Specifically, we fit a marginal logistic regression model for each predictor individually. Asymptotic joint normality of the stacked vector of the marginal regression coefficients is established under standard regularity assumptions, and their (limiting) correlation matrix is estimated. The proposed method extracts common factors from the resulting empirical correlation matrix. Finally, we estimate the realized FDP of a thresholding procedure for the marginal p-values. We demonstrate a practical application of the proposed workflow to MALDI IMS data in an oncological context.  相似文献   

9.
Proteomic biomarker discovery has led to the identification of numerous potential candidates for disease diagnosis, prognosis, and prediction of response to therapy. However, very few of these identified candidate biomarkers reach clinical validation and go on to be routinely used in clinical practice. One particular issue with biomarker discovery is the identification of significantly changing proteins in the initial discovery experiment that do not validate when subsequently tested on separate patient sample cohorts. Here, we seek to highlight some of the statistical challenges surrounding the analysis of LC‐MS proteomic data for biomarker candidate discovery. We show that common statistical algorithms run on data with low sample sizes can overfit and yield misleading misclassification rates and AUC values. A common solution to this problem is to prefilter variables (via, e.g. ANOVA and or use of correction methods such as Bonferonni or false discovery rate) to give a smaller dataset and reduce the size of the apparent statistical challenge. However, we show that this exacerbates the problem yielding even higher performance metrics while reducing the predictive accuracy of the biomarker panel. To illustrate some of these limitations, we have run simulation analyses with known biomarkers. For our chosen algorithm (random forests), we show that the above problems are substantially reduced if a sufficient number of samples are analyzed and the data are not prefiltered. Our view is that LC‐MS proteomic biomarker discovery data should be analyzed without prefiltering and that increasing the sample size in biomarker discovery experiments should be a very high priority.  相似文献   

10.
Over the past decade, there has been growing enthusiasm for using electronic medical records (EMRs) for biomedical research. Quantile regression estimates distributional associations, providing unique insights into the intricacies and heterogeneity of the EMR data. However, the widespread nonignorable missing observations in EMR often obscure the true associations and challenge its potential for robust biomedical discoveries. We propose a novel method to estimate the covariate effects in the presence of nonignorable missing responses under quantile regression. This method imposes no parametric specifications on response distributions, which subtly uses implicit distributions induced by the corresponding quantile regression models. We show that the proposed estimator is consistent and asymptotically normal. We also provide an efficient algorithm to obtain the proposed estimate and a randomly weighted bootstrap approach for statistical inferences. Numerical studies, including an empirical analysis of real-world EMR data, are used to assess the proposed method's finite-sample performance compared to existing literature.  相似文献   

11.
In many settings, including oncology, increasing the dose of treatment results in both increased efficacy and toxicity. With the increasing availability of validated biomarkers and prediction models, there is the potential for individualized dosing based on patient specific factors. We consider the setting where there is an existing dataset of patients treated with heterogenous doses and including binary efficacy and toxicity outcomes and patient factors such as clinical features and biomarkers. The goal is to analyze the data to estimate an optimal dose for each (future) patient based on their clinical features and biomarkers. We propose an optimal individualized dose finding rule by maximizing utility functions for individual patients while limiting the rate of toxicity. The utility is defined as a weighted combination of efficacy and toxicity probabilities. This approach maximizes overall efficacy at a prespecified constraint on overall toxicity. We model the binary efficacy and toxicity outcomes using logistic regression with dose, biomarkers and dose–biomarker interactions. To incorporate the large number of potential parameters, we use the LASSO method. We additionally constrain the dose effect to be non-negative for both efficacy and toxicity for all patients. Simulation studies show that the utility approach combined with any of the modeling methods can improve efficacy without increasing toxicity relative to fixed dosing. The proposed methods are illustrated using a dataset of patients with lung cancer treated with radiation therapy.  相似文献   

12.
Summary Identification of novel biomarkers for risk assessment is important for both effective disease prevention and optimal treatment recommendation. Discovery relies on the precious yet limited resource of stored biological samples from large prospective cohort studies. Case‐cohort sampling design provides a cost‐effective tool in the context of biomarker evaluation, especially when the clinical condition of interest is rare. Existing statistical methods focus on making efficient inference on relative hazard parameters from the Cox regression model. Drawing on recent theoretical development on the weighted likelihood for semiparametric models under two‐phase studies ( Breslow and Wellner, 2007 ), we propose statistical methods to evaluate accuracy and predictiveness of a risk prediction biomarker, with censored time‐to‐event outcome under stratified case‐cohort sampling. We consider nonparametric methods and a semiparametric method. We derive large sample properties of proposed estimators and evaluate their finite sample performance using numerical studies. We illustrate new procedures using data from Framingham Offspring Study to evaluate the accuracy of a recently developed risk score incorporating biomarker information for predicting cardiovascular disease.  相似文献   

13.
Many efforts have been made to discover novel bio-markers for early disease detection in oncology. However, the lack of efficient computational strategies impedes the discovery of disease-specific biomarkers for better understanding and management of treatment outcomes. In this study, we propose a novel graph-based scoring function to rank and identify the most robust biomarkers from limited proteomics data. The proposed method measures the proximity between candidate proteins identified by mass spectrometry (MS) analysis utilizing prior reported knowledge in the literature. Recent advances in mass spectrometry provide new opportunities to identify unique biomarkers from peripheral blood samples in complex treatment modalities such as radiation therapy (radiotherapy), which enables early disease detection, disease progression monitoring, and targeted intervention. Specifically, the dose-limiting role of radiation-induced lung injury known as radiation pneumonitis (RP) in lung cancer patients receiving radiotherapy motivates the search for robust predictive biomarkers. In this case study, plasma from 26 locally advanced non-small cell lung cancer (NSCLC) patients treated with radiotherapy in a longitudinal 3 × 3 matched-control cohort was fractionated using in-line, sequential multiaffinity chromatography. The complex peptide mixtures from endoprotease digestions were analyzed using comparative, high-resolution liquid chromatography (LC)-MS to identify and quantify differential peptide signals. Through analysis of survey mass spectra and annotations of peptides from the tandem spectra, we found candidate proteins that appear to be associated with RP. On the basis of the proposed methodology, α-2-macroglobulin (α2M) was unambiguously ranked as the top candidate protein. As independent validation of this candidate protein, enzyme-linked immunosorbent assay (ELISA) experiments were performed on independent cohort of 20 patients' samples resulting in early significant discrimination between RP and non-RP patients (p = 0.002). These results suggest that the proposed methodology based on longitudinal proteomics analysis and a novel bioinformatics ranking algorithm is a potentially promising approach for the challenging problem of identifying relevant biomarkers in sample-limited clinical applications.  相似文献   

14.
A class of nonparametric statistical methods, including a nonparametric empirical Bayes (EB) method, the Significance Analysis of Microarrays (SAM) and the mixture model method (MMM) have been proposed to detect differential gene expression for replicated microarray experiments. They all depend on constructing a test statistic, for example, a t-statistic, and then using permutation to draw inferences. However, due to special features of microarray data, using standard permutation scores may not estimate the null distribution of the test statistic well, leading to possibly too conservative inferences. We propose a new method of constructing weighted permutation scores to overcome the problem: posterior probabilities of having no differential expression from the EB method are used as weights for genes to better estimate the null distribution of the test statistic. We also propose a weighted method to estimate the false discovery rate (FDR) using the posterior probabilities. Using simulated data and real data for time-course microarray experiments, we show the improved performance of the proposed methods when implemented in MMM, EB and SAM.  相似文献   

15.
Personalized medicine aims to identify those patients who have good or poor prognosis for overall disease outcomes or therapeutic efficacy for a specific treatment. A well-established approach is to identify a set of biomarkers using statistical methods with a classification algorithm to identify patient subgroups for treatment selection. However, there are potential false positives and false negatives in classification resulting in incorrect patient treatment assignment. In this paper, we propose a hybrid mixture model taking uncertainty in class labels into consideration, where the class labels are modeled by a Bernoulli random variable. An EM algorithm was developed to estimate the model parameters, and a parametric bootstrap method was used to test the significance of the predictive variables that were associated with subgroup memberships. Simulation experiments showed that the proposed method averagely had higher accuracy in identifying the subpopulations than the Naïve Bayes classifier and logistic regression. A breast cancer dataset was analyzed to illustrate the proposed hybrid mixture model.  相似文献   

16.
Tao T  Zhai CX  Lu X  Fang H 《Applied bioinformatics》2004,3(2-3):115-124
Automatic discovery of new protein motifs (i.e. amino acid patterns) is one of the major challenges in bioinformatics. Several algorithms have been proposed that can extract statistically significant motif patterns from any set of protein sequences. With these methods, one can generate a large set of candidate motifs that may be biologically meaningful. This article examines methods to predict the functions of these candidate motifs. We use several statistical methods: a popularity method, a mutual information method and probabilistic translation models. These methods capture, from different perspectives, the correlations between the matched motifs of a protein and its assigned Gene Ontology terms that characterise the function of the protein. We evaluate these different methods using the known motifs in the InterPro database. Each method is used to rank candidate terms for each motif. We then use the expected mean reciprocal rank to evaluate the performance. The results show that, in general, all these methods perform well, suggesting that they can all be useful for predicting the function of an unknown motif. Among the methods tested, a probabilistic translation model with a popularity prior performs the best.  相似文献   

17.
《Genomics》2020,112(5):3284-3293
Asthma, chronic obstructive pulmonary disease (COPD), and idiopathic pulmonary fibrosis (IPF) are three serious lung inflammatory diseases. The understanding of the pathogenesis mechanism and the identification of potential prognostic biomarkers of these diseases can provide the patients with more efficient treatments. In this study, an efficient hybrid feature selection method was introduced in order to extract informative genes. We implemented an ontology-based ranking approach on differentially expressed genes following a wrapper method. The examination of the different gene ontologies and their combinations motivated us to propose a biological functional-based method to improve the performance of further wrapper methods. The results identified: TOM1L1, SRSF1, and GIT2 in asthma; CHCHD4, PAIP2, CRLF3, UBQLN4, TRAK1, PRELID1, VAMP4, CCM2, and APBB1IP in COPD; and TUFT1, GAB2, B4GALNT1, TNFRSF17, PRDM8, and SETDB2 in IPF as the potential biomarkers. The proposed method can be used to identify hub genes in other high-throughput datasets.  相似文献   

18.
In this work, we propose a novel method for individualized treatment selection when the treatment response is multivariate. Our method covers any number of treatments and it can be applied for a broad set of models. The proposed method uses a Mahalanobis-type distance measure to establish an ordering of treatments based on treatment performance measures. Our investigation in this work deals with means of responses conditional on lower dimensional composite scores based on covariates where these scores are built using single index models to approximate mean responses against patient covariates. Smoothed estimates of such conditional means are combined to construct an estimate of the aforementioned distance measure, which is then used to estimate the optimal treatment. An empirical study demonstrates the performance of the proposed method in finite samples. We also present a data analysis using an HIV clinical trial data to show the applicability of the proposed procedure for real data.  相似文献   

19.
Recent technological advances continue to provide noninvasive and more accurate biomarkers for evaluating disease status. One standard tool for assessing the accuracy of diagnostic tests is the receiver operating characteristic (ROC) curve. Few statistical methods exist to accommodate multiple continuous‐scale biomarkers in the framework of ROC analysis. In this paper, we propose a method to integrate continuous‐scale biomarkers to optimize classification accuracy. Specifically, we develop semiparametric transformation models for multiple biomarkers. We assume that unknown and marker‐specific transformations of biomarkers follow a multivariate normal distribution. Our models accommodate biomarkers subject to limits of detection and account for the dependence among biomarkers by including a subject‐specific random effect. We also propose a diagnostic measure using an optimal linear combination of the transformed biomarkers. Our diagnostic rule does not depend on any monotone transformation of biomarkers and is not sensitive to extreme biomarker values. Nonparametric maximum likelihood estimation (NPMLE) is used for inference. We show that the parameter estimators are asymptotically normal and efficient. We illustrate our semiparametric approach using data from the Endometriosis, Natural History, Diagnosis, and Outcomes (ENDO) study.  相似文献   

20.
L. Xue  L. Wang  A. Qu 《Biometrics》2010,66(2):393-404
Summary We propose a new estimation method for multivariate failure time data using the quadratic inference function (QIF) approach. The proposed method efficiently incorporates within‐cluster correlations. Therefore, it is more efficient than those that ignore within‐cluster correlation. Furthermore, the proposed method is easy to implement. Unlike the weighted estimating equations in Cai and Prentice (1995, Biometrika 82 , 151–164), it is not necessary to explicitly estimate the correlation parameters. This simplification is particularly useful in analyzing data with large cluster size where it is difficult to estimate intracluster correlation. Under certain regularity conditions, we show the consistency and asymptotic normality of the proposed QIF estimators. A chi‐squared test is also developed for hypothesis testing. We conduct extensive Monte Carlo simulation studies to assess the finite sample performance of the proposed methods. We also illustrate the proposed methods by analyzing primary biliary cirrhosis (PBC) data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号