共查询到20条相似文献,搜索用时 0 毫秒
1.
Background
There is a continuing need to develop molecular diagnostic tools which complement histopathologic examination to increase the accuracy of cancer diagnosis. DNA microarrays provide a means for measuring gene expression signatures which can then be used as components of genomic-based diagnostic tests to determine the presence of cancer. 相似文献2.
Background
A potential benefit of profiling of tissue samples using microarrays is the generation of molecular fingerprints that will define subtypes of disease. Hierarchical clustering has been the primary analytical tool used to define disease subtypes from microarray experiments in cancer settings. Assessing cluster reliability poses a major complication in analyzing output from clustering procedures. While most work has focused on estimating the number of clusters in a dataset, the question of stability of individual-level clusters has not been addressed. 相似文献3.
4.
5.
MOTIVATION: Although several recently proposed analysis packages for microarray data can cope with heavy-tailed noise, many applications rely on Gaussian assumptions. Gaussian noise models foster computational efficiency. This comes, however, at the expense of increased sensitivity to outlying observations. Assessing potential insufficiencies of Gaussian noise in microarray data analysis is thus important and of general interest. RESULTS: We propose to this end assessing different noise models on a large number of microarray experiments. The goodness of fit of noise models is quantified by a hierarchical Bayesian analysis of variance model, which predicts normalized expression values as a mixture of a Gaussian density and t-distributions with adjustable degrees of freedom. Inference of differentially expressed genes is taken into consideration at a second mixing level. For attaining far reaching validity, our investigations cover a wide range of analysis platforms and experimental settings. As the most striking result, we find irrespective of the chosen preprocessing and normalization method in all experiments that a heavy-tailed noise model is a better fit than a simple Gaussian. Further investigations revealed that an appropriate choice of noise model has a considerable influence on biological interpretations drawn at the level of inferred genes and gene ontology terms. We conclude from our investigation that neglecting the over dispersed noise in microarray data can mislead scientific discovery and suggest that the convenience of Gaussian-based modelling should be replaced by non-parametric approaches or other methods that account for heavy-tailed noise. 相似文献
6.
7.
Lin Jiang Hui Jiang Sheng Dai Ying Chen Youqiang Song Clara
Sze-Man Tang Shirley Yin-Yu Pang Shu-Leong Ho Binbin Wang Maria-Mercedes Garcia-Barcelo Paul Kwong-Hang Tam Stacey
S Cherny Mulin
Jun Li Pak Chung Sham Miaoxin Li 《Nucleic acids research》2022,50(6):e34
Identifying rare variants that contribute to complex diseases is challenging because of the low statistical power in current tests comparing cases with controls. Here, we propose a novel and powerful rare variants association test based on the deviation of the observed mutation burden of a gene in cases from a baseline predicted by a weighted recursive truncated negative-binomial regression (RUNNER) on genomic features available from public data. Simulation studies show that RUNNER is substantially more powerful than state-of-the-art rare variant association tests and has reasonable type 1 error rates even for stratified populations or in small samples. Applied to real case-control data, RUNNER recapitulates known genes of Hirschsprung disease and Alzheimer''s disease missed by current methods and detects promising new candidate genes for both disorders. In a case-only study, RUNNER successfully detected a known causal gene of amyotrophic lateral sclerosis. The present study provides a powerful and robust method to identify susceptibility genes with rare risk variants for complex diseases. 相似文献
8.
There is tremendous scientific interest in the analysis of gene expression data in clinical settings, such as oncology. In this paper, we describe the importance of adjusting for confounders and other prognostic factors in order to select for differentially expressed genes for follow-up validation studies. We develop two approaches to the analysis of microarray data in non-randomized clinical settings. The first is an extension of the current significance analysis of microarray procedures, where other covariates are taken into account. The second is a novel covariate-adjusted regression modelling based on the receiver operating characteristic (ROC) curve for the analysis of gene expression data. The ideas are illustrated using data from a prostate cancer molecular profiling study. 相似文献
9.
In previous work, we proposed a method for detecting differential gene expression based on change-point of expression profile. This non-parametric change-point method gave promising result in both simulation study and public dataset experiment. However, the performance is still limited by the less sensitiveness to the right bound and the statistical significance of the statistics has not been fully explored. To overcome the insensitiveness to the right bound we modified the original method by adding a weight function to the D(n) statistic. Simulation study showed that the weighted change-point statistics method is significantly better than the original NPCPS in terms of ROC, false positive rate, as well as change-point estimate. The mean absolute error of the estimated change-point by weighted change-point method was 0.03, reduced by more than 50% comparing with the original 0.06, and the mean FPR was reduced by more than 55%. Experiment on microarray Dataset I resulted in 3974 differentially expressed genes out of total 5293 genes; experiment on microarray Dataset II resulted in 9983 differentially expressed genes among total 12576 genes. In summary, the method proposed here is an effective modification to the previous method especially when only a small subset of cancer samples has DGE. 相似文献
10.
11.
12.
MOTIVATION: Methods for analyzing cancer microarray data often face two distinct challenges: the models they infer need to perform well when classifying new tissue samples while at the same time providing an insight into the patterns and gene interactions hidden in the data. State-of-the-art supervised data mining methods often cover well only one of these aspects, motivating the development of methods where predictive models with a solid classification performance would be easily communicated to the domain expert. RESULTS: Data visualization may provide for an excellent approach to knowledge discovery and analysis of class-labeled data. We have previously developed an approach called VizRank that can score and rank point-based visualizations according to degree of separation of data instances of different class. We here extend VizRank with techniques to uncover outliers, score features (genes) and perform classification, as well as to demonstrate that the proposed approach is well suited for cancer microarray analysis. Using VizRank and radviz visualization on a set of previously published cancer microarray data sets, we were able to find simple, interpretable data projections that include only a small subset of genes yet do clearly differentiate among different cancer types. We also report that our approach to classification through visualization achieves performance that is comparable to state-of-the-art supervised data mining techniques. AVAILABILITY: VizRank and radviz are implemented as part of the Orange data mining suite (http://www.ailab.si/orange). SUPPLEMENTARY INFORMATION: Supplementary data are available from http://www.ailab.si/supp/bi-cancer. 相似文献
13.
Bay BH Jin R Huang J Tan PH 《Experimental biology and medicine (Maywood, N.J.)》2006,231(9):1516-1521
Breast cancer is the most common cancer in women, with a general upward trend in incidence. Basic and clinical breast cancer research has continued at a rapid pace, in the endeavor to understand the biology of the disease so as to improve management of patients. Besides traditional pathological indicators, expression of molecular markers in breast cancer has also been comprehensively investigated. This paper will focus on the prognostic utility of metallothioneins (MTs), a family of low molecular weight metal binding proteins encoded by at least 10 functional MT genes that are associated with cell proliferation in breast cancer. Evidence that MT is a potential prognostic biomarker for breast cancer is supported by many reports in the literature. Expression of the MT protein has been detected by immunohistochemistry in a significant portion of invasive ductal breast cancers. MT expression has also been well studied in association with traditional clinico-pathological parameters of breast cancers. Generally, higher MT expression in breast cancers is predictive of worse patient outcomes. The relationship of MT isoforms to histological grade, estrogen receptor (ER) status, and prognosis will also be discussed. 相似文献
14.
Background
Normalization is a basic step in microarray data analysis. A proper normalization procedure ensures that the intensity ratios provide meaningful measures of relative expression values. 相似文献15.
We propose a novel method for phenotype identification involving a stringent noise analysis and filtering procedure followed by combining the results of several machine learning tools to produce a robust predictor. We illustrate our method on SELDI-TOF MS prostate cancer data (http://home.ccr.cancer.gov/ncifdaproteomics/ppatterns.asp). Our method identified 11 proteomic biomarkers and gave significantly improved predictions over previous analyses with these data. We were able to distinguish cancer from non-cancer cases with a sensitivity of 90.31% and a specificity of 98.81%. The proposed method can be generalized to multi-phenotype prediction and other types of data (e.g., microarray data). 相似文献
16.
Xi Tian Wen-Hao Xu Aihetaimujiang Anwaier Hong-Kai Wang Fang-Ning Wan Da-Long Cao Wen-Jie Luo Guo-Hai Shi Yuan-Yuan Qu Hai-Liang Zhang Ding-Wei Ye 《Journal of cellular and molecular medicine》2021,25(8):3898-3911
This study aims to construct a robust prognostic model for adult adrenocortical carcinoma (ACC) by large-scale multiomics analysis and real-world data. The RPPA data, gene expression profiles and clinical information of adult ACC patients were obtained from The Cancer Proteome Atlas (TCPA), Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA). Integrated prognosis-related proteins (IPRPs) model was constructed. Immunohistochemistry was used to validate the prognostic value of the IPRPs model in Fudan University Shanghai Cancer Center (FUSCC) cohort. 76 ACC cases from TCGA and 22 ACC cases from GSE10927 in NCBI’s GEO database with full data for clinical information and gene expression were utilized to validate the effectiveness of the IPRPs model. Higher FASN (P = .039), FIBRONECTIN (P < .001), TFRC (P < .001), TSC1 (P < .001) expression indicated significantly worse overall survival for adult ACC patients. Risk assessment suggested significantly a strong predictive capacity of IPRPs model for poor overall survival (P < .05). IPRPs model showed a little stronger ability for predicting prognosis than Ki-67 protein in FUSCC cohort (P = .003, HR = 3.947; P = .005, HR = 3.787). In external validation of IPRPs model using gene expression data, IPRPs model showed strong ability for predicting prognosis in TCGA cohort (P = .005, HR = 3.061) and it exhibited best ability for predicting prognosis in GSE10927 cohort (P = .0898, HR = 2.318). This research constructed IPRPs model for predicting adult ACC patients’ prognosis using proteomic data, gene expression data and real-world data and this prognostic model showed stronger predictive value than other biomarkers (Ki-67, Beta-catenin, etc) in multi-cohorts. 相似文献
17.
Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data 总被引:6,自引:0,他引:6
MOTIVATION: Microarray data are used in a range of application areas in biology, although often it contains considerable numbers of missing values. These missing values can significantly affect subsequent statistical analysis and machine learning algorithms so there is a strong motivation to estimate these values as accurately as possible before using these algorithms. While many imputation algorithms have been proposed, more robust techniques need to be developed so that further analysis of biological data can be accurately undertaken. In this paper, an innovative missing value imputation algorithm called collateral missing value estimation (CMVE) is presented which uses multiple covariance-based imputation matrices for the final prediction of missing values. The matrices are computed and optimized using least square regression and linear programming methods. RESULTS: The new CMVE algorithm has been compared with existing estimation techniques including Bayesian principal component analysis imputation (BPCA), least square impute (LSImpute) and K-nearest neighbour (KNN). All these methods were rigorously tested to estimate missing values in three separate non-time series (ovarian cancer based) and one time series (yeast sporulation) dataset. Each method was quantitatively analyzed using the normalized root mean square (NRMS) error measure, covering a wide range of randomly introduced missing value probabilities from 0.01 to 0.2. Experiments were also undertaken on the yeast dataset, which comprised 1.7% actual missing values, to test the hypothesis that CMVE performed better not only for randomly occurring but also for a real distribution of missing values. The results confirmed that CMVE consistently demonstrated superior and robust estimation capability of missing values compared with other methods for both series types of data, for the same order of computational complexity. A concise theoretical framework has also been formulated to validate the improved performance of the CMVE algorithm. AVAILABILITY: The CMVE software is available upon request from the authors. 相似文献
18.
MOTIVATION: Microarray technology emerges as a powerful tool in life science. One major application of microarray technology is to identify differentially expressed genes under various conditions. Currently, the statistical methods to analyze microarray data are generally unsatisfactory, mainly due to the lack of understanding of the distribution and error structure of microarray data. RESULTS: We develop a generalized likelihood ratio (GLR) test based on the two-component model proposed by Rocke and Durbin to identify differentially expressed genes from microarray data. Simulation studies show that the GLR test is more powerful than commonly used methods, like the fold-change method and the two-sample t-test. When applied to microarray data, the GLR test identifies more differentially expressed genes than the t-test, has a lower false discovery rate and shows more consistency over independently repeated experiments. AVAILABILITY: The approach is implemented in software called GLR, which is freely available for downloading at http://www.cc.utah.edu/~jw27c60 相似文献
19.
We developed an ELISA in high-density microarray format to detect hepatocyte growth factor (HGF) in human serum. The microassay can detect HGF at sub-pg/mL concentrations in sample volumes of 100 microL or less. The microassay is also quantitative and was used to detect elevated HGF levels in sera from recurrent breast cancer patients. The microarray format provides the potential for high-throughput quantitation of multiple biomarkers in parallel, as demonstrated with a multiplex analysis of five biomarker proteins. 相似文献
20.
Predicting the prognosis of breast cancer by integrating clinical and microarray data with Bayesian networks 总被引:1,自引:0,他引:1
Gevaert O De Smet F Timmerman D Moreau Y De Moor B 《Bioinformatics (Oxford, England)》2006,22(14):e184-e190
MOTIVATION: Clinical data, such as patient history, laboratory analysis, ultrasound parameters--which are the basis of day-to-day clinical decision support--are often underused to guide the clinical management of cancer in the presence of microarray data. We propose a strategy based on Bayesian networks to treat clinical and microarray data on an equal footing. The main advantage of this probabilistic model is that it allows to integrate these data sources in several ways and that it allows to investigate and understand the model structure and parameters. Furthermore using the concept of a Markov Blanket we can identify all the variables that shield off the class variable from the influence of the remaining network. Therefore Bayesian networks automatically perform feature selection by identifying the (in)dependency relationships with the class variable. RESULTS: We evaluated three methods for integrating clinical and microarray data: decision integration, partial integration and full integration and used them to classify publicly available data on breast cancer patients into a poor and a good prognosis group. The partial integration method is most promising and has an independent test set area under the ROC curve of 0.845. After choosing an operating point the classification performance is better than frequently used indices. 相似文献