首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Query-driven module discovery in microarray data   总被引:1,自引:0,他引:1  
MOTIVATION: Existing (bi)clustering methods for microarray data analysis often do not answer the specific questions of interest to a biologist. Such specific questions could be derived from other information sources, including expert prior knowledge. More specifically, given a set of seed genes which are believed to have a common function, we would like to recruit genes with similar expression profiles as the seed genes in a significant subset of experimental conditions. RESULTS: We introduce QDB, a novel Bayesian query-driven biclustering framework in which the prior distributions allow introducing knowledge from a set of seed genes (query) to guide the pattern search. In two well-known yeast compendia, we grow highly functionally enriched biclusters from small sets of seed genes using a resolution sweep approach. In addition, relevant conditions are identified and modularity of the biclusters is demonstrated, including the discovery of overlapping modules. Finally, our method deals with missing values naturally, performs well on artificial data from a recent biclustering benchmark study and has a number of conceptual advantages when compared to existing approaches for focused module search.  相似文献   

2.
蛋白质芯片是一种新型的高通量蛋白质组学技术,由于其具有高通量、微型化、可平行快速分析等优点,因此在肿瘤血清标识物发现研究方面具有广泛的应用前景。本文综述了蛋白质芯片的基本原理、类型及其在肿瘤血清标记物发现研究中的应用,将蛋白质芯片技术与传统的肿瘤标志物发现技术进行了比较,并对蛋白质芯片技术在肿瘤标识物发现研究上的进一步应用进行了展望。  相似文献   

3.
MOTIVATION: There is a pressing need for improved proteomic screening methods allowing for earlier diagnosis of disease, systematic monitoring of physiological responses and the uncovering of fundamental mechanisms of drug action. The combined platform of LC-MS (Liquid-Chromatography-Mass-Spectrometry) has shown promise in moving toward a solution in these areas. In this paper we present a technique for discovering differences in protein signal between two classes of samples of LC-MS serum proteomic data without use of tandem mass spectrometry, gels or labeling. This method works on data from a lower-precision MS instrument, the type routinely used by and available to the community at large today. We test our technique on a controlled (spike-in) but realistic (serum biomarker discovery) experiment which is therefore verifiable. We also develop a new method for helping to assess the difficulty of a given spike-in problem. Lastly, we show that the problem of class prediction, sometimes mistaken as a solution to biomarker discovery, is actually a much simpler problem. RESULTS: Using precision-recall curves with experimentally extracted ground truth, we show that (1) our technique has good performance using seven replicates from each class, (2) performance degrades with decreasing number of replicates, (3) the signal that we are teasing out is not trivially available (i.e. the differences are not so large that the task is easy). Lastly, we easily obtain perfect classification results for data in which the problem of extracting differences does not produce absolutely perfect results. This emphasizes the different nature of the two problems and also their relative difficulties. AVAILABILITY: Our data are publicly available as a benchmark for further studies of this nature at http://www.cs.toronto.edu/~jenn/LCMS  相似文献   

4.
5.
High-quality biomarkers for disease progression, drug efficacy and toxicity liability are essential for improving the efficiency of drug discovery and development. The identification of drug-activity biomarkers is often limited by access to and the quantity of target tissue. Peripheral blood has increasingly become an attractive alternative to tissue samples from organs as source for biomarker discovery, especially during early clinical studies. However, given the heterogeneous blood cell population, possible artifacts from ex vivo activations, and technical difficulties associated with overall performance of the assay, it is challenging to profile peripheral blood cells directly for biomarker discovery. In the present study, Applied BioSystems' blood collection system was evaluated for its ability to isolate RNA suitable for use on the Affymetrix microarray platform. Blood was collected in a TEMPUS tube and RNA extracted using an ABI-6100 semi-automated workstation. Using human and rat whole blood samples, it was demonstrated that the RNA isolated using this approach was stable, of high quality and was suitable for Affymetrix microarray applications. The microarray data were statistically analysed and compared with other blood protocols. Minimal haemoglobin interference with RNA labelling efficiency and chip hybridization was found using the TEMPUS tube and extraction method. The RNA quality, stability and ease of handling requirement make the TEMPUS tube protocol an attractive approach for expression profiling of whole blood to support target and biomarker discovery.  相似文献   

6.
High-quality biomarkers for disease progression, drug efficacy and toxicity liability are essential for improving the efficiency of drug discovery and development. The identification of drug-activity biomarkers is often limited by access to and the quantity of target tissue. Peripheral blood has increasingly become an attractive alternative to tissue samples from organs as source for biomarker discovery, especially during early clinical studies. However, given the heterogeneous blood cell population, possible artifacts from ex vivo activations, and technical difficulties associated with overall performance of the assay, it is challenging to profile peripheral blood cells directly for biomarker discovery. In the present study, Applied BioSystems’ blood collection system was evaluated for its ability to isolate RNA suitable for use on the Affymetrix microarray platform. Blood was collected in a TEMPUS tube and RNA extracted using an ABI-6100 semi-automated workstation. Using human and rat whole blood samples, it was demonstrated that the RNA isolated using this approach was stable, of high quality and was suitable for Affymetrix microarray applications. The microarray data were statistically analysed and compared with other blood protocols. Minimal haemoglobin interference with RNA labelling efficiency and chip hybridization was found using the TEMPUS tube and extraction method. The RNA quality, stability and ease of handling requirement make the TEMPUS tube protocol an attractive approach for expression profiling of whole blood to support target and biomarker discovery.  相似文献   

7.
MOTIVATION: In clinical practice, pathological phenotypes are often labelled with ordinal scales rather than binary, e.g. the Gleason grading system for tumour cell differentiation. However, in the literature of microarray analysis, these ordinal labels have been rarely treated in a principled way. This paper describes a gene selection algorithm based on Gaussian processes to discover consistent gene expression patterns associated with ordinal clinical phenotypes. The technique of automatic relevance determination is applied to represent the significance level of the genes in a Bayesian inference framework. RESULTS: The usefulness of the proposed algorithm for ordinal labels is demonstrated by the gene expression signature associated with the Gleason score for prostate cancer data. Our results demonstrate how multi-gene markers that may be initially developed with a diagnostic or prognostic application in mind are also useful as an investigative tool to reveal associations between specific molecular and cellular events and features of tumour physiology. Our algorithm can also be applied to microarray data with binary labels with results comparable to other methods in the literature.  相似文献   

8.

Background  

Chromosomal copy number changes (aneuploidies) play a key role in cancer progression and molecular evolution. These copy number changes can be studied using microarray-based comparative genomic hybridization (array CGH) or gene expression microarrays. However, accurate identification of amplified or deleted regions requires a combination of visual and computational analysis of these microarray data.  相似文献   

9.
Microarrays are an effective tool for monitoring genome-wide gene expression levels. In current microarray analyses, the majority of genes on arrays are frequently eliminated for further analysis because the changes in their expression levels (ratios) are considered to be not significant. This strategy risks failure to discover whole sets of genes related to a quantitative trait of interest, which is generally controlled by several loci that make various contributions. Here, we describe a high-throughput gene discovery method based on correspondence analysis with a new index for expression ratios [arctan (1/ratio)] and three artificial marker genes. This method allows us to quickly analyze the whole microarray dataset and discover up-/down-regulated genes related to a trait of interest. We employed an example dataset to show the theoretical advantage of this method. We then used the method to identify 88 cancer-related genes from a published microarray data from patients with breast cancer. This method also allows us to predict the phenotype of a given sample from the gene expression profile. This method can be easily performed and the result is also visible in 3D viewing software that we have developed.  相似文献   

10.
The challenges encountered by proteomic researchers seeking diagnostic, prognostic and mechanistic markers were the subject of the 1-day meeting, Proteomics: Advances in Biomarker Discovery hosted by EuroSciCon. The speakers had a broad range of clinical and basic science interests, and presented data using a number of proteomic platforms to search for discriminant biomarkers of disease in easily accessible bodily fluids including serum and urine. Several potential pitfalls for proteomic researchers were mentioned and the potential of collaborative networks between research institutions to increase the size and power of clinical studies was discussed. Overall, the meeting highlighted the exciting opportunities that proteomic techniques offer for discovering not only diagnostic but also prognostic and mechanistic markers of a number of clinically important diseases.  相似文献   

11.
Gene expression profiling has gradually become a routine procedure for disease diagnosis and classification. In the past decade, many computational methods have been proposed, resulting in great improvements on various levels, including feature selection and algorithms for classification and clustering. In this study, we present iPcc, a novel method from the feature extraction perspective to further propel gene expression profiling technologies from bench to bedside. We define ‘correlation feature space’ for samples based on the gene expression profiles by iterative employment of Pearson’s correlation coefficient. Numerical experiments on both simulated and real gene expression data sets demonstrate that iPcc can greatly highlight the latent patterns underlying noisy gene expression data and thus greatly improve the robustness and accuracy of the algorithms currently available for disease diagnosis and classification based on gene expression profiles.  相似文献   

12.
Microarray experiments generate data sets with information on the expression levels of thousands of genes in a set of biological samples. Unfortunately, such experiments often produce multiple missing expression values, normally due to various experimental problems. As many algorithms for gene expression analysis require a complete data matrix as input, the missing values have to be estimated in order to analyze the available data. Alternatively, genes and arrays can be removed until no missing values remain. However, for genes or arrays with only a small number of missing values, it is desirable to impute those values. For the subsequent analysis to be as informative as possible, it is essential that the estimates for the missing gene expression values are accurate. A small amount of badly estimated missing values in the data might be enough for clustering methods, such as hierachical clustering or K-means clustering, to produce misleading results. Thus, accurate methods for missing value estimation are needed. We present novel methods for estimation of missing values in microarray data sets that are based on the least squares principle, and that utilize correlations between both genes and arrays. For this set of methods, we use the common reference name LSimpute. We compare the estimation accuracy of our methods with the widely used KNNimpute on three complete data matrices from public data sets by randomly knocking out data (labeling as missing). From these tests, we conclude that our LSimpute methods produce estimates that consistently are more accurate than those obtained using KNNimpute. Additionally, we examine a more classic approach to missing value estimation based on expectation maximization (EM). We refer to our EM implementations as EMimpute, and the estimate errors using the EMimpute methods are compared with those our novel methods produce. The results indicate that on average, the estimates from our best performing LSimpute method are at least as accurate as those from the best EMimpute algorithm.  相似文献   

13.
Minimum redundancy feature selection from microarray gene expression data   总被引:7,自引:0,他引:7  
How to selecting a small subset out of the thousands of genes in microarray data is important for accurate classification of phenotypes. Widely used methods typically rank genes according to their differential expressions among phenotypes and pick the top-ranked genes. We observe that feature sets so obtained have certain redundancy and study methods to minimize it. We propose a minimum redundancy - maximum relevance (MRMR) feature selection framework. Genes selected via MRMR provide a more balanced coverage of the space and capture broader characteristics of phenotypes. They lead to significantly improved class predictions in extensive experiments on 6 gene expression data sets: NCI, Lymphoma, Lung, Child Leukemia, Leukemia, and Colon. Improvements are observed consistently among 4 classification methods: Naive Bayes, Linear discriminant analysis, Logistic regression, and Support vector machines. SUPPLIMENTARY: The top 60 MRMR genes for each of the datasets are listed in http://crd.lbl.gov/~cding/MRMR/. More information related to MRMR methods can be found at http://www.hpeng.net/.  相似文献   

14.
We describe methods and software tools for doing data analysis based on Affymetrix microarray data, emphasizing often neglected issues. In our experience with neuroscience studies, experimental design and quality assessment are vital. We also describe in detail the pre-processing methods we have found useful for Affymetrix data. Finally, we summarize the statistical literature and describe some pitfalls in the post-processing analysis.  相似文献   

15.
16.
17.
18.
Tsai CA  Hsueh HM  Chen JJ 《Biometrics》2003,59(4):1071-1081
Testing for significance with gene expression data from DNA microarray experiments involves simultaneous comparisons of hundreds or thousands of genes. If R denotes the number of rejections (declared significant genes) and V denotes the number of false rejections, then V/R, if R > 0, is the proportion of false rejected hypotheses. This paper proposes a model for the distribution of the number of rejections and the conditional distribution of V given R, V / R. Under the independence assumption, the distribution of R is a convolution of two binomials and the distribution of V / R has a noncentral hypergeometric distribution. Under an equicorrelated model, the distributions are more complex and are also derived. Five false discovery rate probability error measures are considered: FDR = E(V/R), pFDR = E(V/R / R > 0) (positive FDR), cFDR = E(V/R / R = r) (conditional FDR), mFDR = E(V)/E(R) (marginal FDR), and eFDR = E(V)/r (empirical FDR). The pFDR, cFDR, and mFDR are shown to be equivalent under the Bayesian framework, in which the number of true null hypotheses is modeled as a random variable. We present a parametric and a bootstrap procedure to estimate the FDRs. Monte Carlo simulations were conducted to evaluate the performance of these two methods. The bootstrap procedure appears to perform reasonably well, even when the alternative hypotheses are correlated (rho = .25). An example from a toxicogenomic microarray experiment is presented for illustration.  相似文献   

19.
A high-throughput software pipeline for analyzing high-performance mass spectral data sets has been developed to facilitate rapid and accurate biomarker determination. The software exploits the mass precision and resolution of high-performance instrumentation, bypasses peak-finding steps, and instead uses discrete m/z data points to identify putative biomarkers. The technique is insensitive to peak shape, and works on overlapping and non-Gaussian peaks which can confound peak-finding algorithms. Methods are presented to assess data set quality and the suitability of groups of m/z values that map to peaks as potential biomarkers. The algorithm is demonstrated with serum mass spectra from patients with and without ovarian cancer. Biomarker candidates are identified and ranked by their ability to discriminate between cancer and noncancer conditions. Their discriminating power is tested by classifying unknowns using a simple distance calculation, and a sensitivity of 95.6% and a specificity of 97.1% are obtained. In contrast, the sensitivity of the ovarian cancer blood marker CA125 is approximately 50% for stage I/II and approximately 80% for stage III/IV cancers. While the generalizability of these markers is currently unknown, we have demonstrated the ability of our analytical package to extract biomarker candidates from high-performance mass spectral data.  相似文献   

20.
Expression levels in oligonucleotide microarray experiments depend on a potentially large number of factors, for example, treatment conditions, different probes, different arrays, and so on. To dissect the effects of these factors on expression levels, fixed-effects ANOVA methods have previously been proposed. Because we are not necessarily interested in estimating the specific effects of different probes and arrays, we propose to treat these as random effects. Then we only need to estimate their means and variances but not the effect of each of their levels; that is, we can work with a much reduced number of parameters and, consequently, higher precision for estimating expression levels. Thus, we developed a mixed-effects ANOVA model with some random and some fixed effects. It automatically accounts for local normalization between different arrays and for background correction. The method was applied to each of the 6,584 genes investigated in a microarray experiment on two mouse cell lines, PA6/S and PA6/8, where PA6/S enhances proliferation of Pre B cells in vitro but PA6/8 does not. To detect a set of differentially expressed genes (multiple testing problem), we applied the method of controlling the false discovery rate (FDR), which successfully identified 207 genes with significantly different expression levels.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号