共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
Bontempi G 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2007,4(2):293-300
Because of high dimensionality, machine learning algorithms typically rely on feature selection techniques in order to perform effective classification in microarray gene expression data sets. However, the large number of features compared to the number of samples makes the task of feature selection computationally hard and prone to errors. This paper interprets feature selection as a task of stochastic optimization, where the goal is to select among an exponential number of alternative gene subsets the one expected to return the highest generalization in classification. Blocking is an experimental design strategy which produces similar experimental conditions to compare alternative stochastic configurations in order to be confident that observed differences in accuracy are due to actual differences rather than to fluctuations and noise effects. We propose an original blocking strategy for improving feature selection which aggregates in a paired way the validation outcomes of several learning algorithms to assess a gene subset and compare it to others. This is a novelty with respect to conventional wrappers, which commonly adopt a sole learning algorithm to evaluate the relevance of a given set of variables. The rationale of the approach is that, by increasing the amount of experimental conditions under which we validate a feature subset, we can lessen the problems related to the scarcity of samples and consequently come up with a better selection. The paper shows that the blocking strategy significantly improves the performance of a conventional forward selection for a set of 16 publicly available cancer expression data sets. The experiments involve six different classifiers and show that improvements take place independent of the classification algorithm used after the selection step. Two further validations based on available biological annotation support the claim that blocking strategies in feature selection may improve the accuracy and the quality of the solution. The first validation is based on retrieving PubMEd abstracts associated to the selected genes and matching them to regular expressions describing the biological phenomenon underlying the expression data sets. The biological validation that follows is based on the use of the Bioconductor package GoStats in order to perform Gene Ontology statistical analysis. 相似文献
3.
4.
MOTIVATION: Large scale gene expression data are often analysed by clustering genes based on gene expression data alone, though a priori knowledge in the form of biological networks is available. The use of this additional information promises to improve exploratory analysis considerably. RESULTS: We propose constructing a distance function which combines information from expression data and biological networks. Based on this function, we compute a joint clustering of genes and vertices of the network. This general approach is elaborated for metabolic networks. We define a graph distance function on such networks and combine it with a correlation-based distance function for gene expression measurements. A hierarchical clustering and an associated statistical measure is computed to arrive at a reasonable number of clusters. Our method is validated using expression data of the yeast diauxic shift. The resulting clusters are easily interpretable in terms of the biochemical network and the gene expression data and suggest that our method is able to automatically identify processes that are relevant under the measured conditions. 相似文献
5.
《Expert review of proteomics》2013,10(6):915-924
This article reviews the current state of systems biology approaches, including the experimental tools used to generate ‘omic’ data and computational frameworks to interpret this data. Through illustrative examples, systems biology approaches to understand gene expression and gene expression regulation are discussed. Some of the challenges facing this field and the future opportunities in the systems biology era are highlighted. 相似文献
6.
This article reviews the current state of systems biology approaches, including the experimental tools used to generate 'omic' data and computational frameworks to interpret this data. Through illustrative examples, systems biology approaches to understand gene expression and gene expression regulation are discussed. Some of the challenges facing this field and the future opportunities in the systems biology era are highlighted. 相似文献
7.
In DNA microarray studies, gene-set analysis (GSA) has become the focus of gene expression data analysis. GSA utilizes the gene expression profiles of functionally related gene sets in Gene Ontology (GO) categories or priori-defined biological classes to assess the significance of gene sets associated with clinical outcomes or phenotypes. Many statistical approaches have been proposed to determine whether such functionally related gene sets express differentially (enrichment and/or deletion) in variations of phenotypes. However, little attention has been given to the discriminatory power of gene sets and classification of patients. 相似文献
8.
9.
J Kohroki M Tsuchiya S Fujita T Nakanishi N Itoh K Tanaka 《Biochemical and biophysical research communications》1999,262(2):365-367
We propose a novel alternative approach, an advanced method for recently developed strategies, for identifying differentially expressed genes. Firstly, double-stranded cDNAs were digested using Sau3AI and the 3'-end restriction fragments of the cDNA were ligated to a double-stranded adapter. Next, the restriction fragments were directly amplified using several combinations of adapter-specific primers and FITC-labeled oligo dT primers. The selected cDNA fragments were displayed on a polyacrylamide gel. Neither nested PCR nor purification of 3'-end fragments are necessary. We examined the validity of this approach by evaluating gene expression changes during granulocytic differentiation of HL-60 cells. This method can theoretically detect almost all gene expression changes more rapidly and through simpler manipulations than by any other approach. 相似文献
10.
Background
Previous differential coexpression analyses focused on identification of differentially coexpressed gene pairs, revealing many insightful biological hypotheses. However, this method could not detect coexpression relationships between pairs of gene sets. Considering the success of many set-wise analysis methods for microarray data, a coexpression analysis based on gene sets may elucidate underlying biological processes provoked by the conditional changes. Here, we propose a differentially coexpressed gene sets (dCoxS) algorithm that identifies the differentially coexpressed gene set pairs between conditions. 相似文献11.
Microarray has become a popular biotechnology in biological and medical research. However, systematic and stochastic variabilities in microarray data are expected and unavoidable, resulting in the problem that the raw measurements have inherent “noise” within microarray experiments. Currently, logarithmic ratios are usually analyzed by various clustering methods directly, which may introduce bias interpretation in identifying groups of genes or samples. In this paper, a statistical method based on mixed model approaches was proposed for microarray data cluster analysis. The underlying rationale of this method is to partition the observed total gene expression level into various variations caused by different factors using an ANOVA model, and to predict the differential effects of GV (gene by variety) interaction using the adjusted unbiased prediction (AUP) method. The predicted GV interaction effects can then be used as the inputs of cluster analysis. We illustrated the application of our method with a gene expression dataset and elucidated the utility of our approach using an external validation. 相似文献
12.
Cline MS Smoot M Cerami E Kuchinsky A Landys N Workman C Christmas R Avila-Campilo I Creech M Gross B Hanspers K Isserlin R Kelley R Killcoyne S Lotia S Maere S Morris J Ono K Pavlovic V Pico AR Vailaya A Wang PL Adler A Conklin BR Hood L Kuiper M Sander C Schmulevich I Schwikowski B Warner GJ Ideker T Bader GD 《Nature protocols》2007,2(10):2366-2382
Cytoscape is a free software package for visualizing, modeling and analyzing molecular and genetic interaction networks. This protocol explains how to use Cytoscape to analyze the results of mRNA expression profiling, and other functional genomics and proteomics experiments, in the context of an interaction network obtained for genes of interest. Five major steps are described: (i) obtaining a gene or protein network, (ii) displaying the network using layout algorithms, (iii) integrating with gene expression and other functional attributes, (iv) identifying putative complexes and functional modules and (v) identifying enriched Gene Ontology annotations in the network. These steps provide a broad sample of the types of analyses performed by Cytoscape. 相似文献
13.
Dabao Zhang Martin T Wells Christine D Smart William E Fry 《Journal of computational biology》2005,12(4):391-406
Commonly accepted intensity-dependent normalization in spotted microarray studies takes account of measurement errors in the differential expression ratio but ignores measurement errors in the total intensity, although the definitions imply the same measurement error components are involved in both statistics. Furthermore, identification of differentially expressed genes is usually considered separately following normalization, which is statistically problematic. By incorporating the measurement errors in both total intensities and differential expression ratios, we propose a measurement-error model for intensity-dependent normalization and identification of differentially expressed genes. This model is also flexible enough to incorporate intra-array and inter-array effects. A Bayesian framework is proposed for the analysis of the proposed measurement-error model to avoid the potential risk of using the common two-step procedure. We also propose a Bayesian identification of differentially expressed genes to control the false discovery rate instead of the ad hoc thresholding of the posterior odds ratio. The simulation study and an application to real microarray data demonstrate promising results. 相似文献
14.
We investigate a model of optimal regulation, intended to describe large-scale differential gene expression. Relations between the optimal expression patterns and the function of genes are deduced from an optimality principle: the regulators have to maximise a fitness function which they influence directly via a cost term, and indirectly via their control on important cell variables, such as metabolic fluxes. According to the model, the optimal linear response to small perturbations reflects the regulators' functions, namely their linear influences on the cell variables. The optimal behaviour can be realised by a linear feedback mechanism. Known or assumed properties of response coefficients lead to predictions about regulation patterns. A symmetry relation predicted for deletion experiments is verified with gene expression data. Where the optimality assumption is valid, our results justify the use of expression data for functional annotation and for pathway reconstruction and suggest the use of linear factor models for the analysis of gene expression data. 相似文献
15.
An activity-based isotope-coded affinity tagging (AB-ICAT) strategy for proteome-wide quantitation of active retaining endoglycosidases has been developed. Two pairs of biotinylated, cleavable, AB-ICAT reagents (light H(8) and heavy D(8)) have been synthesized, one incorporating a recognition element for cellulases and the other incorporating a recognition element for xylanases. The accuracy of the AB-ICAT methodology in quantifying relative glycosidase expression/activity levels in any two samples of interest has been verified using several pairs of model enzyme mixtures where one or more enzyme amounts and/or activities were varied. The methodology has been applied to the biomass-degrading secretomes of the soil bacterium, Cellulomonas fimi, under induction by different polyglycan growth substrates to obtain a quantitative profile of the relative expression/activity levels of individual active retaining endoglycanases per C. fimi cell. Such biological profiles are valuable in understanding the strategies employed by biomass-degrading organisms in exploiting environments containing different biomass polysaccharides. This is the first report on the application of an activity-based ICAT method to a biological system. 相似文献
16.
Franck Rapaport Raya Khanin Yupu Liang Mono Pirun Azra Krek Paul Zumbo Christopher E Mason Nicholas D Socci Doron Betel 《Genome biology》2013,14(9):R95
A large number of computational methods have been developed for analyzing differential gene expression in RNA-seq data. We describe a comprehensive evaluation of common methods using the SEQC benchmark dataset and ENCODE data. We consider a number of key features, including normalization, accuracy of differential expression detection and differential expression analysis when one condition has no detectable expression. We find significant differences among the methods, but note that array-based methods adapted to RNA-seq data perform comparably to methods designed for RNA-seq. Our results demonstrate that increasing the number of replicate samples significantly improves detection power over increased sequencing depth. 相似文献
17.
18.
19.
20.
It is well-established that non-random patterns in coding DNA sequence (CDS) features can be partially explained by translational selection. Recent extensions of microarray and proteomic expression data have stimulated many genome-wide investigations of the relationships between gene expression and various CDS features. However, only modest correlations have been found. Here we introduced the one-way ANOVA, a more powerful extension of previous grouping methods, to re-examine these relationships at the whole genome scale for Saccharomyces cerevisiae, where genome-wide protein abundance has been recently quantified. Our results clarify that coding sequence features are inappropriate for use as genome-wide estimators for protein expression levels. This analysis also demonstrates that one-way ANOVA is a powerful and simple method to explore the influence of gene expression on CDS features. 相似文献