首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 16 毫秒
1.
2.
Because of high dimensionality, machine learning algorithms typically rely on feature selection techniques in order to perform effective classification in microarray gene expression data sets. However, the large number of features compared to the number of samples makes the task of feature selection computationally hard and prone to errors. This paper interprets feature selection as a task of stochastic optimization, where the goal is to select among an exponential number of alternative gene subsets the one expected to return the highest generalization in classification. Blocking is an experimental design strategy which produces similar experimental conditions to compare alternative stochastic configurations in order to be confident that observed differences in accuracy are due to actual differences rather than to fluctuations and noise effects. We propose an original blocking strategy for improving feature selection which aggregates in a paired way the validation outcomes of several learning algorithms to assess a gene subset and compare it to others. This is a novelty with respect to conventional wrappers, which commonly adopt a sole learning algorithm to evaluate the relevance of a given set of variables. The rationale of the approach is that, by increasing the amount of experimental conditions under which we validate a feature subset, we can lessen the problems related to the scarcity of samples and consequently come up with a better selection. The paper shows that the blocking strategy significantly improves the performance of a conventional forward selection for a set of 16 publicly available cancer expression data sets. The experiments involve six different classifiers and show that improvements take place independent of the classification algorithm used after the selection step. Two further validations based on available biological annotation support the claim that blocking strategies in feature selection may improve the accuracy and the quality of the solution. The first validation is based on retrieving PubMEd abstracts associated to the selected genes and matching them to regular expressions describing the biological phenomenon underlying the expression data sets. The biological validation that follows is based on the use of the Bioconductor package GoStats in order to perform Gene Ontology statistical analysis.  相似文献   

3.
4.
This article reviews the current state of systems biology approaches, including the experimental tools used to generate 'omic' data and computational frameworks to interpret this data. Through illustrative examples, systems biology approaches to understand gene expression and gene expression regulation are discussed. Some of the challenges facing this field and the future opportunities in the systems biology era are highlighted.  相似文献   

5.
MOTIVATION: Large scale gene expression data are often analysed by clustering genes based on gene expression data alone, though a priori knowledge in the form of biological networks is available. The use of this additional information promises to improve exploratory analysis considerably. RESULTS: We propose constructing a distance function which combines information from expression data and biological networks. Based on this function, we compute a joint clustering of genes and vertices of the network. This general approach is elaborated for metabolic networks. We define a graph distance function on such networks and combine it with a correlation-based distance function for gene expression measurements. A hierarchical clustering and an associated statistical measure is computed to arrive at a reasonable number of clusters. Our method is validated using expression data of the yeast diauxic shift. The resulting clusters are easily interpretable in terms of the biochemical network and the gene expression data and suggest that our method is able to automatically identify processes that are relevant under the measured conditions.  相似文献   

6.
This article reviews the current state of systems biology approaches, including the experimental tools used to generate ‘omic’ data and computational frameworks to interpret this data. Through illustrative examples, systems biology approaches to understand gene expression and gene expression regulation are discussed. Some of the challenges facing this field and the future opportunities in the systems biology era are highlighted.  相似文献   

7.
目的:分析高血压房颤(HAF)与孤立性房颤(LAF)的临床特征及其对预后的影响。方法:高血压房颤患者106例,孤立性房颤患者102例,分别对其性别,年龄,家族史,并发症,持续性房颤发生情况,超声心动图等临床特征进行分析。结果:二者相比,LAF组发病年龄轻,左房增大者少,持续性房颤者少;两组中左房增大均与持续性房颤者,并发症呈正相关。高血压程度与持续性房颤者,并发症呈正相关。结论左房增大是房颤发生的主要机制,左房是否增大是判断愈后的一个重要指标。  相似文献   

8.
9.

Background  

Traditional methods of analysing gene expression data often include a statistical test to find differentially expressed genes, or use of a clustering algorithm to find groups of genes that behave similarly across a dataset. However, these methods may miss groups of genes which form differential co-expression patterns under different subsets of experimental conditions. Here we describe coXpress, an R package that allows researchers to identify groups of genes that are differentially co-expressed.  相似文献   

10.
An objective of many functional genomics studies is to estimate treatment-induced changes in gene expression. cDNA arrays interrogate each tissue sample for the levels of mRNA for hundreds to tens of thousands of genes, and the use of this technology leads to a multitude of treatment contrasts. By-gene hypotheses tests evaluate the evidence supporting no effect, but selecting a significance level requires dealing with the multitude of comparisons. The p-values from these tests order the genes such that a p-value cutoff divides the genes into two sets. Ideally one set would contain the affected genes and the other would contain the unaffected genes. However, the set of genes selected as affected will have false positives, i.e., genes that are not affected by treatment. Likewise, the other set of genes, selected as unaffected, will contain false negatives, i.e., genes that are affected. A plot of the observed p-values (1 - p) versus their expectation under a uniform [0, 1] distribution allows one to estimate the number of true null hypotheses. With this estimate, the false positive rates and false negative rates associated with any p-value cutoff can be estimated. When computed for a range of cutoffs, these rates summarize the ability of the study to resolve effects. In our work, we are more interested in selecting most of the affected genes rather than protecting against a few false positives. An optimum cutoff, i.e., the best set given the data, depends upon the relative cost of falsely classifying a gene as affected versus the cost of falsely classifying a gene as unaffected. We select the cutoff by a decision-theoretic method analogous to methods developed for receiver operating characteristic curves. In addition, we estimate the false discovery rate and the false nondiscovery rate associated with any cutoff value. Two functional genomics studies that were designed to assess a treatment effect are used to illustrate how the methods allowed the investigators to determine a cutoff to suit their research goals.  相似文献   

11.
In DNA microarray studies, gene-set analysis (GSA) has become the focus of gene expression data analysis. GSA utilizes the gene expression profiles of functionally related gene sets in Gene Ontology (GO) categories or priori-defined biological classes to assess the significance of gene sets associated with clinical outcomes or phenotypes. Many statistical approaches have been proposed to determine whether such functionally related gene sets express differentially (enrichment and/or deletion) in variations of phenotypes. However, little attention has been given to the discriminatory power of gene sets and classification of patients.  相似文献   

12.
We propose a novel alternative approach, an advanced method for recently developed strategies, for identifying differentially expressed genes. Firstly, double-stranded cDNAs were digested using Sau3AI and the 3'-end restriction fragments of the cDNA were ligated to a double-stranded adapter. Next, the restriction fragments were directly amplified using several combinations of adapter-specific primers and FITC-labeled oligo dT primers. The selected cDNA fragments were displayed on a polyacrylamide gel. Neither nested PCR nor purification of 3'-end fragments are necessary. We examined the validity of this approach by evaluating gene expression changes during granulocytic differentiation of HL-60 cells. This method can theoretically detect almost all gene expression changes more rapidly and through simpler manipulations than by any other approach.  相似文献   

13.
Atrial fibrillation (AF) is one of the most frequent cardiac arrhythmias, and atrial remodeling is related to the progression of AF. Although several therapeutic approaches have been presented in recent years, the continuously increasing mortality rate suggests that more advanced strategies for treatment are urgently needed. Exosomes regulate pathological processes through intercellular communication mediated by microribonucleic acid (miRNA) in various cardiovascular diseases (CVDs). Exosomal miRNAs associated with signaling pathways have added more complexity to an already complex direct cell-to-cell interaction. Exosome delivery of miRNAs is involved in cardiac regeneration and cardiac protection. Recent studies have found that exosomes play a critical role in the diagnosis and treatment of cardiac fibrosis. By improving exosome stability and modifying surface epitopes, specific pharmaceutical agents can be supplied to improve tropism and targeting to cells and tissues in vivo. Exosomes harboring miRNAs may have clinical utility in cell-free therapeutic approaches and may serve as prognostic and diagnostic biomarkers for AF. Currently, limitations challenge pharmaceutic design, therapeutic utility and in vivo targeted delivery to patients. The aim of this article is to review the developmental features of AF associated with exosomal miRNAs and relate them to underlying mechanisms.  相似文献   

14.

Background  

Previous differential coexpression analyses focused on identification of differentially coexpressed gene pairs, revealing many insightful biological hypotheses. However, this method could not detect coexpression relationships between pairs of gene sets. Considering the success of many set-wise analysis methods for microarray data, a coexpression analysis based on gene sets may elucidate underlying biological processes provoked by the conditional changes. Here, we propose a differentially coexpressed gene sets (dCoxS) algorithm that identifies the differentially coexpressed gene set pairs between conditions.  相似文献   

15.
Microarray has become a popular biotechnology in biological and medical research. However, systematic and stochastic variabilities in microarray data are expected and unavoidable, resulting in the problem that the raw measurements have inherent “noise” within microarray experiments. Currently, logarithmic ratios are usually analyzed by various clustering methods directly, which may introduce bias interpretation in identifying groups of genes or samples. In this paper, a statistical method based on mixed model approaches was proposed for microarray data cluster analysis. The underlying rationale of this method is to partition the observed total gene expression level into various variations caused by different factors using an ANOVA model, and to predict the differential effects of GV (gene by variety) interaction using the adjusted unbiased prediction (AUP) method. The predicted GV interaction effects can then be used as the inputs of cluster analysis. We illustrated the application of our method with a gene expression dataset and elucidated the utility of our approach using an external validation.  相似文献   

16.
Integration of biological networks and gene expression data using Cytoscape   总被引:1,自引:0,他引:1  
Cytoscape is a free software package for visualizing, modeling and analyzing molecular and genetic interaction networks. This protocol explains how to use Cytoscape to analyze the results of mRNA expression profiling, and other functional genomics and proteomics experiments, in the context of an interaction network obtained for genes of interest. Five major steps are described: (i) obtaining a gene or protein network, (ii) displaying the network using layout algorithms, (iii) integrating with gene expression and other functional attributes, (iv) identifying putative complexes and functional modules and (v) identifying enriched Gene Ontology annotations in the network. These steps provide a broad sample of the types of analyses performed by Cytoscape.  相似文献   

17.
We investigate a model of optimal regulation, intended to describe large-scale differential gene expression. Relations between the optimal expression patterns and the function of genes are deduced from an optimality principle: the regulators have to maximise a fitness function which they influence directly via a cost term, and indirectly via their control on important cell variables, such as metabolic fluxes. According to the model, the optimal linear response to small perturbations reflects the regulators' functions, namely their linear influences on the cell variables. The optimal behaviour can be realised by a linear feedback mechanism. Known or assumed properties of response coefficients lead to predictions about regulation patterns. A symmetry relation predicted for deletion experiments is verified with gene expression data. Where the optimality assumption is valid, our results justify the use of expression data for functional annotation and for pathway reconstruction and suggest the use of linear factor models for the analysis of gene expression data.  相似文献   

18.
Commonly accepted intensity-dependent normalization in spotted microarray studies takes account of measurement errors in the differential expression ratio but ignores measurement errors in the total intensity, although the definitions imply the same measurement error components are involved in both statistics. Furthermore, identification of differentially expressed genes is usually considered separately following normalization, which is statistically problematic. By incorporating the measurement errors in both total intensities and differential expression ratios, we propose a measurement-error model for intensity-dependent normalization and identification of differentially expressed genes. This model is also flexible enough to incorporate intra-array and inter-array effects. A Bayesian framework is proposed for the analysis of the proposed measurement-error model to avoid the potential risk of using the common two-step procedure. We also propose a Bayesian identification of differentially expressed genes to control the false discovery rate instead of the ad hoc thresholding of the posterior odds ratio. The simulation study and an application to real microarray data demonstrate promising results.  相似文献   

19.
20.
An activity-based isotope-coded affinity tagging (AB-ICAT) strategy for proteome-wide quantitation of active retaining endoglycosidases has been developed. Two pairs of biotinylated, cleavable, AB-ICAT reagents (light H(8) and heavy D(8)) have been synthesized, one incorporating a recognition element for cellulases and the other incorporating a recognition element for xylanases. The accuracy of the AB-ICAT methodology in quantifying relative glycosidase expression/activity levels in any two samples of interest has been verified using several pairs of model enzyme mixtures where one or more enzyme amounts and/or activities were varied. The methodology has been applied to the biomass-degrading secretomes of the soil bacterium, Cellulomonas fimi, under induction by different polyglycan growth substrates to obtain a quantitative profile of the relative expression/activity levels of individual active retaining endoglycanases per C. fimi cell. Such biological profiles are valuable in understanding the strategies employed by biomass-degrading organisms in exploiting environments containing different biomass polysaccharides. This is the first report on the application of an activity-based ICAT method to a biological system.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号