首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
One of the challenges in the analysis of gene expression data is placing the results in the context of other data available about genes and their relationships to each other. Here, we approach this problem in the study of gene expression changes associated with age in two areas of the human prefrontal cortex, comparing two computational methods. The first method, "overrepresentation analysis" (ORA), is based on statistically evaluating the fraction of genes in a particular gene ontology class found among the set of genes showing age-related changes in expression. The second method, "functional class scoring" (FCS), examines the statistical distribution of individual gene scores among all genes in the gene ontology class and does not involve an initial gene selection step. We find that FCS yields more consistent results than ORA, and the results of ORA depended strongly on the gene selection threshold. Our findings highlight the utility of functional class scoring for the analysis of complex expression data sets and emphasize the advantage of considering all available genomic information rather than sets of genes that pass a predetermined "threshold of significance."  相似文献   

2.
Serial analysis of gene expression (SAGE) technology produces large sets of interesting genes that are difficult to analyze directly. Bioinformatics tools are needed to interpret the functional information in these gene sets. We present an interactive web-based tool, called Gene Class, which allows functional annotation of SAGE data using the Gene Ontology (GO) database. This tool performs searches in the GO database for each SAGE tag, making associations in the selected GO category for a level selected in the hierarchy. This system provides user-friendly data navigation and visualization for mapping SAGE data onto the gene ontology structure. This tool also provides graphical visualization of the percentage of SAGE tags in each GO category, along with confidence intervals and hypothesis testing.  相似文献   

3.
Comparing the gene-expression profiles of sick and healthy individuals can help in understanding disease. Such differential expression analysis is a well-established way to find gene sets whose expression is altered in the disease. Recent approaches to gene-expression analysis go a step further and seek differential co-expression patterns, wherein the level of co-expression of a set of genes differs markedly between disease and control samples. Such patterns can arise from a disease-related change in the regulatory mechanism governing that set of genes, and pinpoint dysfunctional regulatory networks.Here we present DICER, a new method for detecting differentially co-expressed gene sets using a novel probabilistic score for differential correlation. DICER goes beyond standard differential co-expression and detects pairs of modules showing differential co-expression. The expression profiles of genes within each module of the pair are correlated across all samples. The correlation between the two modules, however, differs markedly between the disease and normal samples.We show that DICER outperforms the state of the art in terms of significance and interpretability of the detected gene sets. Moreover, the gene sets discovered by DICER manifest regulation by disease-specific microRNA families. In a case study on Alzheimer''s disease, DICER dissected biological processes and protein complexes into functional subunits that are differentially co-expressed, thereby revealing inner structures in disease regulatory networks.  相似文献   

4.
当两组样本间基因表达的差异程度较低或样本量较少时,采用通常的错误发现率(falsediscovery rate,FDR)控制水平(如5%或10%),可能无法识别足够多的差异表达基因以进行后续的功能富集分析。然而,功能富集分析对差异表达基因中的错误发现具有一定的稳健性。所以,采用较低的FDR控制水平(即允许较高的FDR)识别差异表达基因,可能可以可靠地发现疾病相关功能。本文分析了5套研究乳腺癌转移的基因表达谱,通过其中差异表达信号较强的3套数据,论证了即使差异表达基因的FDR达到25%,功能富集分析的结果仍具有较高的稳健性。然后,在另外2套差异表达信号微弱的数据中,采用25%的FDR控制水平筛选差异表达基因来进行功能富集分析,并与前述3套数据的功能富集结果做比较。结果显示,采用较低的FDR控制水平筛选差异表达基因,仍然可以可靠地识别乳腺癌转移相关功能。分析结果也提示,在乳腺癌转移过程中,一些功能较为宽泛的生物学过程(如细胞分裂、细胞周期和DNA复制等)整体受到了扰动,反映出乳腺癌转移是一种涉及广泛基因表达改变的系统性疾病。  相似文献   

5.
6.

Background

A tremendous amount of efforts have been devoted to identifying genes for diagnosis and prognosis of diseases using microarray gene expression data. It has been demonstrated that gene expression data have cluster structure, where the clusters consist of co-regulated genes which tend to have coordinated functions. However, most available statistical methods for gene selection do not take into consideration the cluster structure.

Results

We propose a supervised group Lasso approach that takes into account the cluster structure in gene expression data for gene selection and predictive model building. For gene expression data without biological cluster information, we first divide genes into clusters using the K-means approach and determine the optimal number of clusters using the Gap method. The supervised group Lasso consists of two steps. In the first step, we identify important genes within each cluster using the Lasso method. In the second step, we select important clusters using the group Lasso. Tuning parameters are determined using V-fold cross validation at both steps to allow for further flexibility. Prediction performance is evaluated using leave-one-out cross validation. We apply the proposed method to disease classification and survival analysis with microarray data.

Conclusion

We analyze four microarray data sets using the proposed approach: two cancer data sets with binary cancer occurrence as outcomes and two lymphoma data sets with survival outcomes. The results show that the proposed approach is capable of identifying a small number of influential gene clusters and important genes within those clusters, and has better prediction performance than existing methods.  相似文献   

7.
结合基因功能分类体系Gene Ontology筛选聚类特征基因   总被引:3,自引:0,他引:3  
使用两套基因表达谱数据,按各基因的表达值方差,选择表达变异基因对样本聚类,发现一般使用方差较大的前10%的基因作为特征基因,就可以较好地对疾病样本聚类。对不同的疾病,包含聚类信息的特征基因有不同的分布特点。在此基础上,结合基因功能分类体系(Gene Ontology,GO),进一步筛选聚类的特征基因。通过检验在Gene Ontology中的每个功能类中的表达变异基因是否非随机地聚集,寻找疾病相关功能类,再根据相关功能类中的表达变异基因进行聚类分析。实验结果显示:结合基因功能体系进一步筛选表达变异基因作为聚类特征基因,可以保持或提高聚类准确性,并使得聚类结果具有明确的生物学意义。另外,发现了一些可能和淋巴瘤和白血病相关的基因。  相似文献   

8.
9.
Identifying differential expressed genes across various conditions or genotypes is the most typical approach to studying the regulation of gene expression. An estimate of gene-specific variance is often needed for the assessment of statistical significance in most differential expression (DE) detection methods, including linear models (e.g., for transformed and normalized microarray data) and generalized linear models (e.g., for count data in RNAseq). Due to a common limit in sample size, the variance estimate is often unstable in small experiments. Shrinkage estimates using empirical Bayes methods have proven useful in improving the variance estimate, hence improving the detection of DE. The most widely used empirical Bayes methods borrow information across genes within the same experiments. In these methods, genes are considered exchangeable or exchangeable conditioning on expression level. We propose, with the increasing accumulation of expression data, borrowing information from historical data on the same gene can provide better estimate of gene-specific variance, thus further improve DE detection. Specifically, we show that the variation of gene expression is truly gene-specific and reproducible between different experiments. We present a new method to establish informative gene-specific prior on the variance of expression using existing public data, and illustrate how to shrink the variance estimate and detect DE. We demonstrate improvement in DE detection under our strategy compared to leading DE detection methods.  相似文献   

10.
Genomics and proteomics approaches generate distinct gene expression and protein profiles, listing individual genes embedded in broad functional terms as gene ontologies. However, interpretation of gene profiles in a regulatory and functional context remains a major issue. Elucidation of regulatory mechanisms at the gene expression level via analysis of promoter regions is a prominent procedure to decipher such gene regulatory networks. We propose a novel genetic algorithm (GA) to extract joint promoter modules in a set of coexpressed genes as resulting from differential gene expression experiments. Algorithm design has focused on the following constraints: (I) identification of the major promoter modules, which are (II) characterized by a maximum number of joint motifs and (III) are found in a maximum number of coexpressed genes. The capability of the GA in detecting multiple modules was evaluated on various test data sets, analyzing the impact of the number of motifs per promoter module, the number of genes associated with a module, as well as the total number of distinct promoter modules encoded in a sequence set. In addition to the test data sets, the GA was evaluated on two biological examples, namely a muscle-specific data set and the upstream sequences of the beta-actin gene (ACTB) derived from different species, complemented by a comparison to alternative promoter module identification routines.  相似文献   

11.
12.
13.
Chronic wasting disease (CWD) is an invariably fatal neurologic disease that naturally infects mule deer, white tailed deer and elk. The understanding of CWD neurodegeneration at a molecular level is very limited. In this study, microarray analysis was performed to determine changes in the gene expression profiles in six different tissues including brain, midbrain, thalamus, spleen, RPLN and tonsil of CWD-infected elk in comparison to non-infected healthy elk, using 24,000 bovine specific oligo probes. In total, 329 genes were found to be differentially expressed (> 2.0-fold) between CWD negative and positive brain tissues, with 132 genes upregulated and 197 genes downregulated. There were 249 DE genes in the spleen (168 up- and 81 downregulated), 30 DE genes in the retropharyngeal lymph node (RPLN) (18 up- and 12 downregulated), and 55 DE genes in the tonsil (21 up- and 34 downregulated). Using Gene Ontology (GO), the DE genes were assigned to functional groups associated with cellular process, biological regulation, metabolic process, and regulation of biological process. For all brain tissues, the highest ranking networks for DE genes identified by Ingenuity Pathway Analysis (IPA) were associated with neurological disease, cell morphology, cellular assembly and organization. Quantitative real-time PCR (qRT-PCR) validated the expression of DE genes primarily involved in different regulatory pathways, including neuronal signaling and synapse function, calcium signaling, apoptosis and cell death and immune cell trafficking and inflammatory response. This is the first study to evaluate altered gene expression in multiple organs including brain from orally infected elk and the results will improve our understanding of CWD neurodegeneration at the molecular level.  相似文献   

14.
《朊病毒》2013,7(3):282-301
Chronic wasting disease (CWD) is an invariably fatal neurologic disease that naturally infects mule deer, white tailed deer and elk. The understanding of CWD neurodegeneration at a molecular level is very limited. In this study, microarray analysis was performed to determine changes in the gene expression profiles in six different tissues including brain, midbrain, thalamus, spleen, RPLN and tonsil of CWD-infected elk in comparison to non-infected healthy elk, using 24,000 bovine specific oligo probes. In total, 329 genes were found to be differentially expressed (> 2.0-fold) between CWD negative and positive brain tissues, with 132 genes upregulated and 197 genes downregulated. There were 249 DE genes in the spleen (168 up- and 81 downregulated), 30 DE genes in the retropharyngeal lymph node (RPLN) (18 up- and 12 downregulated), and 55 DE genes in the tonsil (21 up- and 34 downregulated). Using Gene Ontology (GO), the DE genes were assigned to functional groups associated with cellular process, biological regulation, metabolic process, and regulation of biological process. For all brain tissues, the highest ranking networks for DE genes identified by Ingenuity Pathway Analysis (IPA) were associated with neurological disease, cell morphology, cellular assembly and organization. Quantitative real-time PCR (qRT-PCR) validated the expression of DE genes primarily involved in different regulatory pathways, including neuronal signaling and synapse function, calcium signaling, apoptosis and cell death and immune cell trafficking and inflammatory response. This is the first study to evaluate altered gene expression in multiple organs including brain from orally infected elk and the results will improve our understanding of CWD neurodegeneration at the molecular level.  相似文献   

15.
Partially paired data sets often occur in microarray experiments (Kim et al., 2005; Liu, Liang and Jang, 2006). Discussions of testing with partially paired data are found in the literature (Lin and Stivers 1974; Ekbohm, 1976; Bhoj, 1978). Bhoj (1978) initially proposed a test statistic that uses a convex combination of paired and unpaired t statistics. Kim et al. (2005) later proposed the t3 statistic, which is a linear combination of paired and unpaired t statistics, and then used it to detect differentially expressed (DE) genes in colorectal cancer (CRC) cDNA microarray data. In this paper, we extend Kim et al.'s t3 statistic to the Hotelling's T2 type statistic Tp for detecting DE gene sets of size p. We employ Efron's empirical null principle to incorporate inter-gene correlation in the estimation of the false discovery rate. Then, the proposed Tp statistic is applied to Kim et al's CRC data to detect the DE gene sets of sizes p=2 and p=3. Our results show that for small p, particularly for p=2 and marginally for p=3, the proposed Tp statistic compliments the univariate procedure by detecting additional DE genes that were undetected in the univariate test procedure. We also conduct a simulation study to demonstrate that Efron's empirical null principle is robust to the departure from the normal assumption.  相似文献   

16.
To discover the genes responsible for the apoptosis evoked by glucocorticoids in leukemic lymphoid cells, we have begun gene array analysis on microchips. Three clones of CEM cells were compared: C7–14, C1–15 and C1–6. C7–14 and C1–15 are subclones from the original clones C7 (sensitive to apoptosis by glucocorticoids) and C1 (resistant). C1–6 is a spontaneous revertant to sensitivity from the C1 clone. Previously we presented data on the sets of genes whose expression is altered in these cell clones after 20 h exposure to dexamethasone (Dex). The two sensitive clones, which respond by undergoing apoptosis starting about 24 h after Dex is added, both showed >2.5-fold induction of 39 genes and 2-fold reduction of expressed levels from 21 genes. C1–15, the resistant clone, showed alterations in a separate set of genes.

In this paper, we present further analysis of the data on genes regulated in these cell clones after 20 h Dex and compare them with the genes regulated after 12 h Dex. Some, but not all the genes found altered at 20 h are altered at 12 h, consistent with our hypothesis that sequential gene regulation eventually provokes full apoptosis. We also compare the levels of basal gene expression in the three clones. At the basal level no single gene stands out, but small sets of genes differ >2-fold in basal expression between the two sensitive and the resistant clone. A number of the genes basally higher in the resistant clone are potentially anti-apoptotic. This is consistent with our hypothesis that the resistant cells have undergone a general shift in gene expression.  相似文献   


17.
采用基因表达谱可以研究基因功能模块与疾病异质性之间的关系.根据两套白血病基因表达谱数据,将富集高变异基因的Gene Ontology基因功能模块作为特征功能模块,将疾病样本聚为两类.通过对比原始多类标签,采用聚类评估指标来分析两类化聚类结果的效果,并探讨特征功能模块与疾病异质性之间的关系.实验结果显示:在两套不同的白血病基因表达谱数据中得到的特征功能模块类似,它们对白血病亚型有较强的分型能力.  相似文献   

18.
To account for the functional non-equivalence among a set of genes within a biological pathway when performing gene set analysis, we introduce GOGANPA, a network-based gene set analysis method, which up-weights genes with functions relevant to the gene set of interest. The genes are weighted according to its degree within a genome-scale functional network constructed using the functional annotations available from the gene ontology database. By benchmarking GOGANPA using a well-studied P53 data set and three breast cancer data sets, we will demonstrate the power and reproducibility of our proposed method over traditional unweighted approaches and a competing network-based approach that involves a complex integrated network. GOGANPA’s sole reliance on gene ontology further allows GOGANPA to be widely applicable to the analysis of any gene-ontology-annotated genome.  相似文献   

19.
Techniques for analyzing genome-wide expression profiles, such as the microarray technique and next-generation sequencers, have been developed. While these techniques can provide a lot of information about gene expression, selection of genes of interest is complicated because of excessive gene expression data. Thus, many researchers use statistical methods or fold change as screening tools for finding gene sets whose expression is altered between groups, which may result in the loss of important information. In the present study, we aimed to establish a combined method for selecting genes of interest with a small magnitude of alteration in gene expression by coupling with proteome analysis. We used hypercholesterolemic rats to examine the effects of a crude herbal drug on gene expression and proteome profiles. We could not select genes of interest by using standard methods. However, by coupling with proteome analysis, we found several effects of the crude herbal drug on gene expression. Our results suggest that this method would be useful in selecting gene sets with expressions that do not show a large magnitude of alteration.  相似文献   

20.
Accumulated biological knowledge is often encoded as gene sets, collections of genes associated with similar biological functions or pathways. The use of gene sets in the analyses of high-throughput gene expression data has been intensively studied and applied in clinical research. However, the main interest remains in finding modules of biological knowledge, or corresponding gene sets, significantly associated with disease conditions. Risk prediction from censored survival times using gene sets hasn’t been well studied. In this work, we propose a hybrid method that uses both single gene and gene set information together to predict patient survival risks from gene expression profiles. In the proposed method, gene sets provide context-level information that is poorly reflected by single genes. Complementarily, single genes help to supplement incomplete information of gene sets due to our imperfect biomedical knowledge. Through the tests over multiple data sets of cancer and trauma injury, the proposed method showed robust and improved performance compared with the conventional approaches with only single genes or gene sets solely. Additionally, we examined the prediction result in the trauma injury data, and showed that the modules of biological knowledge used in the prediction by the proposed method were highly interpretable in biology. A wide range of survival prediction problems in clinical genomics is expected to benefit from the use of biological knowledge.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号