首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In DNA microarray analysis, there is often interest in isolating a few genes that best discriminate between tissue types. This is especially important in cancer, where different clinicopathologic groups are known to vary in their outcomes and response to therapy. The identification of a small subset of gene expression patterns distinctive for tumor subtypes can help design treatment strategies and improve diagnosis. Toward this goal, we propose a methodology for the analysis of high-density oligonucleotide arrays. The gene expression measures are modeled as censored data to account for the quantification limits of the technology, and two gene selection criteria based on contrasts from an analysis of covariance (ANCOVA) model are presented. The model is formulated in a hierarchical Bayesian framework, which in addition to making the fit of the model straightforward and computationally efficient, allows us to borrow strength across genes. The elicitation of hierarchical priors, as well as issues related to parameter identifiability and posterior propriety, are discussed in detail. We examine the performance of our proposed method on simulated data, then present a detailed case study of an endometrial cancer dataset.  相似文献   

2.
MOTIVATION: Most biological traits may be correlated with the underlying gene expression patterns that are partially determined by DNA sequence variation. The correlations between gene expressions and quantitative traits are essential for understanding the functions of genes and dissecting gene regulatory networks. RESULTS: In the present study, we adopted a novel statistical method, called the stochastic expectation and maximization (SEM) algorithm, to analyze the associations between gene expression levels and quantitative trait values and identify genetic loci controlling the gene expression variations. In the first step, gene expression levels measured from microarray experiments were assigned to two different clusters based on the strengths of their association with the phenotypes of a quantitative trait under investigation. In the second step, genes associated with the trait were mapped to genetic loci of the genome. Because gene expressions are quantitative, the genetic loci controlling the expression traits are called expression quantitative trait loci. We applied the same SEM algorithm to a real dataset collected from a barley genetic experiment with both quantitative traits and gene expression traits. For the first time, we identified genes associated with eight agronomy traits of barley. These genes were then mapped to seven chromosomes of the barley genome. The SEM algorithm and the result of the barley data analysis are useful to scientists in the areas of bioinformatics and plant breeding. Availability and implementation: The R program for the SEM algorithm can be downloaded from our website: http://www.statgen.ucr.edu.  相似文献   

3.
We propose a statistical method for uncovering gene pathways that characterize cancer heterogeneity. To incorporate knowledge of the pathways into the model, we define a set of activities of pathways from microarray gene expression data based on the Sparse Probabilistic Principal Component Analysis (SPPCA). A pathway activity logistic regression model is then formulated for cancer phenotype. To select pathway activities related to binary cancer phenotypes, we use the elastic net for the parameter estimation and derive a model selection criterion for selecting tuning parameters included in the model estimation. Our proposed method can also reverse-engineer gene networks based on the identified multiple pathways that enables us to discover novel gene-gene associations relating with the cancer phenotypes. We illustrate the whole process of the proposed method through the analysis of breast cancer gene expression data.  相似文献   

4.
5.
We describe a method for detecting marker genes in large heterogeneous collections of gene expression data. Markers are identified and characterized by the existence of demarcations in their expression values across the whole dataset, which suggest the presence of groupings of samples. We apply this method to DNA microarray data generated from 83 mouse stem cell related samples and describe 426 selected markers associated with differentiation to establish principles of stem cell evolution.  相似文献   

6.
李丽  李霞  陈义汉  郭政  姜伟  张瑞杰  饶绍奇 《遗传》2006,28(9):1129-1134
基因芯片技术为疾病异质性研究提供了有力的工具。当前基于传统聚类分析的方法一般利用芯片上大量基因作为特征来发现疾病的亚型, 因此它们没有考虑到特征中包含的大量无关基因会掩盖有意义的疾病样本的分割。为了避免这个缺点, 提出了基于耦合双向聚类的异质性分析方法(Heterogeneous Analysis Based on Coupled Two-Way Clustering, HCTWC)来搜索有意义的基因簇以便发现样本的内在分割。该方法被应用于弥漫性大B细胞淋巴瘤(diffuse large B-cell lymphoma DLBCL)芯片数据集, 通过识别的基因簇作为特征对DLBCL样本聚类发现生存期分别为55%和25%的两类DLBCL亚型(P<0.05), 因此, HCTWC方法在解决疾病异质性是有效的。  相似文献   

7.
Microarray technologies, which can measure tens of thousands of gene expression values simultaneously in a single experiment, have become a common research method for biomedical researchers. Computational tools to analyze microarray data for biological discovery are needed. In this paper, we investigate the feasibility of using formal concept analysis (FCA) as a tool for microarray data analysis. The method of FCA builds a (concept) lattice from the experimental data together with additional biological information. For microarray data, each vertex of the lattice corresponds to a subset of genes that are grouped together according to their expression values and some biological information related to gene function. The lattice structure of these gene sets might reflect biological relationships in the dataset. Similarities and differences between experiments can then be investigated by comparing their corresponding lattices according to various graph measures. We apply our method to microarray data derived from influenza-infected mouse lung tissue and healthy controls. Our preliminary results show the promise of our method as a tool for microarray data analysis.  相似文献   

8.
9.
Microarrays have been useful in understanding various biological processes by allowing the simultaneous study of the expression of thousands of genes. However, the analysis of microarray data is a challenging task. One of the key problems in microarray analysis is the classification of unknown expression profiles. Specifically, the often large number of non-informative genes on the microarray adversely affects the performance and efficiency of classification algorithms. Furthermore, the skewed ratio of sample to variable poses a risk of overfitting. Thus, in this context, feature selection methods become crucial to select relevant genes and, hence, improve classification accuracy. In this study, we investigated feature selection methods based on gene expression profiles and protein interactions. We found that in our setup, the addition of protein interaction information did not contribute to any significant improvement of the classification results. Furthermore, we developed a novel feature selection method that relies exclusively on observed gene expression changes in microarray experiments, which we call “relative Signal-to-Noise ratio” (rSNR). More precisely, the rSNR ranks genes based on their specificity to an experimental condition, by comparing intrinsic variation, i.e. variation in gene expression within an experimental condition, with extrinsic variation, i.e. variation in gene expression across experimental conditions. Genes with low variation within an experimental condition of interest and high variation across experimental conditions are ranked higher, and help in improving classification accuracy. We compared different feature selection methods on two time-series microarray datasets and one static microarray dataset. We found that the rSNR performed generally better than the other methods.  相似文献   

10.
Regulatory motif finding by logic regression   总被引:1,自引:0,他引:1  
  相似文献   

11.
12.
MOTIVATION: Finding differentially expressed genes is a fundamental objective of a microarray experiment. Numerous methods have been proposed to perform this task. Existing methods are based on point estimates of gene expression level obtained from each microarray experiment. This approach discards potentially useful information about measurement error that can be obtained from an appropriate probe-level analysis. Probabilistic probe-level models can be used to measure gene expression and also provide a level of uncertainty in this measurement. This probe-level measurement error provides useful information which can help in the identification of differentially expressed genes. RESULTS: We propose a Bayesian method to include probe-level measurement error into the detection of differentially expressed genes from replicated experiments. A variational approximation is used for efficient parameter estimation. We compare this approximation with MAP and MCMC parameter estimation in terms of computational efficiency and accuracy. The method is used to calculate the probability of positive log-ratio (PPLR) of expression levels between conditions. Using the measurements from a recently developed Affymetrix probe-level model, multi-mgMOS, we test PPLR on a spike-in dataset and a mouse time-course dataset. Results show that the inclusion of probe-level measurement error improves accuracy in detecting differential gene expression. AVAILABILITY: The MAP approximation and variational inference described in this paper have been implemented in an R package pplr. The MCMC method is implemented in Matlab. Both software are available from http://umber.sbs.man.ac.uk/resources/puma.  相似文献   

13.
14.
Understanding the root molecular and genetic causes driving complex traits is a fundamental challenge in genomics and genetics. Numerous studies have used variation in gene expression to understand complex traits, but the underlying genomic variation that contributes to these expression changes is not well understood. In this study, we developed a framework to integrate gene expression and genotype data to identify biological differences between samples from opposing complex trait classes that are driven by expression changes and genotypic variation. This framework utilizes pathway analysis and multi-task learning to build a predictive model and discover pathways relevant to the complex trait of interest. We simulated expression and genotype data to test the predictive ability of our framework and to measure how well it uncovered pathways with genes both differentially expressed and genetically associated with a complex trait. We found that the predictive performance of the multi-task model was comparable to other similar methods. Also, methods like multi-task learning that considered enrichment analysis scores from both data sets found pathways with both genetic and expression differences related to the phenotype. We used our framework to analyze differences between estrogen receptor (ER) positive and negative breast cancer samples. An analysis of the top 15 gene sets from the multi-task model showed they were all related to estrogen, steroids, cell signaling, or the cell cycle. Although our study suggests that multi-task learning does not enhance predictive accuracy, the models generated by our framework do provide valuable biological pathway knowledge for complex traits.  相似文献   

15.
16.
The main aims of this study were to determine the effects of GH gene abuse/misuse in normal animals and to discover genes that could be used as candidate biomarkers for the detection of GH gene therapy abuse/misuse in humans. We determined the global gene expression profile of peripheral whole blood from normal adult male rats after long-term GH gene therapy using CapitalBio 27 K Rat Genome Oligo Arrays. Sixty one genes were found to be differentially expressed in GH gene-treated rats 24 weeks after receiving GH gene therapy, at a two-fold higher or lower level compared to the empty vector group (p < 0.05). These genes were mainly associated with angiogenesis, oncogenesis, apoptosis, immune networks, signaling pathways, general metabolism, type I diabetes mellitus, carbon fixation, cell adhesion molecules, and cytokine-cytokine receptor interaction. The results imply that exogenous GH gene expression in normal subjects is likely to induce cellular changes in the metabolism, signal pathways and immunity. A real-time qRT-PCR analysis of a selection of the genes confirmed the microarray data. Eight differently expressed genes were selected as candidate biomarkers from among these 61 genes. These 8 showed five-fold higher or lower expression levels after the GH gene transduction (p < 0.05). They were then validated in real-time PCR experiments using 15 single-treated blood samples and 10 control blood samples. In summary, we detected the gene expression profiles of rat peripheral whole blood after long-term GH gene therapy and screened eight genes as candidate biomarkers based on the microarray data. This will contribute to an increased mechanistic understanding of the effects of chronic GH gene therapy abuse/misuse in normal subjects.  相似文献   

17.
MOTIVATION: To understand cancer etiology, it is important to explore molecular changes in cellular processes from normal state to cancerous state. Because genes interact with each other during cellular processes, carcinogenesis related genes may form differential co-expression patterns with other genes in different cell states. In this study, we develop a statistical method for identifying differential gene-gene co-expression patterns in different cell states. RESULTS: For efficient pattern recognition, we extend the traditional F-statistic and obtain an Expected Conditional F-statistic (ECF-statistic), which incorporates statistical information of location and correlation. We also propose a statistical method for data transformation. Our approach is applied to a microarray gene expression dataset for prostate cancer study. For a gene of interest, our method can select other genes that have differential gene-gene co-expression patterns with this gene in different cell states. The 10 most frequently selected genes, include hepsin, GSTP1 and AMACR, which have recently been proposed to be associated with prostate carcinogenesis. However, genes GSTP1 and AMACR cannot be identified by studying differential gene expression alone. By using tumor suppressor genes TP53, PTEN and RB1, we identify seven genes that also include hepsin, GSTP1 and AMACR. We show that genes associated with cancer may have differential gene-gene expression patterns with many other genes in different cell states. By discovering such patterns, we may be able to identify carcinogenesis related genes.  相似文献   

18.
Global gene expression analysis using microarrays and, more recently, RNA-seq, has allowed investigators to understand biological processes at a system level. However, the identification of differentially expressed genes in experiments with small sample size, high dimensionality, and high variance remains challenging, limiting the usability of these tens of thousands of publicly available, and possibly many more unpublished, gene expression datasets. We propose a novel variable selection algorithm for ultra-low-n microarray studies using generalized linear model-based variable selection with a penalized binomial regression algorithm called penalized Euclidean distance (PED). Our method uses PED to build a classifier on the experimental data to rank genes by importance. In place of cross-validation, which is required by most similar methods but not reliable for experiments with small sample size, we use a simulation-based approach to additively build a list of differentially expressed genes from the rank-ordered list. Our simulation-based approach maintains a low false discovery rate while maximizing the number of differentially expressed genes identified, a feature critical for downstream pathway analysis. We apply our method to microarray data from an experiment perturbing the Notch signaling pathway in Xenopus laevis embryos. This dataset was chosen because it showed very little differential expression according to limma, a powerful and widely-used method for microarray analysis. Our method was able to detect a significant number of differentially expressed genes in this dataset and suggest future directions for investigation. Our method is easily adaptable for analysis of data from RNA-seq and other global expression experiments with low sample size and high dimensionality.  相似文献   

19.
Using a measure of how differentially expressed a gene is in two biochemically/phenotypically different conditions, we can rank all genes in a microarray dataset. We have shown that the falling-off of this measure (normalized maximum likelihood in a classification model such as logistic regression) as a function of the rank is typically a power-law function. This power-law function in other similar ranked plots are known as the Zipf's law, observed in many natural and social phenomena. The presence of this power-law function prevents an intrinsic cutoff point between the "important" genes and "irrelevant" genes. We have shown that similar power-law functions are also present in permuted dataset, and provide an explanation from the well-known chi(2) distribution of likelihood ratios. We discuss the implication of this Zipf's law on gene selection in a microarray data analysis, as well as other characterizations of the ranked likelihood plots such as the rate of fall-off of the likelihood.  相似文献   

20.

Background

A combined quantitative trait loci (QTL) and microarray-based approach is commonly used to find differentially expressed genes which are then identified based on the known function of a gene in the biological process governing the trait of interest. However, a low cutoff value in individual gene analyses may result in many genes with moderate but meaningful changes in expression being missed.

Results

We modified a gene set analysis to identify intersection sets with significantly affected expression for which the changes in the individual gene sets are less significant. The gene expression profiles in liver tissues of four strains of mice from publicly available microarray sources were analyzed to detect trait-associated pathways using information on the QTL regions of blood concentrations of high density lipoproteins (HDL) cholesterol and insulin-like growth factor 1 (IGF-1). Several metabolic pathways related to HDL levels, including lipid metabolism, ABC transporters and cytochrome P450 pathways were detected for HDL QTL regions. Most of the pathways identified for the IGF-1 phenotype were signal transduction pathways associated with biological processes for IGF-1's regulation.

Conclusion

We have developed a method of identifying pathways associated with a quantitative trait using information on QTL. Our approach provides insights into genotype-phenotype relations at the level of biological pathways which may help to elucidate the genetic architecture underlying variation in phenotypic traits.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号