首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
There is tremendous scientific interest in the analysis of gene expression data in clinical settings, such as oncology. In this paper, we describe the importance of adjusting for confounders and other prognostic factors in order to select for differentially expressed genes for follow-up validation studies. We develop two approaches to the analysis of microarray data in non-randomized clinical settings. The first is an extension of the current significance analysis of microarray procedures, where other covariates are taken into account. The second is a novel covariate-adjusted regression modelling based on the receiver operating characteristic (ROC) curve for the analysis of gene expression data. The ideas are illustrated using data from a prostate cancer molecular profiling study.  相似文献   

2.
癌的发生与发展过程涉及大量基因的异常表达。在目前基因表达谱分析中采用的标准化方法通常假设在疾病中差异表达的基因的比例很小并且差异上、下调的比例大致相等。这个被研究者所广泛采用的标准化的前提假设尚未被充分地论证过。通过分析胰腺癌的两套表达谱数据,我们发现在胰腺癌样本中基因表达的中值显著高于正常样本,提示传统的标准化假设并不适用于胰腺癌表达谱数据。采用标准化数据会导致错误地判断大量的差异下调的基因并失查许多差异上调的基因。采用原始数据分析发现在胰腺癌中的基因表达有广泛上调的特征,为深入研究胰腺癌的发生和发展机制提供了新线索。  相似文献   

3.
目的利用已有的研究结果和数据,采用多目标评价方法建立乳腺癌易感基因评价模型,对与已知乳腺癌基因关系密切的其它基因进行分析和排序,并给出结果的网络表达模式。方法通过分析已有的文献,并利用有关的基因数据库和已有文献中的数据,提炼出乳腺癌易感基因的多目标评价体系,构建基于加权和法的乳腺癌易感基因评价模型,并利用Cytoscape软件进行评价结果计算和评价结果的网络模式表达。结果利用多目标模型所得到的评价结果,与已有的研究结果一致。其中,乳腺癌易感基因TopBP1排名第二,已知乳腺癌候选易感基因HMMR排名第六。结论文章提出的多目标评价模型能够准确评价被选基因与乳腺癌易感性之间的关系,所提出的评价方法与相关软件结合使用,将成为癌症易感基因研究方面有效的分析方法和途径。  相似文献   

4.
Gene expression studies have been widely used in an effort to identify signatures that can predict clinical progression of cancer. In this study we focused instead on identifying gene expression differences between breast tumors and adjacent normal tissue, and between different subtypes of tumor classified by clinical marker status. We have collected a set of 20 breast cancer tissues, matched with the adjacent pathologically normal tissue from the same patient. The cancer samples representing each subtype of breast cancer identified by estrogen receptor ER(+/-) and Her2(+/-) status and divided into four subgroups (ER+/Her2+, ER+/Her2-, ER-/Her2+, and ER-/Her2-) were hybridized on Affymetrix HG-133 Plus 2.0 microarrays. By comparing cancer samples with their matched normal controls we have identified 3537 overall differentially expressed genes using data analysis methods from Bioconductor. When we looked at the genes in common of the four subgroups, we found 151 regulated genes, some of them encoding known targets for breast cancer treatment. Unique genes in the four subgroups instead suggested gene regulation dependent on the ER/Her2 markers selection. In conclusion, the results indicate that microarray studies using robust analysis of matched tumor and normal samples from the same patients can be used to identify genes differentially expressed in breast cancer tumor subtypes even when small numbers of samples are considered and can further elucidate molecular features of breast cancer.  相似文献   

5.
Microarrays have received significant attention in recent years as scientists have firstly identified factors that can produce reduced confidence in gene expression data obtained on these platforms, and secondly sought to establish laboratory practices and a set of standards by which data are reported with integrity. Microsphere-based assays represent a new generation of diagnostics in this field capable of providing substantial quantitative and qualitative information from gene expression profiling. However, for gene expression profiling, this type of platform is still in the demonstration phase, with issues arising from comparative studies in the literature not yet identified. It is desirable to identify potential parameters that are established as important in controlling the information derived from microsphere-based hybridizations to quantify gene expression. As these evolve, a standard set of parameters will be established that are required to be provided when data are submitted for publication. Here we initiate this process by identifying a number of parameters we have found to be important in microsphere-based assays designed for the quantification of low abundant genes which are variable between studies.  相似文献   

6.
In this paper we discuss some of the statistical issues that should be considered when conducting experiments involving microarray gene expression data. We discuss statistical issues related to preprocessing the data as well as the analysis of the data. Analysis of the data is discussed in three contexts: class comparison, class prediction and class discovery. We also review the methods used in two studies that are using microarray gene expression to assess the effect of exposure to radiofrequency (RF) fields on gene expression. Our intent is to provide a guide for radiation researchers when conducting studies involving microarray gene expression data.  相似文献   

7.
8.
Gene expression profiling offers a great opportunity for studying multi-factor diseases and for understanding the key role of genes in mechanisms which drive a normal cell to a cancer state. Single gene analysis is insufficient to describe the complex perturbations responsible for cancer onset, progression and invasion. A deeper understanding of the mechanisms of tumorigenesis can be reached focusing on deregulation of gene sets or pathways rather than on individual genes. We apply two known and statistically well founded methods for finding pathways and biological processes deregulated in pathological conditions by analyzing gene expression profiles. In particular, we measure the amount of deregulation and assess the statistical significance of predefined pathways belonging to a curated collection (Molecular Signature Database) in a colon cancer data set. We find that pathways strongly involved in different tumors are strictly connected with colon cancer. Moreover, our experimental results show that the study of complex diseases through pathway analysis is able to highlight genes weakly connected to the phenotype which may be difficult to detect by using classical univariate statistics. Our study shows the importance of using gene sets rather than single genes for understanding the main biological processes and pathways involved in colorectal cancer. Our analysis evidences that many of the genes involved in these pathways are strongly associated to colorectal tumorigenesis. In this new perspective, the focus shifts from finding differentially expressed genes to identifying biological processes, cellular functions and pathways perturbed in the phenotypic conditions by analyzing genes co-expressed in a given pathway as a whole, taking into account the possible interactions among them and, more importantly, the correlation of their expression with the phenotypical conditions.  相似文献   

9.
The design and analysis of experiments using gene expression microarrays is a topic of considerable current research, and work is beginning to appear on the analysis of proteomics and metabolomics data by mass spectrometry and NMR spectroscopy. The literature in this area is evolving rapidly, and commercial software for analysis of array or proteomics data is rarely up to date, and is essentially nonexistent for metabolomics data. In this paper, I review some of the issues that should concern any biologists planning to use such high-throughput biological assay data in an experimental investigation. Technical details are kept to a minimum, and may be found in the referenced literature, as well as in the many excellent papers which space limitations prevent my describing. There are usually a number of viable options for design and analysis of such experiments, but unfortunately, there are even more non-viable ones that have been used even in the published literature. This is an area in which up-to-date knowledge of the literature is indispensable for efficient and effective design and analysis of these experiments. In general, we concentrate on relatively simple analyses, often focusing on identifying differentially expressed genes and the comparable issues in mass spectrometry and NMR spectroscopy (consistent differences in peak heights or areas for example). Complex multivariate and pattern recognition methods also need much attention, but the issues we describe in this paper must be dealt with first. The literature on analysis of proteomics and metabolomics data is as yet sparse, so the main focus of this paper will be on methods devised for analysis of gene expression data that generalize to proteomics and metabolomics, with some specific comments near the end on analysis of metabolomics data by mass spectrometry and NMR spectroscopy.  相似文献   

10.
Lack of adequate statistical methods for the analysis of microarray data remains the most critical deterrent to uncovering the true potential of these promising techniques in basic and translational biological studies. The popular practice of drawing important biological conclusions from just one replicate (slide) should be discouraged. In this paper, we discuss some modern trends in statistical analysis of microarray data with a special focus on statistical classification (pattern recognition) and variable selection. In addressing these issues we consider the utility of some distances between random vectors and their nonparametric estimates obtained from gene expression data. Performance of the proposed distances is tested by computer simulations and analysis of gene expression data on two different types of human leukemia. In experimental settings, the error rate is estimated by cross-validation, while a control sample is generated in computer simulation experiments aimed at testing the proposed gene selection procedures and associated classification rules.  相似文献   

11.
Sun W 《Biometrics》2012,68(1):1-11
RNA-seq may replace gene expression microarrays in the near future. Using RNA-seq, the expression of a gene can be estimated using the total number of sequence reads mapped to that gene, known as the total read count (TReC). Traditional expression quantitative trait locus (eQTL) mapping methods, such as linear regression, can be applied to TReC measurements after they are properly normalized. In this article, we show that eQTL mapping, by directly modeling TReC using discrete distributions, has higher statistical power than the two-step approach: data normalization followed by linear regression. In addition, RNA-seq provides information on allele-specific expression (ASE) that is not available from microarrays. By combining the information from TReC and ASE, we can computationally distinguish cis- and trans-eQTL and further improve the power of cis-eQTL mapping. Both simulation and real data studies confirm the improved power of our new methods. We also discuss the design issues of RNA-seq experiments. Specifically, we show that by combining TReC and ASE measurements, it is possible to minimize cost and retain the statistical power of cis-eQTL mapping by reducing sample size while increasing the number of sequence reads per sample. In addition to RNA-seq data, our method can also be employed to study the genetic basis of other types of sequencing data, such as chromatin immunoprecipitation followed by DNA sequencing data. In this article, we focus on eQTL mapping of a single gene using the association-based method. However, our method establishes a statistical framework for future developments of eQTL mapping methods using RNA-seq data (e.g., linkage-based eQTL mapping), and the joint study of multiple genetic markers and/or multiple genes.  相似文献   

12.
In DNA microarray analysis, there is often interest in isolating a few genes that best discriminate between tissue types. This is especially important in cancer, where different clinicopathologic groups are known to vary in their outcomes and response to therapy. The identification of a small subset of gene expression patterns distinctive for tumor subtypes can help design treatment strategies and improve diagnosis. Toward this goal, we propose a methodology for the analysis of high-density oligonucleotide arrays. The gene expression measures are modeled as censored data to account for the quantification limits of the technology, and two gene selection criteria based on contrasts from an analysis of covariance (ANCOVA) model are presented. The model is formulated in a hierarchical Bayesian framework, which in addition to making the fit of the model straightforward and computationally efficient, allows us to borrow strength across genes. The elicitation of hierarchical priors, as well as issues related to parameter identifiability and posterior propriety, are discussed in detail. We examine the performance of our proposed method on simulated data, then present a detailed case study of an endometrial cancer dataset.  相似文献   

13.
In this work, the application of a multivariate curve resolution procedure based on alternating least squares optimization (MCR-ALS) for the analysis of data from DNA microarrays is proposed. For this purpose, simulated and publicly available experimental data sets have been analyzed. Application of MCR-ALS, a method that operates without the use of any training set, has enabled the resolution of the relevant information about different cancer lines classification using a set of few components; each of these defined by a sample and a pure gene expression profile. From resolved sample profiles, a classification of samples according to their origin is proposed. From the resolved pure gene expression profiles, a set of over- or underexpressed genes that could be related to the development of cancer diseases has been selected. Advantages of the MCR-ALS procedure in relation to other previously proposed procedures such as principal component analysis are discussed.  相似文献   

14.
本研究对非小细胞肺癌(non-small cell lung carcinoma,NSCLC)基因表达数据进行差异表达分析,并与蛋白质相互作用网络(PPIN)数据进行整合,进一步利用Heinz搜索算法识别NSCLC相关的基因功能模块,并对模块中的基因进行功能(GO term)和通路(KEGG)富集分析,旨在探究肺癌发病分子机制。蛋白互作网络分析得到一个包含96个基因和117个相互作用的功能模块,以及8个对NSCLC的发生和发展起到关键作用候选基因标志物。富集分析结果表明,这些基因主要富集于基因转录催化及染色质调控等生物学过程,并在基础转录因子、黏着连接、细胞周期、Wnt信号通路及HTLV-Ⅰ感染等生物学通路中发挥重要作用。本研究对非小细胞肺癌相关的基因和生物学通路进行预测,可用于肺癌的早期诊断和早期治疗,以降低肺癌死亡率。  相似文献   

15.
16.
COP1 (constitutive photomorphogenic 1, also known as RFWD2) is a p53-targeting E3 ubiquitin ligase containing RING-finger, coiled-coil, and WD40-repeat domains. Recent studies have identified that COP1 is overexpressed in several cancer types and that increased COP1 expression promotes cell proliferation, cell transformation, and tumor progression. In the present study, we investigated the expression and prognostic value of COP1 in primary gastric cancer. To investigate the role of the COP1 gene in primary gastric cancer pathogenesis, real-time quantitative PCR and western blotting were performed to examine COP1 expression in paired cancerous and matched adjacent noncancerous gastric tissues. The results revealed high COP1 mRNA (P=0.030) and protein (P=0.008) expression in most tumor-bearing tissues compared with the matched adjacent non-tumor tissues. The correlated protein expression analysis revealed a negative correlation between COP1 and p53 in gastric cancer samples (P=0.005, r=-0.572). Immunohistochemical staining of gastric cancer tissues from the same patient showed a high COP1 expression and a low p53 expression. To further investigate the clinicopathological and prognostic roles of COP1 expression, we performed immunohistochemical analysis of 401 paraffin-embedded gastric cancer tissue blocks. The data revealed that high COP1 expression was significantly correlated with T stage (P=0.030), M stage (P=0.048) and TNM stage (P=0.022). Consistent with these results, we found that high expression of COP1 was significantly correlated with poor survival in gastric cancer patients (P<0.001). Cox regression analyses showed that COP1 expression was an independent predictor of overall survival (P<0.001). Our data suggest that COP1 could play an important role in gastric cancer and might serve as a valuable prognostic marker and potential target for gene therapy in the treatment of gastric cancer.  相似文献   

17.
18.
19.
Qin LX  Self SG 《Biometrics》2006,62(2):526-533
Identification of differentially expressed genes and clustering of genes are two important and complementary objectives addressed with gene expression data. For the differential expression question, many "per-gene" analytic methods have been proposed. These methods can generally be characterized as using a regression function to independently model the observations for each gene; various adjustments for multiplicity are then used to interpret the statistical significance of these per-gene regression models over the collection of genes analyzed. Motivated by this common structure of per-gene models, we proposed a new model-based clustering method--the clustering of regression models method, which groups genes that share a similar relationship to the covariate(s). This method provides a unified approach for a family of clustering procedures and can be applied for data collected with various experimental designs. In addition, when combined with per-gene methods for assessing differential expression that employ the same regression modeling structure, an integrated framework for the analysis of microarray data is obtained. The proposed methodology was applied to two microarray data sets, one from a breast cancer study and the other from a yeast cell cycle study.  相似文献   

20.
Recently, several classifiers that combine primary tumor data, like gene expression data, and secondary data sources, such as protein-protein interaction networks, have been proposed for predicting outcome in breast cancer. In these approaches, new composite features are typically constructed by aggregating the expression levels of several genes. The secondary data sources are employed to guide this aggregation. Although many studies claim that these approaches improve classification performance over single genes classifiers, the gain in performance is difficult to assess. This stems mainly from the fact that different breast cancer data sets and validation procedures are employed to assess the performance. Here we address these issues by employing a large cohort of six breast cancer data sets as benchmark set and by performing an unbiased evaluation of the classification accuracies of the different approaches. Contrary to previous claims, we find that composite feature classifiers do not outperform simple single genes classifiers. We investigate the effect of (1) the number of selected features; (2) the specific gene set from which features are selected; (3) the size of the training set and (4) the heterogeneity of the data set on the performance of composite feature and single genes classifiers. Strikingly, we find that randomization of secondary data sources, which destroys all biological information in these sources, does not result in a deterioration in performance of composite feature classifiers. Finally, we show that when a proper correction for gene set size is performed, the stability of single genes sets is similar to the stability of composite feature sets. Based on these results there is currently no reason to prefer prognostic classifiers based on composite features over single genes classifiers for predicting outcome in breast cancer.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号