首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Due to the great variety of preprocessing tools in two-channel expression microarray data analysis it is difficult to choose the most appropriate one for a given experimental setup. In our study, two independent two-channel inhouse microarray experiments as well as a publicly available dataset were used to investigate the influence of the selection of preprocessing methods (background correction, normalization, and duplicate spots correlation calculation) on the discovery of differentially expressed genes. Here we are showing that both the list of differentially expressed genes and the expression values of selected genes depend significantly on the preprocessing approach applied. The choice of normalization method to be used had the highest impact on the results. We propose a simple but efficient approach to increase the reliability of obtained results, where two normalization methods which are theoretically distinct from one another are used on the same dataset. Then the intersection of results, that is, the lists of differentially expressed genes, is used in order to get a more accurate estimation of the genes that were de facto differentially expressed.  相似文献   

2.
MOTIVATION: The field of microarray data analysis is shifting emphasis from methods for identifying differentially expressed genes to methods for identifying differentially expressed gene categories. The latter approaches utilize a priori information about genes to group genes into categories and enhance the interpretation of experiments aimed at identifying expression differences across treatments. While almost all of the existing approaches for identifying differentially expressed gene categories are practically useful, they suffer from a variety of drawbacks. Perhaps most notably, many popular tools are based exclusively on gene-specific statistics that cannot detect many types of multivariate expression change. RESULTS: We have developed a nonparametric multivariate method for identifying gene categories whose multivariate expression distribution differs across two or more conditions. We illustrate our approach and compare its performance to several existing procedures via the analysis of a real data set and a unique data-based simulation study designed to capture the challenges and complexities of practical data analysis. We show that our method has good power for differentiating between differentially expressed and non-differentially expressed gene categories, and we utilize a resampling based strategy for controlling the false discovery rate when testing multiple categories. AVAILABILITY: R code (www.r-project.org) for implementing our approach is available from the first author by request.  相似文献   

3.
4.
Cross-species research in drug development is novel and challenging. A bivariate mixture model utilizing information across two species was proposed to solve the fundamental problem of identifying differentially expressed genes in microarray experiments in order to potentially improve the understanding of translation between preclinical and clinical studies for drug development. The proposed approach models the joint distribution of treatment effects estimated from independent linear models. The mixture model posits up to nine components, four of which include groups in which genes are differentially expressed in both species. A comprehensive simulation to evaluate the model performance and one application on a real world data set, a mouse and human type II diabetes experiment, suggest that the proposed model, though highly structured, can handle various configurations of differential gene expression and is practically useful on identifying differentially expressed genes, especially when the magnitude of differential expression due to different treatment intervention is weak. In the mouse and human application, the proposed mixture model was able to eliminate unimportant genes and identify a list of genes that were differentially expressed in both species and could be potential gene targets for drug development.  相似文献   

5.
6.
7.
Permanent Atrial fibrillation (pmAF) has largely remained incurable since the existing information for explaining precise mechanisms underlying pmAF is not sufficient. Microarray analysis offers a broader and unbiased approach to identify and predict new biological features of pmAF. By considering the unbalanced sample numbers in most microarray data of case - control, we designed an asymmetric principal component analysis algorithm and applied it to re - analyze differential gene expression data of pmAF patients and control samples for predicting new biological features. Finally, we identified 51 differentially expressed genes using the proposed method, in which 42 differentially expressed genes are new findings compared with two related works on the same data and the existing studies. The enrichment analysis illustrated the reliability of identified differentially expressed genes. Moreover, we predicted three new pmAF – related signaling pathways using the identified differentially expressed genes via the KO-Based Annotation System. Our analysis and the existing studies supported that the predicted signaling pathways may promote the pmAF progression. The results above are worthy to do further experimental studies. This work provides some new insights into molecular features of pmAF. It has also the potentially important implications for improved understanding of the molecular mechanisms of pmAF.  相似文献   

8.
9.
10.
Multivariate exploratory tools for microarray data analysis   总被引:2,自引:0,他引:2  
The ultimate success of microarray technology in basic and applied biological sciences depends critically on the development of statistical methods for gene expression data analysis. The most widely used tests for differential expression of genes are essentially univariate. Such tests disregard the multidimensional structure of microarray data. Multivariate methods are needed to utilize the information hidden in gene interactions and hence to provide more powerful and biologically meaningful methods for finding subsets of differentially expressed genes. The objective of this paper is to develop methods of multidimensional search for biologically significant genes, considering expression signals as mutually dependent random variables. To attain these ends, we consider the utility of a pertinent distance between random vectors and its empirical counterpart constructed from gene expression data. The distance furnishes exploratory procedures aimed at finding a target subset of differentially expressed genes. To determine the size of the target subset, we resort to successive elimination of smaller subsets resulting from each step of a random search algorithm based on maximization of the proposed distance. Different stopping rules associated with this procedure are evaluated. The usefulness of the proposed approach is illustrated with an application to the analysis of two sets of gene expression data.  相似文献   

11.
Fuchs B  Zhang K  Bolander ME  Sarkar G 《Gene》2000,258(1-2):155-163
The need for rapid identification of differentially expressed genes will persist even after the complete human genomic sequence becomes available. The most popular method for identifying differentially expressed genes acquires expressed sequence tags (ESTs) from the extreme 3' non-coding end of mRNAs. Such ESTs have limitations for downstream applications. We have developed a method, termed preferential amplification of coding sequences (PACS), that was applied to identify differentially expressed coding sequence tags (dCSTs) between osteoblasts and osteosarcoma cells. PACS was achieved by PCR with a set of primers to anchor at sequences complementary to AUG sequences in mRNAs and another set of primers to anchor at a PCR-amplifiable distance from AUG sequences. An initial screen identified 103 candidate dCSTs after screening approximately 15% of the expressed genes between the two cell types. Of these sequences, 27 represent CSTs of known genes and two are from 3'-ESTs of known mRNAs. Thus, PACS identified CSTs approximately 13.5 times more often than it identified 3' ESTs, attesting to the objective of the method. Since many of the dCSTs represent known genes, their identity and potential relevance to osteosarcoma could be immediately hypothesized. Differential expression of many of the dCSTs was further demonstrated by northern blotting or RT-PCR. Since PACS is not dependent on the existence of a poly A tail on an mRNA, it should have application to identify dCSTs for both prokaryotic and eukaryotic organisms. Additionally, PACS should aid in the identification of cell-specific or tissue-specific genes and bidirectional acquisition of cDNA sequence enabling rapid retrieval of full-length cDNA sequence of novel genes.  相似文献   

12.
13.
14.
《Journal of Asia》2014,17(1):37-43
In this study, we analyzed the gene and miRNA expression differences between the courted virgin queen (CVQ) and non-courted virgin queen (NCVQ) of Apis mellifera using a high-throughput sequencing method. Through Digital Gene Expression (DGE) sequencing, 452 genes were differentially expressed, out of which, 90 genes were up-regulated and 362 genes were down-regulated in CVQ compared with NCVQ. Through small RNA sequencing, 27 miRNAs showed significant expression difference between these two samples. Moreover, 9 of the differentially expressed genes are the targets of the 11 differentially expressed miRNAs. Besides, 47 novel miRNA candidates were predicted in these two samples. Our results provided valuable information for understanding the molecular mechanism of the transition to functional queens.  相似文献   

15.
16.
利用标准化的Affymetrix公司生产的U133A基因芯片检测胃癌(T)与切缘正常胃黏膜(C)基因表达谱差异,并利用生物信息学方法对检测结果进行差异基因在染色体定位和功能分析。结果表明:胃癌与正常胃黏膜比较差异8倍以上共有270个基因,其中表达上调[信号比的对数值(SLR)≥3]有157个,表达下调(SLR≤-3)有113个。从表达差异的基因在染色体定位分析,发现除4个基因未知其定位外,其余所有差异表达基因散在分布和各条染色体上,但以1号染色体为最多,有26个(占9.8%),其次是11和19号染色体上分别有24个(各占9.1%)。而差异表达的基因发生在染色体短臂(q)上有173个(占65%)。从表达差异的基因功能分类看,属于酶和酶调控子基因最多(67个,24.8占%),其次是信号传导基因(43个,占15.9%),第3类是核酸结合基因(17个,占6.3%),第4类是转运子基因(15个,占5.5%),第5类是蛋白结合基因(12个,占4.4%),还有功能未知的基因有50个,占18.5%。以上5大类共占基因总数56.9%。胃癌差异表达基因散在分布在各条染色体上,但以1、11、19号染色体差异表达基因居多。这5大类(酶和酶调控子、信号传导、核酸结合、转运子、蛋白结合)相关基因异常是今后研究胃癌的重要基因。  相似文献   

17.
Using Affymetrix U133A oligonucleotide microarrays, screening was done for genes that were differentially expressed in gastric cancer (T) and normal gastric mucosa (C), and their chromosome location was characterized by bioinformatics. A total of 270 genes were found to have a difference in expression levels of more than eight times. Of them 157 were up-regulated (Signal Log Ratio [SLR]≥3), and 113 were down-regulated (SLR≤-3). Except for, four genes with unknown localization, a vast majority of the genes were sporadically distributed over every chromosome. However, chromosome 1 contained the most differentially expressed genes (26 genes, or 9.8%), followed by chromosomes 11 and 19 (both 24 genes, or 9.1%). These genes were also more likely to be on the short-arm of the chromosome (q), which had 173 (65%). When these genes were classified according to their functions, it was found that most (67 genes, 24.8%) belonged to the enzymes and their regulators groups. The next group was the signal transduction genes group (43 genes, 15.9%). The rest of the top three groups were nucleic acid binding genes (17, 6.3%), transporter genes (15, 5.5%), and protein binding genes (12, 4.4%). These made up 56.9% of all the differentially expressed genes. There were also 50 genes of unknown function (18.5%). Therefore it was concluded that differentially expressed genes in gastric cancer seemed to be sporadically distributed across the genome, but most were found on chromosomes 1, 11 and 19. The five groups associated genes abnormality were important genes for further study on gastric cancer.  相似文献   

18.
MOTIVATION: Currently most of the methods for identifying differentially expressed genes fall into the category of so called single-gene-analysis, performing hypothesis testing on a gene-by-gene basis. In a single-gene-analysis approach, estimating the variability of each gene is required to determine whether a gene is differentially expressed or not. Poor accuracy of variability estimation makes it difficult to identify genes with small fold-changes unless a very large number of replicate experiments are performed. RESULTS: We propose a method that can avoid the difficult task of estimating variability for each gene, while reliably identifying a group of differentially expressed genes with low false discovery rates, even when the fold-changes are very small. In this article, a new characterization of differentially expressed genes is established based on a theorem about the distribution of ranks of genes sorted by (log) ratios within each array. This characterization of differentially expressed genes based on rank is an example of all-gene-analysis instead of single gene analysis. We apply the method to a cDNA microarray dataset and many low fold-changed genes (as low as 1.3 fold-changes) are reliably identified without carrying out hypothesis testing on a gene-by-gene basis. The false discovery rate is estimated in two different ways reflecting the variability from all the genes without the complications related to multiple hypothesis testing. We also provide some comparisons between our approach and single-gene-analysis based methods. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

19.
利用IPI和GenBank,收集了与凋亡相关的5 333条基因序列,用一定的标准,整合筛选得到1 384个凋亡相关的基因.通过亚细胞定位、组织表达差异显著性分析、自然正义/反义RNA对预测、基因簇在pathway上的定位、蛋白质/蛋白质相互作用等方面的分析发现,一些基因只在一个组织里显著差异表达,一些基因存在自然正义/反义RNA对现象,一些基因簇同时位于多条pathway等重要信息.同时,制作了一张凋亡相关基因的寡核苷酸芯片,并且对于一对NAIF1表达质粒转染前后的HeLa样品,通过该芯片的筛选,得到24个差异表达的基因.发现NAIF1过表达诱导的细胞凋亡,伴随了PAX2、PDCD8、PDCD10、DFFA、CASP7等基因表达的显著变化,同时还发现,U58668,该mRNA没有任何基因或蛋白质的注释信息,当camptothecin诱导U937细胞系凋亡时上调,在这里的NAIF1过表达诱导的HeLa细胞系中也上调(上述数据结果见http://gpcrome.cbi.pku.edu.cn:2005/chip).  相似文献   

20.
ABSTRACT: BACKGROUND: A common task in analyzing microarray data is to determine which genes are differentially expressed across two (or more) kind of tissue samples or samples submitted under experimental conditions. Several statistical methods have been proposed to accomplish this goal, generally based on measures of distance between classes. It is well known that biological samples are heterogeneous because of factors such as molecular subtypes or genetic background that are often unknown to the experimenter. For instance, in experiments which involve molecular classification of tumors it is important to identify significant subtypes of cancer. Bimodal or multimodal distributions often reflect the presence of subsamples mixtures. Consequently, there can be genes differentially expressed on sample subgroups which are missed if usual statistical approaches are used. In this paper we propose a new graphical tool which not only identifies genes with up and down regulations, but also genes with differential expression in different subclasses, that are usually missed if current statistical methods are used. This tool is based on two measures of distance between samples, namely the overlapping coefficient (OVL) between two densities and the area under the receiver operating characteristic (ROC) curve. The methodology proposed here was implemented in the open-source R software. RESULTS: This method was applied to a publicly available dataset, as well as to a simulated dataset. We compared our results with the ones obtained using some of the standard methods for detecting differentially expressed genes, namely Welch t-statistic, fold change (FC), rank products (RP), average difference (AD), weighted average difference (WAD), moderated t-statistic (modT), intensity-based moderated t-statistic (ibmT), significance analysis of microarrays (samT) and area under the ROC curve (AUC). On both datasets all differentially expressed genes with bimodal or multimodal distributions were not selected by all standard selection procedures. We also compared our results with (i) area between ROC curve and rising area (ABCR) and (ii) the test for not proper ROC curves (TNRC). We found our methodology more comprehensive, because it detects both bimodal and multimodal distributions and different variances can be considered on both samples. Another advantage of our method is that we can analyze graphically the behavior of different kinds of differentially expressed genes. CONCLUSION: Our results indicate that the arrow plot represents a new flexible and useful tool for the analysis of gene expression profiles from microarrays.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号