期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Statistical methods on detecting differentially expressed genes for RNA-seq data

Z Chen J Liu HK Ng S Nadarajah HL Kaufman JY Yang Y Deng 《BMC systems biology》2011,5(Z3):S1

相似文献

2.

Identification of cancer subtypes from single-cell RNA-seq data using a consensus clustering method

Yanglan Gan Ning Li Guobing Zou Yongchang Xin Jihong Guan 《BMC medical genomics》2018,11(6):117

Background

Human cancers are complex ecosystems composed of cells with distinct molecular signatures. Such intratumoral heterogeneity poses a major challenge to cancer diagnosis and treatment. Recent advancements of single-cell techniques such as scRNA-seq have brought unprecedented insights into cellular heterogeneity. Subsequently, a challenging computational problem is to cluster high dimensional noisy datasets with substantially fewer cells than the number of genes.

Methods

In this paper, we introduced a consensus clustering framework conCluster, for cancer subtype identification from single-cell RNA-seq data. Using an ensemble strategy, conCluster fuses multiple basic partitions to consensus clusters.

Results

Applied to real cancer scRNA-seq datasets, conCluster can more accurately detect cancer subtypes than the widely used scRNA-seq clustering methods. Further, we conducted co-expression network analysis for the identified melanoma subtypes.

Conclusions

Our analysis demonstrates that these subtypes exhibit distinct gene co-expression networks and significant gene sets with different functional enrichment.

相似文献

3.

A comparison of statistical methods for detecting differentially expressed genes from RNA-seq data

Kvam VM Liu P Si Y 《American journal of botany》2012,99(2):248-256

RNA-Seq technologies are quickly revolutionizing genomic studies, and statistical methods for RNA-seq data are under continuous development. Timely review and comparison of the most recently proposed statistical methods will provide a useful guide for choosing among them for data analysis. Particular interest surrounds the ability to detect differential expression (DE) in genes. Here we compare four recently proposed statistical methods, edgeR, DESeq, baySeq, and a method with a two-stage Poisson model (TSPM), through a variety of simulations that were based on different distribution models or real data. We compared the ability of these methods to detect DE genes in terms of the significance ranking of genes and false discovery rate control. All methods compared are implemented in freely available software. We also discuss the availability and functions of the currently available versions of these software. 相似文献

4.

Ranking analysis of microarray data: a powerful method for identifying differentially expressed genes

Tan YD Fornage M Fu YX 《Genomics》2006,88(6):846-854

Microarray technology provides a powerful tool for the expression profile of thousands of genes simultaneously, which makes it possible to explore the molecular and metabolic etiology of the development of a complex disease under study. However, classical statistical methods and technologies fail to be applicable to microarray data. Therefore, it is necessary and motivating to develop powerful methods for large-scale statistical analyses. In this paper, we described a novel method, called Ranking Analysis of Microarray Data (RAM). RAM, which is a large-scale two-sample t-test method, is based on comparisons between a set of ranked T statistics and a set of ranked Z values (a set of ranked estimated null scores) yielded by a "randomly splitting" approach instead of a "permutation" approach and a two-simulation strategy for estimating the proportion of genes identified by chance, i.e., the false discovery rate (FDR). The results obtained from the simulated and observed microarray data show that RAM is more efficient in identification of genes differentially expressed and estimation of FDR under undesirable conditions such as a large fudge factor, small sample size, or mixture distribution of noises than Significance Analysis of Microarrays. 相似文献

5.

Venn Mapping: clustering of heterologous microarray data based on the number of co-occurring differentially expressed genes

Smid M Dorssers LC Jenster G 《Bioinformatics (Oxford, England)》2003,19(16):2065-2071

MOTIVATION: To evaluate microarray data, clustering is widely used to group biological samples or genes. However, problems arise when comparing heterologous databases. As the clustering algorithm searches for similarities between experiments, it will most likely first separate the data sets, masking relationships that exist between samples from different databases. RESULTS: We developed a program, Venn Mapper, to calculate the statistical significance of the number of co-occurring differentially expressed genes in any of the two experiments. For proof of principle, we analysed a heterologous data set of 170 microarrays including breast and prostate cancer microarray analyses. Significant overlap was found in an unsupervised analysis between metastasized prostate cancer and metastasized breast cancer and BRCA mutated breast cancer. A comparison between single microarray data and the averaged breast and prostate data sets was also evaluated. This analysis suggests that genes expressed higher in stromal cells are also implicated in metastatic prostate cancer and BRCA mutated breast cancer. The Venn Mapper program identifies overlaps between samples from heterologous data sets and directly extracts the genes responsible for the overlap. From this information novel biological hypotheses may be addressed. AVAILABILITY: Venn Mapper is freely available on http://www.erasmusmc.nl/gatcplatform. SUPPLEMENTARY INFORMATION: http://www.erasmusmc.nl/gatcplatform/vennmapper.html. 相似文献

6.

Adaptive thresholds to detect differentially expressed genes in microarray data

Fukuoka Y Inaoka H Noshiro M 《Bioinformation》2011,7(1):33-37

To detect changes in gene expression data from microarrays, a fixed threshold for fold difference is used widely. However, it is not always guaranteed that a threshold value which is appropriate for highly expressed genes is suitable for lowly expressed genes. In this study, aiming at detecting truly differentially expressed genes from a wide expression range, we proposed an adaptive threshold method (AT). The adaptive thresholds, which have different values for different expression levels, are calculated based on two measurements under the same condition. The sensitivity, specificity and false discovery rate (FDR) of AT were investigated by simulations. The sensitivity and specificity under various noise conditions were greater than 89.7% and 99.32%, respectively. The FDR was smaller than 0.27. These results demonstrated the reliability of the method. 相似文献

7.

Identifying differentially expressed genes in meta-analysis via Bayesian model-based clustering

Jung YY Oh MS Shin DW Kang SH Oh HS 《Biometrical journal. Biometrische Zeitschrift》2006,48(3):435-450

A Bayesian model-based clustering approach is proposed for identifying differentially expressed genes in meta-analysis. A Bayesian hierarchical model is used as a scientific tool for combining information from different studies, and a mixture prior is used to separate differentially expressed genes from non-differentially expressed genes. Posterior estimation of the parameters and missing observations are done by using a simple Markov chain Monte Carlo method. From the estimated mixture model, useful measure of significance of a test such as the Bayesian false discovery rate (FDR), the local FDR (Efron et al., 2001), and the integration-driven discovery rate (IDR; Choi et al., 2003) can be easily computed. The model-based approach is also compared with commonly used permutation methods, and it is shown that the model-based approach is superior to the permutation methods when there are excessive under-expressed genes compared to over-expressed genes or vice versa. The proposed method is applied to four publicly available prostate cancer gene expression data sets and simulated data sets. 相似文献

8.

Microarray data analysis: a practical approach for selecting differentially expressed genes

下载免费PDF全文

David M Mutch Alvin Berger Robert Mansourian Andreas Rytz Matthew-Alan Roberts 《Genome biology》2001,2(12):preprint00-29

Background

The biomedical community is rapidly developing new methods of data analysis for microarray experiments, with the goal of establishing new standards to objectively process the massive datasets produced from functional genomic experiments. Each microarray experiment measures thousands of genes simultaneously producing an unprecedented amount of biological information across increasingly numerous experiments; however, in general, only a very small percentage of the genes present on any given array are identified as differentially regulated. The challenge then is to process this information objectively and efficiently in order to obtain knowledge of the biological system under study and by which to compare information gained across multiple experiments. In this context, systematic and objective mathematical approaches, which are simple to apply across a large number of experimental designs, become fundamental to correctly handle the mass of data and to understand the true complexity of the biological systems under study. 相似文献

9.

An efficient method to identify differentially expressed genes in microarray experiments

Qin H Feng T Harding SA Tsai CJ Zhang S 《Bioinformatics (Oxford, England)》2008,24(14):1583-1589

相似文献

10.

A weighted average difference method for detecting differentially expressed genes from microarray data

Koji Kadota Yuji Nakai Kentaro Shimizu 《Algorithms for molecular biology : AMB》2008,3(1):1-12

相似文献

11.

Nonparametric methods for identifying differentially expressed genes in microarray data 总被引：11，自引：0，他引：11

Troyanskaya OG Garber ME Brown PO Botstein D Altman RB 《Bioinformatics (Oxford, England)》2002,18(11):1454-1461

MOTIVATION: Gene expression experiments provide a fast and systematic way to identify disease markers relevant to clinical care. In this study, we address the problem of robust identification of differentially expressed genes from microarray data. Differentially expressed genes, or discriminator genes, are genes with significantly different expression in two user-defined groups of microarray experiments. We compare three model-free approaches: (1). nonparametric t-test, (2). Wilcoxon (or Mann-Whitney) rank sum test, and (3). a heuristic method based on high Pearson correlation to a perfectly differentiating gene ('ideal discriminator method'). We systematically assess the performance of each method based on simulated and biological data under varying noise levels and p-value cutoffs. RESULTS: All methods exhibit very low false positive rates and identify a large fraction of the differentially expressed genes in simulated data sets with noise level similar to that of actual data. Overall, the rank sum test appears most conservative, which may be advantageous when the computationally identified genes need to be tested biologically. However, if a more inclusive list of markers is desired, a higher p-value cutoff or the nonparametric t-test may be appropriate. When applied to data from lung tumor and lymphoma data sets, the methods identify biologically relevant differentially expressed genes that allow clear separation of groups in question. Thus the methods described and evaluated here provide a convenient and robust way to identify differentially expressed genes for further biological and clinical analysis. 相似文献

12.

A mixture model approach to detecting differentially expressed genes with microarray data 总被引：4，自引：0，他引：4

Pan W Lin J Le CT 《Functional & integrative genomics》2003,3(3):117-124

An exciting biological advancement over the past few years is the use of microarray technologies to measure simultaneously the expression levels of thousands of genes. The bottleneck now is how to extract useful information from the resulting large amounts of data. An important and common task in analyzing microarray data is to identify genes with altered expression under two experimental conditions. We propose a nonparametric statistical approach, called the mixture model method (MMM), to handle the problem when there are a small number of replicates under each experimental condition. Specifically, we propose estimating the distributions of a t -type test statistic and its null statistic using finite normal mixture models. A comparison of these two distributions by means of a likelihood ratio test, or simply using the tail distribution of the null statistic, can identify genes with significantly changed expression. Several methods are proposed to effectively control the false positives. The methodology is applied to a data set containing expression levels of 1,176 genes of rats with and without pneumococcal middle ear infection. 相似文献

13.

AffyMiner: mining differentially expressed genes and biological knowledge in GeneChip microarray data

Lu G Nguyen TV Xia Y Fromm M 《BMC bioinformatics》2006,7(Z4):S26

相似文献

14.

Positional clustering of differentially expressed genes on human chromosomes 20, 21 and 22

Mégy K Audic S Claverie JM 《Genome biology》2003,4(2):P1

Background

Clusters of genes co-expressed are known in prokaryotes (operons) and were recently described in several eukaryote organisms, including Human. According to some studies, these clusters consist of housekeeping genes, whereas other studies suggest that these clustered genes exhibit similar tissue specificity. Here we further explore the relationship between co-expression and chromosomal co-localization in the human genome by analyzing the expression status of the genes along the best-annotated chromosomes 20, 21 and 22. 相似文献

15.

Radiation hybrid mapping of 70 rat genes from a data set of differentially expressed genes

Caroline A. Wallace Saira Ali Anne M. Glazier Penny J. Norsworthy Danilo C. Carlos James Scott Tom C. Freeman Lawrence W. Stanton Anne E. Kwitek Timothy J. Aitman 《Mammalian genome》2002,13(4):194-197

The spontaneously hypertensive rat (SHR) is a model of human essential hypertension. Increased blood pressure in SHR is associated with other risk factors associated with cardiovascular disease, including insulin resistance and dyslipidemia. DNA microarray studies identified over 200 differentially expressed genes and ESTs between SHR and normotensive control rats. These clones represent candidate genes that may underlie previously detected QTLs in SHR. This study made use of the publication of two whole-genome maps to identify positional QTL candidates. Radiation hybrid (RH) mapping was used to determine the chromosomal locations of 70 rat genes and ESTs from this dataset. Most of the locations are novel, but in five cases we identified a definitive map location for genes previously mapped by somatic cell hybrids and/or linkage analysis. Genes for which the mouse genome map location was already determined mapped to syntenic segments in the rat genome map, except for two rat genes whose map locations confirmed previous findings. Where synteny comparisons could be made only with the human, 74% of the genes mapped in this study lay in a conserved syntenic segment. Chromosomal localisation of these mouse and human orthologs to syntenic segments produces a high level of confidence in the data presented in this study. The data provide new map locations for rat genes and will aid efforts to advance the rat genome map. The data may also be used to prioritize candidate QTL genes in SHR and other rat strains on the basis of their map location. 相似文献

16.

CIT: identification of differentially expressed clusters of genes from microarray data 总被引：3，自引：0，他引：3

Rhodes DR Miller JC Haab BB Furge KA 《Bioinformatics (Oxford, England)》2002,18(1):205-206

Cluster Identification Tool (CIT) is a microarray analysis program that identifies differentially expressed genes. Following division of experimental samples based on a parameter of interest, CIT uses a statistical discrimination metric and permutation analysis to identify clusters of genes or individual genes that best differentiate between the experimental groups. CIT integrates with the freely available CLUSTER and TREEVIEW programs to form a more complete microarray analysis package. 相似文献

17.

SDImpute: A statistical block imputation method based on cell-level and gene-level information for dropouts in single-cell RNA-seq data

Jing Qi Yang Zhou Zicen Zhao Shuilin Jin 《PLoS computational biology》2021,17(6)

The single-cell RNA sequencing (scRNA-seq) technologies obtain gene expression at single-cell resolution and provide a tool for exploring cell heterogeneity and cell types. As the low amount of extracted mRNA copies per cell, scRNA-seq data exhibit a large number of dropouts, which hinders the downstream analysis of the scRNA-seq data. We propose a statistical method, SDImpute (Single-cell RNA-seq Dropout Imputation), to implement block imputation for dropout events in scRNA-seq data. SDImpute automatically identifies the dropout events based on the gene expression levels and the variations of gene expression across similar cells and similar genes, and it implements block imputation for dropouts by utilizing gene expression unaffected by dropouts from similar cells. In the experiments, the results of the simulated datasets and real datasets suggest that SDImpute is an effective tool to recover the data and preserve the heterogeneity of gene expression across cells. Compared with the state-of-the-art imputation methods, SDImpute improves the accuracy of the downstream analysis including clustering, visualization, and differential expression analysis. 相似文献

18.

Comparative transcriptomic analysis to identify differentially expressed genes in fat tissue of adult Berkshire and Jeju Native Pig using RNA-seq 总被引：1，自引：0，他引：1

Simrinder Singh Sodhi Won Cheoul Park Mrinmoy Ghosh Jin Nam Kim Neelesh Sharma Kwang Yun Shin In Cheol Cho Youn Chul Ryu Sung Jong Oh Sung Hoon Kim Ki-Duk Song Sang Pyo Hong Seo Ae Cho Hee Bal Kim Dong Kee Jeong 《Molecular biology reports》2014,41(9):6305-6315

相似文献

19.

A generalized likelihood ratio test to identify differentially expressed genes from microarray data

Wang S Ethier S 《Bioinformatics (Oxford, England)》2004,20(1):100-104

MOTIVATION: Microarray technology emerges as a powerful tool in life science. One major application of microarray technology is to identify differentially expressed genes under various conditions. Currently, the statistical methods to analyze microarray data are generally unsatisfactory, mainly due to the lack of understanding of the distribution and error structure of microarray data. RESULTS: We develop a generalized likelihood ratio (GLR) test based on the two-component model proposed by Rocke and Durbin to identify differentially expressed genes from microarray data. Simulation studies show that the GLR test is more powerful than commonly used methods, like the fold-change method and the two-sample t-test. When applied to microarray data, the GLR test identifies more differentially expressed genes than the t-test, has a lower false discovery rate and shows more consistency over independently repeated experiments. AVAILABILITY: The approach is implemented in software called GLR, which is freely available for downloading at http://www.cc.utah.edu/~jw27c60 相似文献

20.

Principal components analysis based methodology to identify differentially expressed genes in time-course microarray data

Sudhakar Jonnalagadda Rajagopalan Srinivasan 《BMC bioinformatics》2008,9(1):267

Background

Time-course microarray experiments are being increasingly used to characterize dynamic biological processes. In these experiments, the goal is to identify genes differentially expressed in time-course data, measured between different biological conditions. These differentially expressed genes can reveal the changes in biological process due to the change in condition which is essential to understand differences in dynamics. 相似文献