首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background

Massively parallel cDNA sequencing (RNA-seq) experiments are gradually superseding microarrays in quantitative gene expression profiling. However, many biologists are uncertain about the choice of differentially expressed gene (DEG) analysis methods and the validity of cost-saving sample pooling strategies for their RNA-seq experiments. Hence, we performed experimental validation of DEGs identified by Cuffdiff2, edgeR, DESeq2 and Two-stage Poisson Model (TSPM) in a RNA-seq experiment involving mice amygdalae micro-punches, using high-throughput qPCR on independent biological replicate samples. Moreover, we sequenced RNA-pools and compared their results with sequencing corresponding individual RNA samples.

Results

False-positivity rate of Cuffdiff2 and false-negativity rates of DESeq2 and TSPM were high. Among the four investigated DEG analysis methods, sensitivity and specificity of edgeR was relatively high. We documented the pooling bias and that the DEGs identified in pooled samples suffered low positive predictive values.

Conclusions

Our results highlighted the need for combined use of more sensitive DEG analysis methods and high-throughput validation of identified DEGs in future RNA-seq experiments. They indicated limited utility of sample pooling strategies for RNA-seq in similar setups and supported increasing the number of biological replicate samples.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1767-y) contains supplementary material, which is available to authorized users.  相似文献   

2.
Gene selection methods aim at determining biologically relevant subsets of genes in DNA microarray experiments. However, their assessment and validation represent a major difficulty since the subset of biologically relevant genes is usually unknown. To solve this problem a novel procedure for generating biologically plausible synthetic gene expression data is proposed. It is based on a proper mathematical model representing gene expression signatures and expression profiles through Boolean threshold functions. The results show that the proposed procedure can be successfully adopted to analyze the quality of statistical and machine learning-based gene selection algorithms.  相似文献   

3.
The aim of this study was to determine the effects of ionizing radiation on gene expression by using for a first time a qPCR platform specifically established for the detection of 94 DNA repair genes but also to test the robustness of these results by using three analytical methods (global pattern recognition, ΔΔCq/Normfinder and ΔΔCq/Genorm). Study was focused on these genes because DNA repair is known primarily to determine the radiation response. Six strains of normal human fibroblasts were exposed to 2 Gy, and changes in gene expression were analyzed 24 h thereafter. A significant change in gene expression was found for only few genes, but the genes detected were mostly different for the three analytical methods used. For GPR, a significant change was found for four genes, in contrast to the eight or nine genes when applying ΔΔCq/Genorm or ΔΔCq/Normfinder, respectively. When using all three methods, a significant change in expression was only seen for GADD45A and PCNA. These data demonstrate that (1) the genes identified to show an altered expression upon irradiation strongly depend on the analytical method applied, and that (2) overall GADD45A and PCNA appear to play a central role in this response, while no significant change is induced for any of the other DNA repair genes tested.  相似文献   

4.
Abbreviated purine nucleoside phosphorylase (PNP) genes were engineered to determine the effect of introns on human PNP gene expression. PNP minigenes containing the first intron (complete or shortened from 2.9 kb down to 855 bp), the first two introns or all five PNP introns resulted in substantial human PNP isozyme expression after transient transfection of murine NIH 3T3 cells. Low level human PNP activity was observed after transfection with a PNP minigene containing the last three introns. An intronless PNP minigene construct containing the PNP cDNA fused to genomic flanking sequences resulted in undetectable human PNP activity. Heterogeneous, stable NIH 3T3 transfectants of intron-containing PNP minigenes (verified by Southern analysis), expressed high levels of PNP activity and contained appropriately processed 1.7 kb message visualized by northern analysis. Stable transfectants of the intronless PNP minigene (40-45 copies per haploid genome) contained no detectable human PNP isozyme or mRNA. Insertion of the 855 bp shortened intron 1 sequence in either orientation upstream or downstream of a chimeric PNP promoter-bacterial chloramphenicol acetyltransferase (CAT) gene resulted in a several-fold increase in CAT expression in comparison with the parental PNP-CAT construct. We conclude that human PNP gene expression at the mRNA and protein level is dependent on the presence of intronic sequences and that the level of PNP expression varies directly with the number of introns included. The disproportionately greatest effect of intron 1 can be explained by the presence of an enhancer-like element retained in the shortened 855 bp intron 1 sequence.  相似文献   

5.
Machaon CVE: cluster validation for gene expression data   总被引:2,自引:0,他引:2  
SUMMARY: This paper presents a cluster validation tool for gene expression data. Machaon CVE (Clustering and Validation Environment) system aims to partition samples or genes into groups characterized by similar expression patterns, and to evaluate the quality of the clusters obtained. AVAILABILITY: The program is freely available for non-profit use on request at http://www.cs.tcd.ie/Nadia.Bolshakova/Machaon.html SUPPLEMENTARY INFORMATION: http://www.cs.tcd.ie/Nadia.Bolshakova/Machaon.html  相似文献   

6.
Clustering methods for microarray gene expression data   总被引:1,自引:0,他引:1  
Within the field of genomics, microarray technologies have become a powerful technique for simultaneously monitoring the expression patterns of thousands of genes under different sets of conditions. A main task now is to propose analytical methods to identify groups of genes that manifest similar expression patterns and are activated by similar conditions. The corresponding analysis problem is to cluster multi-condition gene expression data. The purpose of this paper is to present a general view of clustering techniques used in microarray gene expression data analysis.  相似文献   

7.
8.
9.
10.
We outline a high throughput process for the production of bacterial expression clones using automated liquid handlers. The protocol consists of a series of interlinked methods representing liquid manipulations or incubations on various stations of the automation system. The methods employ the ligation-independent cloning approach that enables the simultaneous production of plasmids for different expression systems. The current cloning protocol spans 3 days with a linear throughput of 400 targets per production run. This automated approach enables the production of large numbers of bacterial expression clones and ultimately purified proteins. Although they were developed for structural genomics, these molecular protocols can also be applied in high throughput strategies such as those used for site-specific mutagenesis or protein interaction studies.  相似文献   

11.
This article describes three multivariate projection methods and compares them for their ability to identify clusters of biological samples and genes using real-life data on gene expression levels of leukemia patients. It is shown that principal component analysis (PCA) has the disadvantage that the resulting principal factors are not very informative, while correspondence factor analysis (CFA) has difficulties interpreting distances between objects. Spectral map analysis (SMA) is introduced as an alternative approach to the analysis of microarray data. Weighted SMA outperforms PCA, and is at least as powerful as CFA, in finding clusters in the samples, as well as identifying genes related to these clusters. SMA addresses the problem of data analysis in microarray experiments in a more appropriate manner than CFA, and allows more flexible weighting to the genes and samples. Proper weighting is important, since it enables less reliable data to be down-weighted and more reliable information to be emphasized.  相似文献   

12.
Jasmonate and salicylate as global signals for defense gene expression   总被引:20,自引:0,他引:20  
Remarkably, only a few low molecular mass signals, including jasmonic acid, ethylene and salicylic acid, upregulate the expression of scores of defense-related genes. Using these regulators, the plant fine-tunes its defense gene expression against aggressors which, in some cases, may be able to disrupt or amplify plant defense signal pathways to their own ends.  相似文献   

13.
14.
Although many numerical clustering algorithms have been applied to gene expression dataanalysis,the essential step is still biological interpretation by manual inspection.The correlation betweengenetic co-regulation and affiliation to a common biological process is what biologists expect.Here,weintroduce some clustering algorithms that are based on graph structure constituted by biological knowledge.After applying a widely used dataset,we compared the result clusters of two of these algorithms in terms ofthe homogeneity of clusters and coherence of annotation and matching ratio.The results show that theclusters of knowledge-guided analysis are the kernel parts of the clusters of Gene Ontology (GO)-Clustersoftware,which contains the genes that are most expression correlative and most consistent with biologicalfunctions.Moreover,knowledge-guided analysis seems much more applicable than GO-Cluster in a largerdataset.  相似文献   

15.
16.
17.
MOTIVATION: With the advent of microarray chip technology, large data sets are emerging containing the simultaneous expression levels of thousands of genes at various time points during a biological process. Biologists are attempting to group genes based on the temporal pattern of their expression levels. While the use of hierarchical clustering (UPGMA) with correlation 'distance' has been the most common in the microarray studies, there are many more choices of clustering algorithms in pattern recognition and statistics literature. At the moment there do not seem to be any clear-cut guidelines regarding the choice of a clustering algorithm to be used for grouping genes based on their expression profiles. RESULTS: In this paper, we consider six clustering algorithms (of various flavors!) and evaluate their performances on a well-known publicly available microarray data set on sporulation of budding yeast and on two simulated data sets. Among other things, we formulate three reasonable validation strategies that can be used with any clustering algorithm when temporal observations or replications are present. We evaluate each of these six clustering methods with these validation measures. While the 'best' method is dependent on the exact validation strategy and the number of clusters to be used, overall Diana appears to be a solid performer. Interestingly, the performance of correlation-based hierarchical clustering and model-based clustering (another method that has been advocated by a number of researchers) appear to be on opposite extremes, depending on what validation measure one employs. Next it is shown that the group means produced by Diana are the closest and those produced by UPGMA are the farthest from a model profile based on a set of hand-picked genes. Availability: S+ codes for the partial least squares based clustering are available from the authors upon request. All other clustering methods considered have S+ implementation in the library MASS. S+ codes for calculating the validation measures are available from the authors upon request. The sporulation data set is publicly available at http://cmgm.stanford.edu/pbrown/sporulation  相似文献   

18.
《Genomics》2020,112(3):2459-2466
The chloroplast genome (CPG) is a powerful tool for phylogenetic studies. Many CPGs have been determined using NGS. However, the large nuclear-genome and difficult CPG-DNA separation in conifers limit their application in related research. In this study, three methods (PCR + Sanger, PCR + HiSeq, cpDNA+HiSeq) for obtaining the CPGs of Pinus massoniana were compared for sequence accuracy, time and cost. PCR + Sanger obtained the most accurate CPGs with advantages in cost (3.08$/kb) and time (2–3 days); PCR + HiSeq generated some DNA fragments with low depth, and the SNPs false-positive-rate (0.44) and sequencing error-rate (0.0265) of this method were higher than those of the cpDNA+HiSeq. Moreover, the cost (~6.17$/kb) and time (4–5 weeks) would significantly increase when HiSeq sequencing were outsourced to sequencing service company. Thus, for the study of intraspecific and interspecies variation in CPGs, CPG sequences can be obtained by comprehensive methods to bridge the method shortcomings. Scuh as sequence accuracy, cost and time.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号