首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
3.
4.
With the fast development of high-throughput sequencing technologies, a new generation of genome-wide gene expression measurements is under way. This is based on mRNA sequencing (RNA-seq), which complements the already mature technology of microarrays, and is expected to overcome some of the latter’s disadvantages. These RNA-seq data pose new challenges, however, as strengths and weaknesses have yet to be fully identified. Ideally, Next (or Second) Generation Sequencing measures can be integrated for more comprehensive gene expression investigation to facilitate analysis of whole regulatory networks. At present, however, the nature of these data is not very well understood. In this paper we study three alternative gene expression time series datasets for the Drosophila melanogaster embryo development, in order to compare three measurement techniques: RNA-seq, single-channel and dual-channel microarrays. The aim is to study the state of the art for the three technologies, with a view of assessing overlapping features, data compatibility and integration potential, in the context of time series measurements. This involves using established tools for each of the three different technologies, and technical and biological replicates (for RNA-seq and microarrays, respectively), due to the limited availability of biological RNA-seq replicates for time series data. The approach consists of a sensitivity analysis for differential expression and clustering. In general, the RNA-seq dataset displayed highest sensitivity to differential expression. The single-channel data performed similarly for the differentially expressed genes common to gene sets considered. Cluster analysis was used to identify different features of the gene space for the three datasets, with higher similarities found for the RNA-seq and single-channel microarray dataset.  相似文献   

5.
6.
7.
Rapidly growing public gene expression databases contain a wealth of data for building an unprecedentedly detailed picture of human biology and disease. This data comes from many diverse measurement platforms that make integrating it all difficult. Although RNA-sequencing (RNA-seq) is attracting the most attention, at present, the rate of new microarray studies submitted to public databases far exceeds the rate of new RNA-seq studies. There is clearly a need for methods that make it easier to combine data from different technologies. In this paper, we propose a new method for processing RNA-seq data that yields gene expression estimates that are much more similar to corresponding estimates from microarray data, hence greatly improving cross-platform comparability. The method we call PREBS is based on estimating the expression from RNA-seq reads overlapping the microarray probe regions, and processing these estimates with standard microarray summarisation algorithms. Using paired microarray and RNA-seq samples from TCGA LAML data set we show that PREBS expression estimates derived from RNA-seq are more similar to microarray-based expression estimates than those from other RNA-seq processing methods. In an experiment to retrieve paired microarray samples from a database using an RNA-seq query sample, gene signatures defined based on PREBS expression estimates were found to be much more accurate than those from other methods. PREBS also allows new ways of using RNA-seq data, such as expression estimation for microarray probe sets. An implementation of the proposed method is available in the Bioconductor package “prebs.”  相似文献   

8.
Experimental variance is a major challenge when dealing with high-throughput sequencing data. This variance has several sources: sampling replication, technical replication, variability within biological conditions, and variability between biological conditions. The high per-sample cost of RNA-Seq often precludes the large number of experiments needed to partition observed variance into these categories as per standard ANOVA models. We show that the partitioning of within-condition to between-condition variation cannot reasonably be ignored, whether in single-organism RNA-Seq or in Meta-RNA-Seq experiments, and further find that commonly-used RNA-Seq analysis tools, as described in the literature, do not enforce the constraint that the sum of relative expression levels must be one, and thus report expression levels that are systematically distorted. These two factors lead to misleading inferences if not properly accommodated. As it is usually only the biological between-condition and within-condition differences that are of interest, we developed ALDEx, an ANOVA-like differential expression procedure, to identify genes with greater between- to within-condition differences. We show that the presence of differential expression and the magnitude of these comparative differences can be reasonably estimated with even very small sample sizes.  相似文献   

9.
10.
One of the most useful features of molecular phylogenetic analyses is the potential for estimating dates of divergence of evolutionary lineages from the DNA of extant species. But lineage-specific variation in rate of molecular evolution complicates molecular dating, because a calibration rate estimated from one lineage may not be an accurate representation of the rate in other lineages. Many molecular dating studies use a ``clock test' to identify and exclude sequences that vary in rate between lineages. However, these clock tests should not be relied upon without a critical examination of their effectiveness at removing rate variable sequences from any given data set, particularly with regard to the sequence length and number of variable sites. As an illustration of this problem we present a power test of a frequently employed triplet relative rates test. We conclude that (1) relative rates tests are unlikely to detect moderate levels of lineage-specific rate variation (where one lineage has a rate of molecular evolution 1.5 to 4.0 times the other) for most commonly used sequences in molecular dating analyses, and (2) this lack of power is likely to result in substantial error in the estimation of dates of divergence. As an example, we show that the well-studied rate difference between murid rodents and great apes will not be detected for many of the sequences used to date the divergence between these two lineages and that this failure to detect rate variation is likely to result in consistent overestimation the date of the rodent–primate split. Received: 9 June 1999 / Accepted: 22 October 1999  相似文献   

11.
12.
13.

Thanks to advances in high-throughput sequencing technologies, the importance of microbiome to human health and disease has been increasingly recognized. Analyzing microbiome data from sequencing experiments is challenging due to their unique features such as compositional data, excessive zero observations, overdispersion, and complex relations among microbial taxa. Clustered microbiome data have become prevalent in recent years from designs such as longitudinal studies, family studies, and matched case–control studies. The within-cluster dependence compounds the challenge of the microbiome data analysis. Methods that properly accommodate intra-cluster correlation and features of the microbiome data are needed. We develop robust and powerful differential composition tests for clustered microbiome data. The methods do not rely on any distributional assumptions on the microbial compositions, which provides flexibility to model various correlation structures among taxa and among samples within a cluster. By leveraging the adjusted sandwich covariance estimate, the methods properly accommodate sample dependence within a cluster. The two-part version of the test can further improve power in the presence of excessive zero observations. Different types of confounding variables can be easily adjusted for in the methods. We perform extensive simulation studies under commonly adopted clustered data designs to evaluate the methods. We demonstrate that the methods properly control the type I error under all designs and are more powerful than existing methods in many scenarios. The usefulness of the proposed methods is further demonstrated with two real datasets from longitudinal microbiome studies on pregnant women and inflammatory bowel disease patients. The methods have been incorporated into the R package “miLineage” publicly available at https://tangzheng1.github.io/tanglab/software.html.

  相似文献   

14.
15.
The Qira black sheep and the Hetian sheep are two local breeds in the Northwest of China, which are characterized by high-fecundity and low-fecundity breed respectively. The elucidation of mRNA expression profiles in the ovaries among different sheep breeds representing fecundity extremes will helpful for identification and utilization of major prolificacy genes in sheep. In the present study, we performed RNA-seq technology to compare the difference in ovarian mRNA expression profiles between Qira black sheep and Hetian sheep. From the Qira black sheep and the Hetian sheep libraries, we obtained a total of 11,747,582 and 11,879,968 sequencing reads, respectively. After aligning to the reference sequences, the two libraries included 16,763 and 16,814 genes respectively. A total of 1,252 genes were significantly differentially expressed at Hetian sheep compared with Qira black sheep. Eight differentially expressed genes were randomly selected for validation by real-time RT-PCR. This study provides a basic data for future research of the sheep reproduction.  相似文献   

16.
This paper considers the exact distribution of the X2 index of dispersion and -2 log (likelihood ratio) tests for the hypothesis of homogeneity of c independent samples from a common binomial population. The exact significance levels and power of these tests under ‘logit’ alternatives are compared numerically for the cases: c = 3, 4, 5 and various sample sizes. n1 = 5,10 for i = 1,…, c.  相似文献   

17.
It is crucial for researchers to optimize RNA-seq experimental designs for differential expression detection. Currently, the field lacks general methods to estimate power and sample size for RNA-Seq in complex experimental designs, under the assumption of the negative binomial distribution. We simulate RNA-Seq count data based on parameters estimated from six widely different public data sets (including cell line comparison, tissue comparison, and cancer data sets) and calculate the statistical power in paired and unpaired sample experiments. We comprehensively compare five differential expression analysis packages (DESeq, edgeR, DESeq2, sSeq, and EBSeq) and evaluate their performance by power, receiver operator characteristic (ROC) curves, and other metrics including areas under the curve (AUC), Matthews correlation coefficient (MCC), and F-measures. DESeq2 and edgeR tend to give the best performance in general. Increasing sample size or sequencing depth increases power; however, increasing sample size is more potent than sequencing depth to increase power, especially when the sequencing depth reaches 20 million reads. Long intergenic noncoding RNAs (lincRNA) yields lower power relative to the protein coding mRNAs, given their lower expression level in the same RNA-Seq experiment. On the other hand, paired-sample RNA-Seq significantly enhances the statistical power, confirming the importance of considering the multifactor experimental design. Finally, a local optimal power is achievable for a given budget constraint, and the dominant contributing factor is sample size rather than the sequencing depth. In conclusion, we provide a power analysis tool (http://www2.hawaii.edu/~lgarmire/RNASeqPowerCalculator.htm) that captures the dispersion in the data and can serve as a practical reference under the budget constraint of RNA-Seq experiments.  相似文献   

18.
19.
Summary .  Time course microarray data consist of mRNA expression from a common set of genes collected at different time points. Such data are thought to reflect underlying biological processes developing over time. In this article, we propose a model that allows us to examine differential expression and gene network relationships using time course microarray data. We model each gene-expression profile as a random functional transformation of the scale, amplitude, and phase of a common curve. Inferences about the gene-specific amplitude parameters allow us to examine differential gene expression. Inferences about measures of functional similarity based on estimated time-transformation functions allow us to examine gene networks while accounting for features of the gene-expression profiles. We discuss applications to simulated data as well as to microarray data on prostate cancer progression.  相似文献   

20.
The least squares calculation of the best values of the parameters of the Moffitt equation and of the Drude equation is examined. It is proved that the least squares evaluation of all three parameters of the Moffitt equation becomes indeterminate as the bo term approaches zero. Estimates of low helical content based on the Moffitt relationship are therefore also indeterminate and of dubious value. Both the size of bo and the range of wavelengths chosen affect the standard deviations of the parameters. The magnitude of the effects is illustrated by selected examples. The computer program OPTROT is available for evaluating the extent to which data may be correlated by the equations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号