首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
With the fast development of high-throughput sequencing technologies, a new generation of genome-wide gene expression measurements is under way. This is based on mRNA sequencing (RNA-seq), which complements the already mature technology of microarrays, and is expected to overcome some of the latter’s disadvantages. These RNA-seq data pose new challenges, however, as strengths and weaknesses have yet to be fully identified. Ideally, Next (or Second) Generation Sequencing measures can be integrated for more comprehensive gene expression investigation to facilitate analysis of whole regulatory networks. At present, however, the nature of these data is not very well understood. In this paper we study three alternative gene expression time series datasets for the Drosophila melanogaster embryo development, in order to compare three measurement techniques: RNA-seq, single-channel and dual-channel microarrays. The aim is to study the state of the art for the three technologies, with a view of assessing overlapping features, data compatibility and integration potential, in the context of time series measurements. This involves using established tools for each of the three different technologies, and technical and biological replicates (for RNA-seq and microarrays, respectively), due to the limited availability of biological RNA-seq replicates for time series data. The approach consists of a sensitivity analysis for differential expression and clustering. In general, the RNA-seq dataset displayed highest sensitivity to differential expression. The single-channel data performed similarly for the differentially expressed genes common to gene sets considered. Cluster analysis was used to identify different features of the gene space for the three datasets, with higher similarities found for the RNA-seq and single-channel microarray dataset.  相似文献   

3.
Falin LJ  Tyler BM 《PloS one》2011,6(7):e22071
The widespread use of high-throughput experimental assays designed to measure the entire complement of a cell's genes or gene products has led to vast stores of data that are extremely plentiful in terms of the number of items they can measure in a single sample, yet often sparse in the number of samples per experiment due to their high cost. This often leads to datasets where the number of treatment levels or time points sampled is limited, or where there are very small numbers of technical and/or biological replicates. Here we introduce a novel algorithm to quantify the uncertainty in the unmeasured intervals between biological measurements taken across a set of quantitative treatments. The algorithm provides a probabilistic distribution of possible gene expression values within unmeasured intervals, based on a plausible biological constraint. We show how quantification of this uncertainty can be used to guide researchers in further data collection by identifying which samples would likely add the most information to the system under study. Although the context for developing the algorithm was gene expression measurements taken over a time series, the approach can be readily applied to any set of quantitative systems biology measurements taken following quantitative (i.e. non-categorical) treatments. In principle, the method could also be applied to combinations of treatments, in which case it could greatly simplify the task of exploring the large combinatorial space of future possible measurements.  相似文献   

4.
Testing for differentially expressed genes with microarray data   总被引:1,自引:1,他引:0       下载免费PDF全文
This paper compares the type I error and power of the one- and two-sample t-tests, and the one- and two-sample permutation tests for detecting differences in gene expression between two microarray samples with replicates using Monte Carlo simulations. When data are generated from a normal distribution, type I errors and powers of the one-sample parametric t-test and one-sample permutation test are very close, as are the two-sample t-test and two-sample permutation test, provided that the number of replicates is adequate. When data are generated from a t-distribution, the permutation tests outperform the corresponding parametric tests if the number of replicates is at least five. For data from a two-color dye swap experiment, the one-sample test appears to perform better than the two-sample test since expression measurements for control and treatment samples from the same spot are correlated. For data from independent samples, such as the one-channel array or two-channel array experiment using reference design, the two-sample t-tests appear more powerful than the one-sample t-tests.  相似文献   

5.
The effect of replication on gene expression microarray experiments   总被引:5,自引:0,他引:5  
MOTIVATION: We examine the effect of replication on the detection of apparently differentially expressed genes in gene expression microarray experiments. Our analysis is based on a random sampling approach using real data sets from 16 published studies. We consider both the ability to find genes that meet particular statistical criteria as well as the stability of the results in the face of changing levels of replication. RESULTS: While dependent on the data source, our findings suggest that stable results are typically not obtained until at least five biological replicates have been used. Conversely, for most studies, 10-15 replicates yield results that are quite stable, and there is less improvement in stability as the number of replicates is further increased. Our methods will be of use in evaluating existing data sets and in helping to design new studies.  相似文献   

6.
The importance of variance modelling is now widely known for the analysis of microarray data. In particular the power and accuracy of statistical tests for differential gene expressions are highly dependent on variance modelling. The aim of this paper is to use a structural model on the variances, which includes a condition effect and a random gene effect, and to propose a simple estimation procedure for these parameters by working on the empirical variances. The proposed variance model was compared with various methods on both real and simulated data. It proved to be more powerful than the gene-by-gene analysis and more robust to the number of false positives than the homogeneous variance model. It performed well compared with recently proposed approaches such as SAM and VarMixt even for a small number of replicates, and performed similarly to Limma. The main advantage of the structural model is that, thanks to the use of a linear mixed model on the logarithm of the variances, various factors of variation can easily be incorporated in the model, which is not the case for previously proposed empirical Bayes methods. It is also very fast to compute and is adapted to the comparison of more than two conditions.  相似文献   

7.
We have developed an episomal replicating expression vector in which the SV40 gene coding for the large T-antigen was replaced by chromosomal scaffold/matrix attached regions. Southern analysis as well as vector rescue experiments in CHO cells and in Escherichia coli demonstrate that the vector replicates episomally in CHO cells. It occurs in a very low copy number in the cells and is stably maintained over more than 100 generations without selection pressure.  相似文献   

8.
Optimal experimental design is important for the efficient use of modern highthroughput technologies such as microarrays and proteomics. Multiple factors including the reliability of measurement system, which itself must be estimated from prior experimental work, could influence design decisions. In this study, we describe how the optimal number of replicate measures (technical replicates) for each biological sample (biological replicate) can be determined. Different allocations of biological and technical replicates were evaluated by minimizing the variance of the ratio of technical variance (measurement error) to the total variance (sum of sampling error and measurement error). We demonstrate that if the number of biological replicates and the number of technical replicates per biological sample are variable, while the total number of available measures is fixed, then the optimal allocation of replicates for measurement evaluation experiments requires two technical replicates for each biological replicate. Therefore, it is recommended to use two technical replicates for each biological replicate if the goal is to evaluate the reproducibility of measurements.  相似文献   

9.
10.
X Liu  K Y Liang 《Biometrics》1992,48(2):645-654
Ignoring measurement error may cause bias in the estimation of regression parameters. When the true covariates are unobservable, multiple imprecise measurements can be used in the analysis to correct for the associated bias. We suggest a simple estimating procedure that gives consistent estimates of regression parameters by using the repeated measurements with error. The relative Pitman efficiency of our estimator based on models with and without measurement error has been found to be a simple function of the number of replicates and the ratio of intra- to inter-variance of the true covariate. The procedure thus provides a guide for deciding the number of repeated measurements in the design stage. An example from a survey study is presented.  相似文献   

11.
Applying appropriate error models and conservative estimates to microarray data helps to reduce the number of false predictions and allows one to focus on biologically relevant observations. Several key conclusions have been drawn from the statistical analysis of global gene expression data: it is worth keeping core information for each experiment, including raw and processed data; biological and technical replicates are needed; careful experimental design makes the analysis simpler and more powerful; the choice of the similarity measure is nontrivial and depends on the goal of an experiment; array information must be complemented with other data; and gene expression studies are 'hypothesis generators'.  相似文献   

12.
Protein phosphorylation plays a central role in many signal transduction pathways that mediate biological processes. Novel quantitative mass spectrometry-based methods have recently revealed phosphorylation dynamics in animals, yeast, and plants. These methods are important for our understanding of how differential phosphorylation participates in translating distinct signals into proper physiological responses, and shifted research towards screening for potential cancer therapies and in-depth analysis of phosphoproteomes. In this review, we aim to describe current progress in quantitative phosphoproteomics. This emerging field has changed numerous static pathways into dynamic signaling networks, and revealed protein kinase networks that underlie adaptation to environmental stimuli. Mass spectrometry enables high-throughput and high-quality analysis of differential phosphorylation at a site-specific level. Although determination of differential phosphorylation between treatments is analogous to detecting differential gene expression, the large body of statistical techniques that has been developed for analysis of differential gene expression is not generally applied for detecting differential phosphorylation. We suggest possible improvements for analysis of quantitative phosphorylation by increasing the number of biological replicates and adapting statistical tests used for gene expression profiling and widely implemented in freely available software tools.  相似文献   

13.

Oregon‐R, +3, and crossbred strains of Drosophila melanogaster were tested for their response to selection for abdominal bristle number. Various subsidiary tests, consisting of heritability estimations, testing for lethal second and third chromosomes, and chromosome assays were conducted on the selection replicates, which had undergone 14 generations of selection. Evidence showed that a plateau which occurred very early in the +3 high selection replicates was due to fixation of a few additive genes with large effects, thus accounting for the low phenotypic and additive genetic variance, the slight regression in abdominal bristle number on relaxation of selection, the absence of directional dominance, and the low frequency of recessive lethals.

High frequencies of second and third chromosome lethals were found in the Oregon‐R high and low replicates and in the +3 low replicates. That these lethals were not selected for heterozygote superiority for extreme bristle effect was indicated by the slight regression of these replicates on relaxation of selection, and by the absence of high, fluctuating phenotypic variances.

From chromosome assays it appears that the two parental strains had different arrays of genes affecting high bristle number, with these genes located mostly in chromosome II in the Oregon‐R high line but in chromosome III in the +3 high line. In the Crossbred high line, high bristle factors were located in both the second and third chromosomes. The low bristle factors were located mainly in the second chromosome in all three low selection lines.

It appears that the original cross had combined different genes favouring high bristle number, thus allowing greater response in the Crossbred high selection line. The same did not occur for low selection; the response from the Crossbred low line was similar to that of the parental low lines, suggesting that the gene arrays affecting low bristle number in the two original populations were comparable.  相似文献   

14.
Formalin‐fixed paraffin‐embedded (FFPE) tissue is a rich source of clinically relevant material that can yield important translational biomarker discovery using proteomic analysis. Protocols for analyzing FFPE tissue by LC‐MS/MS exist, but standardization of procedures and critical analysis of data quality is limited. This study compared and characterized data obtained from FFPE tissue using two methods: a urea in‐solution digestion method (UISD) versus a commercially available Qproteome FFPE Tissue Kit method (Qkit). Each method was performed independently three times on serial sections of homogenous FFPE tissue to minimize pre‐analytical variations and analyzed with three technical replicates by LC‐MS/MS. Data were evaluated for reproducibility and physiochemical distribution, which highlighted differences in the ability of each method to identify proteins of different molecular weights and isoelectric points. Each method replicate resulted in a significant number of new protein identifications, and both methods identified significantly more proteins using three technical replicates as compared to only two. UISD was cheaper, required less time, and introduced significant protein modifications as compared to the Qkit method, which provided more precise and higher protein yields. These data highlight significant variability among method replicates and type of method used, despite minimizing pre‐analytical variability. Utilization of only one method or too few replicates (both method and technical) may limit the subset of proteomic information obtained.  相似文献   

15.
16.
Quantitative proteomic comparisons require a sufficient number of samples to reach an acceptable level of significance. But 2D gel electrophoresis commonly results in incomplete data sets due to spots with missing values reducing thereby the number of parallel measurements for individual proteins. Here we investigated how many missing values per spot can be tolerated. The number of spots in common between all gels was found to decrease with the number of parallel gels in a non-linear fashion. Increasing numbers of missing values were associated with a moderate increase in the quantitative variation of spot volumes. Based on the missing value pattern in 20 gels we performed an analysis of the multiple testing power for the hypothetical scenario of a comparative 2DE study with six or twelve parallel gels. The calculation considered the statistical power of the individual spot as well as the number of spots included in the analysis. The power increased with inclusion of spots with higher number of missing values and showed an optimum at a specific minimum number of spot replicates. The results suggest that proteins with missing values can be included in a univariate analysis as long as a sufficient number of parallel gels are made.  相似文献   

17.
MOTIVATION: Due to advances in experimental technologies, such as microarray, mass spectrometry and nuclear magnetic resonance, it is feasible to obtain large-scale data sets, in which measurements for a large number of features can be simultaneously collected. However, the sample sizes of these data sets are usually small due to their relatively high costs, which leads to the issue of concordance among different data sets collected for the same study: features should have consistent behavior in different data sets. There is a lack of rigorous statistical methods for evaluating this concordance or discordance. METHODS: Based on a three-component normal-mixture model, we propose two likelihood ratio tests for evaluating the concordance and discordance between two large-scale data sets with two sample groups. The parameter estimation is achieved through the expectation-maximization (E-M) algorithm. A normal-distribution-quantile-based method is used for data transformation. RESULTS: To evaluate the proposed tests, we conducted some simulation studies, which suggested their satisfactory performances. As applications, the proposed tests were applied to three SELDI-MS data sets with replicates. One data set has replicates from different platforms and the other two have replicates from the same platform. We found that data generated by SELDI-MS showed satisfactory concordance between replicates from the same platform but unsatisfactory concordance between replicates from different platforms. AVAILABILITY: The R codes are freely available at http://home.gwu.edu/~ylai/research/Concordance.  相似文献   

18.
Currently, linear mixed model analyses of expression microarray experiments are performed either in a gene-specific or global mode. The joint analysis provides more flexibility in terms of how parameters are fitted and estimated and tends to be more powerful than the gene-specific analysis. Here we show how to implement the gene-specific linear mixed model analysis as an exact algorithm for the joint linear mixed model analysis. The gene-specific algorithm is exact, when the mixed model equations can be partitioned into unrelated components: One for all global fixed and random effects and the others for the gene-specific fixed and random effects for each gene separately. This unrelatedness holds under three conditions: (1) any gene must have the same number of replicates or probes on all arrays, but these numbers can differ among genes; (2) the residual variance of the (transformed) expression data must be homogeneous or constant across genes (other variance components need not be homogeneous) and (3) the number of genes in the experiment is large. When these conditions are violated, the gene-specific algorithm is expected to be nearly exact.  相似文献   

19.
A variance statistic was used to partition the total variance into that attributable to each step of a TMEN assay procedure. Estimation of the TMEN of wheat was used as an example. The variance statistic can also be used to optimize the design of a TMEN experiment with respect to cost of the experiment and desired accuracy of the result. Experimental design optimization is accomplished by providing a functional relationship between the accuracy of the estimate and the number of replicates of feed, the number of birds used in the experiment, and the cost of each step. The variance statistic is also a useful tool for identifying and removing outliers and highly variable measurements. This feature was demonstrated with the chosen example data. Gross energy of the feed will explain approximately 50% of the variance of the TMEN estimate depending on how many replicates are evaluated. Nitrogen content of the feed sample will explain approximately 40% of the total variance. It is recommended to replicate this measurement as many times as possible. Ten replicates were recommended for the example data. The energy content of excreta from fed birds represented the next largest source of variance, at approximately 4% of the total variance, respectively. If within-bird variance is large, better homogenization of the sample and more replicates are recommended. If among-bird variance is significantly different, more birds should be used. Nitrogen content of excreta from fed birds represented less than 2.5% of the total variance. Energy and nitrogen content of excreta from unfed birds combined represented less than 2% of the total variance, suggesting that the number of unfed birds and the amount of excreta sub-samples may be reduced without adversely affecting the accuracy of the TMEN estimate. Variance due to the amount of excreta collected from the fed birds, and variance due to the amount of feed consumed by the birds, are expected to be small. This result suggested that force-feeding may not be necessary for accurate TMEN estimates.  相似文献   

20.

Background  

One of the challenges with modeling the temporal progression of biological signals is dealing with the effect of noise and the limited number of replicates at each time point. Given the rising interest in utilizing predictive mathematical models to describe the biological response of an organism or analysis such as clustering and gene ontology enrichment, it is important to determine whether the dynamic progression of the data has been accurately captured despite the limited number of replicates, such that one can have confidence that the results of the analysis are capturing important salient dynamic features.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号