首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
MOTIVATION: The occurrence of false positives and false negatives in a microarray analysis could be easily estimated if the distribution of p-values were approximated and then expressed as a mixture of null and alternative densities. Essentially any distribution of p-values can be expressed as such a mixture by extracting a uniform density from it. RESULTS: The occurrence of false positives and false negatives in a microarray analysis could be easily estimated if the distribution of p-values were approximated and then expressed as a mixture of null and alternative densities. Essentially any distribution of p-values can be expressed as such a mixture by extracting a uniform density from it. AVAILABILITY: An S-plus function library is available from http://www.stjuderesearch.org/statistics.  相似文献   

2.
MOTIVATION: Statistical tests for the detection of differentially expressed genes lead to a large collection of p-values one for each gene comparison. Without any further adjustment, these p-values may lead to a large number of false positives, simply because the number of genes to be tested is huge, which might mean wastage of laboratory resources. To account for multiple hypotheses, these p-values are typically adjusted using a single step method or a step-down method in order to achieve an overall control of the error rate (the so-called familywise error rate). In many applications, this may lead to an overly conservative strategy leading to too few genes being flagged. RESULTS: In this paper we introduce a novel empirical Bayes screening (EBS) technique to inspect a large number of p-values in an effort to detect additional positive cases. In effect, each case borrows strength from an overall picture of the alternative hypotheses computed from all the p-values, while the entire procedure is calibrated by a step-down method so that the familywise error rate at the complete null hypothesis is still controlled. It is shown that the EBS has substantially higher sensitivity than the standard step-down approach for multiple comparison at the cost of a modest increase in the false discovery rate (FDR). The EBS procedure also compares favorably when compared with existing FDR control procedures for multiple testing. The EBS procedure is particularly useful in situations where it is important to identify all possible potentially positive cases which can be subjected to further confirmatory testing in order to eliminate the false positives. We illustrated this screening procedure using a data set on human colorectal cancer where we show that the EBS method detected additional genes related to colon cancer that were missed by other methods.This novel empirical Bayes procedure is advantageous over our earlier proposed empirical Bayes adjustments due to the following reasons: (i) it offers an automatic screening of the p-values the user may obtain from a univariate (i.e., gene by gene) analysis package making it extremely easy to use for a non-statistician, (ii) since it applies to the p-values, the tests do not have to be t-tests; in particular they could be F-tests which might arise in certain ANOVA formulations with expression data or even nonparametric tests, (iii) the empirical Bayes adjustment uses nonparametric function estimation techniques to estimate the marginal density of the transformed p-values rather than using a parametric model for the prior distribution and is therefore robust against model mis-specification. AVAILABILITY: R code for EBS is available from the authors upon request. SUPPLEMENTARY INFORMATION: http://www.stat.uga.edu/~datta/EBS/supp.htm  相似文献   

3.
MOTIVATION: Microarray techniques provide a valuable way of characterizing the molecular nature of disease. Unfortunately expense and limited specimen availability often lead to studies with small sample sizes. This makes accurate estimation of variability difficult, since variance estimates made on a gene by gene basis will have few degrees of freedom, and the assumption that all genes share equal variance is unlikely to be true. RESULTS: We propose a model by which the within gene variances are drawn from an inverse gamma distribution, whose parameters are estimated across all genes. This results in a test statistic that is a minor variation of those used in standard linear models. We demonstrate that the model assumptions are valid on experimental data, and that the model has more power than standard tests to pick up large changes in expression, while not increasing the rate of false positives. AVAILABILITY: This method is incorporated into BRB-ArrayTools version 3.0 (http://linus.nci.nih.gov/BRB-ArrayTools.html). SUPPLEMENTARY MATERIAL: ftp://linus.nci.nih.gov/pub/techreport/RVM_supplement.pdf  相似文献   

4.
5.
Over the past few years, due to the popularisation of high-throughput methodologies such as DNA microarrays, the possibility of obtaining experimental data has increased significantly. Nevertheless, the interpretation of the results, which involves translating these data into useful biological knowledge, still remains a challenge. The methods and strategies used for this interpretation are in continuous evolution and new proposals are constantly arising. Initially, a two-step approach was used in which genes of interest were initially selected, based on thresholds that consider only experimental values, and then in a second, independent step the enrichment of these genes in biologically relevant terms, was analysed. For different reasons, these methods are relatively poor in terms of performance and a new generation of procedures, which draw inspiration from systems biology criteria, are currently under development. Such procedures, aim to directly test the behaviour of blocks of functionally related genes, instead of focusing on single genes.  相似文献   

6.
7.
Since mutation rate is a key biological parameter, its proper estimation has received great attention for decades. However, instead of the mutation rate, many authors opt for reporting the average mutant frequency, a less meaningful quantity. This is because the standard methods to estimate the mutation rate, derived from the Luria and Delbrück's fluctuation analysis, ideally require high-replication experiments to be applied; a requirement often unattainable due to constraints of time, budget or sample availability. But the main problem with mutant frequency, apart from being less informative, is its poor reproducibility; an especially marked defect when the chosen average is the arithmetic mean. Several authors tried to avoid this by employing other averages (such as the median or the geometric mean) or discarding outliers, though as far as we know nobody has evaluated which method performs best under low-replication settings. Here we use computer simulations to compare the performance of different methods used in low-replication experiments (≤4 cultures). Besides the customary averages of mutant frequency, we also tested two well-known fluctuation methods. Contrary to common practice, our results support that fluctuation methods should be applied in such circumstances, as they perform as well as or better than any average of mutant frequency. In particular, experimentalists will benefit from using MSS maximum likelihood in low-replication experiments because it: (i) provides more reproducible results, (ii) allows for direct estimation of mutation rate and (iii) allows for the application of conventional statistics.  相似文献   

8.

Background  

A typical microarray experiment has many sources of variation which can be attributed to biological and technical causes. Identifying sources of variation and assessing their magnitude, among other factors, are important for optimal experimental design. The objectives of this study were: (1) to estimate relative magnitudes of different sources of variation and (2) to evaluate agreement between biological and technical replicates.  相似文献   

9.
10.
11.
12.
MOTIVATION: If there is insufficient RNA from the tissues under investigation from one organism, then it is common practice to pool RNA. An important question is to determine whether pooling introduces biases, which can lead to inaccurate results. In this article, we describe two biases related to pooling, from a theoretical as well as a practical point of view. RESULTS: We model and quantify the respective parts of the pooling bias due to the log transform as well as the bias due to biological averaging of the samples. We also evaluate the impact of the bias on the statistical differential analysis of Affymetrix data.  相似文献   

13.
14.
Estimating relative fitness in viral competition experiments   总被引:1,自引:0,他引:1       下载免费PDF全文
The relative fitness of viral variants has previously been defined as the slope of the logarithmic ratio of the genotype or phenotype frequencies in time plots of pairwise competition experiments. Developing mathematical models for such experiments by employing the conventional coefficient of selection s, we demonstrate that this logarithmic ratio gives the fitness difference, rather than the relative fitness. This fitness difference remains proportional to the actual replication rate realized in the particular experimental setup and hence cannot be extrapolated to other situations. Conversely, the conventional relative fitness (1 + s) should be more generic. We develop an approach to compute the generic relative fitness in conventional competition experiments. This involves an estimation of the total viral replication during the experiment and requires an estimate of the average lifetime of productively infected cells. The novel approach is illustrated by estimating the relative fitness, i.e., the relative replication rate, of a set of zidovudine-resistant human immunodeficiency virus type 1 variants. A tool for calculating the relative fitness from observed changes in viral load and genotype (or phenotype) frequencies is publically available on the website at http://www-binf.bio.uu.nl/( approximately )rdb/fitness.html.  相似文献   

15.
DNA microarray technology provides useful tools for profiling global gene expression patterns in different cell/tissue samples. One major challenge is the large number of genes relative to the number of samples. The use of all genes can suppress or reduce the performance of a classification rule due to the noise of nondiscriminatory genes. Selection of an optimal subset from the original gene set becomes an important prestep in sample classification. In this study, we propose a family-wise error (FWE) rate approach to selection of discriminatory genes for two-sample or multiple-sample classification. The FWE approach controls the probability of the number of one or more false positives at a prespecified level. A public colon cancer data set is used to evaluate the performance of the proposed approach for the two classification methods: k nearest neighbors (k-NN) and support vector machine (SVM). The selected gene sets from the proposed procedure appears to perform better than or comparable to several results reported in the literature using the univariate analysis without performing multivariate search. In addition, we apply the FWE approach to a toxicogenomic data set with nine treatments (a control and eight metals, As, Cd, Ni, Cr, Sb, Pb, Cu, and AsV) for a total of 55 samples for a multisample classification. Two gene sets are considered: the gene set omegaF formed by the ANOVA F-test, and a gene set omegaT formed by the union of one-versus-all t-tests. The predicted accuracies are evaluated using the internal and external crossvalidation. Using the SVM classification, the overall accuracies to predict 55 samples into one of the nine treatments are above 80% for internal crossvalidation. OmegaF has slightly higher accuracy rates than omegaT. The overall predicted accuracies are above 70% for the external crossvalidation; the two gene sets omegaT and omegaF performed equally well.  相似文献   

16.
17.
Practical FDR-based sample size calculations in microarray experiments   总被引:3,自引:2,他引:3  
Motivation: Owing to the experimental cost and difficulty inobtaining biological materials, it is essential to considerappropriate sample sizes in microarray studies. With the growinguse of the False Discovery Rate (FDR) in microarray analysis,an FDR-based sample size calculation is essential. Method: We describe an approach to explicitly connect the samplesize to the FDR and the number of differentially expressed genesto be detected. The method fits parametric models for degreeof differential expression using the Expectation–Maximizationalgorithm. Results: The applicability of the method is illustrated withsimulations and studies of a lung microarray dataset. We proposeto use a small training set or published data from relevantbiological settings to calculate the sample size of an experiment. Availability: Code to implement the method in the statisticalpackage R is available from the authors. Contact: jhu{at}mdanderson.org  相似文献   

18.
The efficiency of pooling mRNA in microarray experiments   总被引:11,自引:0,他引:11  
In a microarray experiment, messenger RNA samples are oftentimes pooled across subjects out of necessity, or in an effort to reduce the effect of biological variation. A basic problem in such experiments is to estimate the nominal expression levels of a large number of genes. Pooling samples will affect expression estimation, but the exact effects are not yet known as the approach has not been systematically studied in this context. We consider how mRNA pooling affects expression estimates by assessing the finite-sample performance of different estimators for designs with and without pooling. Conditions under which it is advantageous to pool mRNA are defined; and general properties of estimates from both pooled and non-pooled designs are derived under these conditions. A formula is given for the total number of subjects and arrays required in a pooled experiment to obtain gene expression estimates and confidence intervals comparable to those obtained from the no-pooling case. The formula demonstrates that by pooling a perhaps increased number of subjects, one can decrease the number of arrays required in an experiment without a loss of precision. The assumptions that facilitate derivation of this formula are considered using data from a quantitative real-time PCR experiment. The calculations are not specific to one particular method of quantifying gene expression as they assume only that a single, normalized, estimate of expression is obtained for each gene. As such, the results should be generally applicable to a number of technologies provided sufficient pre-processing and normalization methods are available and applied.  相似文献   

19.

Background  

Enzyme-linked immunosorbent assay (ELISA) is a standard immunoassay to estimate a protein's concentration in a sample. Deploying ELISA in a microarray format permits simultaneous estimation of the concentrations of numerous proteins in a small sample. These estimates, however, are uncertain due to processing error and biological variability. Evaluating estimation error is critical to interpreting biological significance and improving the ELISA microarray process. Estimation error evaluation must be automated to realize a reliable high-throughput ELISA microarray system.  相似文献   

20.
In microarray studies it is common that the number of replications (i.e. the sample size) is small and that the distribution of expression values differs from normality. In this situation, permutation and bootstrap tests may be appropriate for the identification of differentially expressed genes. However, unlike bootstrap tests, permutation tests are not suitable for very small sample sizes, such as three per group. A variety of different bootstrap tests exists. For example, it is possible to adjust the data to have a common mean before the bootstrap samples are drawn. For small significance levels, which can occur when a large number of genes is investigated, the original bootstrap test, as well as a bootstrap test suggested for the Behrens-Fisher problem, have no power in cases of very small sample sizes. In contrast, the modified test based on adjusted data is powerful. Using a Monte Carlo simulation study, we demonstrate that the difference in power can be huge. In addition, the different tests are illustrated using microarray data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号