首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background  

A large number of genes usually show differential expressions in a microarray experiment with two types of tissues, and the p-values of a proper statistical test are often used to quantify the significance of these differences. The genes with small p-values are then picked as the genes responsible for the differences in the tissue RNA expressions. One key question is what should be the threshold to consider the p-values small. There is always a trade off between this threshold and the rate of false claims. Recent statistical literature shows that the false discovery rate (FDR) criterion is a powerful and reasonable criterion to pick those genes with differential expression. Moreover, the power of detection can be increased by knowing the number of non-differential expression genes. While this number is unknown in practice, there are methods to estimate it from data. The purpose of this paper is to present a new method of estimating this number and use it for the FDR procedure construction.  相似文献   

2.

Background  

Thousands of genes in a genomewide data set are tested against some null hypothesis, for detecting differentially expressed genes in microarray experiments. The expected proportion of false positive genes in a set of genes, called the False Discovery Rate (FDR), has been proposed to measure the statistical significance of this set. Various procedures exist for controlling the FDR. However the threshold (generally 5%) is arbitrary and a specific measure associated with each gene would be worthwhile.  相似文献   

3.
For multiple testing based on discrete p-values, we propose a false discovery rate (FDR) procedure “BH+” with proven conservativeness. BH+ is at least as powerful as the BH (i.e., Benjamini-Hochberg) procedure when they are applied to superuniform p-values. Further, when applied to mid-p-values, BH+ can be more powerful than it is applied to conventional p-values. An easily verifiable necessary and sufficient condition for this is provided. BH+ is perhaps the first conservative FDR procedure applicable to mid-p-values and to p-values with general distributions. It is applied to multiple testing based on discrete p-values in a methylation study, an HIV study and a clinical safety study, where it makes considerably more discoveries than the BH procedure. In addition, we propose an adaptive version of the BH+ procedure, prove its conservativeness under certain conditions, and provide evidence on its excellent performance via simulation studies.  相似文献   

4.
Multiple testing (MT) with false discovery rate (FDR) control has been widely conducted in the “discrete paradigm” where p-values have discrete and heterogeneous null distributions. However, in this scenario existing FDR procedures often lose some power and may yield unreliable inference, and for this scenario there does not seem to be an FDR procedure that partitions hypotheses into groups, employs data-adaptive weights and is nonasymptotically conservative. We propose a weighted p-value-based FDR procedure, “weighted FDR (wFDR) procedure” for short, for MT in the discrete paradigm that efficiently adapts to both heterogeneity and discreteness of p-value distributions. We theoretically justify the nonasymptotic conservativeness of the wFDR procedure under independence, and show via simulation studies that, for MT based on p-values of binomial test or Fisher's exact test, it is more powerful than six other procedures. The wFDR procedure is applied to two examples based on discrete data, a drug safety study, and a differential methylation study, where it makes more discoveries than two existing methods.  相似文献   

5.

Background

When conducting multiple hypothesis tests, it is important to control the number of false positives, or the False Discovery Rate (FDR). However, there is a tradeoff between controlling FDR and maximizing power. Several methods have been proposed, such as the q-value method, to estimate the proportion of true null hypothesis among the tested hypotheses, and use this estimation in the control of FDR. These methods usually depend on the assumption that the test statistics are independent (or only weakly correlated). However, many types of data, for example microarray data, often contain large scale correlation structures. Our objective was to develop methods to control the FDR while maintaining a greater level of power in highly correlated datasets by improving the estimation of the proportion of null hypotheses.

Results

We showed that when strong correlation exists among the data, which is common in microarray datasets, the estimation of the proportion of null hypotheses could be highly variable resulting in a high level of variation in the FDR. Therefore, we developed a re-sampling strategy to reduce the variation by breaking the correlations between gene expression values, then using a conservative strategy of selecting the upper quartile of the re-sampling estimations to obtain a strong control of FDR.

Conclusion

With simulation studies and perturbations on actual microarray datasets, our method, compared to competing methods such as q-value, generated slightly biased estimates on the proportion of null hypotheses but with lower mean square errors. When selecting genes with controlling the same FDR level, our methods have on average a significantly lower false discovery rate in exchange for a minor reduction in the power.  相似文献   

6.

Background  

Before conducting a microarray experiment, one important issue that needs to be determined is the number of arrays required in order to have adequate power to identify differentially expressed genes. This paper discusses some crucial issues in the problem formulation, parameter specifications, and approaches that are commonly proposed for sample size estimation in microarray experiments. Common methods for sample size estimation are formulated as the minimum sample size necessary to achieve a specified sensitivity (proportion of detected truly differentially expressed genes) on average at a specified false discovery rate (FDR) level and specified expected proportion (π 1) of the true differentially expression genes in the array. Unfortunately, the probability of detecting the specified sensitivity in such a formulation can be low. We formulate the sample size problem as the number of arrays needed to achieve a specified sensitivity with 95% probability at the specified significance level. A permutation method using a small pilot dataset to estimate sample size is proposed. This method accounts for correlation and effect size heterogeneity among genes.  相似文献   

7.

Background  

Finding over- or under-represented motifs in biological sequences is now a common task in genomics. Thanks to p-value calculation for motif counts, exceptional motifs are identified and represent candidate functional motifs. The present work addresses the related question of comparing the exceptionality of one motif in two different sequences. Just comparing the motif count p-values in each sequence is indeed not sufficient to decide if this motif is significantly more exceptional in one sequence compared to the other one. A statistical test is required.  相似文献   

8.
We consider the problem of estimating the proportion of differentially expressed genes from the distribution of P-values arising from statistical tests in a microarray experiment. A two-component mixture model is studied in which one component is uniform on [0,1] and corresponds to equally expressed genes and the other component corresponds to differentially expressed genes. Two different parametric families are explicitly investigated for modeling the nonuniform component—the Beta family and a mixture of uniform densities whose supports form a partition of [0,1]. The Minimum Hellinger discrepancy is used to estimate the mixture proportions which are then used to estimate the proportion of differentially expressed genes in the microarray study. The performance of the proposed methods are investigated using simulation studies in which they are also compared with selected other published procedures. The methods are illustrated through a case study involving data from a published microarray experiment.  相似文献   

9.

Background  

In microarray studies researchers are often interested in the comparison of relevant quantities between two or more similar experiments, involving different treatments, tissues, or species. Typically each experiment reports measures of significance (e.g. p-values) or other measures that rank its features (e.g genes). Our objective is to find a list of features that are significant in all experiments, to be further investigated. In this paper we present an R package called sdef, that allows the user to quantify the evidence of communality between the experiments using previously proposed statistical methods based on the ranked lists of p-values. sdef implements two approaches that address this objective: the first is a permutation test of the maximal ratio of observed to expected common features under the hypothesis of independence between the experiments. The second approach, set in a Bayesian framework, is more flexible as it takes into account the uncertainty on the number of genes differentially expressed in each experiment.  相似文献   

10.
11.

Background  

The evaluation of statistical significance has become a critical process in identifying differentially expressed genes in microarray studies. Classical p-value adjustment methods for multiple comparisons such as family-wise error rate (FWER) have been found to be too conservative in analyzing large-screening microarray data, and the False Discovery Rate (FDR), the expected proportion of false positives among all positives, has been recently suggested as an alternative for controlling false positives. Several statistical approaches have been used to estimate and control FDR, but these may not provide reliable FDR estimation when applied to microarray data sets with a small number of replicates.  相似文献   

12.
13.
The ordinary-, penalized-, and bootstrap t-test, least squares and best linear unbiased prediction were compared for their false discovery rates (FDR), i.e. the fraction of falsely discovered genes, which was empirically estimated in a duplicate of the data set. The bootstrap-t-test yielded up to 80% lower FDRs than the alternative statistics, and its FDR was always as good as or better than any of the alternatives. Generally, the predicted FDR from the bootstrapped P-values agreed well with their empirical estimates, except when the number of mRNA samples is smaller than 16. In a cancer data set, the bootstrap-t-test discovered 200 differentially regulated genes at a FDR of 2.6%, and in a knock-out gene expression experiment 10 genes were discovered at a FDR of 3.2%. It is argued that, in the case of microarray data, control of the FDR takes sufficient account of the multiple testing, whilst being less stringent than Bonferoni-type multiple testing corrections. Extensions of the bootstrap simulations to more complicated test-statistics are discussed.  相似文献   

14.
15.

Background  

DIPLOSPOROUS (DIP) is the locus for diplospory in Taraxacum, associated to unreduced female gamete formation in apomicts. Apomicts reproduce clonally through seeds, including apomeiosis, parthenogenesis, and autonomous or pseudogamous endosperm formation. In Taraxacum, diplospory results in first division restitution (FDR) nuclei, and inherits as a dominant, monogenic trait, independent from the other apomixis elements. A preliminary genetic linkage map indicated that the DIP-locus lacks suppression of recombination, which is unique among all other map-based cloning efforts of apomeiosis to date. FDR as well as apomixis as a whole are of interest in plant breeding, allowing for polyploidization and fixation of hybrid vigor, respectively. No dominant FDR or apomixis genes have yet been isolated. Here, we zoom-in to the DIP-locus by largely extending our initial mapping population, and by analyzing (local) suppression of recombination and allele sequence divergence (ASD).  相似文献   

16.

Background  

Cryptosporidium is a protozoan parasite that causes diarrheal illness in a wide range of hosts including humans. Two species, C. parvum and C. hominis are of primary public health relevance. Genome sequences of these two species are available and show only 3-5% sequence divergence. We investigated this sequence variability, which could correspond either to sequence gaps in the published genome sequences or to the presence of species-specific genes. Comparative genomic tools were used to identify putative species-specific genes and a subset of these genes was tested by PCR in a collection of Cryptosporidium clinical isolates and reference strains.  相似文献   

17.

Background

q-value is a widely used statistical method for estimating false discovery rate (FDR), which is a conventional significance measure in the analysis of genome-wide expression data. q-value is a random variable and it may underestimate FDR in practice. An underestimated FDR can lead to unexpected false discoveries in the follow-up validation experiments. This issue has not been well addressed in literature, especially in the situation when the permutation procedure is necessary for p-value calculation.

Results

We proposed a statistical method for the conservative adjustment of q-value. In practice, it is usually necessary to calculate p-value by a permutation procedure. This was also considered in our adjustment method. We used simulation data as well as experimental microarray or sequencing data to illustrate the usefulness of our method.

Conclusions

The conservativeness of our approach has been mathematically confirmed in this study. We have demonstrated the importance of conservative adjustment of q-value, particularly in the situation that the proportion of differentially expressed genes is small or the overall differential expression signal is weak.
  相似文献   

18.

Background  

The Significance Analysis of Microarrays (SAM) is a popular method for detecting significantly expressed genes and controlling the false discovery rate (FDR). Recently, it has been reported in the literature that the FDR is not well controlled by SAM. Due to the vast application of SAM in microarray data analysis, it is of great importance to have an extensive evaluation of SAM and its associated R-package (sam2.20).  相似文献   

19.

Background  

Molecular signatures are sets of genes, proteins, genetic variants or other variables that can be used as markers for a particular phenotype. Reliable signature discovery methods could yield valuable insight into cell biology and mechanisms of human disease. However, it is currently not clear how to control error rates such as the false discovery rate (FDR) in signature discovery. Moreover, signatures for cancer gene expression have been shown to be unstable, that is, difficult to replicate in independent studies, casting doubts on their reliability.  相似文献   

20.

Background  

Molecular lock-and-key systems are common among reproductive proteins, yet their evolution remains a major puzzle in evolutionary biology. In the Brassicaceae, the genes encoding self-incompatibility have been identified, but technical challenges currently prevent detailed analyses of the molecular interaction between the male and female components. In the present study, we investigate sequence polymorphism in the female specificity determinant SRK of Arabidopsis halleri from throughout Europe. Using a comparative approach based on published SRK sequences in A. lyrata and Brassica, we track the signature of frequency-dependent selection acting on these genes at the codon level. Using simulations, we evaluate power and accuracy of our approach and estimate the proportion of codon sites involved in the molecular interaction.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号