共查询到20条相似文献,搜索用时 15 毫秒
1.
SUMMARY: twilight is a Bioconductor compatible package for analysing the statistical significance of differentially expressed genes. It is based on the concept of the local false discovery rate (FDR), a generalization of the frequently used global FDR. twilight implements the heuristic search algorithm for estimating the local FDR introduced in our earlier work. In addition to the raw significance measures, it produces diagnostic plots, which provide insight into the extent of differential expression across genes. AVAILABILITY: http://www.bioconductor.org CONTACT: stefanie.scheid@molgen.mpg.de SUPPLEMENTARY INFORMATION: Please visit our software webpage on http://compdiag.molgen.mpg.de/software. 相似文献
2.
Scheid S Spang R 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2004,1(3):98-108
Screening for differential gene expression in microarray studies leads to difficult large-scale multiple testing problems. The local false discovery rate is a statistical concept for quantifying uncertainty in multiple testing. We introduce a novel estimator for the local false discovery rate that is based on an algorithm which splits all genes into two groups, representing induced and noninduced genes, respectively. Starting from the full set of genes, we successively exclude genes until the gene-wise p-values of the remaining genes look like a typical sample from a uniform distribution. In comparison to other methods, our algorithm performs compatibly in detecting the shape of the local false discovery rate and has a smaller bias with respect to estimating the overall percentage of noninduced genes. Our algorithm is implemented in the Bioconductor compatible R package TWILIGHT version 1.0.1, which is available from http://compdiag.molgen.mpg.de/software or from the Bioconductor project at http://www.bioconductor.org. 相似文献
3.
A mixture model for estimating the local false discovery rate in DNA microarray analysis 总被引:3,自引:0,他引:3
MOTIVATION: Statistical methods based on controlling the false discovery rate (FDR) or positive false discovery rate (pFDR) are now well established in identifying differentially expressed genes in DNA microarray. Several authors have recently raised the important issue that FDR or pFDR may give misleading inference when specific genes are of interest because they average the genes under consideration with genes that show stronger evidence for differential expression. The paper proposes a flexible and robust mixture model for estimating the local FDR which quantifies how plausible each specific gene expresses differentially. RESULTS: We develop a special mixture model tailored to multiple testing by requiring the P-value distribution for the differentially expressed genes to be stochastically smaller than the P-value distribution for the non-differentially expressed genes. A smoothing mechanism is built in. The proposed model gives robust estimation of local FDR for any reasonable underlying P-value distributions. It also provides a single framework for estimating the proportion of differentially expressed genes, pFDR, negative predictive values, sensitivity and specificity. A cervical cancer study shows that the local FDR gives more specific and relevant quantification of the evidence for differential expression that can be substantially different from pFDR. AVAILABILITY: An R function implementing the proposed model is available at http://www.geocities.com/jg_liao/software 相似文献
4.
Background
The study of discrete characters is crucial for the understanding of evolutionary processes. Even though great advances have been made in the analysis of nucleotide sequences, computer programs for non-DNA discrete characters are often dedicated to specific analyses and lack flexibility. Discrete characters often have different transition rate matrices, variable rates among sites and sometimes contain unobservable states. To obtain the ability to accurately estimate a variety of discrete characters, programs with sophisticated methodologies and flexible settings are desired.Results
DiscML performs maximum likelihood estimation for evolutionary rates of discrete characters on a provided phylogeny with the options that correct for unobservable data, rate variations, and unknown prior root probabilities from the empirical data. It gives users options to customize the instantaneous transition rate matrices, or to choose pre-determined matrices from models such as birth-and-death (BD), birth-death-and-innovation (BDI), equal rates (ER), symmetric (SYM), general time-reversible (GTR) and all rates different (ARD). Moreover, we show application examples of DiscML on gene family data and on intron presence/absence data.Conclusion
DiscML was developed as a unified R program for estimating evolutionary rates of discrete characters with no restriction on the number of character states, and with flexibility to use different transition models. DiscML is ideal for the analyses of binary (1s/0s) patterns, multi-gene families, and multistate discrete morphological characteristics.Electronic supplementary material
The online version of this article (doi:10.1186/1471-2105-15-320) contains supplementary material, which is available to authorized users. 相似文献5.
False discovery rate (FDR) analyses of protein and peptide identification results using decoy database searching conventionally report aggregate or global FDRs for a whole set of identifications, which are often not very informative about the error rates of individual members in the set. We describe a nonlinear curve fitting method for calculating the local FDR, which estimates the chance that an individual protein (or peptide) is incorrect, and present a simple tool that implements this analysis. The goal of this method is to offer a simple extension to the now commonplace decoy database searching, providing additional valuable information. 相似文献
6.
A simple procedure for estimating the false discovery rate 总被引:1,自引:0,他引:1
MOTIVATION: The most used criterion in microarray data analysis is nowadays the false discovery rate (FDR). In the framework of estimating procedures based on the marginal distribution of the P-values without any assumption on gene expression changes, estimators of the FDR are necessarily conservatively biased. Indeed, only an upper bound estimate can be obtained for the key quantity pi0, which is the probability for a gene to be unmodified. In this paper, we propose a novel family of estimators for pi0 that allows the calculation of FDR. RESULTS: The very simple method for estimating pi0 called LBE (Location Based Estimator) is presented together with results on its variability. Simulation results indicate that the proposed estimator performs well in finite sample and has the best mean square error in most of the cases as compared with the procedures QVALUE, BUM and SPLOSH. The different procedures are then applied to real datasets. AVAILABILITY: The R function LBE is available at http://ifr69.vjf.inserm.fr/lbe CONTACT: broet@vjf.inserm.fr. 相似文献
7.
MOTIVATION: The false discovery rate (fdr) is a key tool for statistical assessment of differential expression (DE) in microarray studies. Overall control of the fdr alone, however, is not sufficient to address the problem of genes with small variance, which generally suffer from a disproportionally high rate of false positives. It is desirable to have an fdr-controlling procedure that automatically accounts for gene variability. METHODS: We generalize the local fdr as a function of multiple statistics, combining a common test statistic for assessing DE with its standard error information. We use a non-parametric mixture model for DE and non-DE genes to describe the observed multi-dimensional statistics, and estimate the distribution for non-DE genes via the permutation method. We demonstrate this fdr2d approach for simulated and real microarray data. RESULTS: The fdr2d allows objective assessment of DE as a function of gene variability. We also show that the fdr2d performs better than commonly used modified test statistics. AVAILABILITY: An R-package OCplus containing functions for computing fdr2d() and other operating characteristics of microarray data is available at http://www.meb.ki.se/~yudpaw. 相似文献
8.
Background
Proteomic protein identification results need to be compared across laboratories and platforms, and thus a reliable method is needed to estimate false discovery rates. The target-decoy strategy is a platform-independent and thus a prime candidate for standardized reporting of data. In its current usage based on global population parameters, the method does not utilize individual peptide scores optimally. 相似文献9.
Bogojeska J Alexa A Altmann A Lengauer T Rahnenführer J 《Bioinformatics (Oxford, England)》2008,24(20):2391-2392
In genetics, many evolutionary pathways can be modeled by the ordered accumulation of permanent changes. Mixture models of mutagenetic trees have been used to describe disease progression in cancer and in HIV. In cancer, progression is modeled by the accumulation of chromosomal gains and losses in tumor cells; in HIV, the accumulation of drug resistance-associated mutations in the viral genome is known to be associated with disease progression. From such evolutionary models, genetic progression scores can be derived that assign measures for the disease state to single patients. Rtreemix is an R package for estimating mixture models of evolutionary pathways from observed cross-sectional data and for estimating associated genetic progression scores. The package also provides extended functionality for estimating confidence intervals for estimated model parameters and for evaluating the stability of the estimated evolutionary mixture models. 相似文献
10.
Background
The use of current high-throughput genetic, genomic and post-genomic data leads to the simultaneous evaluation of a large number of statistical hypothesis and, at the same time, to the multiple-testing problem. As an alternative to the too conservative Family-Wise Error-Rate (FWER), the False Discovery Rate (FDR) has appeared for the last ten years as more appropriate to handle this problem. However one drawback of FDR is related to a given rejection region for the considered statistics, attributing the same value to those that are close to the boundary and those that are not. As a result, the local FDR has been recently proposed to quantify the specific probability for a given null hypothesis to be true. 相似文献11.
- 1.Camera trapping plays an important role in wildlife surveys, and provides valuable information for estimation of population density. While mark-recapture techniques can estimate population density for species that can be individually recognized or marked, there are no robust methods to estimate density of species that cannot be individually identified.
- 2.We developed a new approach to estimate population density based on the simulation of individual movement within the camera grid. Simulated animals followed a correlated random walk with the movement parameters of segment length, angular deflection, movement distance and home-range size derived from empirical movement paths. Movement was simulated under a series of population densities. We used the Random Forest algorithm to determine the population density with the highest likelihood of matching the camera trap data. We developed an R package, cameratrapR, to conduct simulations and estimate population density.
- 3.Compared with line transect surveys and the random encounter model, cameratrapR provides more reliable estimates of wildlife density with narrower confidence intervals. Functions are provided to visualize movement paths, derive movement parameters, and plot camera trapping results.
- 4.The package allows researchers to estimate population sizes/densities of animals that cannot be individually identified and cameras are deployed in a grid pattern.
12.
Gregory Hather Roger Higdon Andrew Bauman Priska D. von Haller Eugene Kolker 《Proteomics》2010,10(12):2369-2376
MS‐based proteomics characterizes protein contents of biological samples. The most common approach is to first match observed MS/MS peptide spectra against theoretical spectra from a protein sequence database and then to score these matches. The false discovery rate (FDR) can be estimated as a function of the score by searching together the protein sequence database and its randomized version and comparing the score distributions of the randomized versus nonrandomized matches. This work introduces a straightforward isotonic regression‐based method to estimate the cumulative FDRs and local FDRs (LFDRs) of peptide identification. Our isotonic method not only performed as well as other methods used for comparison, but also has the advantages of being: (i) monotonic in the score, (ii) computationally simple, and (iii) not dependent on assumptions about score distributions. We demonstrate the flexibility of our approach by using it to estimate FDRs and LFDRs for protein identification using summaries of the peptide spectra scores. We reconfirmed that several of these methods were superior to a two‐peptide rule. Finally, by estimating both the FDRs and LFDRs, we showed for both peptide and protein identification, moderate FDR values (5%) corresponded to large LFDR values (53 and 60%). 相似文献
13.
Victor B Gabriël S Kanobana K Mostovenko E Polman K Dorny P Deelder AM Palmblad M 《Journal of proteome research》2012,11(3):1991-1995
Tandem mass spectrometry is commonly used to identify peptides, typically by comparing their product ion spectra with those predicted from a protein sequence database and scoring these matches. The most reported quality metric for a set of peptide identifications is the false discovery rate (FDR), the fraction of expected false identifications in the set. This metric has so far only been used for completely sequenced organisms or known protein mixtures. We have investigated whether FDR estimations are also applicable in the case of partially sequenced organisms, where many high-quality spectra fail to identify the correct peptides because the latter are not present in the searched sequence database. Using real data from human plasma and simulated partial sequence databases derived from two complete human sequence databases with different levels of redundancy, we could demonstrate that the mixture model approach in PeptideProphet is robust for partial databases, particularly if used in combination with decoy sequences. We therefore recommend using this method when estimating the FDR and reporting peptide identifications from incompletely sequenced organisms. 相似文献
14.
Drug discovery and drug target identification are two intimately linked facets of intervention strategies aimed at effectively combating pathological conditions in humans. Simple model organisms provide attractive platforms for devising and streamlining efficient drug discovery and drug target identification methodologies. The nematode worm Caenorhabditis elegans has emerged as a particularly convenient and versatile tool that can be exploited to achieve these goals. Although C. elegans is a relatively modern addition to the arsenal of model organisms, its biology has already been investigated to an exceptional level. This, coupled with effortless handling and a notable low cost of cultivation and maintenance, allows seamless implementation of high-throughput drug screening approaches as well as in-depth genetic and biochemical studies of the molecular pathways targeted by specific drugs. In this review, we introduce C. elegans as a model organism with significant advantages toward the identification of molecular drug targets. In addition, we discuss the value of the worm in the development of drug screening and drug evaluation protocols. The unique features of C. elegans, which greatly facilitate drug studies, hold promise for both deciphering disease pathogenesis and formulating educated and effective therapeutic interventions. 相似文献
15.
Matthew W. Pennell Carisa R. Stansbury Lisette P. Waits Craig R. Miller 《Molecular ecology resources》2013,13(1):154-157
Non‐invasive genetic sampling is an increasingly popular approach for investigating the demographics of natural populations. This has also become a useful tool for managers and conservation biologists, especially for those species for which traditional mark–recapture studies are not practical. However, the consequence of collecting DNA indirectly is that an individual may be sampled multiple times per sampling session. This requires alternative statistical approaches to those used in traditional mark–recapture studies. Here we present the R package capwire , an implementation of the population size estimators of Miller et al. (Molecular Ecology 2005; 14 : 1991), which were designed to deal specifically with this type of sampling. The aim of this project is to enable users across platforms to easily manipulate their data and interact with existing R packages. We have also provided functions to simulate data under a variety of scenarios to allow for rigorous testing of the robustness of the method and to facilitate further development of this approach. 相似文献
16.
Dudoit S Gilbert HN van der Laan MJ 《Biometrical journal. Biometrische Zeitschrift》2008,50(5):716-744
This article proposes resampling-based empirical Bayes multiple testing procedures for controlling a broad class of Type I error rates, defined as generalized tail probability (gTP) error rates, gTP (q,g) = Pr(g (V(n),S(n)) > q), and generalized expected value (gEV) error rates, gEV (g) = E [g (V(n),S(n))], for arbitrary functions g (V(n),S(n)) of the numbers of false positives V(n) and true positives S(n). Of particular interest are error rates based on the proportion g (V(n),S(n)) = V(n) /(V(n) + S(n)) of Type I errors among the rejected hypotheses, such as the false discovery rate (FDR), FDR = E [V(n) /(V(n) + S(n))]. The proposed procedures offer several advantages over existing methods. They provide Type I error control for general data generating distributions, with arbitrary dependence structures among variables. Gains in power are achieved by deriving rejection regions based on guessed sets of true null hypotheses and null test statistics randomly sampled from joint distributions that account for the dependence structure of the data. The Type I error and power properties of an FDR-controlling version of the resampling-based empirical Bayes approach are investigated and compared to those of widely-used FDR-controlling linear step-up procedures in a simulation study. The Type I error and power trade-off achieved by the empirical Bayes procedures under a variety of testing scenarios allows this approach to be competitive with or outperform the Storey and Tibshirani (2003) linear step-up procedure, as an alternative to the classical Benjamini and Hochberg (1995) procedure. 相似文献
17.
A variety of methods have been described in the literature for assigning statistical significance to peptides identified via tandem mass spectrometry. Here, we explain how two types of scores, the q-value and the posterior error probability, are related and complementary to one another. 相似文献
18.
Testing for significance with gene expression data from DNA microarray experiments involves simultaneous comparisons of hundreds or thousands of genes. If R denotes the number of rejections (declared significant genes) and V denotes the number of false rejections, then V/R, if R > 0, is the proportion of false rejected hypotheses. This paper proposes a model for the distribution of the number of rejections and the conditional distribution of V given R, V / R. Under the independence assumption, the distribution of R is a convolution of two binomials and the distribution of V / R has a noncentral hypergeometric distribution. Under an equicorrelated model, the distributions are more complex and are also derived. Five false discovery rate probability error measures are considered: FDR = E(V/R), pFDR = E(V/R / R > 0) (positive FDR), cFDR = E(V/R / R = r) (conditional FDR), mFDR = E(V)/E(R) (marginal FDR), and eFDR = E(V)/r (empirical FDR). The pFDR, cFDR, and mFDR are shown to be equivalent under the Bayesian framework, in which the number of true null hypotheses is modeled as a random variable. We present a parametric and a bootstrap procedure to estimate the FDRs. Monte Carlo simulations were conducted to evaluate the performance of these two methods. The bootstrap procedure appears to perform reasonably well, even when the alternative hypotheses are correlated (rho = .25). An example from a toxicogenomic microarray experiment is presented for illustration. 相似文献
19.
The tumor suppressor PTEN controls a variety of biological processes including cell proliferation, growth, migration, and death. As a master cellular regulator, PTEN itself is also subjected to deliberated regulation to ensure its proper function. Defects in PTEN regulation have a profound impact on carcinogenesis. In this review, we briefly discuss recent advances concerning PTEN regulation and how such knowledge facilitates our understanding and further exploration of PTEN biology. The carboxyl-tail of PTEN, which appears to be associated with multiple types of posttranslational regulation, will be under detailed scrutiny. Further, a comparative analysis of PTEN and p53 suggests while p53 needs to be activated to suppress tumorigenesis (a dormant gatekeeper), PTEN is probably a constitutive surveillant against cancer development, thus a default gatekeeper. 相似文献
20.
Acetyl-coenzyme A carboxylases (ACCs) have crucial roles in fatty acid metabolism in humans and most other living organisms. They are attractive targets for drug discovery against a variety of human diseases, including diabetes, obesity, cancer, and microbial infections. In addition, ACCs from grasses are the targets of herbicides that have been in commercial use for more than 20 years. Significant progresses in both basic research and in drug discovery have been made over the past few years in the studies on these enzymes. At the basic research level, the crystal structures of the biotin carboxylase (BC) and the carboxyltransferase (CT) components of ACC have been determined, and the molecular basis for ACC inhibition by small molecules are beginning to be understood. At the drug discovery level, a large number of nanomolar inhibitors of mammalian ACCs have been reported and the extent of their therapeutic potential is being aggressively explored. This review summarizes these new progresses and also offers some prospects in terms of the future directions for the studies on these important enzymes. 相似文献