首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
As much of the focus of genetics and molecular biology has shifted toward the systems level, it has become increasingly important to accurately extract biologically relevant signal from thousands of related measurements. The common property among these high-dimensional biological studies is that the measured features have a rich and largely unknown underlying structure. One example of much recent interest is identifying differentially expressed genes in comparative microarray experiments. We propose a new approach aimed at optimally performing many hypothesis tests in a high-dimensional study. This approach estimates the optimal discovery procedure (ODP), which has recently been introduced and theoretically shown to optimally perform multiple significance tests. Whereas existing procedures essentially use data from only one feature at a time, the ODP approach uses the relevant information from the entire data set when testing each feature. In particular, we propose a generally applicable estimate of the ODP for identifying differentially expressed genes in microarray experiments. This microarray method consistently shows favorable performance over five highly used existing methods. For example, in testing for differential expression between two breast cancer tumor types, the ODP provides increases from 72% to 185% in the number of genes called significant at a false discovery rate of 3%. Our proposed microarray method is freely available to academic users in the open-source, point-and-click EDGE software package.  相似文献   

2.
We propose a Bayesian hypothesis testing procedure for comparing the distributions of paired samples. The procedure is based on a flexible model for the joint distribution of both samples. The flexibility is given by a mixture of Dirichlet processes. Our proposal uses a spike-slab prior specification for the base measure of the Dirichlet process and a particular parametrization for the kernel of the mixture in order to facilitate comparisons and posterior inference. The joint model allows us to derive the marginal distributions and test whether they differ or not. The procedure exploits the correlation between samples, relaxes the parametric assumptions, and detects possible differences throughout the entire distributions. A Monte Carlo simulation study comparing the performance of this strategy to other traditional alternatives is provided. Finally, we apply the proposed approach to spirometry data collected in the United States to investigate changes in pulmonary function in children and adolescents in response to air polluting factors.  相似文献   

3.
4.

Background  

Large-scale statistical analyses have become hallmarks of post-genomic era biological research due to advances in high-throughput assays and the integration of large biological databases. One accompanying issue is the simultaneous estimation of p-values for a large number of hypothesis tests. In many applications, a parametric assumption in the null distribution such as normality may be unreasonable, and resampling-based p-values are the preferred procedure for establishing statistical significance. Using resampling-based procedures for multiple testing is computationally intensive and typically requires large numbers of resamples.  相似文献   

5.
C Y Meng  A P Dempster 《Biometrics》1987,43(2):301-311
Statistical analyses of simple tumor rates from an animal experiment with one control and one treated group typically consist of hypothesis testing of many 2 X 2 tables, one for each tumor type or site. The multiplicity of significance tests may cause excessive overall false-positive rates. This paper presents a Bayesian approach to the problem of multiple significance testing. We develop a normal logistic model that accommodates the incidences of all tumor types or sites observed in the current experiment simultaneously as well as their historical control incidences. Exchangeable normal priors are assumed for certain linear terms in the model. Posterior means, standard deviations, and Bayesian P-values are computed for an average treatment effect as well as for the effects on individual tumor types or sites. Model assumptions are checked using probability plots and the sensitivity of the parameter estimates to alternative priors is studied. The method is illustrated using tumor data from a chronic animal experiment.  相似文献   

6.
7.
Comparative phylogeographic studies often reveal disparate levels of sequence divergence between lineages spanning a common geographic barrier, leading to the conclusion that isolation was nonsynchronous. However, only rarely do researchers account for the expected variance associated with ancestral coalescence and among-taxon variation in demographic history. We introduce a flexible approximate Bayesian computational (ABC) framework that can test for simultaneous divergence (TSD) using a hierarchical model that incorporates idiosyncratic differences in demographic history across taxon pairs. The method is tested across a range of conditions and is shown to be accurate even with single-locus mitochondrial DNA (mtDNA) data. We apply this method to a landmark dataset of putative simultaneous vicariance, eight geminate echinoid taxon pairs thought to have been split by the Isthmus of Panama 3.1 million years ago. The ABC posterior estimates are not consistent with a history of simultaneous vicariance given these data. Subsequent ABC estimates under a constrained model that assumes two divergence times across the eight taxon pairs suggests simultaneous divergence 3.1 million years ago in seven of the taxon pairs and a more recent divergence in the remaining taxon pair. These ABC estimates on the simultaneous divergence of the seven taxon pairs correspond to a DNA substitution rate of approximately 1.59% per lineage per million years at the mtDNA cytochrome oxidase I gene. This ABC framework can easily be modified to analyze single taxon-pair datasets and/or be expanded to include multiple loci, migration, recombination, and other idiosyncratic demographic histories. The flexible aspect of ABC and its built-in evaluation of estimator bias and statistical power has the potential to greatly enhance statistical rigor in phylogeographic studies.  相似文献   

8.
An important experimental design problem in early-stage drug discovery is how to prioritize available compounds for testing when very little is known about the target protein. Informer-based ranking (IBR) methods address the prioritization problem when the compounds have provided bioactivity data on other potentially relevant targets. An IBR method selects an informer set of compounds, and then prioritizes the remaining compounds on the basis of new bioactivity experiments performed with the informer set on the target. We formalize the problem as a two-stage decision problem and introduce the Bayes Optimal Informer SEt (BOISE) method for its solution. BOISE leverages a flexible model of the initial bioactivity data, a relevant loss function, and effective computational schemes to resolve the two-step design problem. We evaluate BOISE and compare it to other IBR strategies in two retrospective studies, one on protein-kinase inhibition and the other on anticancer drug sensitivity. In both empirical settings BOISE exhibits better predictive performance than available methods. It also behaves well with missing data, where methods that use matrix completion show worse predictive performance.  相似文献   

9.
Ding M  Rosner GL  Müller P 《Biometrics》2008,64(3):886-894
Summary .   Most phase II screening designs available in the literature consider one treatment at a time. Each study is considered in isolation. We propose a more systematic decision-making approach to the phase II screening process. The sequential design allows for more efficiency and greater learning about treatments. The approach incorporates a Bayesian hierarchical model that allows combining information across several related studies in a formal way and improves estimation in small data sets by borrowing strength from other treatments. The design incorporates a utility function that includes sampling costs and possible future payoff. Computer simulations show that this method has high probability of discarding treatments with low success rates and moving treatments with high success rates to phase III trial.  相似文献   

10.
A broad approach to the design of Phase I clinical trials for the efficient estimation of the maximum tolerated dose is presented. The method is rooted in formal optimal design theory and involves the construction of constrained Bayesian c- and D-optimal designs. The imposed constraint incorporates the optimal design points and their weights and ensures that the probability that an administered dose exceeds the maximum acceptable dose is low. Results relating to these constrained designs for log doses on the real line are described and the associated equivalence theorem is given. The ideas are extended to more practical situations, specifically to those involving discrete doses. In particular, a Bayesian sequential optimal design scheme comprising a pilot study on a small number of patients followed by the allocation of patients to doses one at a time is developed and its properties explored by simulation.  相似文献   

11.
In clinical studies involving multiple variables, simultaneous tests are often considered where both the outcomes and hypotheses are correlated. This article proposes a multivariate mixture prior on treatment effects, that allows positive probability of zero effect for each hypothesis, correlations among effect sizes, correlations among binary outcomes of zero versus nonzero effect, and correlations among the observed test statistics (conditional on the effects). We develop a Bayesian multiple testing procedure, for the multivariate two-sample situation with unknown covariance structure, and obtain the posterior probabilities of no difference between treatment regimens for specific variables. Prior selection methods and robustness issues are discussed in the context of a clinical example.  相似文献   

12.
In a computer simulation, a neural network first received a simultaneous procedure, where the interstimulus interval (ISI) was 0 time-steps (ts). Output activations were near zero under this procedure. The network then received a forward-delay procedure where the ISI was 8 ts. Output activations increased to the near-maximum level faster than those of a control network that first received an explicitly unpaired procedure. Comparable results were obtained with rats that first received trials where a retractable lever was presented for 3s concurrently with access to water. Low-lever pressing was observed under this procedure. The rats then received trials where the lever was followed 15s after by water. Lever pressing appeared faster than a control group that received the 15-s ISI after an explicitly unpaired procedure. The model used in the simulation explains these results as connection-weight increments that promote little output activations in a simultaneous procedure, but facilitate acquisition in an optimal ISI.  相似文献   

13.
A simple procedure for estimating the false discovery rate   总被引:1,自引:0,他引:1  
MOTIVATION: The most used criterion in microarray data analysis is nowadays the false discovery rate (FDR). In the framework of estimating procedures based on the marginal distribution of the P-values without any assumption on gene expression changes, estimators of the FDR are necessarily conservatively biased. Indeed, only an upper bound estimate can be obtained for the key quantity pi0, which is the probability for a gene to be unmodified. In this paper, we propose a novel family of estimators for pi0 that allows the calculation of FDR. RESULTS: The very simple method for estimating pi0 called LBE (Location Based Estimator) is presented together with results on its variability. Simulation results indicate that the proposed estimator performs well in finite sample and has the best mean square error in most of the cases as compared with the procedures QVALUE, BUM and SPLOSH. The different procedures are then applied to real datasets. AVAILABILITY: The R function LBE is available at http://ifr69.vjf.inserm.fr/lbe CONTACT: broet@vjf.inserm.fr.  相似文献   

14.
15.
Following the success of small-molecule high-throughput screening (HTS) in drug discovery, other large-scale screening techniques are currently revolutionizing the biological sciences. Powerful new statistical tools have been developed to analyze the vast amounts of data in DNA chip studies, but have not yet found their way into compound screening. In HTS, characterization of single-point hit lists is often done only in retrospect after the results of confirmation experiments are available. However, for prioritization, for optimal use of resources, for quality control, and for comparison of screens it would be extremely valuable to predict the rates of false positives and false negatives directly from the primary screening results. Making full use of the available information about compounds and controls contained in HTS results and replicated pilot runs, the Z score and from it the p value can be estimated for each measurement. Based on this consideration, we have applied the concept of p-value distribution analysis (PVDA), which was originally developed for gene expression studies, to HTS data. PVDA allowed prediction of all relevant error rates as well as the rate of true inactives, and excellent agreement with confirmation experiments was found.  相似文献   

16.
Testing for simultaneous vicariance across comparative phylogeographic data sets is a notoriously difficult problem hindered by mutational variance, the coalescent variance, and variability across pairs of sister taxa in parameters that affect genetic divergence. We simulate vicariance to characterize the behaviour of several commonly used summary statistics across a range of divergence times, and to characterize this behaviour in comparative phylogeographic datasets having multiple taxon-pairs. We found Tajima's D to be relatively uncorrelated with other summary statistics across divergence times, and using simple hypothesis testing of simultaneous vicariance given variable population sizes, we counter-intuitively found that the variance across taxon pairs in Nei and Li's net nucleotide divergence (pi(net)), a common measure of population divergence, is often inferior to using the variance in Tajima's D across taxon pairs as a test statistic to distinguish ancient simultaneous vicariance from variable vicariance histories. The opposite and more intuitive pattern is found for testing more recent simultaneous vicariance, and overall we found that depending on the timing of vicariance, one of these two test statistics can achieve high statistical power for rejecting simultaneous vicariance, given a reasonable number of intron loci (> 5 loci, 400 bp) and a range of conditions. These results suggest that components of these two composite summary statistics should be used in future simulation-based methods which can simultaneously use a pool of summary statistics to test comparative the phylogeographic hypotheses we consider here.  相似文献   

17.
18.
This paper addresses the question of biomarker discovery in proteomics. Given clinical data regarding a list of proteins for a set of individuals, the tackled problem is to extract a short subset of proteins the concentrations of which are an indicator of the biological status (healthy or pathological). In this paper, it is formulated as a specific instance of variable selection. The originality is that the proteins are not investigated one after the other but the best partition between discriminant and non-discriminant proteins is directly sought. In this way, correlations between the proteins are intrinsically taken into account in the decision. The developed strategy is derived in a Bayesian setting, and the decision is optimal in the sense that it minimizes a global mean error. It is finally based on the posterior probabilities of the partitions. The main difficulty is to calculate these probabilities since they are based on the so-called evidence that require marginalization of all the unknown model parameters. Two models are presented that relate the status to the protein concentrations, depending whether the latter are biomarkers or not. The first model accounts for biological variabilities by assuming that the concentrations are Gaussian distributed with a mean and a covariance matrix that depend on the status only for the biomarkers. The second one is an extension that also takes into account the technical variabilities that may significantly impact the observed concentrations. The main contributions of the paper are: (1) a new Bayesian formulation of the biomarker selection problem, (2) the closed-form expression of the posterior probabilities in the noiseless case, and (3) a suitable approximated solution in the noisy case. The methods are numerically assessed and compared to the state-of-the-art methods (t test, LASSO, Battacharyya distance, FOHSIC) on synthetic and real data from proteins quantified in human serum by mass spectrometry in selected reaction monitoring mode.  相似文献   

19.
The increasing availability of DNA sequence data enables exciting new opportunities for fungal ecology. However, it amplifies the challenge of how to objectively classify the diversity of fungal sequences into meaningful units, often in the absence of morphological characters. Here, we test the utility of modern multilocus Bayesian coalescent-based methods for delimiting cryptic fungal diversity in the orchid mycorrhiza morphospecies Serendipita vermifera. We obtained 147 fungal isolates from Caladenia, a speciose clade of Australian orchids known to associate with Serendipita fungi. DNA sequence data for 7 nuclear and mtDNA loci were used to erect competing species hypotheses by clustering isolates based on: (a) ITS sequence divergence, (b) Bayesian admixture analysis, and (c) mtDNA variation. We implemented two coalescent-based Bayesian methods to determine which species hypothesis best fitted our data. Both methods found strong support for eight species of Serendipita among our isolates, supporting species boundaries reflected in ITS divergence. Patterns of host plant association showed evidence for both generalist and specialist associations within the host genus Caladenia. Our findings demonstrate the utility of Bayesian species delimitation methods and suggest that wider application of these techniques will readily uncover new species in other cryptic fungal lineages.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号