共查询到20条相似文献,搜索用时 10 毫秒
1.
In this work we present a web-based tool for estimating multiple alignment quality using Bayesian hypothesis testing. The proposed method is very simple, easily implemented and not time consuming with a linear complexity. We evaluated method against a series of different alignments (a set of random and biologically derived alignments) and compared the results with tools based on classical statistical methods (such as sFFT and csFFT). Taking correlation coefficient as an objective criterion of the true quality, we found that Bayesian hypothesis testing performed better on average than the classical methods we tested. This approach may be used independently or as a component of any tool in computational biology which is based on the statistical estimation of alignment quality. AVAILABILITY: http://www.fmi.ch/groups/functional.genomics/tool.htm. SUPPLEMENTARY INFORMATION: Supplementary data are available from http://www.fmi.ch/groups/functional.genomics/tool-Supp.htm. 相似文献
2.
In genetics, we often encounter a large number of highly correlated test statistics. The most famous conservative bound for multiple comparison is Bonferroni's bound, which is suitable when the test statistics are independent but not when the test statistics are highly correlated. This article proposes a new conservative bound that is easily calculated without multiple integration and is a good approximation when the test statistics are highly correlated. The performance of the proposed method is evaluated by simulation and real data analysis. 相似文献
3.
In clinical studies involving multiple variables, simultaneous tests are often considered where both the outcomes and hypotheses are correlated. This article proposes a multivariate mixture prior on treatment effects, that allows positive probability of zero effect for each hypothesis, correlations among effect sizes, correlations among binary outcomes of zero versus nonzero effect, and correlations among the observed test statistics (conditional on the effects). We develop a Bayesian multiple testing procedure, for the multivariate two-sample situation with unknown covariance structure, and obtain the posterior probabilities of no difference between treatment regimens for specific variables. Prior selection methods and robustness issues are discussed in the context of a clinical example. 相似文献
4.
An important goal of environmental health research is to assess the risk posed by mixtures of environmental exposures. Two popular classes of models for mixtures analyses are response-surface methods and exposure-index methods. Response-surface methods estimate high-dimensional surfaces and are thus highly flexible but difficult to interpret. In contrast, exposure-index methods decompose coefficients from a linear model into an overall mixture effect and individual index weights; these models yield easily interpretable effect estimates and efficient inferences when model assumptions hold, but, like most parsimonious models, incur bias when these assumptions do not hold. In this paper, we propose a Bayesian multiple index model framework that combines the strengths of each, allowing for non-linear and non-additive relationships between exposure indices and a health outcome, while reducing the dimensionality of the exposure vector and estimating index weights with variable selection. This framework contains response-surface and exposure-index models as special cases, thereby unifying the two analysis strategies. This unification increases the range of models possible for analysing environmental mixtures and health, allowing one to select an appropriate analysis from a spectrum of models varying in flexibility and interpretability. In an analysis of the association between telomere length and 18 organic pollutants in the National Health and Nutrition Examination Survey (NHANES), the proposed approach fits the data as well as more complex response-surface methods and yields more interpretable results. 相似文献
5.
Glenn W. Suter II 《人类与生态风险评估》1996,2(2):331-347
Statistical hypothesis testing is commonly used inappropriately to analyze data, determine causality, and make decisions about significance in ecological risk assessment. Hypothesis testing is conceptually inappropriate in that it is designed to test scientific hypotheses rather than to estimate risks. It is inappropriate for analysis of field studies because it requires replication and random assignment of treatments. It discourages good toxicity testing and field studies, it provides less protection to ecosystems or their components that are difficult to sample or replicate, and it provides less protection when more treatments or responses are used. It provides a poor basis for decision‐making because it does not generate a conclusion of no effect, it does not indicate the nature or magnitude of effects, it does not address effects at untested exposure levels, and it confounds effects and uncertainty. Attempts to make hypothesis testing less problematical cannot solve these problems. Rather, risk assessors should focus on analyzing the relationship between exposure and effects, on presenting a clear estimate of expected or observed effects and associated uncertainties, and on providing the information in a manner that is useful to decision‐makers and the public. 相似文献
6.
Long-distance procurement of timber was necessary for the construction of Ancestral Pueblo Great Houses in Chaco Canyon, New Mexico. A number of higher-altitude tree sources were available within 30–70 km, though some isolated trees may have been acquired more locally. Highly regional tree ring variations enable matching some construction timbers to their source. Here, a method is developed which 1) develops a rejection criteria for ruling out sources for a tree ring sequence, 2) quantifies the relative spatial representation of a given source sequence, and 3) applies Bayes theorem to calculate posterior probabilities of source attribution. The application of this method in part supports past sourcing work, but indicates that the majority (59–64%) of timbers cannot be ascribed with even low confidence to the most common high-altitude sources. This analysis supports a model of diverse tree acquisition from a number of different sources, though with high uncertainty for a majority of timbers used in the present study. 相似文献
7.
Often a response of interest cannot be measured directly and it is necessary to rely on multiple surrogates, which can be assumed to be conditionally independent given the latent response and observed covariates. Latent response models typically assume that residual densities are Gaussian. This article proposes a Bayesian median regression modeling approach, which avoids parametric assumptions about residual densities by relying on an approximation based on quantiles. To accommodate within-subject dependency, the quantile response categories of the surrogate outcomes are related to underlying normal variables, which depend on a latent normal response. This underlying Gaussian covariance structure simplifies interpretation and model fitting, without restricting the marginal densities of the surrogate outcomes. A Markov chain Monte Carlo algorithm is proposed for posterior computation, and the methods are applied to single-cell electrophoresis (comet assay) data from a genetic toxicology study. 相似文献
8.
Bayesian statistics for parasitologists 总被引:3,自引:0,他引:3
Bayesian statistical methods are increasingly being used in the analysis of parasitological data. Here, the basis of differences between the Bayesian method and the classical or frequentist approach to statistical inference is explained. This is illustrated with practical implications of Bayesian analyses using prevalence estimation of strongyloidiasis and onchocerciasis as two relevant examples. The strongyloidiasis example addresses the problem of parasitological diagnosis in the absence of a gold standard, whereas the onchocerciasis case focuses on the identification of villages warranting priority mass ivermectin treatment. The advantages and challenges faced by users of the Bayesian approach are also discussed and the readers pointed to further directions for a more in-depth exploration of the issues raised. We advocate collaboration between parasitologists and Bayesian statisticians as a fruitful and rewarding venture for advancing applied research in parasite epidemiology and the control of parasitic infections. 相似文献
9.
Recent molecular studies have incorporated the parametric bootstrap method to test a priori hypotheses when the results of molecular based phylogenies are in conflict with these hypotheses. The parametric bootstrap requires the specification of a particular substitutional model, the parameters of which will be used to generate simulated, replicate DNA sequence data sets. It has been both suggested that, (a) the method appears robust to changes in the model of evolution, and alternatively that, (b) as realistic model of DNA substitution as possible should be used to avoid false rejection of a null hypothesis. Here we empirically evaluate the effect of suboptimal substitution models when testing hypotheses of monophyly with the parametric bootstrap using data sets of mtDNA cytochrome oxidase I and II (COI and COII) sequences for Macaronesian Calathus beetles, and mitochondrial 16S rDNA and nuclear ITS2 sequences for European Timarcha beetles. Whether a particular hypothesis of monophyly is rejected or accepted appears to be highly dependent on whether the nucleotide substitution model being used is optimal. It appears that a parameter rich model is either equally or less likely to reject a hypothesis of monophyly where the optimal model is unknown. A comparison of the performance of the Kishino–Hasegawa (KH) test shows it is not as severely affected by the use of suboptimal models, and overall it appears to be a less conservative method with a higher rate of failure to reject null hypotheses. 相似文献
10.
Many biomedical studies collect data on times of occurrence for a health event that can occur repeatedly, such as infection, hospitalization, recurrence of disease, or tumor onset. To analyze such data, it is necessary to account for within-subject dependency in the multiple event times. Motivated by data from studies of palpable tumors, this article proposes a dynamic frailty model and Bayesian semiparametric approach to inference. The widely used shared frailty proportional hazards model is generalized to allow subject-specific frailties to change dynamically with age while also accommodating nonproportional hazards. Parametric assumptions on the frailty distribution are avoided by using Dirichlet process priors for a shared frailty and for multiplicative innovations on this frailty. By centering the semiparametric model on a conditionally conjugate dynamic gamma model, we facilitate posterior computation and lack-of-fit assessments of the parametric model. Our proposed method is demonstrated using data from a cancer chemoprevention study. 相似文献
11.
Gao X 《Bioinformatics (Oxford, England)》2006,22(12):1486-1494
MOTIVATION: The parametric F-test has been widely used in the analysis of factorial microarray experiments to assess treatment effects. However, the normality assumption is often untenable for microarray experiments with small replications. Therefore, permutation-based methods are called for help to assess the statistical significance. The distribution of the F-statistics across all the genes on the array can be regarded as a mixture distribution with a proportion of statistics generated from the null distribution of no differential gene expression whereas the other proportion of statistics generated from the alternative distribution of genes differentially expressed. This results in the fact that the permutation distribution of the F-statistics may not approximate well to the true null distribution of the F-statistics. Therefore, the construction of a proper null statistic to better approximate the null distribution of F-statistic is of great importance to the permutation-based multiple testing in microarray data analysis. RESULTS: In this paper, we extend the ideas of constructing null statistics based on pairwise differences to neglect the treatment effects from the two-sample comparison problem to the multifactorial balanced or unbalanced microarray experiments. A null statistic based on a subpartition method is proposed and its distribution is employed to approximate the null distribution of the F-statistic. The proposed null statistic is able to accommodate unbalance in the design and is also corrected for the undue correlation between its numerator and denominator. In the simulation studies and real biological data analysis, the number of true positives and the false discovery rate (FDR) of the proposed null statistic are compared with those of the permutated version of the F-statistic. It has been shown that our proposed method has a better control of the FDRs and a higher power than the standard permutation method to detect differentially expressed genes because of the better approximated tail probabilities. 相似文献
12.
Background
Biologists often conduct multiple but different cDNA microarray studies that all target the same biological system or pathway. Within each study, replicate slides within repeated identical experiments are often produced. Pooling information across studies can help more accurately identify true target genes. Here, we introduce a method to integrate multiple independent studies efficiently. 相似文献13.
Ritabrata Dutta Karim Zouaoui Boudjeltia Christos Kotsalos Alexandre Rousseau Daniel Ribeiro de Sousa Jean-Marc Desmet Alain Van Meerhaeghe Antonietta Mira Bastien Chopard 《PLoS computational biology》2022,18(3)
Cardio/cerebrovascular diseases (CVD) have become one of the major health issue in our societies. But recent studies show that the present pathology tests to detect CVD are ineffectual as they do not consider different stages of platelet activation or the molecular dynamics involved in platelet interactions and are incapable to consider inter-individual variability. Here we propose a stochastic platelet deposition model and an inferential scheme to estimate the biologically meaningful model parameters using approximate Bayesian computation with a summary statistic that maximally discriminates between different types of patients. Inferred parameters from data collected on healthy volunteers and different patient types help us to identify specific biological parameters and hence biological reasoning behind the dysfunction for each type of patients. This work opens up an unprecedented opportunity of personalized pathology test for CVD detection and medical treatment. 相似文献
14.
Bayesian inference for prevalence and diagnostic test accuracy based on dual-pooled screening 总被引:1,自引:0,他引:1
We propose a useful protocol for the problem of screening populations for low-prevalence characteristics such as HIV or drugs. Current HIV screening of blood that has been donated for transfusion involves the testing of individual blood units with an inexpensive enzyme-linked immunosorbent assay test and follow-up with a more accurate and more expensive western blot test for only those units that tested positive. Our cost-effective pooling strategy would enhance current methods by making it possible to accurately estimate the sensitivity and specificity of the initial screening test, and the proportion of defective units that have passed through the system. We also provide a method of estimating the distribution of prevalences for the characteristic throughout the population or subpopulations of interest. 相似文献
15.
多元统计分析方法在万寿菊品种抗旱性评价中的应用 总被引:5,自引:0,他引:5
利用主成分分析法、隶属函数法和聚类分析法,对9个万寿菊品种16个生理指标的抗旱性进行综合评价.结果表明: 不同品种万寿菊的16个生理指标变化程度不同,其中脯氨酸(Pro)、H2O2和抗坏血酸过氧化物酶(APX)对干旱胁迫的敏感性最大,而且部分抗旱系数之间显著相关;4个主因子代表16个生理指标抗旱性88.6%的数据信息,‘珍妮’和‘金门’、‘鸿运’、‘珍妮’、‘拳王’分别在4个主因子上的抗旱性最强;9个品种的抗旱性综合评价值大小为:‘珍妮’>‘金门’>‘鸿运’>‘拳王’>‘巨人’>‘大英雄’>‘小英雄’>‘迪阿哥’>‘发现’;聚类分析将9个品种分为3类,其中‘金门’、‘珍妮’、‘鸿运’和‘拳王’属于抗旱品种. 相似文献
16.
R. G. Lalonde 《Evolutionary ecology》1988,2(4):316-320
Summary McGinley and Charnov (1988) propose that seasonal seed weight decline results from optimization of independently varying resource components: in particular, carbon and nitrogen. Canada thistle (Cirsium arvense) (L.) Scop, does not express seasonal seed weight reduction when the number of seeds competing for the plant's resources is reduced by low pollination success. Seeds sampled from thistles treated to high and low pollination regimes were analyzed here for relative investment of carbon and nitrogen. The ratio of these two elements remained constant over the season in both treatment groups. The seasonal decline in mean seed weight displayed by this plant under high pollination is therefore not explainable by McGinley and Charnov's multiple resource pool hypothesis. 相似文献
17.
In many practical problems, a hypothesis testing involves a nuisance parameter which appears only under the alternative hypothesis. Davies (1977, Biometrika 64, 247-254) proposed the maximum of the score statistics over the whole range of the nuisance parameter as a test statistic for this type of hypothesis testing. Freidlin, Podgor, and Gastwirth (1999, Biometrics 55, 883-886) studied two other simpler maximum test statistics, the maximum of the score statistics at two extreme points of the nuisance parameter, and the maximum of the score statistics at three points of the nuisance parameter including the two extreme points. In this article, we compare the powers of these three maximum-type statistics in the context of three genetic problems. 相似文献
18.
Background
We consider effects of dependence among variables of high-dimensional data in multiple hypothesis testing problems, in particular the False Discovery Rate (FDR) control procedures. Recent simulation studies consider only simple correlation structures among variables, which is hardly inspired by real data features. Our aim is to systematically study effects of several network features like sparsity and correlation strength by imposing dependence structures among variables using random correlation matrices. 相似文献19.
We analyze some aspects of scan statistics, which have been proposed to help for the detection of weak signals in genetic linkage analysis. We derive approximate expressions for the power of a test based on moving averages of the identity by descent allele sharing proportions for pairs of relatives at several contiguous markers. We confirm these approximate formulae by simulation. The results show that when there is a single trait-locus on a chromosome, the test based on the scan statistic is slightly less powerful than that based on the customary allele sharing statistic. On the other hand, if two genes having a moderate effect on a trait lie close to each other on the same chromosome, scan statistics improve power to detect linkage. 相似文献
20.
Xiongzhi Chen 《Biometrical journal. Biometrische Zeitschrift》2020,62(4):1060-1079
For multiple testing based on discrete p-values, we propose a false discovery rate (FDR) procedure “BH+” with proven conservativeness. BH+ is at least as powerful as the BH (i.e., Benjamini-Hochberg) procedure when they are applied to superuniform p-values. Further, when applied to mid-p-values, BH+ can be more powerful than it is applied to conventional p-values. An easily verifiable necessary and sufficient condition for this is provided. BH+ is perhaps the first conservative FDR procedure applicable to mid-p-values and to p-values with general distributions. It is applied to multiple testing based on discrete p-values in a methylation study, an HIV study and a clinical safety study, where it makes considerably more discoveries than the BH procedure. In addition, we propose an adaptive version of the BH+ procedure, prove its conservativeness under certain conditions, and provide evidence on its excellent performance via simulation studies. 相似文献