首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A recently proposed optimal Bayesian classification paradigm addresses optimal error rate analysis for small-sample discrimination, including optimal classifiers, optimal error estimators, and error estimation analysis tools with respect to the probability of misclassification under binary classes. Here, we address multi-class problems and optimal expected risk with respect to a given risk function, which are common settings in bioinformatics. We present Bayesian risk estimators (BRE) under arbitrary classifiers, the mean-square error (MSE) of arbitrary risk estimators under arbitrary classifiers, and optimal Bayesian risk classifiers (OBRC). We provide analytic expressions for these tools under several discrete and Gaussian models and present a new methodology to approximate the BRE and MSE when analytic expressions are not available. Of particular note, we present analytic forms for the MSE under Gaussian models with homoscedastic covariances, which are new even in binary classification.  相似文献   

2.
Summary Absence of a perfect reference test is an acknowledged source of bias in diagnostic studies. In the case of tuberculous pleuritis, standard reference tests such as smear microscopy, culture and biopsy have poor sensitivity. Yet meta‐analyses of new tests for this disease have always assumed the reference standard is perfect, leading to biased estimates of the new test’s accuracy. We describe a method for joint meta‐analysis of sensitivity and specificity of the diagnostic test under evaluation, while considering the imperfect nature of the reference standard. We use a Bayesian hierarchical model that takes into account within‐ and between‐study variability. We show how to obtain pooled estimates of sensitivity and specificity, and how to plot a hierarchical summary receiver operating characteristic curve. We describe extensions of the model to situations where multiple reference tests are used, and where index and reference tests are conditionally dependent. The performance of the model is evaluated using simulations and illustrated using data from a meta‐analysis of nucleic acid amplification tests (NAATs) for tuberculous pleuritis. The estimate of NAAT specificity was higher and the sensitivity lower compared to a model that assumed that the reference test was perfect.  相似文献   

3.
The aim of this contribution is to give an overview of approaches to testing for non-inferiority of one out of two binomial distributions as compared to the other in settings involving independent samples (the paired samples case is not considered here but the major conclusions and recommendations can be shown to hold for both sampling schemes). In principle, there is an infinite number of different ways of defining (one-sided) equivalence in any multiparameter setting. In the binomial two-sample problem, the following three choices of a measure of dissimilarity between the underlying distributions are of major importance for real applications: the odds ratio (OR), the relative risk (RR), and the difference (DEL) of both binomial parameters. It is shown that for all three possibilities of formulating the hypotheses of a non-inferiority problem concerning two binomial proportions, reasonable testing procedures providing exact control over the type-I error risk are available. As a particularly useful and versatile way of handling mathematically nonnatural parametrizations like RR and DELTA, the approach through Bayesian posterior probabilities of hypotheses with respect to some non-informative reference prior has much to recommend it. In order to ensure that the corresponding testing procedure be valid in the classical, i.e. frequentist sense, it suffices to use straightforward computational techniques yielding suitably corrected nominal significance levels. In view of the availability of testing procedures with satisfactory properties for all parametrizations of main practical interest, the discussion of the pros and cons of these methods has to focus on the question of which of the underlying measures of dissimilarity should be preferred on grounds of logic and intuition. It is argued that the OR clearly merits to be given preference also with regard to this latter kind of criteria since the non-inferiority hypotheses defined in terms of the other parametric functions are bounded by lines which cross the boundaries of the parameter space. From this fact, we conclude that the exact Fisher type test for one-sided equivalence provides the most reasonable approach to the confirmatory analysis of non-inferiority trials involving two independent samples of binary data. The marked conservatism of the nonrandomized version of this test can largely be removed by using a suitably increased nominal significance level (depending, in addition to the target level, on the sample sizes and the equivalence margin), or by replacing it with a Bayesian test for non-inferiority with respect to the odds ratio.  相似文献   

4.
The Pacific walrus (Odobenus rosmarus divergens) is a candidate to be listed as an endangered species under United States law, in part, because of climate change‐related concerns. While the population was known to be declining in the 1980s and 1990s, its recent status has not been determined. We developed Bayesian models of walrus population dynamics to assess the population by synthesizing information on population sizes, age structures, reproductive rates, and harvests for 1974–2015. Candidate models allowed for temporal variation in some or all vital rates, as well as density dependence or density independence in reproduction and calf survival. All selected models indicated that the population underwent a multidecade decline, which began moderating in the 1990s, and that annual reproductive rate and natural calf survival rates rose over time in a density‐dependent manner. However, selected models were equivocal regarding whether the natural juvenile survival rate was constant or decreasing over time. Depending on whether juvenile survival decreased after 1998, the population growth rate either increased during 1999–2015 or stabilized at a lesser level of decline than seen in the 1980s. The probability that the population was still declining in 2015 ranged from 45% to 87%.  相似文献   

5.
Configural Frequency Analysis (CFA) is being increasingly used by psychologists and other researchers to test for the presence of combinations of categorical variables which occur more frequently or less frequently than expected under a particular model of chance. Configurations which occur more frequently than chance are known as “Types”-Configurations which are conspicuous by their absence or rarity are known as “Antitypes”. Most configural frequency test theory consists of binomial tests applied to the cells of a cross-tabulation table. The wide variety of statistical tests described in papers and books on CFA are approximations to the binomial test, due to the computational intensity associated with performing binomial tests directly (VON EYE, 1990b). This paper advocates direct computation of binomial probabilities instead of the usual approximations used in CFA. Mathematical relationships of the binomial distribution with the F and incomplete beta distributions are described which enable the researcher to efficiently compute binomial probabilities using functions available in common statistical software. The classical inference approach adopted by traditional CFA makes it difficult to make conclusions regarding the likely prevalence rates of types or antitypes in the reference population. It is also not possible to exploit additional information about the sample which, while not known precisely, is known with a degree of confidence and can aid in the identification of types and antitypes. A Bayesian conjugate distributions approach based on the incomplete beta distribution is proposed. Bayesian extensions of this model to both classical CFA and a sequential CFA analysis advanced by KIESER and VICTOR (1991) are described.  相似文献   

6.
Open population capture‐recapture models are widely used to estimate population demographics and abundance over time. Bayesian methods exist to incorporate open population modeling with spatial capture‐recapture (SCR), allowing for estimation of the effective area sampled and population density. Here, open population SCR is formulated as a hidden Markov model (HMM), allowing inference by maximum likelihood for both Cormack‐Jolly‐Seber and Jolly‐Seber models, with and without activity center movement. The method is applied to a 12‐year survey of male jaguars (Panthera onca) in the Cockscomb Basin Wildlife Sanctuary, Belize, to estimate survival probability and population abundance over time. For this application, inference is shown to be biased when assuming activity centers are fixed over time, while including a model for activity center movement provides negligible bias and nominal confidence interval coverage, as demonstrated by a simulation study. The HMM approach is compared with Bayesian data augmentation and closed population models for this application. The method is substantially more computationally efficient than the Bayesian approach and provides a lower root‐mean‐square error in predicting population density compared to closed population models.  相似文献   

7.
Understanding causes of nest loss is critical for the management of endangered bird populations. Available methods for estimating nest loss probabilities to competing sources do not allow for random effects and covariation among sources, and there are few data simulation methods or goodness‐of‐fit (GOF) tests for such models. We developed a Bayesian multinomial extension of the widely used logistic exposure (LE) nest survival model which can incorporate multiple random effects and fixed‐effect covariates for each nest loss category. We investigated the performance of this model and the accompanying GOF test by analysing simulated nest fate datasets with and without age‐biased discovery probability, and by comparing the estimates with those of traditional fixed‐effects estimators. We then exemplify the use of the multinomial LE model and GOF test by analysing Piping Plover Charadrius melodus nest fate data (n = 443) to explore the effects of wire cages (exclosures) constructed around nests, which are used to protect nests from predation but can lead to increased nest abandonment rates. Mean parameter estimates of the random‐effects multinomial LE model were all within 1 sd of the true values used to simulate the datasets. Age‐biased discovery probability did not result in biased parameter estimates. Traditional fixed‐effects models provided estimates with a high bias of up to 43% with a mean of 71% smaller standard deviations. The GOF test identified models that were a poor fit to the simulated data. For the Piping Plover dataset, the fixed‐effects model was less well‐supported than the random‐effects model and underestimated the risk of exclosure use by 16%. The random‐effects model estimated a range of 1–6% probability of abandonment for nests not protected by exclosures across sites and 5–41% probability of abandonment for nests with exclosures, suggesting that the magnitude of exclosure‐related abandonment is site‐specific. Our results demonstrate that unmodelled heterogeneity can result in biased estimates potentially leading to incorrect management recommendations. The Bayesian multinomial LE model offers a flexible method of incorporating random effects into an analysis of nest failure and is robust to age‐biased nest discovery probability. This model can be generalized to other staggered‐entry, time‐to‐hazard situations.  相似文献   

8.
引入贝叶斯理论用以从DNA分子标记的表现型(电泳谱带)推断其基因型(DNA来源)。结果表明,根据标记座位独立贫富而确定的遗传信息不完全标记的基因型概率,与根据邻近的遗传信息完全标记的基因型和有关重组率算得的相应贝叶斯概率,通常都有很大的差异,所以在进行数量性状基因定位和标记辅助选择等工作前前,应当计算每一个体基因组上所有遗传信息不完全座位的有关基因型的贝叶斯概率,文中列出计算未知基因型的贝叶斯概率的详细过程,也讨论了贝叶斯概率的若干推广应用。  相似文献   

9.
While Bayesian analysis has become common in phylogenetics, the effects of topological prior probabilities on tree inference have not been investigated. In Bayesian analyses, the prior probability of topologies is almost always considered equal for all possible trees, and clade support is calculated from the majority rule consensus of the approximated posterior distribution of topologies. These uniform priors on tree topologies imply non-uniform prior probabilities of clades, which are dependent on the number of taxa in a clade as well as the number of taxa in the analysis. As such, uniform topological priors do not model ignorance with respect to clades. Here, we demonstrate that Bayesian clade support, bootstrap support, and jackknife support from 17 empirical studies are significantly and positively correlated with non-uniform clade priors resulting from uniform topological priors. Further, we demonstrate that this effect disappears for bootstrap and jackknife when data sets are free from character conflict, but remains pronounced for Bayesian clade supports, regardless of tree shape. Finally, we propose the use of a Bayes factor to account for the fact that uniform topological priors do not model ignorance with respect to clade probability.  相似文献   

10.
Bayesian inference in ecology   总被引:14,自引:1,他引:13  
Bayesian inference is an important statistical tool that is increasingly being used by ecologists. In a Bayesian analysis, information available before a study is conducted is summarized in a quantitative model or hypothesis: the prior probability distribution. Bayes’ Theorem uses the prior probability distribution and the likelihood of the data to generate a posterior probability distribution. Posterior probability distributions are an epistemological alternative to P‐values and provide a direct measure of the degree of belief that can be placed on models, hypotheses, or parameter estimates. Moreover, Bayesian information‐theoretic methods provide robust measures of the probability of alternative models, and multiple models can be averaged into a single model that reflects uncertainty in model construction and selection. These methods are demonstrated through a simple worked example. Ecologists are using Bayesian inference in studies that range from predicting single‐species population dynamics to understanding ecosystem processes. Not all ecologists, however, appreciate the philosophical underpinnings of Bayesian inference. In particular, Bayesians and frequentists differ in their definition of probability and in their treatment of model parameters as random variables or estimates of true values. These assumptions must be addressed explicitly before deciding whether or not to use Bayesian methods to analyse ecological data.  相似文献   

11.
Understanding mammalian evolution using Bayesian phylogenetic inference   总被引:1,自引:0,他引:1  
1. Phylogenetic trees are critical in addressing evolutionary hypotheses; however, the reconstruction of a phylogeny is no easy task. This process has recently been made less arduous by using a Bayesian statistical approach. This method offers the advantage that one can determine the probability of some hypothesis (i.e. a phylogeny), conditional on the observed data (i.e. nucleotide sequences). 2. By reconstructing phylogenies using Bayes’ theorem in combination with Markov chain Monte Carlo, phylogeneticists are able to test hypotheses more quickly compared with using standard methods such as neighbour-joining, maximum likelihood or parsimony. Critics of the Bayesian approach suggest that it is not a panacea, and argue that the prior probability is too subjective and the resulting posterior probability is too liberal compared with maximum likelihood. 3. These issues are currently debated in the arena of mammalian evolution. Recently, proponents and opponents of the Bayesian approach have constructed the mammalian phylogeny using different methods under different conditions and with a variety of parameters. These analyses showed the robustness (or lack of) of the Bayesian approach. In the end, consensus suggests that Bayesian methods are robust and give essentially the same answer as maximum likelihood methods but in less time. 4. Approaches based on fossils and molecules typically agree on ordinal-level relationships among mammals but not on higher-level relationships, as Bayesian analyses recognize the African radiation, Afrotheria, and the two Laurasian radiations, Laurasiatheria and Euarchontoglires, whereas fossils did not predict Afrotheria.  相似文献   

12.
Determination of the relative gene order on chromosomes is of critical importance in the construction of human gene maps. In this paper we develop a sequential algorithm for gene ordering. We start by comparing three sequential procedures to order three genes on the basis of Bayesian posterior probabilities, maximum-likelihood ratio, and minimal recombinant class. In the second part of the paper we extend sequential procedure based on the posterior probabilities to the general case of g genes. We present a theorem that states that the predicted average probability of committing a decision error, associated with a Bayesian sequential procedure that accepts the hypothesis of a gene-order configuration with posterior probability equal to or greater than pi *, is smaller than 1 - pi *. This theorem holds irrespective of the number of genes, the genetic model, and the source of genetic information. The theorem is an extension of a classical result of Wald, concerning the sum of the actual and the nominal error probabilities in the sequential probability ratio test of two hypotheses. A stepwise strategy for ordering a large number of genes, with control over the decision-error probabilities, is discussed. An asymptotic approximation is provided, which facilitates the calculations with existing computer software for gene mapping, of the posterior probabilities of an order and the error probabilities. We illustrate with some simulations that the stepwise ordering is an efficient procedure.  相似文献   

13.
We develop three Bayesian predictive probability functions based on data in the form of a double sample. One Bayesian predictive probability function is for predicting the true unobservable count of interest in a future sample for a Poisson model with data subject to misclassification and two Bayesian predictive probability functions for predicting the number of misclassified counts in a current observable fallible count for an event of interest. We formulate a Gibbs sampler to calculate prediction intervals for these three unobservable random variables and apply our new predictive models to calculate prediction intervals for a real‐data example. (© 2004 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

14.
The application of mixed nucleotide/doublet substitution models has recently received attention in RNA‐based phylogenetics. Within a Bayesian approach, it was shown that mixed models outperformed analyses relying on simple nucleotide models. We analysed an mt RNA data set of dragonflies representing all major lineages of Anisoptera plus outgroups, using a mixed model in a Bayesian and parsimony (MP) approach. We used a published mt 16S rRNA secondary consensus structure model and inferred consensus models for the mt 12S rRNA and tRNA valine. Secondary structure information was used to set data partitions for paired and unpaired sites on which doublet or nucleotide models were applied, respectively. Several different doublet models are currently available of which we chose the most appropriate one by a Bayes factor test. The MP reconstructions relied on recoded data for paired sites in order to account for character covariance and an application of the ratchet strategy to find most parsimonious trees. Bayesian and parsimony reconstructions are partly differently resolved, indicating sensitivity of the reconstructions to model specification. Our analyses depict a tree in which the damselfly family Lestidae is sister group to a monophyletic clade Epiophlebia + Anisoptera, contradicting recent morphological and molecular work. In Bayesian analyses, we found a deep split between Libelluloidea and a clade ‘Aeshnoidea’ within Anisoptera largely congruent with Tillyard’s early ideas of anisopteran evolution, which had been based on evidently plesiomorphic character states. However, parsimony analysis did not support a clade ‘Aeshnoidea’, but instead, placed Gomphidae as sister taxon to Libelluloidea. Monophyly of Libelluloidea is only modestly supported, and many inter‐family relationships within Libelluloidea do not receive substantial support in Bayesian and parsimony analyses. We checked whether high Bayesian node support was inflated owing to either: (i) wrong secondary consensus structures; (ii) under‐sampling of the MCMC process, thereby missing other local maxima; or (iii) unrealistic prior assumptions on topologies or branch lengths. We found that different consensus structure models exert strong influence on the reconstruction, which demonstrates the importance of taxon‐specific realistic secondary structure models in RNA phylogenetics.  相似文献   

15.
Several statistical methods have been proposed for estimating the infection prevalence based on pooled samples, but these methods generally presume the application of perfect diagnostic tests, which in practice do not exist. To optimize prevalence estimation based on pooled samples, currently available and new statistical models were described and compared. Three groups were tested: (a) Frequentist models, (b) Monte Carlo Markov‐Chain (MCMC) Bayesian models, and (c) Exact Bayesian Computation (EBC) models. Simulated data allowed the comparison of the models, including testing the performance under complex situations such as imperfect tests with a sensitivity varying according to the pool weight. In addition, all models were applied to data derived from the literature, to demonstrate the influence of the model on real‐prevalence estimates. All models were implemented in the freely available R and OpenBUGS software and are presented in Appendix S1. Bayesian models can flexibly take into account the imperfect sensitivity and specificity of the diagnostic test (as well as the influence of pool‐related or external variables) and are therefore the method of choice for calculating population prevalence based on pooled samples. However, when using such complex models, very precise information on test characteristics is needed, which may in general not be available.  相似文献   

16.
Jian Zhang  Faming Liang 《Biometrics》2010,66(4):1078-1086
Summary Clustering is a widely used method in extracting useful information from gene expression data, where unknown correlation structures in genes are believed to persist even after normalization. Such correlation structures pose a great challenge on the conventional clustering methods, such as the Gaussian mixture (GM) model, k‐means (KM), and partitioning around medoids (PAM), which are not robust against general dependence within data. Here we use the exponential power mixture model to increase the robustness of clustering against general dependence and nonnormality of the data. An expectation–conditional maximization algorithm is developed to calculate the maximum likelihood estimators (MLEs) of the unknown parameters in these mixtures. The Bayesian information criterion is then employed to determine the numbers of components of the mixture. The MLEs are shown to be consistent under sparse dependence. Our numerical results indicate that the proposed procedure outperforms GM, KM, and PAM when there are strong correlations or non‐Gaussian components in the data.  相似文献   

17.
Two procedures for predicting the carcinogenicity of chemicals are described. One of these (CASE) is a self-learning artificial intelligence system that automatically recognizes activating and/or deactivating structural subunits of candidate chemicals and uses this to determine the probability that the test chemical is or is not a carcinogen. If the chemical is predicted to be carcinogen, CASE also projects its probable potency.

The second procedure (CPBS) uses Bayesian decision theory to predict the potential carcinogenicity of chemicals based upon the results of batteries of short-term assays. CPBS is useful even if the test results are mixed (i.e. both positive and negative responses are obtained in different genotoxic assays). CPBS can also be used to identify highly predictive as well as cost-effective batteries of assays.

For illustrative purposes the ability of CASE and CPBS to predict the carcinogenicity of a carcinogenic and a non-carcinogenic polycyclic aromatic hydrocarbon is shown. The potential for using the two methods in tandem to increase reliability and decrease cost is presented.  相似文献   


18.
The discovery of rare genetic variants through next generation sequencing is a very challenging issue in the field of human genetics. We propose a novel region‐based statistical approach based on a Bayes Factor (BF) to assess evidence of association between a set of rare variants (RVs) located on the same genomic region and a disease outcome in the context of case‐control design. Marginal likelihoods are computed under the null and alternative hypotheses assuming a binomial distribution for the RV count in the region and a beta or mixture of Dirac and beta prior distribution for the probability of RV. We derive the theoretical null distribution of the BF under our prior setting and show that a Bayesian control of the false Discovery Rate can be obtained for genome‐wide inference. Informative priors are introduced using prior evidence of association from a Kolmogorov‐Smirnov test statistic. We use our simulation program, sim1000G, to generate RV data similar to the 1000 genomes sequencing project. Our simulation studies showed that the new BF statistic outperforms standard methods (SKAT, SKAT‐O, Burden test) in case‐control studies with moderate sample sizes and is equivalent to them under large sample size scenarios. Our real data application to a lung cancer case‐control study found enrichment for RVs in known and novel cancer genes. It also suggests that using the BF with informative prior improves the overall gene discovery compared to the BF with noninformative prior.  相似文献   

19.
A phylogenetic analysis of the diving beetle tribe Hydaticini Sharp (Coleoptera: Dytiscidae: Dytiscinae) is presented based on data from adult morphology, two nuclear (histone III and wingless) and two mitochondrial (cytochrome c oxidase I and II) protein‐coding genes. We explore how to best partition a data set of multiple nuclear and mitochondrial protein‐coding genes by using Bayes factor and a penalized modification of Bayes Factor. Ten biologically relevant partitioning strategies were identified ranging from all DNA analysed under a single model to each codon position of each gene treated with a separate model. Model selection criteria AIC, AICc, BIC and four ways of traversing parameter space in a hierarchical likelihood ratio test were applied to each partition. All unique partitioning and model combinations were analysed with Bayesian methods. Results show that partitioning by codon position and genome source (nuclear vs. mitochondrial) is strongly favoured over partitioning by gene. We also find evidence that Bayes Factor can penalize overparameterization even when comparing nested models. Species groups showing a strong geographical pattern were generally highly supported, however, the sister group relationship of an isolated Madagascan and Australian species were shown to be artefactual with a long‐branch extraction test. The following conclusions were supported in both the selected method of partitioning the Bayesian analysis and combined parsimony analyses: (i) the tribe Hydaticini is monophyletic (ii) the genus Hydaticus Leach is paraphyletic with respect to Prodaticus Sharp (iii) the subgenus Hydaticus (Hydaticus) is monophyletic, and (iv) the subgenus H. (Guignotites) Brinck is paraphyletic with respect to Prodaticus and the subgenera H. (Pleurodytes) Régimbart and H. (Hydaticinus) Guignot. Based on these results, Hydaticus and Prodaticus are each recognized as valid genera and Guignotites, Hydaticinus and Pleurodytes are each placed as junior synonyms of Prodaticus (new synonymies).  相似文献   

20.
In clinical trials for the comparison of two treatments it seems reasonable to stop the study if either one treatment has worked out to be markedly superior in the main effect, or one to be severely inferior with respect to an adverse side effect. Two stage sampling plans are considered for simultaneously testing a main and side effect, assumed to follow a bivariate normal distribution with known variances, but unknown correlation. The test procedure keeps the global significance level under the null hypothesis of no differences in main and side effects. The critical values are chosen under the side condition, that the probability for ending at the first or second stage with a rejection of the elementary null hypothesis for the main effect is controlled, when a particular constellation of differences in mean holds; analogously the probability of ending with a rejection of the null hypotheses for the side effect, given certain treatment differences, is controlled too. Plans “optimal” with respect to sample size are given.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号