首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Comparison of the performance and accuracy of different inference methods, such as maximum likelihood (ML) and Bayesian inference, is difficult because the inference methods are implemented in different programs, often written by different authors. Both methods were implemented in the program MIGRATE, that estimates population genetic parameters, such as population sizes and migration rates, using coalescence theory. Both inference methods use the same Markov chain Monte Carlo algorithm and differ from each other in only two aspects: parameter proposal distribution and maximization of the likelihood function. Using simulated datasets, the Bayesian method generally fares better than the ML approach in accuracy and coverage, although for some values the two approaches are equal in performance. MOTIVATION: The Markov chain Monte Carlo-based ML framework can fail on sparse data and can deliver non-conservative support intervals. A Bayesian framework with appropriate prior distribution is able to remedy some of these problems. RESULTS: The program MIGRATE was extended to allow not only for ML(-) maximum likelihood estimation of population genetics parameters but also for using a Bayesian framework. Comparisons between the Bayesian approach and the ML approach are facilitated because both modes estimate the same parameters under the same population model and assumptions.  相似文献   

2.
Problems involving thousands of null hypotheses have been addressed by estimating the local false discovery rate (LFDR). A previous LFDR approach to reporting point and interval estimates of an effect-size parameter uses an estimate of the prior distribution of the parameter conditional on the alternative hypothesis. That estimated prior is often unreliable, and yet strongly influences the posterior intervals and point estimates, causing the posterior intervals to differ from fixed-parameter confidence intervals, even for arbitrarily small estimates of the LFDR. That influence of the estimated prior manifests the failure of the conditional posterior intervals, given the truth of the alternative hypothesis, to match the confidence intervals. Those problems are overcome by changing the posterior distribution conditional on the alternative hypothesis from a Bayesian posterior to a confidence posterior. Unlike the Bayesian posterior, the confidence posterior equates the posterior probability that the parameter lies in a fixed interval with the coverage rate of the coinciding confidence interval. The resulting confidence-Bayes hybrid posterior supplies interval and point estimates that shrink toward the null hypothesis value. The confidence intervals tend to be much shorter than their fixed-parameter counterparts, as illustrated with gene expression data. Simulations nonetheless confirm that the shrunken confidence intervals cover the parameter more frequently than stated. Generally applicable sufficient conditions for correct coverage are given. In addition to having those frequentist properties, the hybrid posterior can also be motivated from an objective Bayesian perspective by requiring coherence with some default prior conditional on the alternative hypothesis. That requirement generates a new class of approximate posteriors that supplement Bayes factors modified for improper priors and that dampen the influence of proper priors on the credibility intervals. While that class of posteriors intersects the class of confidence-Bayes posteriors, neither class is a subset of the other. In short, two first principles generate both classes of posteriors: a coherence principle and a relevance principle. The coherence principle requires that all effect size estimates comply with the same probability distribution. The relevance principle means effect size estimates given the truth of an alternative hypothesis cannot depend on whether that truth was known prior to observing the data or whether it was learned from the data.  相似文献   

3.
Parameter inference and model selection are very important for mathematical modeling in systems biology. Bayesian statistics can be used to conduct both parameter inference and model selection. Especially, the framework named approximate Bayesian computation is often used for parameter inference and model selection in systems biology. However, Monte Carlo methods needs to be used to compute Bayesian posterior distributions. In addition, the posterior distributions of parameters are sometimes almost uniform or very similar to their prior distributions. In such cases, it is difficult to choose one specific value of parameter with high credibility as the representative value of the distribution. To overcome the problems, we introduced one of the population Monte Carlo algorithms, population annealing. Although population annealing is usually used in statistical mechanics, we showed that population annealing can be used to compute Bayesian posterior distributions in the approximate Bayesian computation framework. To deal with un-identifiability of the representative values of parameters, we proposed to run the simulations with the parameter ensemble sampled from the posterior distribution, named “posterior parameter ensemble”. We showed that population annealing is an efficient and convenient algorithm to generate posterior parameter ensemble. We also showed that the simulations with the posterior parameter ensemble can, not only reproduce the data used for parameter inference, but also capture and predict the data which was not used for parameter inference. Lastly, we introduced the marginal likelihood in the approximate Bayesian computation framework for Bayesian model selection. We showed that population annealing enables us to compute the marginal likelihood in the approximate Bayesian computation framework and conduct model selection depending on the Bayes factor.  相似文献   

4.
We develop a new Bayesian approach to interval estimation for both the risk difference and the risk ratio for a 2 x 2 table with a structural zero using Markov chain Monte Carlo (MCMC) methods. We also derive a normal approximation for the risk difference and a gamma approximation for the risk ratio. We then compare the coverage and interval width of our new intervals to the score-based intervals over various parameter and sample-size configurations. Finally, we consider a Bayesian method for sample-size determination.  相似文献   

5.
We present a novel approach to investigating the divergence history of biomes and their component species using single-locus data prior to investing in multilocus data. We use coalescent-based hierarchical approximate Bayesian computation (HABC) methods (MsBayes) to estimate the number and timing of discrete divergences across a putative barrier and to assign species to their appropriate period of co-divergence. We then apply a coalescent-based full Bayesian model of divergence (IMa) to suites of species shown to have simultaneously diverged. The full Bayesian model results in reduced credibility intervals around divergence times and allows other parameters associated with divergence to be summarized across species assemblages. We apply this approach to 10 bird species that are wholly or patchily discontinuous in semi-arid habitats between Australia's southwest (SW) and southeast (SE) mesic zones. There was substantial support for up to three discrete periods of divergence. HABC indicates that two species wholly restricted to more mesic habitats diverged earliest, between 594,382 and 3,417,699 years ago, three species from semi-arid habitats diverged between 0 and 1,508,049 years ago, and four diverged more recently, between 0 and 396,843 years ago. Eight species were assigned to three periods of co-divergence with confidence. For full Bayesian analyses, we accounted for uncertainty in the two remaining species by analyzing all possible suites of species. Estimates of divergence times from full Bayesian divergence models ranged between 429,105 and 2,006,355; 67,172 and 663,837; and 24,607 and 171,085 for the earliest, middle, and most recent periods of co-divergence, respectively. This single-locus approach uses the power of multitaxa coalescent analyses as an efficient means of generating a foundation for further, targeted research using multilocus and genomic tools applied to an understudied biome.  相似文献   

6.
Multilocus genealogical approaches are still uncommon in phylogeography and historical demography, fields which have been dominated by microsatellite markers and mitochondrial DNA, particularly for vertebrates. Using 30 newly developed anonymous nuclear loci, we estimated population divergence times and ancestral population sizes of three closely related species of Australian grass finches (Poephila) distributed across two barriers in northern Australia. We verified that substitution rates were generally constant both among lineages and among loci, and that intralocus recombination was uncommon in our dataset, thereby satisfying two assumptions of our multilocus analysis. The reconstructed gene trees exhibited all three possible tree topologies and displayed considerable variation in coalescent times, yet this information provided the raw data for maximum likelihood and Bayesian estimation of population divergence times and ancestral population sizes. Estimates of these parameters were in close agreement with each other regardless of statistical approach and our Bayesian estimates were robust to prior assumptions. Our results suggest that black-throated finches (Poephila cincta) diverged from long-tailed finches (P. acuticauda and P. hecki) across the Carpentarian Barrier in northeastern Australia around 0.6 million years ago (mya), and that P. acuticauda diverged from P. hecki across the Kimberley Plateau-Arnhem Land Barrier in northwestern Australia approximately 0.3 mya. Bayesian 95% credibility intervals around these estimates strongly support Pleistocene timing for both speciation events, despite the fact that many gene divergences across the Carpentarian region clearly predated the Pleistocene. Estimates of ancestral effective population sizes for the basal ancestor and long-tailed finch ancestor were large (about 521,000 and about 384,000, respectively). Although the errors around the population size parameter estimates are considerable, they are the first for birds taking into account multiple sources of variance.  相似文献   

7.
Predicting population dynamics for rare species is of paramount importance in order to evaluate the likelihood of extinction and planning conservation strategies. However, evaluating and predicting population viability can be hindered from a lack of data. Rare species frequently have small populations, so estimates of vital rates are often very uncertain due to lack of data. We evaluated the vital rates of seven small populations from two watersheds with varying light environment of a common epiphytic orchid using Bayesian methods of parameter estimation. From the Lefkovitch matrices we predicted the deterministic population growth rates, elasticities, stable stage distributions and the credible intervals of the statistics. Populations were surveyed on a monthly basis between 18–34 months. In some of the populations few or no transitions in some of the vital rates were observed throughout the sampling period, however, we were able to predict the most likely vital rates using a Bayesian model that incorporated the transitions rates from the other populations. Asymptotic population growth rate varied among the seven orchid populations. There was little difference in population growth rate among watersheds even though it was expected because of physical differences as a result of differing canopy cover and watershed width. Elasticity analyses of Lepanthes rupestris suggest that growth rate is more sensitive to survival followed by growth, shrinking and the reproductive rates. The Bayesian approach helped to estimate transition probabilities that were uncommon or variable in some populations. Moreover, it increased the precision of the parameter estimates as compared to traditional approaches.  相似文献   

8.
Pollock DD  Larkin JC 《Genetics》2004,168(1):489-502
Large-scale screens for loss-of-function mutants have played a significant role in recent advances in developmental biology and other fields. In such mutant screens, it is desirable to estimate the degree of "saturation" of the screen (i.e., what fraction of the possible target genes has been identified). We applied Bayesian and maximum-likelihood methods for estimating the number of loci remaining undetected in large-scale screens and produced credibility intervals to assess the uncertainty of these estimates. Since different loci may mutate to alleles with detectable phenotypes at different rates, we also incorporated variation in the degree of mutability among genes, using either gamma-distributed mutation rates or multiple discrete mutation rate classes. We examined eight published data sets from large-scale mutant screens and found that credibility intervals are much broader than implied by previous assumptions about the degree of saturation of screens. The likelihood methods presented here are a significantly better fit to data from published experiments than estimates based on the Poisson distribution, which implicitly assumes a single mutation rate for all loci. The results are reasonably robust to different models of variation in the mutability of genes. We tested our methods against mutant allele data from a region of the Drosophila melanogaster genome for which there is an independent genomics-based estimate of the number of undetected loci and found that the number of such loci falls within the predicted credibility interval for our models. The methods we have developed may also be useful for estimating the degree of saturation in other types of genetic screens in addition to classical screens for simple loss-of-function mutants, including genetic modifier screens and screens for protein-protein interactions using the yeast two-hybrid method.  相似文献   

9.
A popular approach to detecting positive selection is to estimate the parameters of a probabilistic model of codon evolution and perform inference based on its maximum likelihood parameter values. This approach has been evaluated intensively in a number of simulation studies and found to be robust when the available data set is large. However, uncertainties in the estimated parameter values can lead to errors in the inference, especially when the data set is small or there is insufficient divergence between the sequences. We introduce a Bayesian model comparison approach to infer whether the sequence as a whole contains sites at which the rate of nonsynonymous substitution is greater than the rate of synonymous substitution. We incorporated this probabilistic model comparison into a Bayesian approach to site-specific inference of positive selection. Using simulated sequences, we compared this approach to the commonly used empirical Bayes approach and investigated the effect of tree length on the performance of both methods. We found that the Bayesian approach outperforms the empirical Bayes method when the amount of sequence divergence is small and is less prone to false-positive inference when the sequences are saturated, while the results are indistinguishable for intermediate levels of sequence divergence.  相似文献   

10.
Summary With increasing frequency, epidemiologic studies are addressing hypotheses regarding gene‐environment interaction. In many well‐studied candidate genes and for standard dietary and behavioral epidemiologic exposures, there is often substantial prior information available that may be used to analyze current data as well as for designing a new study. In this article, first, we propose a proper full Bayesian approach for analyzing studies of gene–environment interaction. The Bayesian approach provides a natural way to incorporate uncertainties around the assumption of gene–environment independence, often used in such an analysis. We then consider Bayesian sample size determination criteria for both estimation and hypothesis testing regarding the multiplicative gene–environment interaction parameter. We illustrate our proposed methods using data from a large ongoing case–control study of colorectal cancer investigating the interaction of N‐acetyl transferase type 2 (NAT2) with smoking and red meat consumption. We use the existing data to elicit a design prior and show how to use this information in allocating cases and controls in planning a future study that investigates the same interaction parameters. The Bayesian design and analysis strategies are compared with their corresponding frequentist counterparts.  相似文献   

11.
In phylogenetic analyses with combined multigene or multiprotein data sets, accounting for differing evolutionary dynamics at different loci is essential for accurate tree prediction. Existing maximum likelihood (ML) and Bayesian approaches are computationally intensive. We present an alternative approach that is orders of magnitude faster. The method, Distance Rates (DistR), estimates rates based upon distances derived from gene/protein sequence data. Simulation studies indicate that this technique is accurate compared with other methods and robust to missing sequence data. The DistR method was applied to a fungal mitochondrial data set, and the rate estimates compared well to those obtained using existing ML and Bayesian approaches. Inclusion of the protein rates estimated from the DistR method into the ML calculation of trees as a branch length multiplier resulted in a significantly improved fit as measured by the Akaike Information Criterion (AIC). Furthermore, bootstrap support for the ML topology was significantly greater when protein rates were used, and some evident errors in the concatenated ML tree topology (i.e., without protein rates) were corrected. [Bayesian credible intervals; DistR method; multigene phylogeny; PHYML; rate heterogeneity.].  相似文献   

12.
In this study, we used an empirical example based on 100 mitochondrial genomes from higher teleost fishes to compare the accuracy of parsimony-based jackknife values with Bayesian support values. Phylogenetic analyses of 366 partitions, using differential taxon and character sampling from the entire data matrix of 100 taxa and 7,990 characters, were performed for both phylogenetic methods. The tree topology and branch-support values from each partition were compared with the tree inferred from all taxa and characters. Using this approach, we quantified the accuracy of the branch-support values assigned by the jackknife and Bayesian methods, with respect to each of 15 basal clades. In comparing the jackknife and Bayesian methods, we found that (1) both measures of support differ significantly from an ideal support index; (2) the jackknife underestimated support values; (3) the Bayesian method consistently overestimated support; (4) the magnitude by which Bayesian values overestimate support exceeds the magnitude by which the jackknife underestimates support; and (5) both methods performed poorly when taxon sampling was increased and character sampling was not increases. These results indicate that (1) the higher Bayesian support values are inappropriate (in magnitude), and (2) Bayesian support values should not be interpreted as probabilities that clades are correctly resolved. We advocate the continued use of the relatively conservative bootstrap and jackknife approaches to estimating branch support rather than the more extreme overestimates provided by the Markov Chain Monte Carlo-based Bayesian methods.  相似文献   

13.
Bayesian Markov chain Monte Carlo sampling has become increasingly popular in phylogenetics as a method for both estimating the maximum likelihood topology and for assessing nodal confidence. Despite the growing use of posterior probabilities, the relationship between the Bayesian measure of confidence and the most commonly used confidence measure in phylogenetics, the nonparametric bootstrap proportion, is poorly understood. We used computer simulation to investigate the behavior of three phylogenetic confidence methods: Bayesian posterior probabilities calculated via Markov chain Monte Carlo sampling (BMCMC-PP), maximum likelihood bootstrap proportion (ML-BP), and maximum parsimony bootstrap proportion (MP-BP). We simulated the evolution of DNA sequence on 17-taxon topologies under 18 evolutionary scenarios and examined the performance of these methods in assigning confidence to correct monophyletic and incorrect monophyletic groups, and we examined the effects of increasing character number on support value. BMCMC-PP and ML-BP were often strongly correlated with one another but could provide substantially different estimates of support on short internodes. In contrast, BMCMC-PP correlated poorly with MP-BP across most of the simulation conditions that we examined. For a given threshold value, more correct monophyletic groups were supported by BMCMC-PP than by either ML-BP or MP-BP. When threshold values were chosen that fixed the rate of accepting incorrect monophyletic relationship as true at 5%, all three methods recovered most of the correct relationships on the simulated topologies, although BMCMC-PP and ML-BP performed better than MP-BP. BMCMC-PP was usually a less biased predictor of phylogenetic accuracy than either bootstrapping method. BMCMC-PP provided high support values for correct topological bipartitions with fewer characters than was needed for nonparametric bootstrap.  相似文献   

14.
One of the main tasks when dealing with the impacts of infrastructures on wildlife is to identify hotspots of high mortality so one can devise and implement mitigation measures. A common strategy to identify hotspots is to divide an infrastructure into several segments and determine when the number of collisions in a segment is above a given threshold, reflecting a desired significance level that is obtained assuming a probability distribution for the number of collisions, which is often the Poisson distribution. The problem with this approach, when applied to each segment individually, is that the probability of identifying false hotspots (Type I error) is potentially high. The way to solve this problem is to recognize that it requires multiple testing corrections or a Bayesian approach. Here, we apply three different methods that implement the required corrections to the identification of hotspots: (i) the familywise error rate correction, (ii) the false discovery rate, and (iii) a Bayesian hierarchical procedure. We illustrate the application of these methods with data on two bird species collected on a road in Brazil. The proposed methods provide practitioners with procedures that are reliable and simple to use in real situations and, in addition, can reflect a practitioner’s concerns towards identifying false positive or missing true hotspots. Although one may argue that an overly cautionary approach (reducing the probability of type I error) may be beneficial from a biological conservation perspective, it may lead to a waste of resources and, probably worse, it may raise doubts about the methodology adopted and the credibility of those suggesting it.  相似文献   

15.
The objective of this study was to obtain a quantitative assessment of the monophyly of morning glory taxa, specifically the genus Ipomoea and the tribe Argyreieae. Previous systematic studies of morning glories intimated the paraphyly of Ipomoea by suggesting that the genera within the tribe Argyreieae are derived from within Ipomoea; however, no quantitative estimates of statistical support were developed to address these questions. We applied a Bayesian analysis to provide quantitative estimates of monophyly in an investigation of morning glory relationships using DNA sequence data. We also explored various approaches for examining convergence of the Markov chain Monte Carlo (MCMC) simulation of the Bayesian analysis by running 18 separate analyses varying in length. We found convergence of the important components of the phylogenetic model (the tree with the maximum posterior probability, branch lengths, the parameter values from the DNA substitution model, and the posterior probabilities for clade support) for these data after one million generations of the MCMC simulations. In the process, we identified a run where the parameter values obtained were often outside the range of values obtained from the other runs, suggesting an aberrant result. In addition, we compared the Bayesian method of phylogenetic analysis to maximum likelihood and maximum parsimony. The results from the Bayesian analysis and the maximum likelihood analysis were similar for topology, branch lengths, and parameters of the DNA substitution model. Topologies also were similar in the comparison between the Bayesian analysis and maximum parsimony, although the posterior probabilities and the bootstrap proportions exhibited some striking differences. In a Bayesian analysis of three data sets (ITS sequences, waxy sequences, and ITS + waxy sequences) no supoort for the monophyly of the genus Ipomoea, or for the tribe Argyreieae, was observed, with the estimate of the probability of the monophyly of these taxa being less than 3.4 x 10(-7).  相似文献   

16.
Numerous statistical methods have been developed for analyzing high‐dimensional data. These methods often focus on variable selection approaches but are limited for the purpose of testing with high‐dimensional data. They are often required to have explicit‐likelihood functions. In this article, we propose a “hybrid omnibus test” for high‐dicmensional data testing purpose with much weaker requirements. Our hybrid omnibus test is developed under a semiparametric framework where a likelihood function is no longer necessary. Our test is a version of a frequentist‐Bayesian hybrid score‐type test for a generalized partially linear single‐index model, which has a link function being a function of a set of variables through a generalized partially linear single index. We propose an efficient score based on estimating equations, define local tests, and then construct our hybrid omnibus test using local tests. We compare our approach with an empirical‐likelihood ratio test and Bayesian inference based on Bayes factors, using simulation studies. Our simulation results suggest that our approach outperforms the others, in terms of type I error, power, and computational cost in both the low‐ and high‐dimensional cases. The advantage of our approach is demonstrated by applying it to genetic pathway data for type II diabetes mellitus.  相似文献   

17.
Clustering of multivariate data is a commonly used technique in ecology, and many approaches to clustering are available. The results from a clustering algorithm are uncertain, but few clustering approaches explicitly acknowledge this uncertainty. One exception is Bayesian mixture modelling, which treats all results probabilistically, and allows comparison of multiple plausible classifications of the same data set. We used this method, implemented in the AutoClass program, to classify catchments (watersheds) in the Murray Darling Basin (MDB), Australia, based on their physiographic characteristics (e.g. slope, rainfall, lithology). The most likely classification found nine classes of catchments. Members of each class were aggregated geographically within the MDB. Rainfall and slope were the two most important variables that defined classes. The second-most likely classification was very similar to the first, but had one fewer class. Increasing the nominal uncertainty of continuous data resulted in a most likely classification with five classes, which were again aggregated geographically. Membership probabilities suggested that a small number of cases could be members of either of two classes. Such cases were located on the edges of groups of catchments that belonged to one class, with a group belonging to the second-most likely class adjacent. A comparison of the Bayesian approach to a distance-based deterministic method showed that the Bayesian mixture model produced solutions that were more spatially cohesive and intuitively appealing. The probabilistic presentation of results from the Bayesian classification allows richer interpretation, including decisions on how to treat cases that are intermediate between two or more classes, and whether to consider more than one classification. The explicit consideration and presentation of uncertainty makes this approach useful for ecological investigations, where both data and expectations are often highly uncertain.  相似文献   

18.
The linear receptive field describes a mapping from sensory stimuli to a one-dimensional variable governing a neuron's spike response. However, traditional receptive field estimators such as the spike-triggered average converge slowly and often require large amounts of data. Bayesian methods seek to overcome this problem by biasing estimates towards solutions that are more likely a priori, typically those with small, smooth, or sparse coefficients. Here we introduce a novel Bayesian receptive field estimator designed to incorporate locality, a powerful form of prior information about receptive field structure. The key to our approach is a hierarchical receptive field model that flexibly adapts to localized structure in both spacetime and spatiotemporal frequency, using an inference method known as empirical Bayes. We refer to our method as automatic locality determination (ALD), and show that it can accurately recover various types of smooth, sparse, and localized receptive fields. We apply ALD to neural data from retinal ganglion cells and V1 simple cells, and find it achieves error rates several times lower than standard estimators. Thus, estimates of comparable accuracy can be achieved with substantially less data. Finally, we introduce a computationally efficient Markov Chain Monte Carlo (MCMC) algorithm for fully Bayesian inference under the ALD prior, yielding accurate Bayesian confidence intervals for small or noisy datasets.  相似文献   

19.
The dynamics of species diversification rates are a key component of macroevolutionary patterns. Although not absolutely necessary, the use of divergence times inferred from sequence data has led to development of more powerful methods for inferring diversification rates. However, it is unclear what impact uncertainty in age estimates have on diversification rate inferences. Here, we quantify these effects using both Bayesian and frequentist methodology. Through simulation, we demonstrate that adding sequence data results in more precise estimates of internal node ages, but a reasonable approximation of these node ages is often sufficient to approach the theoretical minimum variance in speciation rate estimates. We also find that even crude estimates of divergence times increase the power of tests of diversification rate differences between sister clades. Finally, because Bayesian and frequentist methods provided similar assessments of error, novel Bayesian approaches may provide a useful framework for tests of diversification rates in more complex contexts than are addressed here.  相似文献   

20.
We describe a probabilistic approach to simultaneous image segmentation and intensity estimation for complementary DNA microarray experiments. The approach overcomes several limitations of existing methods. In particular, it (a) uses a flexible Markov random field approach to segmentation that allows for a wider range of spot shapes than existing methods, including relatively common 'doughnut-shaped' spots; (b) models the image directly as background plus hybridization intensity, and estimates the two quantities simultaneously, avoiding the common logical error that estimates of foreground may be less than those of the corresponding background if the two are estimated separately; and (c) uses a probabilistic modeling approach to simultaneously perform segmentation and intensity estimation, and to compute spot quality measures. We describe two approaches to parameter estimation: a fast algorithm, based on the expectation-maximization and the iterated conditional modes algorithms, and a fully Bayesian framework. These approaches produce comparable results, and both appear to offer some advantages over other methods. We use an HIV experiment to compare our approach to two commercial software products: Spot and Arrayvision.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号