首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 531 毫秒
1.
In this paper we analyze the fraction of non-disjunction in Meiosis I assuming reference (non-informative) priors. We consider Jeffreys's approach to built a non-informative prior (Jeffreys's prior) for the fraction of non-disjunction in Meiosis I. We prove that Jeffreys's prior is a proper distribution. We perform Monte Carlo studies in order to compare Bayes estimates obtained assuming Jeffreys's and uniform priors. We consider full Bayesian significance test (FBST) and Bayes factor (BF) for testing precise hypothesis on the fraction of non-disjunction in Meiosis I. The ultimate goal of this paper is to compare these two test procedures through simulation studies using both prior specifications. An application to Down Syndrome data is also presented.  相似文献   

2.
Haas PJ  Liu Y  Stokes L 《Biometrics》2006,62(1):135-141
We consider the problem of estimating the number of distinct species S in a study area from the recorded presence or absence of species in each of a sample of quadrats. A generalized jackknife estimator of S is derived, along with an estimate of its variance. It is compared with the jackknife estimator for S proposed by Heltshe and Forrester and the empirical Bayes estimator of Mingoti and Meeden. We show that the empirical Bayes estimator has the form of a generalized jackknife estimator under a specific model for species distribution. We compare the new estimators of S to the empirical Bayes estimator via simulation. We characterize circumstances under which each is superior.  相似文献   

3.
A Bayesian method is presented for estimating mortality rates of specific diseases when the frequency of deaths over a specified time period is assumed to have a Poisson distribution with mean proportional to the population size. The estimators use information from related populations, each having its own rate which is assumed distributed according to a common prior distribution about which some information is available. The study was motivated by an epidemiological study on the geographic variation of cancer mortality in the state of Missouri. Data from this study are used to illustrate the method and to compare it to a somewhat simpler empirical Bayes method.  相似文献   

4.
The linear receptive field describes a mapping from sensory stimuli to a one-dimensional variable governing a neuron's spike response. However, traditional receptive field estimators such as the spike-triggered average converge slowly and often require large amounts of data. Bayesian methods seek to overcome this problem by biasing estimates towards solutions that are more likely a priori, typically those with small, smooth, or sparse coefficients. Here we introduce a novel Bayesian receptive field estimator designed to incorporate locality, a powerful form of prior information about receptive field structure. The key to our approach is a hierarchical receptive field model that flexibly adapts to localized structure in both spacetime and spatiotemporal frequency, using an inference method known as empirical Bayes. We refer to our method as automatic locality determination (ALD), and show that it can accurately recover various types of smooth, sparse, and localized receptive fields. We apply ALD to neural data from retinal ganglion cells and V1 simple cells, and find it achieves error rates several times lower than standard estimators. Thus, estimates of comparable accuracy can be achieved with substantially less data. Finally, we introduce a computationally efficient Markov Chain Monte Carlo (MCMC) algorithm for fully Bayesian inference under the ALD prior, yielding accurate Bayesian confidence intervals for small or noisy datasets.  相似文献   

5.
Millar RB 《Biometrics》2004,60(2):536-542
Priors are seldom unequivocal and an important component of Bayesian modeling is assessment of the sensitivity of the posterior to the specified prior distribution. This is especially true in fisheries science where the Bayesian approach has been promoted as a rigorous method for including existing information from previous surveys and from related stocks or species. These informative priors may be highly contested by various interest groups. Here, formulae for the first and second derivatives of Bayes estimators with respect to hyper-parameters of the joint prior density are given. The formula for the second derivative provides a correction to a previously published result. The formulae are shown to reduce to very convenient and easily implemented forms when the hyper-parameters are for exponential family marginal priors. For model parameters with such priors it is shown that the ratio of posterior variance to prior variance can be interpreted as the sensitivity of the posterior mean to the prior mean. This methodology is applied to a nonlinear state-space model for the biomass of South Atlantic albacore tuna and sensitivity of the maximum sustainable yield to the prior specification is examined.  相似文献   

6.
This article explains estimation of gene frequencies from a Bayesian viewpoint using prior information. How to obtain Bayes estimators and the highest posterior density credible sets (Bayesian counterpart to classical confidence intervals) for gene frequencies is described. Tests of hypotheses are also discussed. A readily available mathematical application package is used to demonstrate the mathematical computations.  相似文献   

7.
A popular approach to detecting positive selection is to estimate the parameters of a probabilistic model of codon evolution and perform inference based on its maximum likelihood parameter values. This approach has been evaluated intensively in a number of simulation studies and found to be robust when the available data set is large. However, uncertainties in the estimated parameter values can lead to errors in the inference, especially when the data set is small or there is insufficient divergence between the sequences. We introduce a Bayesian model comparison approach to infer whether the sequence as a whole contains sites at which the rate of nonsynonymous substitution is greater than the rate of synonymous substitution. We incorporated this probabilistic model comparison into a Bayesian approach to site-specific inference of positive selection. Using simulated sequences, we compared this approach to the commonly used empirical Bayes approach and investigated the effect of tree length on the performance of both methods. We found that the Bayesian approach outperforms the empirical Bayes method when the amount of sequence divergence is small and is less prone to false-positive inference when the sequences are saturated, while the results are indistinguishable for intermediate levels of sequence divergence.  相似文献   

8.
Curation and interpretation of copy number variants identified by genome-wide testing is challenged by the large number of events harbored in each personal genome. Conventional determination of phenotypic relevance relies on patterns of higher frequency in affected individuals versus controls; however, an increasing amount of ascertained variation is rare or private to clans. Consequently, frequency data have less utility to resolve pathogenic from benign. One solution is disease-specific algorithms that leverage gene knowledge together with variant frequency to aid prioritization. We used large-scale resources including Gene Ontology, protein-protein interactions and other annotation systems together with a broad set of 83 genes with known associations to epilepsy to construct a pathogenicity score for the phenotype. We evaluated the score for all annotated human genes and applied Bayesian methods to combine the derived pathogenicity score with frequency information from our diagnostic laboratory. Analysis determined Bayes factors and posterior distributions for each gene. We applied our method to subjects with abnormal chromosomal microarray results and confirmed epilepsy diagnoses gathered by electronic medical record review. Genes deleted in our subjects with epilepsy had significantly higher pathogenicity scores and Bayes factors compared to subjects referred for non-neurologic indications. We also applied our scores to identify a recently validated epilepsy gene in a complex genomic region and to reveal candidate genes for epilepsy. We propose a potential use in clinical decision support for our results in the context of genome-wide screening. Our approach demonstrates the utility of integrative data in medical genomics.  相似文献   

9.
Estimating the number of species in a stochastic abundance model   总被引:1,自引:0,他引:1  
Chao A  Bunge J 《Biometrics》2002,58(3):531-539
Consider a stochastic abundance model in which the species arrive in the sample according to independent Poisson processes, where the abundance parameters of the processes follow a gamma distribution. We propose a new estimator of the number of species for this model. The estimator takes the form of the number of duplicated species (i.e., species represented by two or more individuals) divided by an estimated duplication fraction. The duplication fraction is estimated from all frequencies including singleton information. The new estimator is closely related to the sample coverage estimator presented by Chao and Lee (1992, Journal of the American Statistical Association 87, 210-217). We illustrate the procedure using the Malayan butterfly data discussed by Fisher, Corbet, and Williams (1943, Journal of Animal Ecology 12, 42-58) and a 1989 Christmas Bird Count dataset collected in Florida, U.S.A. Simulation studies show that this estimator compares well with maximum likelihood estimators (i.e., empirical Bayes estimators from the Bayesian viewpoint) for which an iterative numerical procedure is needed and may be infeasible.  相似文献   

10.
Pan W  Lin X  Zeng D 《Biometrics》2006,62(2):402-412
We propose a new class of models, transition measurement error models, to study the effects of covariates and the past responses on the current response in longitudinal studies when one of the covariates is measured with error. We show that the response variable conditional on the error-prone covariate follows a complex transition mixed effects model. The naive model obtained by ignoring the measurement error correctly specifies the transition part of the model, but misspecifies the covariate effect structure and ignores the random effects. We next study the asymptotic bias in naive estimator obtained by ignoring the measurement error for both continuous and discrete outcomes. We show that the naive estimator of the regression coefficient of the error-prone covariate is attenuated, while the naive estimators of the regression coefficients of the past responses are generally inflated. We then develop a structural modeling approach for parameter estimation using the maximum likelihood estimation method. In view of the multidimensional integration required by full maximum likelihood estimation, an EM algorithm is developed to calculate maximum likelihood estimators, in which Monte Carlo simulations are used to evaluate the conditional expectations in the E-step. We evaluate the performance of the proposed method through a simulation study and apply it to a longitudinal social support study for elderly women with heart disease. An additional simulation study shows that the Bayesian information criterion (BIC) performs well in choosing the correct transition orders of the models.  相似文献   

11.
ABSTRACT: BACKGROUND: An important question in the analysis of biochemical data is that of identifying subsets of molecular variables that may jointly influence a biological response. Statistical variable selection methods have been widely used for this purpose. In many settings, it may be important to incorporate ancillary biological information concerning the variables of interest. Pathway and network maps are one example of a source of such information. However, although ancillary information is increasingly available, it is not always clear how it should be used nor how it should be weighted in relation to primary data. RESULTS: We put forward an approach in which biological knowledge is incorporated using informative prior distributions over variable subsets, with prior information selected and weighted in an automated, objective manner using an empirical Bayes formulation. We employ continuous, linear models with interaction terms and exploit biochemically-motivated sparsity constraints to permit exact inference. We show an example of priors for pathway- and network-based information and illustrate our proposed method on both synthetic response data and by an application to cancer drug response data. Comparisons are also made to alternative Bayesian and frequentist penalised-likelihood methods for incorporating network-based information. CONCLUSIONS: The empirical Bayes method proposed here can aid prior elicitation for Bayesian variable selection studies and help to guard against mis-specification of priors. Empirical Bayes, together with the proposed pathway-based priors, results in an approach with a competitive variable selection performance. In addition, the overall procedure is fast, deterministic, and has very few user-set parameters, yet is capable of capturing interplay between molecular players. The approach presented is general and readily applicable in any setting with multiple sources of biological prior knowledge.  相似文献   

12.
Ball RD 《Genetics》2007,177(4):2399-2416
We calculate posterior probabilities for candidate genes as a function of genomic location. Posterior probabilities for quantitative trait loci (QTL) presence in a small interval are calculated using a Bayesian model-selection approach based on the Bayesian information criterion (BIC) and used to combine QTL colocation information with sequence-specific evidence, e.g., from differential expression and/or association studies. Our method takes into account uncertainty in estimation of number and locations of QTL and estimated map position. Posterior probabilities for QTL presence were calculated for simulated data with n = 100, 300, and 1200 QTL progeny and compared with interval mapping and composite-interval mapping. Candidate genes that mapped to QTL regions had substantially larger posterior probabilities. Among candidates with a given Bayes factor, those that map near a QTL are more promising for further investigation with association studies and functional testing or for use in marker-aided selection. The BIC is shown to correspond very closely to Bayes factors for linear models with a nearly noninformative Zellner prior for the simulated QTL data with n > or = 100. It is shown how to modify the BIC to use a subjective prior for the QTL effects.  相似文献   

13.
Plant disease is responsible for major losses in agriculture throughout the world. Diseases are often spread by insect organisms that transmit a bacterium, virus, or other pathogen. To assess disease epidemics, plant pathologists often use multiple-vector-transfers. In such contexts, groups of insect vectors are moved from an infected source to each of n test plants that will then be observed for developing symptoms of infection. The purpose of this paper is to present new estimators for p, the probability of pathogen transmission for an individual vector, motivated from an empirical Bayesian approach. We specifically investigate four such estimators, characterize their small-sample properties, and propose new credible intervals for p. These estimators remove the need to specify hyperparameters a priori and are shown to be easier to compute than the classical Bayes estimators proposed by Chaubey and Li (1995, Journal of Official Statistics 11, 1035-1046) and Chick (1996, Biometrics 52, 1055-1062). Furthermore, some of these estimators are shown to have better frequentist properties than the commonly used maximum likelihood estimator and to provide a smaller Bayes risk than the estimator proposed by Burrows (1987, Phytopathology 77, 363-365).  相似文献   

14.
Codon-based substitution models are routinely used to measure selective pressures acting on protein-coding genes. To this effect, the nonsynonymous to synonymous rate ratio (dN/dS = omega) is estimated. The proportion of amino-acid sites potentially under positive selection, as indicated by omega > 1, is inferred by fitting a probability distribution where some sites are permitted to have omega > 1. These sites are then inferred by means of an empirical Bayes or by a Bayes empirical Bayes approach that, respectively, ignores or accounts for sampling errors in maximum-likelihood estimates of the distribution used to infer the proportion of sites with omega > 1. Here, we extend a previous full-Bayes approach to include models with high power and low false-positive rates when inferring sites under positive selection. We propose some heuristics to alleviate the computational burden, and show that (i) full Bayes can be superior to empirical Bayes when analyzing a small data set or small simulated data, (ii) full Bayes has only a small advantage over Bayes empirical Bayes with our small test data, and (iii) Bayesian methods appear relatively insensitive to mild misspecifications of the random process generating adaptive evolution in our simulations, but in practice can prove extremely sensitive to model specification. We suggest that the codon model used to detect amino acids under selection should be carefully selected, for instance using Akaike information criterion (AIC).  相似文献   

15.
In this paper we introduce a misclassification model for the meiosis I non‐disjunction fraction in numerical chromosomal anomalies named trisomies. We obtain posteriors, and their moments, for the probability that a non‐disjunction occurs in the first division of meiosis and for the misclassification errors. We also extend previous works by providing the exact posterior, and its moments, for the probability that a non‐disjunction occurs in the first division of meiosis assuming the model proposed in the literature which does not consider that data are subject to misclassification. We perform Monte Carlo studies in order to compare Bayes estimates obtained by using both models. An application to Down Syndrome data is also presented. (© 2008 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

16.
We propose a new statistical method for constructing a genetic network from microarray gene expression data by using a Bayesian network. An essential point of Bayesian network construction is the estimation of the conditional distribution of each random variable. We consider fitting nonparametric regression models with heterogeneous error variances to the microarray gene expression data to capture the nonlinear structures between genes. Selecting the optimal graph, which gives the best representation of the system among genes, is still a problem to be solved. We theoretically derive a new graph selection criterion from Bayes approach in general situations. The proposed method includes previous methods based on Bayesian networks. We demonstrate the effectiveness of the proposed method through the analysis of Saccharomyces cerevisiae gene expression data newly obtained by disrupting 100 genes.  相似文献   

17.
We propose a Bayesian method for testing molecular clock hypotheses for use with aligned sequence data from multiple taxa. Our method utilizes a nonreversible nucleotide substitution model to avoid the necessity of specifying either a known tree relating the taxa or an outgroup for rooting the tree. We employ reversible jump Markov chain Monte Carlo to sample from the posterior distribution of the phylogenetic model parameters and conduct hypothesis testing using Bayes factors, the ratio of the posterior to prior odds of competing models. Here, the Bayes factors reflect the relative support of the sequence data for equal rates of evolutionary change between taxa versus unequal rates, averaged over all possible phylogenetic parameters, including the tree and root position. As the molecular clock model is a restriction of the more general unequal rates model, we use the Savage-Dickey ratio to estimate the Bayes factors. The Savage-Dickey ratio provides a convenient approach to calculating Bayes factors in favor of sharp hypotheses. Critical to calculating the Savage-Dickey ratio is a determination of the prior induced on the modeling restrictions. We demonstrate our method on a well-studied mtDNA sequence data set consisting of nine primates. We find strong support against a global molecular clock, but do find support for a local clock among the anthropoids. We provide mathematical derivations of the induced priors on branch length restrictions assuming equally likely trees. These derivations also have more general applicability to the examination of prior assumptions in Bayesian phylogenetics.  相似文献   

18.
Yi N  Xu S 《Genetics》2000,156(1):411-422
Variance component analysis of quantitative trait loci (QTL) is an important strategy of genetic mapping for complex traits in humans. The method is robust because it can handle an arbitrary number of alleles with arbitrary modes of gene actions. The variance component method is usually implemented using the proportion of alleles with identity-by-descent (IBD) shared by relatives. As a result, information about marker linkage phases in the parents is not required. The method has been studied extensively under either the maximum-likelihood framework or the sib-pair regression paradigm. However, virtually all investigations are limited to normally distributed traits under a single QTL model. In this study, we develop a Bayes method to map multiple QTL. We also extend the Bayesian mapping procedure to identify QTL responsible for the variation of complex binary diseases in humans under a threshold model. The method can also treat the number of QTL as a parameter and infer its posterior distribution. We use the reversible jump Markov chain Monte Carlo method to infer the posterior distributions of parameters of interest. The Bayesian mapping procedure ends with an estimation of the joint posterior distribution of the number of QTL and the locations and variances of the identified QTL. Utilities of the method are demonstrated using a simulated population consisting of multiple full-sib families.  相似文献   

19.
This paper uses the analysis of a data set to examine a number of issues in Bayesian statistics and the application of MCMC methods. The data concern the selectivity of fishing nets and logistic regression is used to relate the size of a fish to the probability it will be retained or escape from a trawl net. Hierarchical models relate information from different trawls and posterior distributions are determined using MCMC. Centring data is shown to radically reduce autocorrelation in chains and Rao‐Blackwellisation and chain‐thinning are found to have little effect on parameter estimates. The results of four convergence diagnostics are compared and the sensitivity of the posterior distribution to the prior distribution is examined using a novel method. Nested models are fitted to the data and compared using intrinsic Bayes factors, pseudo‐Bayes factors and credible intervals.  相似文献   

20.
Understanding the functional relationship between the sample size and the performance of species richness estimators is necessary to optimize limited sampling resources against estimation error. Nonparametric estimators such as Chao and Jackknife demonstrate strong performances, but consensus is lacking as to which estimator performs better under constrained sampling. We explore a method to improve the estimators under such scenario. The method we propose involves randomly splitting species‐abundance data from a single sample into two equally sized samples, and using an appropriate incidence‐based estimator to estimate richness. To test this method, we assume a lognormal species‐abundance distribution (SAD) with varying coefficients of variation (CV), generate samples using MCMC simulations, and use the expected mean‐squared error as the performance criterion of the estimators. We test this method for Chao, Jackknife, ICE, and ACE estimators. Between abundance‐based estimators with the single sample, and incidence‐based estimators with the split‐in‐two samples, Chao2 performed the best when CV < 0.65, and incidence‐based Jackknife performed the best when CV > 0.65, given that the ratio of sample size to observed species richness is greater than a critical value given by a power function of CV with respect to abundance of the sampled population. The proposed method increases the performance of the estimators substantially and is more effective when more rare species are in an assemblage. We also show that the splitting method works qualitatively similarly well when the SADs are log series, geometric series, and negative binomial. We demonstrate an application of the proposed method by estimating richness of zooplankton communities in samples of ballast water. The proposed splitting method is an alternative to sampling a large number of individuals to increase the accuracy of richness estimations; therefore, it is appropriate for a wide range of resource‐limited sampling scenarios in ecology.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号