首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Bayesian inference in ecology   总被引:14,自引:1,他引:13  
Bayesian inference is an important statistical tool that is increasingly being used by ecologists. In a Bayesian analysis, information available before a study is conducted is summarized in a quantitative model or hypothesis: the prior probability distribution. Bayes’ Theorem uses the prior probability distribution and the likelihood of the data to generate a posterior probability distribution. Posterior probability distributions are an epistemological alternative to P‐values and provide a direct measure of the degree of belief that can be placed on models, hypotheses, or parameter estimates. Moreover, Bayesian information‐theoretic methods provide robust measures of the probability of alternative models, and multiple models can be averaged into a single model that reflects uncertainty in model construction and selection. These methods are demonstrated through a simple worked example. Ecologists are using Bayesian inference in studies that range from predicting single‐species population dynamics to understanding ecosystem processes. Not all ecologists, however, appreciate the philosophical underpinnings of Bayesian inference. In particular, Bayesians and frequentists differ in their definition of probability and in their treatment of model parameters as random variables or estimates of true values. These assumptions must be addressed explicitly before deciding whether or not to use Bayesian methods to analyse ecological data.  相似文献   

2.
In order to understand fish biology and reproduction, it is important to know the fecundity patterns of individual fish, as frequently established by recording the output of mixed-sex groups of fish in a laboratory setting. However, for understanding individual reproductive health and modeling purposes it is important to estimate individual fecundity from group fecundity. We created a multistage method that disaggregates group-level data into estimates for individual-level clutch size and spawning interval distributions. The first stage of the method develops estimates of the daily spawning probability of fish. Daily spawning probabilities are then used to calculate the log likelihood of candidate distributions of clutch size. Selecting the best candidate distribution for clutch size allows for a Monte Carlo resampling of annotations of the original data which state how many fish spawned on which day. We verify this disaggregation technique by combining data from fathead minnow pairs, and checking that the disaggregation method reproduced the original clutch sizes and spawning intervals. This method will allow scientists to estimate individual clutch size and spawning interval distributions from group spawning data without specialized or elaborate experimental designs.  相似文献   

3.
The usual analysis of quantal response data occurring in diverse fields such as economics, medicine, psychology and toxicology use probit and logit models or their extensions with generalized least squares or the principle of likelihood as the method of statistical inference. The symmetric alternative models lead to practically comparable results and the choice of model or method is determined by considerations of familiarity and computational convenience. Recent attempts at improvement involve larger parametric families of tolerance distributions and employ the method of maximum likelihood in analysis. In this paper we consider models with the tolerance distributions based upon the Tukey-lambda distributions which are described in terms of their quantile functions. The likelihood methods for fitting the models and testing their adequacies are developed and illustrated using classical data due to BLISS (1935) and ASHFORD and SMITH (1964).  相似文献   

4.
The Kolmogorov-Smirnov test determines the consistency of empirical data with a particular probability distribution. Often, parameters in the distribution are unknown, and have to be estimated from the data. In this case, the Kolmogorov-Smirnov test depends on the form of the particular probability distribution under consideration, even when the estimated parameter-values are used within the distribution. In the present work, we address a less specific problem: to determine the consistency of data with a given functional form of a probability distribution (for example the normal distribution), without enquiring into values of unknown parameters in the distribution. For a wide class of distributions, we present a direct method for determining whether empirical data are consistent with a given functional form of the probability distribution. This utilizes a transformation of the data. If the data are from the class of distributions considered here, the transformation leads to an empirical distribution with no unknown parameters, and hence is susceptible to a standard Kolmogorov-Smirnov test. We give some general analytical results for some of the distributions from the class of distributions considered here. The significance level and power of the tests introduced in this work are estimated from simulations. Some biological applications of the method are given.  相似文献   

5.
A novel method for the qualification of reduced scale models (RSMs) was illustrated using data from both a 250-ml advanced microscale bioreactor (ambr) and a 5-L bioreactor RSM for a 2,000-L manufacturing scale process using a CHO cell line to produce a recombinant monoclonal antibody. The example study showed how the method was used to identify process performance attributes and product quality attributes that capture important aspects of the RSM qualification process. The method uses two novel statistical approaches: multivariate dimension reduction and data visualization techniques, via partial least squares discriminant analysis (PLS-DA), and Bayesian multivariate linear modeling for inferential analysis. Bayesian multivariate linear modeling allows for individual probability distributions of the differences of the mean of each attribute for each scale, as well as joint probability statements on the differences of the means for multiple attributes. Depending on the results of this inferential procedure, PLS-DA is used to identify the process performance outputs at the different scales which have the greatest negative impact on the multivariate Bayesian joint probabilities. Experience with that particular process can then be leveraged to adjust operating conditions to minimize these differences, and then equivalence can be reassessed using the multivariate linear model.  相似文献   

6.
7.
Short phylogenetic distances between taxa occur, for example, in studies on ribosomal RNA-genes with slow substitution rates. For consistently short distances, it is proved that in the completely singular limit of the covariance matrix ordinary least squares (OLS) estimates are minimum variance or best linear unbiased (BLU) estimates of phylogenetic tree branch lengths. Although OLS estimates are in this situation equal to generalized least squares (GLS) estimates, the GLS chi-square likelihood ratio test will be inapplicable as it is associated with zero degrees of freedom. Consequently, an OLS normal distribution test or an analogous bootstrap approach will provide optimal branch length tests of significance for consistently short phylogenetic distances. As the asymptotic covariances between branch lengths will be equal to zero, it follows that the product rule can be used in tree evaluation to calculate an approximate simultaneous confidence probability that all interior branches are positive.  相似文献   

8.
Inference of the insulin secretion rate (ISR) from C-peptide measurements as a quantification of pancreatic β-cell function is clinically important in diseases related to reduced insulin sensitivity and insulin action. ISR derived from C-peptide concentration is an example of nonparametric Bayesian model selection where a proposed ISR time-course is considered to be a "model". An inferred value of inaccessible continuous variables from discrete observable data is often problematic in biology and medicine, because it is a priori unclear how robust the inference is to the deletion of data points, and a closely related question, how much smoothness or continuity the data actually support. Predictions weighted by the posterior distribution can be cast as functional integrals as used in statistical field theory. Functional integrals are generally difficult to evaluate, especially for nonanalytic constraints such as positivity of the estimated parameters. We propose a computationally tractable method that uses the exact solution of an associated likelihood function as a prior probability distribution for a Markov-chain Monte Carlo evaluation of the posterior for the full model. As a concrete application of our method, we calculate the ISR from actual clinical C-peptide measurements in human subjects with varying degrees of insulin sensitivity. Our method demonstrates the feasibility of functional integral Bayesian model selection as a practical method for such data-driven inference, allowing the data to determine the smoothing timescale and the width of the prior probability distribution on the space of models. In particular, our model comparison method determines the discrete time-step for interpolation of the unobservable continuous variable that is supported by the data. Attempts to go to finer discrete time-steps lead to less likely models.  相似文献   

9.
Using a four-taxon example under a simple model of evolution, we show that the methods of maximum likelihood and maximum posterior probability (which is a Bayesian method of inference) may not arrive at the same optimal tree topology. Some patterns that are separately uninformative under the maximum likelihood method are separately informative under the Bayesian method. We also show that this difference has impact on the bootstrap frequencies and the posterior probabilities of topologies, which therefore are not necessarily approximately equal. Efron et al. (Proc. Natl. Acad. Sci. USA 93:13429-13434, 1996) stated that bootstrap frequencies can, under certain circumstances, be interpreted as posterior probabilities. This is true only if one includes a non-informative prior distribution of the possible data patterns, and most often the prior distributions are instead specified in terms of topology and branch lengths. [Bayesian inference; maximum likelihood method; Phylogeny; support.].  相似文献   

10.
SATTERTHWAITE'S (1941) approximation of the distribution of a linear combination, of independent mean squares is a commonly used technique in the analysis of variance. Confidence intervals and test statistics based on this approximation require that be positive. In this article, the probability that will be negative is considered in situations in which the mean squares are associated with a general balanced mixed model. Expressions are given for exact and approximate values of this probability in terms of the expected values and degrees of freedom of the mean squares. An example is presented to illustrate the implementation of the proposed methodology.  相似文献   

11.
Site occupancy models with heterogeneous detection probabilities   总被引:1,自引:0,他引:1  
Royle JA 《Biometrics》2006,62(1):97-102
Models for estimating the probability of occurrence of a species in the presence of imperfect detection are important in many ecological disciplines. In these "site occupancy" models, the possibility of heterogeneity in detection probabilities among sites must be considered because variation in abundance (and other factors) among sampled sites induces variation in detection probability (p). In this article, I develop occurrence probability models that allow for heterogeneous detection probabilities by considering several common classes of mixture distributions for p. For any mixing distribution, the likelihood has the general form of a zero-inflated binomial mixture for which inference based upon integrated likelihood is straightforward. A recent paper by Link demonstrates that in closed population models used for estimating population size, different classes of mixture distributions are indistinguishable from data, yet can produce very different inferences about population size. I demonstrate that this problem can also arise in models for estimating site occupancy in the presence of heterogeneous detection probabilities. The implications of this are discussed in the context of an application to avian survey data and the development of animal monitoring programs.  相似文献   

12.
In this article, we compare Wald-type, logarithmic transformation, and Fieller-type statistics for the classical 2-sided equivalence testing of the rate ratio under matched-pair designs with a binary end point. These statistics can be implemented through sample-based, constrained least squares estimation and constrained maximum likelihood (CML) estimation methods. Sample size formulae based on the CML estimation method are developed. We consider formulae that control a prespecified power or confidence width. Our simulation studies show that statistics based on the CML estimation method generally outperform other statistics and methods with respect to actual type I error rate and average width of confidence intervals. Also, the corresponding sample size formulae are valid asymptotically in the sense that the exact power and actual coverage probability for the estimated sample size are generally close to their prespecified values. The methods are illustrated with a real example from a clinical laboratory study.  相似文献   

13.
Neural networks are considered by many to be very promising tools for classification and prediction. The flexibility of the neural network models often result in over-fit. Shrinking the parameters using a penalized likelihood is often used in order to overcome such over-fit. In this paper we extend the approach proposed by FARAGGI and SIMON (1995a) to modeling censored survival data using the input-output relationship associated with a single hidden layer feed-forward neural network. Instead of estimating the neural network parameters using the method of maximum likelihood, we place normal prior distributions on the parameters and make inferences based on derived posterior distributions of the parameters. This Bayesian formulation will result in shrinking the parameters of the neural network model and will reduce the over-fit compared with the maximum likelihood estimators. We illustrate our proposed method on a simulated and a real example.  相似文献   

14.
Goal, Scope and Background Decision-makers demand information about the range of possible outcomes of their actions. Therefore, for developing Life Cycle Assessment (LCA) as a decision-making tool, Life Cycle Inventory (LCI) databases should provide uncertainty information. Approaches for incorporating uncertainty should be selected properly contingent upon the characteristics of the LCI database. For example, in industry-based LCI databases where large amounts of up-to-date process data are collected, statistical methods might be useful for quantifying the uncertainties. However, in practice, there is still a lack of knowledge as to what statistical methods are most effective for obtaining the required parameters. Another concern from the industry's perspective is the confidentiality of the process data. The aim of this paper is to propose a procedure for incorporating uncertainty information with statistical methods in industry-based LCI databases, which at the same time preserves the confidentiality of individual data. Methods The proposed procedure for taking uncertainty in industry-based databases into account has two components: continuous probability distributions fitted to scattering unit process data, and rank order correlation coefficients between inventory flows. The type of probability distribution is selected using statistical methods such as goodness-of-fit statistics or experience based approaches. Parameters of probability distributions are estimated using maximum likelihood estimation. Rank order correlation coefficients are calculated for inventory items in order to preserve data interdependencies. Such probability distributions and rank order correlation coefficients may be used in Monte Carlo simulations in order to quantify uncertainties in LCA results as probability distribution. Results and Discussion A case study is performed on the technology selection of polyethylene terephthalate (PET) chemical recycling systems. Three processes are evaluated based on CO2 reduction compared to the conventional incineration technology. To illustrate the application of the proposed procedure, assumptions were made about the uncertainty of LCI flows. The application of the probability distributions and the rank order correlation coefficient is shown, and a sensitivity analysis is performed. A potential use of the results of the hypothetical case study is discussed. Conclusion and Outlook The case study illustrates how the uncertainty information in LCI databases may be used in LCA. Since the actual scattering unit process data were not available for the case study, the uncertainty distribution of the LCA result is hypothetical. However, the merit of adopting the proposed procedure has been illustrated: more informed decision-making becomes possible, basing the decisions on the significance of the LCA results. With this illustration, the authors hope to encourage both database developers and data suppliers to incorporate uncertainty information in LCI databases.  相似文献   

15.
Krafty RT  Gimotty PA  Holtz D  Coukos G  Guo W 《Biometrics》2008,64(4):1023-1031
SUMMARY: In this article we develop a nonparametric estimation procedure for the varying coefficient model when the within-subject covariance is unknown. Extending the idea of iterative reweighted least squares to the functional setting, we iterate between estimating the coefficients conditional on the covariance and estimating the functional covariance conditional on the coefficients. Smoothing splines for correlated errors are used to estimate the functional coefficients with smoothing parameters selected via the generalized maximum likelihood. The covariance is nonparametrically estimated using a penalized estimator with smoothing parameters chosen via a Kullback-Leibler criterion. Empirical properties of the proposed method are demonstrated in simulations and the method is applied to the data collected from an ovarian tumor study in mice to analyze the effects of different chemotherapy treatments on the volumes of two classes of tumors.  相似文献   

16.
Several maximum likelihood and distance matrix methods for estimating phylogenetic trees from homologous DNA sequences were compared when substitution rates at sites were assumed to follow a gamma distribution. Computer simulations were performed to estimate the probabilities that various tree estimation methods recover the true tree topology. The case of four species was considered, and a few combinations of parameters were examined. Attention was applied to discriminating among different sources of error in tree reconstruction, i.e., the inconsistency of the tree estimation method, the sampling error in the estimated tree due to limited sequence length, and the sampling error in the estimated probability due to the number of simulations being limited. Compared to the least squares method based on pairwise distance estimates, the joint likelihood analysis is found to be more robust when rate variation over sites is present but ignored and an assumption is thus violated. With limited data, the likelihood method has a much higher probability of recovering the true tree and is therefore more efficient than the least squares method. The concept of statistical consistency of a tree estimation method and its implications were explored, and it is suggested that, while the efficiency (or sampling error) of a tree estimation method is a very important property, statistical consistency of the method over a wide range of, if not all, parameter values is prerequisite.  相似文献   

17.
Schafer DW 《Biometrics》2001,57(1):53-61
This paper presents an EM algorithm for semiparametric likelihood analysis of linear, generalized linear, and nonlinear regression models with measurement errors in explanatory variables. A structural model is used in which probability distributions are specified for (a) the response and (b) the measurement error. A distribution is also assumed for the true explanatory variable but is left unspecified and is estimated by nonparametric maximum likelihood. For various types of extra information about the measurement error distribution, the proposed algorithm makes use of available routines that would be appropriate for likelihood analysis of (a) and (b) if the true x were available. Simulations suggest that the semiparametric maximum likelihood estimator retains a high degree of efficiency relative to the structural maximum likelihood estimator based on correct distributional assumptions and can outperform maximum likelihood based on an incorrect distributional assumption. The approach is illustrated on three examples with a variety of structures and types of extra information about the measurement error distribution.  相似文献   

18.
Estimates of quantitative trait loci (QTL) effects derived from complete genome scans are biased, if no assumptions are made about the distribution of QTL effects. Bias should be reduced if estimates are derived by maximum likelihood, with the QTL effects sampled from a known distribution. The parameters of the distributions of QTL effects for nine economic traits in dairy cattle were estimated from a daughter design analysis of the Israeli Holstein population including 490 marker-by-sire contrasts. A separate gamma distribution was derived for each trait. Estimates for both the α and β parameters and their SE decreased as a function of heritability. The maximum likelihood estimates derived for the individual QTL effects using the gamma distributions for each trait were regressed relative to the least squares estimates, but the regression factor decreased as a function of the least squares estimate. On simulated data, the mean of least squares estimates for effects with nominal 1% significance was more than twice the simulated values, while the mean of the maximum likelihood estimates was slightly lower than the mean of the simulated values. The coefficient of determination for the maximum likelihood estimates was five-fold the corresponding value for the least squares estimates.  相似文献   

19.
BACKGROUND: Comparing distributions of data is an important goal in many applications. For example, determining whether two samples (e.g., a control and test sample) are statistically significantly different is useful to detect a response, or to provide feedback regarding instrument stability by detecting when collected data varies significantly over time. METHODS: We apply a variant of the chi-squared statistic to comparing univariate distributions. In this variant, a control distribution is divided such that an equal number of events fall into each of the divisions, or bins. This approach is thereby a mini-max algorithm, in that it minimizes the maximum expected variance for the control distribution. The control-derived bins are then applied to test sample distributions, and a normalized chi-squared value is computed. We term this algorithm Probability Binning. RESULTS: Using a Monte-Carlo simulation, we determined the distribution of chi-squared values obtained by comparing sets of events derived from the same distribution. Based on this distribution, we derive a conversion of any given chi-squared value into a metric that is analogous to a t-score, i.e., it can be used to estimate the probability that a test distribution is different from a control distribution. We demonstrate that this metric scales with the difference between two distributions, and can be used to rank samples according to similarity to a control. Finally, we demonstrate the applicability of this metric to ranking immunophenotyping distributions to suggest that it indeed can be used to objectively determine the relative distance of distributions compared to a single control. CONCLUSION: Probability Binning, as shown here, provides a useful metric for determining the probability that two or more flow cytometric data distributions are different. This metric can also be used to rank distributions to identify which are most similar or dissimilar. In addition, the algorithm can be used to quantitate contamination of even highly-overlapping populations. Finally, as demonstrated in an accompanying paper, Probability Binning can be used to gate on events that represent significantly different subsets from a control sample. Published 2001 Wiley-Liss, Inc.  相似文献   

20.
Molecular divergence time analyses often rely on the age of fossil lineages to calibrate node age estimates. Most divergence time analyses are now performed in a Bayesian framework, where fossil calibrations are incorporated as parametric prior probabilities on node ages. It is widely accepted that an ideal parameterization of such node age prior probabilities should be based on a comprehensive analysis of the fossil record of the clade of interest, but there is currently no generally applicable approach for calculating such informative priors. We provide here a simple and easily implemented method that employs fossil data to estimate the likely amount of missing history prior to the oldest fossil occurrence of a clade, which can be used to fit an informative parametric prior probability distribution on a node age. Specifically, our method uses the extant diversity and the stratigraphic distribution of fossil lineages confidently assigned to a clade to fit a branching model of lineage diversification. Conditioning this on a simple model of fossil preservation, we estimate the likely amount of missing history prior to the oldest fossil occurrence of a clade. The likelihood surface of missing history can then be translated into a parametric prior probability distribution on the age of the clade of interest. We show that the method performs well with simulated fossil distribution data, but that the likelihood surface of missing history can at times be too complex for the distribution-fitting algorithm employed by our software tool. An empirical example of the application of our method is performed to estimate echinoid node ages. A simulation-based sensitivity analysis using the echinoid data set shows that node age prior distributions estimated under poor preservation rates are significantly less informative than those estimated under high preservation rates.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号