期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Statistical test for detecting overdispersion in offspring number based on kinship information

Tetsuya Akita 《Population Ecology》2018,60(4):297-308

In this paper, we develop a theory of a new statistic that tests overdispersion in offspring number on the basis of exactly known kinship relationships. The statistic utilizes the sample size and the number of kinship pairs found in a sample, specially the number of mother–offspring (MO) and maternal–half-sibling (MHS) pairs. Given a sufficiently large sample size, the statistic proposed in this paper approximately follows a standard-normal distribution under non-overdispersed conditions (Poisson’s variance). We found that (1) the value of the statistic (\(\ge 2\) or \(<2\)) reasonably indicates whether reproduction is overdispersed at the 5% significance level; (2) the power of the statistic is determined primarily by the balance between the degree of overdispersion and the sample size; (3) in many cases, if the number of kinship pairs can be approximated by a normal distribution, false-positive and false-negative situations can be avoided. The proposed method can detect moderate-weak levels of overdispersion that produce few MHS pairs in a sample because the effect of the population size (which determines the number of detected MHS pairs) is canceled by the detection of the number of MO pairs. Once the kinship determination procedure is established, this indirect measurement will be readily applicable to species even with weak overdispersion, expanding the available opportunities for understanding how overdispersion in offspring number affects ecological processes. 相似文献

2.

Identifying and Quantifying Heterogeneity in High Content Analysis: Application of Heterogeneity Indices to Drug Discovery

Albert H. Gough Ning Chen Tong Ying Shun Timothy R. Lezon Robert C. Boltz Celeste E. Reese Jacob Wagner Lawrence A. Vernetti Jennifer R. Grandis Adrian V. Lee Andrew M. Stern Mark E. Schurdak D. Lansing Taylor 《PloS one》2014,9(7)

One of the greatest challenges in biomedical research, drug discovery and diagnostics is understanding how seemingly identical cells can respond differently to perturbagens including drugs for disease treatment. Although heterogeneity has become an accepted characteristic of a population of cells, in drug discovery it is not routinely evaluated or reported. The standard practice for cell-based, high content assays has been to assume a normal distribution and to report a well-to-well average value with a standard deviation. To address this important issue we sought to define a method that could be readily implemented to identify, quantify and characterize heterogeneity in cellular and small organism assays to guide decisions during drug discovery and experimental cell/tissue profiling. Our study revealed that heterogeneity can be effectively identified and quantified with three indices that indicate diversity, non-normality and percent outliers. The indices were evaluated using the induction and inhibition of STAT3 activation in five cell lines where the systems response including sample preparation and instrument performance were well characterized and controlled. These heterogeneity indices provide a standardized method that can easily be integrated into small and large scale screening or profiling projects to guide interpretation of the biology, as well as the development of therapeutics and diagnostics. Understanding the heterogeneity in the response to perturbagens will become a critical factor in designing strategies for the development of therapeutics including targeted polypharmacology. 相似文献

3.

Population trends from count data: Handling environmental bias,overdispersion and excess of zeroes

《Ecological Informatics》2022

The assessment of population trends is a key point in wildlife conservation. Survey data collected over long period may not be comparable due to the presence of environmental biases (i.e. inadequate representation of the variability of environmental covariates in the study area). Moreover, count data may be affected by both overdispersion (i.e. the variance is larger than the mean) and excess of zero counts (potentially leading to zero inflation). The aim of this study was to define a modelling procedure to assess long-term population trends that addressed these three issues and to shed light on the effects of environmental bias, overdispersion, and zero inflation on trend estimates. To test our procedure, we used six bird species whose data were collected in northern Italy from 1992 to 2019. We designed a multi-step approach. First, using generalised additive models (GAMs), we implemented a full factorial design of models (eight models per species) taking or not into account the environmental bias (including or not including environmental covariates, respectively), overdispersion (using a negative binomial distribution or a Poisson distribution, respectively), and zero inflation (using or not using zero-inflated models, respectively). Models were ranked according to the Akaike Information Criterion. Second, annual population indices (median and 95% confidence interval of the number of breeding pairs per point count) were predicted through a parametric bootstrap procedure. Third, long-term population trends were assessed and tested for significance fitting weighted least square linear regression models to the predicted annual indices. To evaluate the effect of environmental bias, overdispersion, and zero inflation on trend estimates, an average discrepancy index was calculated for each model group. The results showed that environmental bias was the most important driver in determining different trend estimates, although overlooking overdispersion and zero inflation could lead to misleading results. For five species, zero-inflated GAMs resulted the best models to predict annual population indices. Our findings suggested a mutual interaction between zero inflation and overdispersion, with overdispersion arising in non-zero-inflated models. Moreover, for species having flocking foraging and/or colonial breeding behaviours, overdispersed and zero-inflated models may be more adequate. In conclusion, properly handling environmental bias, which may affect several data sets coming from long-term monitoring programs, is crucial to obtain reliable estimates of population trends. Furthermore, the extent to which overdispersion and zero inflation may affect trend estimates should be assessed by comparing different models, rather than presumed using statistical assumption. 相似文献

4.

Modeling Exon-Specific Bias Distribution Improves the Analysis of RNA-Seq Data

Xuejun Liu Li Zhang Songcan Chen 《PloS one》2015,10(10)

相似文献

5.

Exact inference for matched case-control studies 总被引：1，自引：0，他引：1

K F Hirji C R Mehta N R Patel 《Biometrics》1988,44(3):803-814

In an epidemiological study with a small sample size or a sparse data structure, the use of an asymptotic method of analysis may not be appropriate. In this paper we present an alternative method of analyzing data for case-control studies with a matched design that does not rely on large-sample assumptions. A recursive algorithm to compute the exact distribution of the conditional sufficient statistics of the parameters of the logistic model for such a design is given. This distribution can be used to perform exact inference on model parameters, the methodology of which is outlined. To illustrate the exact method, and compare it with the conventional asymptotic method, analyses of data from two case-control studies are also presented. 相似文献

6.

THE DIRICHLET-MULTINOMIAL MODEL: ACCOUNTING FOR INTER-TRIAL VARIATION IN REPLICATED RATINGS

DANIEL M. ENNIS JIAN BI 《Journal of sensory studies》1999,14(3):321-345

Differences in sensory acuity and hedonic reactions to products lead to latent groups in pooled ratings data. Manufacturing locations and time differences also are sources of rating heterogeneity. Intensity and hedonic ratings are ordered categorical data. Categorical responses follow a multinomial distribution and this distribution can be applied to pooled data over trials if the multinomial probabilities are constant from trial to trial. The common test statistic used for comparing vectors of proportions or frequencies is the Pearson chi-square statistic. When ratings data are obtained from repeated ratings experiments or from a cluster sampling procedure, the covariance matrix for the vector of category proportions can differ dramatically from the one assumed for the multinomial model because of inter-trial. This effect is referred to as overdispersion. The standard multinomial model does not fit overdispersed multinomial data. The practical implication of this is that an inflated Type I error can result in a seriously erroneous conclusion. Another implication is that overdispersion is a measurable quantity that may be of interest because it can be used to signal the presence of latent segments. The Dirichlet-Multinomial (DM) model is introduced in this paper to fit overdispersed intensity and hedonic ratings data. Methods for estimating the parameters of the DM model and the test statistics based on them to test against a specified vector or compare vectors of proportions are given. A novel theoretical contribution of this paper is a method for calculating the power of the tests. This method is useful both in evaluating the tests and determining sample size and the number of trials. A test for goodness of fit of the multinomial model against the DM model is also given. The DM model can be extended further to the Generalized Dirichlet-Multinomial (GDM) model, in which multiple sources of variation are considered. The GDM model and its applications are discussed in this paper. Applications of the DM and GDM models in sensory and consumer research are illustrated using numerical examples. 相似文献

7.

A novel method for quantifying overdispersion in count data and its application to farmland birds

下载免费PDF全文

Barry J. Mcmahon Gordon Purvis Helen Sheridan Gavin M. Siriwardena Andrew C. Parnell 《Ibis》2017,159(2):406-414

The statistical modelling of count data permeates the discipline of ecology. Such data often exhibit overdispersion compared with a standard Poisson distribution, so that the variance of the counts is greater than that of the mean. Whereas modelling to reveal the effects of explanatory variables on the mean is commonplace, overdispersion is generally regarded as a nuisance parameter to be accounted for and subsequently ignored. Instead, we propose a method that models the overdispersion as a biologically interesting property of a data set and show how novel inference is provided as a result. We adapted the double hierarchical generalized linear model approach to create an easily extendible model structure that quantifies the influence of explanatory variables on the overdispersion of count data, and apply it to farmland birds. These data were from a study within Irish agricultural ecosystems, in which total bird species abundance and the abundance of farmland indicator species were compared on dairy and non‐dairy farms in the winter and breeding seasons. In general, overdispersion in bird counts was greater on dairy farms than on non‐dairy farms, and for total bird numbers, overdispersion was greatest on dairy farms in winter. Our code is fitted using the Bayesian package Rstan, and we make all code and data available in a GitHub repository. Within a Bayesian framework, this approach facilitates a meaningful quantification of the effects of categorical explanatory variables on any response variable with a tendency to overdispersion that has a meaningful biological or ecological explanation. 相似文献

8.

A new statistical approach for assessing similarity of species composition with incidence and abundance data 总被引：14，自引：0，他引：14

Anne Chao Robin L. Chazdon Robert K. Colwell Tsung-Jen Shen 《Ecology letters》2005,8(2):148-159

The classic Jaccard and Sørensen indices of compositional similarity (and other indices that depend upon the same variables) are notoriously sensitive to sample size, especially for assemblages with numerous rare species. Further, because these indices are based solely on presence–absence data, accurate estimators for them are unattainable. We provide a probabilistic derivation for the classic, incidence‐based forms of these indices and extend this approach to formulate new Jaccard‐type or Sørensen‐type indices based on species abundance data. We then propose estimators for these indices that include the effect of unseen shared species, based on either (replicated) incidence‐ or abundance‐based sample data. In sampling simulations, these new estimators prove to be considerably less biased than classic indices when a substantial proportion of species are missing from samples. Based on species‐rich empirical datasets, we show how incorporating the effect of unseen shared species not only increases accuracy but also can change the interpretation of results. 相似文献

9.

Enhancing the performance and interpretation of freshwater biological indices: An application in arid zone streams

《Ecological Indicators》2014

We assessed the performance of biological indices developed for invertebrate assemblages occurring in arid zone streams: a multimetric index (MMI) and an O/E index of taxonomic completeness. Our overall goal was to advance our understanding of the factors that affect performance and interpretation of biological indices. Our specific objectives were to (1) develop biological indices that are insensitive to natural environmental gradients, (2) develop a general method to determine if the biological potential of an assessed site is adequately represented by the population of reference sites, (3) develop a robust method to select metrics for inclusion in MMIs that ensures maximum independence of metrics, and (4) determine if a fundamental sample property (the evenness of taxa counts within a sample) affects index performance. Random Forest modeling revealed that both individual metrics and taxa composition were strongly associated with natural environmental heterogeneity, which meant both the MMI and O/E index needed to be based on site-specific expectations. We produced a precise, responsive, and ecologically robust MMI by using principal components analysis to identify 7 statistically independent metrics from a list of 31 candidate assemblage-level metrics. However, the O/E index we developed was relatively imprecise compared with O/E indices developed for other regions. This imprecision may be the consequence of low predictability in local taxa composition associated with the relatively high spatial isolation of aquatic habitats within arid regions. We were also able to assess the likelihood that the biological potential of assessed sites were adequately characterized by the population of reference sites by developing and applying a multivariate, nearest-neighbor test that determined if an assessed site occurred within the environmental space of the reference site network. This approach is robust and applicable to all biological indices. We also demonstrate that the evenness of taxa counts within a sample is positively related to estimates of sample taxa richness and thus the scores of both indices. The relationship between richness and sample evenness can potentially compromise inferences regarding biological condition, and post hoc adjustments for the effects of evenness on index scores might be desirable. Further improvements in the performance and interpretation of biological indices will require simultaneous consideration of the effects of incomplete sampling on characterization of biological assemblages and the physical and biological factors that influence community assembly. 相似文献

10.

Factors associated with annual-interval mammography for women in their 40s

Jennifer M. Gierisch Suzanne C. O’Neill Barbara K. Rimer Jessica T. DeFrank J. Michael Bowling Celette Sugg Skinner 《Cancer epidemiology》2009,33(1):72-78

Background: Evidence is mounting that annual mammography for women in their 40s may be the optimal schedule to reduce morbidity and mortality from breast cancer. Few studies have assessed predictors of repeat mammography on an annual interval among these women. Methods: We assessed mammography screening status among 596 insured Black and Non-Hispanic white women ages 43–49. Adherence was defined as having a second mammogram 10–14 months after a previous mammogram. We examined socio-demographic, medical and healthcare-related variables on receipt of annual-interval repeat mammograms. We also assessed barriers associated with screening. Results: 44.8% of the sample were adherent to annual-interval mammography. A history of self-reported abnormal mammograms, family history of breast cancer and never having smoked were associated with adherence. Saying they had not received mammography reminders and reporting barriers to mammography were associated with non-adherence. Four barrier categories were associated with women's non-adherence: lack of knowledge/not thinking mammograms are needed, cost, being too busy, and forgetting to make/keep appointments. Conclusions: Barriers we identified are similar to those found in other studies. Health professionals may need to take extra care in discussing mammography screening risk and benefits due to ambiguity about screening guidelines for women in their 40s, especially for women without family histories of breast cancer or histories of abnormal mammograms. Reminders are important in promoting mammography and should be coupled with other strategies to help women maintain adherence to regular mammography. 相似文献

11.

Combining counts and incidence data: an efficient approach for estimating the log-normal species abundance distribution and diversity indices

Bellier E Grøtan V Engen S Schartau AK Diserud OH Finstad AG 《Oecologia》2012,170(2):477-488

Obtaining accurate estimates of diversity indices is difficult because the number of species encountered in a sample increases with sampling intensity. We introduce a novel method that requires that the presence of species in a sample to be assessed while the counts of the number of individuals per species are only required for just a small part of the sample. To account for species included as incidence data in the species abundance distribution, we modify the likelihood function of the classical Poisson log-normal distribution. Using simulated community assemblages, we contrast diversity estimates based on a community sample, a subsample randomly extracted from the community sample, and a mixture sample where incidence data are added to a subsample. We show that the mixture sampling approach provides more accurate estimates than the subsample and at little extra cost. Diversity indices estimated from a freshwater zooplankton community sampled using the mixture approach show the same pattern of results as the simulation study. Our method efficiently increases the accuracy of diversity estimates and comprehension of the left tail of the species abundance distribution. We show how to choose the scale of sample size needed for a compromise between information gained, accuracy of the estimates and cost expended when assessing biological diversity. The sample size estimates are obtained from key community characteristics, such as the expected number of species in the community, the expected number of individuals in a sample and the evenness of the community. 相似文献

12.

Fast inference in generalized linear models via expected log-likelihoods

Alexandro D. Ramirez Liam Paninski 《Journal of computational neuroscience》2014,36(2):215-234

Generalized linear models play an essential role in a wide variety of statistical applications. This paper discusses an approximation of the likelihood in these models that can greatly facilitate computation. The basic idea is to replace a sum that appears in the exact log-likelihood by an expectation over the model covariates; the resulting “expected log-likelihood” can in many cases be computed significantly faster than the exact log-likelihood. In many neuroscience experiments the distribution over model covariates is controlled by the experimenter and the expected log-likelihood approximation becomes particularly useful; for example, estimators based on maximizing this expected log-likelihood (or a penalized version thereof) can often be obtained with orders of magnitude computational savings compared to the exact maximum likelihood estimators. A risk analysis establishes that these maximum EL estimators often come with little cost in accuracy (and in some cases even improved accuracy) compared to standard maximum likelihood estimates. Finally, we find that these methods can significantly decrease the computation time of marginal likelihood calculations for model selection and of Markov chain Monte Carlo methods for sampling from the posterior parameter distribution. We illustrate our results by applying these methods to a computationally-challenging dataset of neural spike trains obtained via large-scale multi-electrode recordings in the primate retina. 相似文献

13.

Evaluating Dependence Among Mule Deer Siblings in Fetal and Neonatal Survival Analyses

CHAD J. BISHOP GARY C. WHITE PAUL M. LUKACS 《The Journal of wildlife management》2008,72(5):1085-1093

Abstract: The assumption of independent sample units is potentially violated in survival analyses where siblings comprise a high proportion of the sample. Violation of the independence assumption causes sample data to be overdispersed relative to a binomial model, which leads to underestimates of sampling variances. A variance inflation factor, c, is therefore required to obtain appropriate estimates of variances. We evaluated overdispersion in fetal and neonatal mule deer (Odocoileus hemionus) datasets where more than half of the sample units were comprised of siblings. We developed a likelihood function for estimating fetal survival when the fates of some fetuses are unknown, and we used several variations of the binomial model to estimate neonatal survival. We compared theoretical variance estimates obtained from these analyses with empirical variance estimates obtained from data-bootstrap analyses to estimate the overdispersion parameter, c. Our estimates of c for fetal survival ranged from 0.678 to 1.118, which indicate little to no evidence of overdispersion. For neonatal survival, 3 different models indicated that ĉ ranged from 1.1 to 1.4 and averaged 1.24–1.26, providing evidence of limited overdispersion (i.e., limited sibling dependence). Our results indicate that fates of sibling mule deer fetuses and neonates may often be independent even though they have the same dam. Predation tends to act independently on sibling neonates because of dam-neonate behavioral adaptations. The effect of maternal characteristics on sibling fate dependence is less straightforward and may vary by circumstance. We recommend that future neonatal survival studies incorporate additional sampling intensity to accommodate modest overdispersion (i.e., ĉ = 1.25), which would facilitate a corresponding ĉ adjustment in a model selection analysis using quasi-likelihood without a reduction in power. Our computational approach could be used to evaluate sample unit dependence in other studies where fates of individually marked siblings are monitored. 相似文献

14.

Evaluating the Effect of Disturbed Ensemble Distributions on SCFG Based Statistical Sampling of RNA Secondary Structures

A Scheid ME Nebel 《BMC bioinformatics》2012,13(1):159

ABSTRACT: BACKGROUND: Over the past years, statistical and Bayesian approaches have become increasingly appreciated to address the long-standing problem of computational RNA structure prediction. Recently, a novel probabilistic method for the prediction of RNA secondary structures from a single sequence has been studied which is based on generating statistically representative and reproducible samples of the entire ensemble of feasible structures for a particular input sequence. This method samples the possible foldings from a distribution implied by a sophisticated (traditional or length- dependent) stochastic context-free grammar (SCFG) that mirrors the standard thermodynamic model applied in modern physics-based prediction algorithms. Specifically, that grammar represents an exact probabilistic counterpart to the energy model underlying the Sfold software, which employs a sampling extension of the partition function (PF) approach to produce statistically representative subsets of the Boltzmann-weighted ensemble. Although both sampling approaches have the same worst-case time and space complexities, it has been indicated that they differ in performance (both with respect to prediction accuracy and quality of generated samples), where neither of these two competing approaches generally outperforms the other. RESULTS: In this work, we will consider the SCFG based approach in order to perform an analysis on how the quality of generated sample sets and the corresponding prediction accuracy changes when different degrees of disturbances are incorporated into the needed sampling probabilities. This is motivated by the fact that if the results prove to be resistant to large errors on the distinct sampling probabilities (compared to the exact ones), then it will be an indication that these probabilities do not need to be computed exactly, but it may be sufficient and more efficient to approximate them. Thus, it might then be possible to decrease the worst-case time requirements of such an SCFG based sampling method without significant accuracy losses. If, on the other hand, the quality of sampled structures can be observed to strongly react to slight disturbances, there is little hope for improving the complexity by heuristic procedures. We hence provide a reliable test for the hypothesis that a heuristic method could be implemented to improve the time scaling of RNA secondary structure prediction in the worst-case -- without sacrificing much of the accuracy of the results. CONCLUSIONS: Our experiments indicate that absolute errors generally lead to the generation of useless sample sets, whereas relative errors seem to have only small negative impact on both the predictive accuracy and the overall quality of resulting structure samples. Based on these observations, we present some useful ideas for developing a time-reduced sampling method guaranteeing an acceptable predictive accuracy. We also discuss some inherent drawbacks that arise in the context of approximation. The key results of this paper are crucial for the design of an efficient and competitive heuristic prediction method based on the increasingly accepted and attractive statistical sampling approach. This has indeed been indicated by the construction of prototype algorithms (see [25]). 相似文献

15.

An exact test of the Hardy-Weinberg law. 总被引：4，自引：0，他引：4

W Chapco 《Biometrics》1976,32(1):183-189

An exact distribution of a finite sample drawn from an infinite population in Hardy-Weinberg Equilibrium is described for k-alleles. Accordingly, an exact test of the law is presented and compared with two x2-tests for two and three alleles. For two alleles, it is shown that the "classical" c2-test is very adequate for sample sizes as small as ten. For three alleles, it is shown that a simpler formulation based on Leven's distribution approximates the exact test of this paper rather closely. However, it is recommended that researchers continue to employ the standard x2-test for all sample sizes and abide by it if the corresponding probability value is not "too close" to the critical level; otherwise, an exact test should be used. 相似文献

16.

Testing approaches for overdispersion in poisson regression versus the generalized poisson model 总被引：1，自引：0，他引：1

Yang Z Hardin JW Addy CL Vuong QH 《Biometrical journal. Biometrische Zeitschrift》2007,49(4):565-584

Overdispersion is a common phenomenon in Poisson modeling, and the negative binomial (NB) model is frequently used to account for overdispersion. Testing approaches (Wald test, likelihood ratio test (LRT), and score test) for overdispersion in the Poisson regression versus the NB model are available. Because the generalized Poisson (GP) model is similar to the NB model, we consider the former as an alternate model for overdispersed count data. The score test has an advantage over the LRT and the Wald test in that the score test only requires that the parameter of interest be estimated under the null hypothesis. This paper proposes a score test for overdispersion based on the GP model and compares the power of the test with the LRT and Wald tests. A simulation study indicates the score test based on asymptotic standard Normal distribution is more appropriate in practical application for higher empirical power, however, it underestimates the nominal significance level, especially in small sample situations, and examples illustrate the results of comparing the candidate tests between the Poisson and GP models. A bootstrap test is also proposed to adjust the underestimation of nominal level in the score statistic when the sample size is small. The simulation study indicates the bootstrap test has significance level closer to nominal size and has uniformly greater power than the score test based on asymptotic standard Normal distribution. From a practical perspective, we suggest that, if the score test gives even a weak indication that the Poisson model is inappropriate, say at the 0.10 significance level, we advise the more accurate bootstrap procedure as a better test for comparing whether the GP model is more appropriate than Poisson model. Finally, the Vuong test is illustrated to choose between GP and NB2 models for the same dataset. 相似文献

17.

Competition matters: Determining the drivers of land snail community assembly among limestone karst areas in northern Vietnam

下载免费PDF全文

Parm Viktor von Oheimb Katharina C. M. von Oheimb Takahiro Hirano Tu Van Do Hao Van Luong Jonathan Ablett Sang Van Pham Fred Naggs 《Ecology and evolution》2018,8(8):4136-4149

The insular limestone karsts of northern Vietnam harbor a very rich biodiversity. Many taxa are strongly associated with these environments, and individual species communities can differ considerably among karst areas. The exact processes that have shaped the biotic composition of these habitats, however, remain largely unknown. In this study, the role of two major processes for the assembly of snail communities on limestone karsts was investigated, interspecific competition and filtering of taxa due to geographical factors. Communities of operculate land snails of the genus Cyclophorus were studied using the dry and fluid‐preserved specimen collections of the Natural History Museum, London. Phylogenetic distances (based on a Bayesian analysis using DNA sequence data) and shell characters (based on 200 semilandmarks) were used as proxies for ecological similarity and were analyzed to reveal patterns of overdispersion (indicating competition) or clustering (indicating filtering) in observed communities compared to random communities. Among the seven studied karst areas, a total of 15 Cyclophorus lineages were found. Unique communities were present in each area. The analyses revealed phylogenetic overdispersion in six and morphological overdispersion in four of seven karst areas. The pattern of frequent phylogenetic overdispersion indicated that competition among lineages is the major process shaping the Cyclophorus communities studied. The Coastal Area, which was phylogenetically overdispersed, showed a clear morphological clustering, which could have been caused by similar ecological adaptations among taxa in this environment. Only the community in the Cuc Phuong Area showed a pattern of phylogenetic clustering, which was partly caused by an absence of a certain, phylogenetically very distinct group in this region. Filtering due to geographical factors could have been involved here. This study shows how museum collections can be used to examine community assembly and contributes to the understanding of the processes that have shaped karst communities in Vietnam. 相似文献

18.

THE BETA-BINOMIAL MODEL: ACCOUNTING FOR INTER-TRIAL VARIATION IN REPLICATED DIFFERENCE AND PREFERENCE TESTS 总被引：1，自引：0，他引：1

DANIEL M. ENNIS JIAN BI 《Journal of sensory studies》1998,13(4):389-412

Binomial tests are commonly used in sensory difference and preference testing under the assumptions that choices are independent and choice probabilities do not vary from trial to trial. This paper addresses violations of the latter assumption (often referred to as overdispersion) and accounts for variation in inter-trial choice probabilities following the Beta distribution. Such variation could arise as a result of differences in test substrate from trial to trial, differences in sensory acuity among subjects or the existence of latent preference segments. In fact, it is likely that overdispersion occurs ubiquitously in product testing. Using the Binomial model for data in which there is inter-trial variation may lead to seriously misleading conclusions from a sensory difference or preference test. A simulation study in this paper based on product testing experience showed that when using a Binomial model for overdispersed Binomial data, Type I error may be 0.44 for a Binomial test specification corresponding to a level of 0.05. Underestimation of Type I error using the Binomial model may seriously undermine legal claims of product superiority in situations where overdispersion occurs. The Beta-Binomial (BB) model, an extension of the Binomial distribution, was developed to fit overdispersed Binomial data. Procedures for estimating and testing the parameters as well as testing for goodness of fit are discussed. Procedures for determining sample size and for calculating estimate precision and test power based on the BB model are given. Numerical examples and simulation results are also given in the paper. The BB model should improve the validity of sensory difference and preference testing. 相似文献

19.

Evaluating Functional Diversity: Missing Trait Data and the Importance of Species Abundance Structure and Data Transformation

Maria Májeková Taavi Paal Nichola S. Plowman Michala Bryndová Liis Kasari Anna Norberg Matthias Weiss Tom R. Bishop Sarah H. Luke Katerina Sam Yoann Le Bagousse-Pinguet Jan Lep? Lars G?tzenberger Francesco de Bello 《PloS one》2016,11(2)

Functional diversity (FD) is an important component of biodiversity that quantifies the difference in functional traits between organisms. However, FD studies are often limited by the availability of trait data and FD indices are sensitive to data gaps. The distribution of species abundance and trait data, and its transformation, may further affect the accuracy of indices when data is incomplete. Using an existing approach, we simulated the effects of missing trait data by gradually removing data from a plant, an ant and a bird community dataset (12, 59, and 8 plots containing 62, 297 and 238 species respectively). We ranked plots by FD values calculated from full datasets and then from our increasingly incomplete datasets and compared the ranking between the original and virtually reduced datasets to assess the accuracy of FD indices when used on datasets with increasingly missing data. Finally, we tested the accuracy of FD indices with and without data transformation, and the effect of missing trait data per plot or per the whole pool of species. FD indices became less accurate as the amount of missing data increased, with the loss of accuracy depending on the index. But, where transformation improved the normality of the trait data, FD values from incomplete datasets were more accurate than before transformation. The distribution of data and its transformation are therefore as important as data completeness and can even mitigate the effect of missing data. Since the effect of missing trait values pool-wise or plot-wise depends on the data distribution, the method should be decided case by case. Data distribution and data transformation should be given more careful consideration when designing, analysing and interpreting FD studies, especially where trait data are missing. To this end, we provide the R package “traitor” to facilitate assessments of missing trait data. 相似文献

20.

Modeling spatiotemporal abundance of mobile wildlife in highly variable environments using boosted GAMLSS hurdle models

Adam Smith Benjamin Hofner Juliet S. Lamb Jason Osenkowski Taber Allison Giancarlo Sadoti Scott R. McWilliams Peter Paton 《Ecology and evolution》2019,9(5):2346-2364

Modeling organism distributions from survey data involves numerous statistical challenges, including accounting for zero‐inflation, overdispersion, and selection and incorporation of environmental covariates. In environments with high spatial and temporal variability, addressing these challenges often requires numerous assumptions regarding organism distributions and their relationships to biophysical features. These assumptions may limit the resolution or accuracy of predictions resulting from survey‐based distribution models. We propose an iterative modeling approach that incorporates a negative binomial hurdle, followed by modeling of the relationship of organism distribution and abundance to environmental covariates using generalized additive models (GAM) and generalized additive models for location, scale, and shape (GAMLSS). Our approach accounts for key features of survey data by separating binary (presence‐absence) from count (abundance) data, separately modeling the mean and dispersion of count data, and incorporating selection of appropriate covariates and response functions from a suite of potential covariates while avoiding overfitting. We apply our modeling approach to surveys of sea duck abundance and distribution in Nantucket Sound (Massachusetts, USA), which has been proposed as a location for offshore wind energy development. Our model results highlight the importance of spatiotemporal variation in this system, as well as identifying key habitat features including distance to shore, sediment grain size, and seafloor topographic variation. Our work provides a powerful, flexible, and highly repeatable modeling framework with minimal assumptions that can be broadly applied to the modeling of survey data with high spatiotemporal variability. Applying GAMLSS models to the count portion of survey data allows us to incorporate potential overdispersion, which can dramatically affect model results in highly dynamic systems. Our approach is particularly relevant to systems in which little a priori knowledge is available regarding relationships between organism distributions and biophysical features, since it incorporates simultaneous selection of covariates and their functional relationships with organism responses. 相似文献