首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Objective

Recent studies have shown the relevance of the cerebral grey matter involvement in multiple sclerosis (MS). The number of new cortical lesions (CLs), detected by specific MRI sequences, has the potential to become a new research outcome in longitudinal MS studies. Aim of this study is to define the statistical model better describing the distribution of new CLs developed over 12 and 24 months in patients with relapsing-remitting (RR) MS.

Methods

Four different models were tested (the Poisson, the Negative Binomial, the zero-inflated Poisson and the zero-inflated Negative Binomial) on a group of 191 RRMS patients untreated or treated with 3 different disease modifying therapies. Sample size for clinical trials based on this new outcome measure were estimated by a bootstrap resampling technique.

Results

The zero-inflated Poisson model gave the best fit, according to the Akaike criterion to the observed distribution of new CLs developed over 12 and 24 months both in each treatment group and in the whole RRMS patients group adjusting for treatment effect.

Conclusions

The sample size calculations based on the zero-inflated Poisson model indicate that randomized clinical trials using this new MRI marker as an outcome are feasible.  相似文献   

2.
采用方差,均值法、负二项参数K、Cassie指数、丛生指数、平均拥挤度、聚块性指数、Green指数、泊松分布、负二项分布及奈曼分布的X2理论拟合,研究了湘西油茶树群落中蛇足石杉种群的分布格局,结果表明湘西油茶树群落中蛇足石衫种群的分布格局类型呈集群分布,且符合n=1的奈曼分布,此格局的形成主要与其繁殖的生物学特性、特定的环境条件及共生真菌等因素密切相关.  相似文献   

3.

Background  

Serial Analysis of Gene Expressions (SAGE) produces gene expression measurements on a discrete scale, due to the finite number of molecules in the sample. This means that part of the variance in SAGE data should be understood as the sampling error in a binomial or Poisson distribution, whereas other variance sources, in particular biological variance, should be modeled using a continuous distribution function, i.e. a prior on the intensity of the Poisson distribution. One challenge is that such a model predicts a large number of genes with zero counts, which cannot be observed.  相似文献   

4.
Environmental DNA (eDNA) analysis of water samples is on the brink of becoming a standard monitoring method for aquatic species. This method has improved detection rates over conventional survey methods and thus has demonstrated effectiveness for estimation of site occupancy and species distribution. The frontier of eDNA applications, however, is to infer species density. Building upon previous studies, we present and assess a modeling approach that aims at inferring animal density from eDNA. The modeling combines eDNA and animal count data from a subset of sites to estimate species density (and associated uncertainties) at other sites where only eDNA data are available. As a proof of concept, we first perform a cross‐validation study using experimental data on carp in mesocosms. In these data, fish densities are known without error, which allows us to test the performance of the method with known data. We then evaluate the model using field data from a study on a stream salamander species to assess the potential of this method to work in natural settings, where density can never be known with absolute certainty. Two alternative distributions (Normal and Negative Binomial) to model variability in eDNA concentration data are assessed. Assessment based on the proof of concept data (carp) revealed that the Negative Binomial model provided much more accurate estimates than the model based on a Normal distribution, likely because eDNA data tend to be overdispersed. Greater imprecision was found when we applied the method to the field data, but the Negative Binomial model still provided useful density estimates. We call for further model development in this direction, as well as further research targeted at sampling design optimization. It will be important to assess these approaches on a broad range of study systems.  相似文献   

5.
Both ecological field studies and attempts to extrapolate from laboratory experiments to natural populations generally encounter the high degree of natural variability and chaotic behavior that typify natural ecosystems. Regardless of this variability and non-normal distribution, most statistical models of natural systems use normal error which assumes independence between the variance and mean. However, environmental data are often random or clustered and are better described by probability distributions which have more realistic variance to mean relationships. Until recently statistical software packages modeled only with normal error and researchers had to assume approximate normality on the original or transformed scale of measurement and had to live with the consequences of often incorrectly assuming independence between the variance and mean. Recent developments in statistical software allow researchers to use generalized linear models (GLMs) and analysis can now proceed with probability distributions from the exponential family which more realistically describe natural conditions: binomial (even distribution with variance less than mean), Poisson (random distribution with variance equal mean), negative binomial (clustered distribution with variance greater than mean). GLMs fit parameters on the original scale of measurement and eliminate the need for obfuscating transformations, reduce bias for proportions with unequal sample size, and provide realistic estimates of variance which can increase power of tests. Because GLMs permit modeling according to the non-normal behavior of natural systems and obviate the need for normality assumptions, they will likely become a widely used tool for analyzing toxicity data. To demonstrate the broad-scale utility of GLMs, we present several examples where the use of GLMs improved the statistical power of field and laboratory studies to document the rapid ecological recovery of Prince William Sound following the Exxon Valdez oil spill.  相似文献   

6.
7.
ABSTRACT Count data with means <2 are often assumed to follow a Poisson distribution. However, in many cases these kinds of data, such as number of young fledged, are more appropriately considered to be multinomial observations due to naturally occurring upper truncation of the distribution. We evaluated the performance of several versions of multinomial regression, plus Poisson and normal regression, for analysis of count data with means <2 through Monte Carlo simulations. Simulated data mimicked observed counts of number of young fledged (0, 1, 2, or 3) by California spotted owls (Strix occidentalis occidentalis). We considered size and power of tests to detect differences among 10 levels of a categorical predictor, as well as tests for trends across 10-year periods. We found regular regression and analysis of variance procedures based on a normal distribution to perform satisfactorily in all cases we considered, whereas failure rate of multinomial procedures was often excessively high, and the Poisson model demonstrated inappropriate test size for data where the variance/mean ratio was <1 or >1.2. Thus, managers can use simple statistical methods with which they are likely already familiar to analyze the kinds of count data we described here.  相似文献   

8.
9.
Binomial tests are commonly used in sensory difference and preference testing under the assumptions that choices are independent and choice probabilities do not vary from trial to trial. This paper addresses violations of the latter assumption (often referred to as overdispersion) and accounts for variation in inter-trial choice probabilities following the Beta distribution. Such variation could arise as a result of differences in test substrate from trial to trial, differences in sensory acuity among subjects or the existence of latent preference segments. In fact, it is likely that overdispersion occurs ubiquitously in product testing. Using the Binomial model for data in which there is inter-trial variation may lead to seriously misleading conclusions from a sensory difference or preference test. A simulation study in this paper based on product testing experience showed that when using a Binomial model for overdispersed Binomial data, Type I error may be 0.44 for a Binomial test specification corresponding to a level of 0.05. Underestimation of Type I error using the Binomial model may seriously undermine legal claims of product superiority in situations where overdispersion occurs. The Beta-Binomial (BB) model, an extension of the Binomial distribution, was developed to fit overdispersed Binomial data. Procedures for estimating and testing the parameters as well as testing for goodness of fit are discussed. Procedures for determining sample size and for calculating estimate precision and test power based on the BB model are given. Numerical examples and simulation results are also given in the paper. The BB model should improve the validity of sensory difference and preference testing.  相似文献   

10.
Aims Fits of species-abundance distributions to empirical data are increasingly used to evaluate models of diversity maintenance and community structure and to infer properties of communities, such as species richness. Two distributions predicted by several models are the Poisson lognormal (PLN) and the negative binomial (NB) distribution; however, at least three different ways to parameterize the PLN have been proposed, which differ in whether unobserved species contribute to the likelihood and in whether the likelihood is conditional upon the total number of individuals in the sample. Each of these has an analogue for the NB. Here, we propose a new formulation of the PLN and NB that includes the number of unobserved species as one of the estimated parameters. We investigate the performance of parameter estimates obtained from this reformulation, as well as the existing alternatives, for drawing inferences about the shape of species abundance distributions and estimation of species richness.Methods We simulate the random sampling of a fixed number of individuals from lognormal and gamma community relative abundance distributions, using a previously developed 'individual-based' bootstrap algorithm. We use a range of sample sizes, community species richness levels and shape parameters for the species abundance distributions that span much of the realistic range for empirical data, generating 1?000 simulated data sets for each parameter combination. We then fit each of the alternative likelihoods to each of the simulated data sets, and we assess the bias, sampling variance and estimation error for each method.Important findings Parameter estimates behave reasonably well for most parameter values, exhibiting modest levels of median error. However, for the NB, median error becomes extremely large as the NB approaches either of two limiting cases. For both the NB and PLN,>90% of the variation in the error in model parameters across parameter sets is explained by three quantities that corresponded to the proportion of species not observed in the sample, the expected number of species observed in the sample and the discrepancy between the true NB or PLN distribution and a Poisson distribution with the same mean. There are relatively few systematic differences between the four alternative likelihoods. In particular, failing to condition the likelihood on the total sample sizes does not appear to systematically increase the bias in parameter estimates. Indeed, overall, the classical likelihood performs slightly better than the alternatives. However, our reparameterized likelihood, for which species richness is a fitted parameter, has important advantages over existing approaches for estimating species richness from fitted species-abundance models.  相似文献   

11.
《Aquatic Botany》2007,86(4):377-384
We evaluated six methods to estimate species richness in extrapolated sample size using presence–absence data for aquatic macrophyte assemblages. Methods suitable for assemblages involving terrestrial and non-clonal (unitary) organisms may not be valid for aquatic macrophytes. The extrapolation of a species accumulation curve using a logarithmic function or using a linear model on the log of accumulated sampling units consistently overestimated species richness. The newly proposed Total-Species method gave similar results. The Negative Binomial and Logarithmic Series methods and the recently proposed Binomial Mixture Model were unbiased and accurate. We conclude that current extrapolation techniques are valid for estimation of species richness in macrophyte assemblages, and recommend the Logarithmic Series, Binomial Negative or Binomial Mixture Model methods.  相似文献   

12.
Overdispersion is a common phenomenon in Poisson modeling, and the negative binomial (NB) model is frequently used to account for overdispersion. Testing approaches (Wald test, likelihood ratio test (LRT), and score test) for overdispersion in the Poisson regression versus the NB model are available. Because the generalized Poisson (GP) model is similar to the NB model, we consider the former as an alternate model for overdispersed count data. The score test has an advantage over the LRT and the Wald test in that the score test only requires that the parameter of interest be estimated under the null hypothesis. This paper proposes a score test for overdispersion based on the GP model and compares the power of the test with the LRT and Wald tests. A simulation study indicates the score test based on asymptotic standard Normal distribution is more appropriate in practical application for higher empirical power, however, it underestimates the nominal significance level, especially in small sample situations, and examples illustrate the results of comparing the candidate tests between the Poisson and GP models. A bootstrap test is also proposed to adjust the underestimation of nominal level in the score statistic when the sample size is small. The simulation study indicates the bootstrap test has significance level closer to nominal size and has uniformly greater power than the score test based on asymptotic standard Normal distribution. From a practical perspective, we suggest that, if the score test gives even a weak indication that the Poisson model is inappropriate, say at the 0.10 significance level, we advise the more accurate bootstrap procedure as a better test for comparing whether the GP model is more appropriate than Poisson model. Finally, the Vuong test is illustrated to choose between GP and NB2 models for the same dataset.  相似文献   

13.
The spatial distribution of bottom animals in a large, flat, and unexposed bottom area (about 8 km2) in central Lake Mälaren was investigated. As far as could be seen from preliminary tests the area was homogeneous in most respects. Working from the ice in all 42 tube samples and 6 Ekman samples were obtained from randomly distributed quadrats in a grid measuring 100 × 50 m2. A 0.3 mm gauge net was used for sieving. The mud surface area covered with the tube samples was about 40 cm2. Different methods of illustration were applied to demonstrate how many samples were actually required to show certain characteristics in the bottom fauna. For most purposes the accumulated mean value (cf. Kajak, 1963) was close enough to the stable mean value already after two or three tube samples for both chironomids and tubificids. In order to get a good idea of the five major constituents amongst tubificids (about 96% of all tubificids) two tube samples were found to be quite enough in this particular test. For the seven major constituents (about 99% of all tubificids) four tube samples were sufficient — or one single Ekman sample. The main constituents were well represented after two tube samples or one Ekman sample. Very little further information about the major constituents was obtained after the minimum number of samples mentioned above had been taken. A statistical study based on the same material was performed by a group of statisticians at the Institute of Statistics, University of Stockholm. Different statistical tests were applied to the material. Due to the relatively small number of observations nothing definitely could be said about the actual distribution of tubificids or chironomids in the area. The chironomids seemed to conform better to the Poisson distribution than to the Negative Binomial distribution, while the tubificids showed greater agreement with the Negative Binomial distribution.  相似文献   

14.
 青岛沿海地区是山茶在中国分布的最北界。长门岩岛是该地区山茶的主要分布地。岛上的山茶集中分布在海拔20~80m的范围内,植株的胸径为2~45cm,高度0.4~4.5m。由生命表可以看出,属于小个体级与大个体级的数目较少,大多数个体处于中级水平。从大小金字塔看,它属于Bodenheimer所述的下降型种群。存活曲线图表明,小个体级的死亡率较高。应用泊松分布、负二项分布和奈曼分布3种理论分布模型,进行了分布格局的研究,结果表明,该种群的实际频数符合负二项分布,由此判定种群的分布格局为聚集分布。通过对种群的聚集强度指数的计算,求得各参数分别为二项参数(K)0.6291,扩散系数(C)17.7372,丛生指标(I)16.7372,Cassie指标(1/K)1.5896,聚块性指标为(m*/m)2.5896,均表明为强聚集分布。  相似文献   

15.
Current practice in the normalization of microbiome count data is inefficient in the statistical sense. For apparently historical reasons, the common approach is either to use simple proportions (which does not address heteroscedasticity) or to use rarefying of counts, even though both of these approaches are inappropriate for detection of differentially abundant species. Well-established statistical theory is available that simultaneously accounts for library size differences and biological variability using an appropriate mixture model. Moreover, specific implementations for DNA sequencing read count data (based on a Negative Binomial model for instance) are already available in RNA-Seq focused R packages such as edgeR and DESeq. Here we summarize the supporting statistical theory and use simulations and empirical data to demonstrate substantial improvements provided by a relevant mixture model framework over simple proportions or rarefying. We show how both proportions and rarefied counts result in a high rate of false positives in tests for species that are differentially abundant across sample classes. Regarding microbiome sample-wise clustering, we also show that the rarefying procedure often discards samples that can be accurately clustered by alternative methods. We further compare different Negative Binomial methods with a recently-described zero-inflated Gaussian mixture, implemented in a package called metagenomeSeq. We find that metagenomeSeq performs well when there is an adequate number of biological replicates, but it nevertheless tends toward a higher false positive rate. Based on these results and well-established statistical theory, we advocate that investigators avoid rarefying altogether. We have provided microbiome-specific extensions to these tools in the R package, phyloseq.  相似文献   

16.
This work is about assessing model adequacy for negative binomial (NB) regression, particularly (1) assessing the adequacy of the NB assumption, and (2) assessing the appropriateness of models for NB dispersion parameters. Tools for the first are appropriate for NB regression generally; those for the second are primarily intended for RNA sequencing (RNA-Seq) data analysis. The typically small number of biological samples and large number of genes in RNA-Seq analysis motivate us to address the trade-offs between robustness and statistical power using NB regression models. One widely-used power-saving strategy, for example, is to assume some commonalities of NB dispersion parameters across genes via simple models relating them to mean expression rates, and many such models have been proposed. As RNA-Seq analysis is becoming ever more popular, it is appropriate to make more thorough investigations into power and robustness of the resulting methods, and into practical tools for model assessment. In this article, we propose simulation-based statistical tests and diagnostic graphics to address model adequacy. We provide simulated and real data examples to illustrate that our proposed methods are effective for detecting the misspecification of the NB mean-variance relationship as well as judging the adequacy of fit of several NB dispersion models.  相似文献   

17.
MOTIVATION: Microarray techniques provide a valuable way of characterizing the molecular nature of disease. Unfortunately expense and limited specimen availability often lead to studies with small sample sizes. This makes accurate estimation of variability difficult, since variance estimates made on a gene by gene basis will have few degrees of freedom, and the assumption that all genes share equal variance is unlikely to be true. RESULTS: We propose a model by which the within gene variances are drawn from an inverse gamma distribution, whose parameters are estimated across all genes. This results in a test statistic that is a minor variation of those used in standard linear models. We demonstrate that the model assumptions are valid on experimental data, and that the model has more power than standard tests to pick up large changes in expression, while not increasing the rate of false positives. AVAILABILITY: This method is incorporated into BRB-ArrayTools version 3.0 (http://linus.nci.nih.gov/BRB-ArrayTools.html). SUPPLEMENTARY MATERIAL: ftp://linus.nci.nih.gov/pub/techreport/RVM_supplement.pdf  相似文献   

18.
Data on Pythium-induced cavity spot from experimental plots and commercial fields were collated to study the frequency distribution of cavities on carrots. It was shown that cavities tended to occur in clusters with strong evidence that they followed a Polya-Aeppli distribution rather than the Negative Binomial distribution often associated with clustering. Possible biological interpretations of the Polya-Aeppli distribution are discussed, also their implications for estimation of inoculum potential and analysis of cavity spot field trials.  相似文献   

19.
If substitutions in DNA sequences follow a Poisson process, the ratio of the variance in the number of substitutions to the mean number of substitutions (the index of dispersion) should equal 1. In this paper, the robustness of the commonly applied estimator of the index of dispersion in replacement sites and silent sites to various assumptions regarding DNA evolution is explored using simulation methods. The estimate of the index of dispersion may be strongly biased if the assumptions of the model of substitution are violated. However, the results of this study support the conclusions of studies by Gillespie and Ohta that the process of substitution in replacement sites is overdispersed. This result contradicts those of a recent study and shows that the high index of dispersion for replacement sites is not an artifact caused by the method of estimation.  相似文献   

20.
A simple extension of the Poisson process results in binomially distributed counts of events in a time interval. A further extension generalises this to probability distributions under‐ or over‐dispersed relative to the binomial distribution. Substantial levels of under‐dispersion are possible with this modelling, but only modest levels of over‐dispersion – up to Poisson‐like variation. Although simple analytical expressions for the moments of these probability distributions are not available, approximate expressions for the mean and variance are derived, and used to re‐parameterise the models. The modelling is applied in the analysis of two published data sets, one showing under‐dispersion and the other over‐dispersion. More appropriate assessment of the precision of estimated parameters and reliable model checking diagnostics follow from this more general modelling of these data sets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号