首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Ridout M  Hinde J  Demétrio CG 《Biometrics》2001,57(1):219-223
Count data often show a higher incidence of zero counts than would be expected if the data were Poisson distributed. Zero-inflated Poisson regression models are a useful class of models for such data, but parameter estimates may be seriously biased if the nonzero counts are overdispersed in relation to the Poisson distribution. We therefore provide a score test for testing zero-inflated Poisson regression models against zero-inflated negative binomial alternatives.  相似文献   

2.
Cui Y  Kim DY  Zhu J 《Genetics》2006,174(4):2159-2172
Statistical methods for mapping quantitative trait loci (QTL) have been extensively studied. While most existing methods assume normal distribution of the phenotype, the normality assumption could be easily violated when phenotypes are measured in counts. One natural choice to deal with count traits is to apply the classical Poisson regression model. However, conditional on covariates, the Poisson assumption of mean-variance equality may not be valid when data are potentially under- or overdispersed. In this article, we propose an interval-mapping approach for phenotypes measured in counts. We model the effects of QTL through a generalized Poisson regression model and develop efficient likelihood-based inference procedures. This approach, implemented with the EM algorithm, allows for a genomewide scan for the existence of QTL throughout the entire genome. The performance of the proposed method is evaluated through extensive simulation studies along with comparisons with existing approaches such as the Poisson regression and the generalized estimating equation approach. An application to a rice tiller number data set is given. Our approach provides a standard procedure for mapping QTL involved in the genetic control of complex traits measured in counts.  相似文献   

3.
We analyze a real data set pertaining to reindeer fecal pellet‐group counts obtained from a survey conducted in a forest area in northern Sweden. In the data set, over 70% of counts are zeros, and there is high spatial correlation. We use conditionally autoregressive random effects for modeling of spatial correlation in a Poisson generalized linear mixed model (GLMM), quasi‐Poisson hierarchical generalized linear model (HGLM), zero‐inflated Poisson (ZIP), and hurdle models. The quasi‐Poisson HGLM allows for both under‐ and overdispersion with excessive zeros, while the ZIP and hurdle models allow only for overdispersion. In analyzing the real data set, we see that the quasi‐Poisson HGLMs can perform better than the other commonly used models, for example, ordinary Poisson HGLMs, spatial ZIP, and spatial hurdle models, and that the underdispersed Poisson HGLMs with spatial correlation fit the reindeer data best. We develop R codes for fitting these models using a unified algorithm for the HGLMs. Spatial count response with an extremely high proportion of zeros, and underdispersion can be successfully modeled using the quasi‐Poisson HGLM with spatial random effects.  相似文献   

4.
The question of how to characterize the bacterial density in a body of water when data are available as counts from a number of small-volume samples was examined for cases where either the Poisson or negative binomial probability distributions could be used to describe the bacteriological data. The suitability of the Poisson distribution when replicate analyses were performed under carefully controlled conditions and of the negative binomial distribution for samples collected from different locations and over time were illustrated by two examples. In cases where the negative binomial distribution was appropriate, a procedure was given for characterizing the variability by dividing the bacterial counts into homogeneous groups. The usefulness of this procedure was illustrated for the second example based on survey data for Lake Erie. A further illustration of the difference between results based on the Poisson and negative binomial distributions was given by calculating the probability of obtaining all samples sterile, assuming various bacterial densities and sample sizes.  相似文献   

5.
We discuss the identification of genes that are associated with an outcome in RNA sequencing and other sequence-based comparative genomic experiments. RNA-sequencing data take the form of counts, so models based on the Gaussian distribution are unsuitable. Moreover, normalization is challenging because different sequencing experiments may generate quite different total numbers of reads. To overcome these difficulties, we use a log-linear model with a new approach to normalization. We derive a novel procedure to estimate the false discovery rate (FDR). Our method can be applied to data with quantitative, two-class, or multiple-class outcomes, and the computation is fast even for large data sets. We study the accuracy of our approaches for significance calculation and FDR estimation, and we demonstrate that our method has potential advantages over existing methods that are based on a Poisson or negative binomial model. In summary, this work provides a pipeline for the significance analysis of sequencing data.  相似文献   

6.
Overdispersion or extra-Poisson variation is very common for count data. This phenomenon arises when the variability of the counts greatly exceeds the mean under the Poisson assumption, resulting in substantial bias for the parameter estimates. To detect whether count data are overdispersed in the Poisson regression setting, various tests have been proposed and among them, the score tests derived by Dean (1992) are popular and easy to implement. However, such tests can be sensitive to anomalous or extreme observations. In this paper, diagnostic measures are proposed for assessing the sensitivity of Dean's score test for overdispersion in Poisson regression. Applications to the well-known fabric faults and Ames salmonella assay data sets illustrate the usefulness of the diagnostics in analyzing overdispersed count data.  相似文献   

7.
We introduce a new scoring method for calculation of alignments of optical maps. Missing cuts, false cuts, and sizing errors present in optical maps are addressed by our alignment score through calculation of corresponding likelihoods. The size error model is derived through the application of Central Limit Theorem and validated by residual plots collected from real data. Missing cuts and false cuts are modeled as Bernoulli and Poisson events, respectively, as suggested by previous studies. Likelihoods are used to derive an alignment score through calculation of likelihood ratios for a certain hypothesis test. This allows us to achieve maximal descriminative power for the alignment score. Our scoring method is naturally embedded within a well known DP framework for finding optimal alignments.  相似文献   

8.
We can see at light intensities much lower than an average of one photon per rod photoreceptor, demonstrating that rods must be able to transmit a signal after absorption of a single photon. However, activation of one rhodopsin molecule (Rh*) hyperpolarizes a mammalian rod by just 1 mV. Based on the properties of the voltage-dependent Ca2+ channel and data on [Ca2+] in the rod synaptic terminal, the 1 mV hyperpolarization should reduce the rate of release of quanta of neurotransmitter by only 20%. If quantal release were Poisson, the distributions of quantal count in the dark and in response to one Rh* would overlap greatly. Depending on the threshold quantal count, the overlap would generate too frequent false positives in the dark, too few true positives in response to one Rh*, or both. Therefore, quantal release must be regular, giving narrower distributions of quantal count that overlap less. We model regular release as an Erlang process, essentially a mechanism that counts many Poisson events before release of a quantum of neurotransmitter. The combination of appropriately narrow distributions of quantal count and a suitable threshold can give few false positives and appropriate (e.g., 35%) efficiency for one Rh*.  相似文献   

9.
Moderated statistical tests for assessing differences in tag abundance   总被引:2,自引:0,他引:2  
MOTIVATION: Digital gene expression (DGE) technologies measure gene expression by counting sequence tags. They are sensitive technologies for measuring gene expression on a genomic scale, without the need for prior knowledge of the genome sequence. As the cost of sequencing DNA decreases, the number of DGE datasets is expected to grow dramatically. Various tests of differential expression have been proposed for replicated DGE data using binomial, Poisson, negative binomial or pseudo-likelihood (PL) models for the counts, but none of the these are usable when the number of replicates is very small. RESULTS: We develop tests using the negative binomial distribution to model overdispersion relative to the Poisson, and use conditional weighted likelihood to moderate the level of overdispersion across genes. Not only is our strategy applicable even with the smallest number of libraries, but it also proves to be more powerful than previous strategies when more libraries are available. The methodology is equally applicable to other counting technologies, such as proteomic spectral counts. AVAILABILITY: An R package can be accessed from http://bioinf.wehi.edu.au/resources/  相似文献   

10.
In a 3-year period, four series of simulated water samples containing selected test strains were distributed to more than 50 laboratories in The Netherlands for bacteriological testing. Participating laboratories examined the samples by enrichment or membrane filtration methods, or both, for total coliform organisms, thermotolerant coliform organisms, faecal streptococci and standard plate counts (37 degrees and 22 degrees C) according to Dutch standard methods. The results were quantitatively satisfactory: the distribution of positive and negative results with subsamples conformed to stochastic variation; the standard deviation of membrane or plate counts was usually in the range which may be expected from a Poisson distribution, and there was good correspondence between average counts in participating laboratories and those expected from controls in the organizing laboratory. Problems of a qualitative nature were frequently encountered, however. Among them were a false positive response with a strain of Enterobacter cloacae in the thermotolerant coliform test; a false positive result with Clostridium perfringens in enrichment tests for total or thermotolerant coliform organisms and false positive results with Micrococcus varians in the faecal streptococcus test by membrane filtration. It is concluded that quality assessment should be a consistent activity in water microbiology laboratories. For this purpose, stable and well characterized reference materials are needed.  相似文献   

11.
Analysis of count data is required in many areas of biometric interest. Often the simple Poisson distribution is not appropriate, since an extra-number of zero counts occur in the count data. Some current approaches for the problem at hand are reviewed. It will be argued that these situations can often be easily modeled using the zero-inflated Poisson distribution. A variety of applications are considered in which this occurs. Possibilities are outlined on how the validity of the zero-inflated Poisson can be validated including a comparison with the nonparametric Poisson mixture maximum likelihood estimator.  相似文献   

12.
The objective of the study was to provide a general procedure for mapping species abundance when data are zero‐inflated and spatially correlated counts. The bivalve species Macoma balthica was observed on a 500×500 m grid in the Dutch part of the Wadden Sea. In total, 66% of the 3451 counts were zeros. A zero‐inflated Poisson mixture model was used to relate counts to environmental covariates. Two models were considered, one with relatively fewer covariates (model “small”) than the other (model “large”). The models contained two processes: a Bernoulli (species prevalence) and a Poisson (species intensity, when the Bernoulli process predicts presence). The model was used to make predictions for sites where only environmental data are available. Predicted prevalences and intensities show that the model “small” predicts lower mean prevalence and higher mean intensity, than the model “large”. Yet, the product of prevalence and intensity, which might be called the unconditional intensity, is very similar. Cross‐validation showed that the model “small” performed slightly better, but the difference was small. The proposed methodology might be generally applicable, but is computer intensive.  相似文献   

13.
Proficiency testing of water microbiology laboratories in The Netherlands   总被引:1,自引:1,他引:0  
In a 3-year period, four series of simulated water samples containing selected test strains were distributed to more than 50 laboratories in The Netherlands for bacteriological testing. Participating laboratories examined the samples by enrichment or membrane filtration methods, or both, for total coliform organisms, thermotol-erant coliform organisms, faecal streptococci and standard plate counts (37˙ and 22˙C) according to Dutch standard methods. The results were quantitatively satisfactory: the distribution of positive and negative results with subsamples conformed to stochastic variation; the standard deviation of membrane or plate counts was usually in the range which may be expected from a Poisson distribution, and there was good correspondence between average counts in participating laboratories and those expected from controls in the organizing laboratory. Problems of a qualitative nature were frequently encountered, however. Among them were a false positive response with a strain of Enterobacter cloacae in the thermotolerant coliform test; a false positive result with Clostridium perfringens in enrichment tests for total or thermotolerant coliform organisms and false positive results with Micrococcus varians in the faecal streptococcus test by membrane filtration. It is concluded that quality assessment should be a consistent activity in water microbiology laboratories. For this purpose, stable and well characterized reference materials are needed.  相似文献   

14.
15.
Markov regression models for time series: a quasi-likelihood approach   总被引:6,自引:0,他引:6  
S L Zeger  B Qaqish 《Biometrics》1988,44(4):1019-1031
This paper discusses a quasi-likelihood (QL) approach to regression analysis with time series data. We consider a class of Markov models, referred to by Cox (1981, Scandinavian Journal of Statistics 8, 93-115) as "observation-driven" models in which the conditional means and variances given the past are explicit functions of past outcomes. The class includes autoregressive and Markov chain models for continuous and categorical observations as well as models for counts (e.g., Poisson) and continuous outcomes with constant coefficient of variation (e.g., gamma). We focus on Poisson and gamma data for illustration. Analogous to QL for independent observations, large-sample properties of the regression coefficients depend only on correct specification of the first conditional moment.  相似文献   

16.
Ghosh S  Gelfand AE  Zhu K  Clark JS 《Biometrics》2012,68(3):878-885
Summary Many applications involve count data from a process that yields an excess number of zeros. Zero-inflated count models, in particular, zero-inflated Poisson (ZIP) and zero-inflated negative binomial (ZINB) models, along with Poisson hurdle models, are commonly used to address this problem. However, these models struggle to explain extreme incidence of zeros (say more than 80%), especially to find important covariates. In fact, the ZIP may struggle even when the proportion is not extreme. To redress this problem we propose the class of k-ZIG models. These models allow more flexible modeling of both the zero-inflation and the nonzero counts, allowing interplay between these two components. We develop the properties of this new class of models, including reparameterization to a natural link function. The models are straightforwardly fitted within a Bayesian framework. The methodology is illustrated with simulated data examples as well as a forest seedling dataset obtained from the USDA Forest Service's Forest Inventory and Analysis program.  相似文献   

17.
ABSTRACT Count data with means <2 are often assumed to follow a Poisson distribution. However, in many cases these kinds of data, such as number of young fledged, are more appropriately considered to be multinomial observations due to naturally occurring upper truncation of the distribution. We evaluated the performance of several versions of multinomial regression, plus Poisson and normal regression, for analysis of count data with means <2 through Monte Carlo simulations. Simulated data mimicked observed counts of number of young fledged (0, 1, 2, or 3) by California spotted owls (Strix occidentalis occidentalis). We considered size and power of tests to detect differences among 10 levels of a categorical predictor, as well as tests for trends across 10-year periods. We found regular regression and analysis of variance procedures based on a normal distribution to perform satisfactorily in all cases we considered, whereas failure rate of multinomial procedures was often excessively high, and the Poisson model demonstrated inappropriate test size for data where the variance/mean ratio was <1 or >1.2. Thus, managers can use simple statistical methods with which they are likely already familiar to analyze the kinds of count data we described here.  相似文献   

18.
Zero‐truncated data arises in various disciplines where counts are observed but the zero count category cannot be observed during sampling. Maximum likelihood estimation can be used to model these data; however, due to its nonstandard form it cannot be easily implemented using well‐known software packages, and additional programming is often required. Motivated by the Rao–Blackwell theorem, we develop a weighted partial likelihood approach to estimate model parameters for zero‐truncated binomial and Poisson data. The resulting estimating function is equivalent to a weighted score function for standard count data models, and allows for applying readily available software. We evaluate the efficiency for this new approach and show that it performs almost as well as maximum likelihood estimation. The weighted partial likelihood approach is then extended to regression modelling and variable selection. We examine the performance of the proposed methods through simulation and present two case studies using real data.  相似文献   

19.
This paper discusses a two‐state hidden Markov Poisson regression (MPR) model for analyzing longitudinal data of epileptic seizure counts, which allows for the rate of the Poisson process to depend on covariates through an exponential link function and to change according to the states of a two‐state Markov chain with its transition probabilities associated with covariates through a logit link function. This paper also considers a two‐state hidden Markov negative binomial regression (MNBR) model, as an alternative, by using the negative binomial instead of Poisson distribution in the proposed MPR model when there exists extra‐Poisson variation conditional on the states of the Markov chain. The two proposed models in this paper relax the stationary requirement of the Markov chain, allow for overdispersion relative to the usual Poisson regression model and for correlation between repeated observations. The proposed methodology provides a plausible analysis for the longitudinal data of epileptic seizure counts, and the MNBR model fits the data much better than the MPR model. Maximum likelihood estimation using the EM and quasi‐Newton algorithms is discussed. A Monte Carlo study for the proposed MPR model investigates the reliability of the estimation method, the choice of probabilities for the initial states of the Markov chain, and some finite sample behaviors of the maximum likelihood estimates, suggesting that (1) the estimation method is accurate and reliable as long as the total number of observations is reasonably large, and (2) the choice of probabilities for the initial states of the Markov process has little impact on the parameter estimates.  相似文献   

20.
Current practice in the normalization of microbiome count data is inefficient in the statistical sense. For apparently historical reasons, the common approach is either to use simple proportions (which does not address heteroscedasticity) or to use rarefying of counts, even though both of these approaches are inappropriate for detection of differentially abundant species. Well-established statistical theory is available that simultaneously accounts for library size differences and biological variability using an appropriate mixture model. Moreover, specific implementations for DNA sequencing read count data (based on a Negative Binomial model for instance) are already available in RNA-Seq focused R packages such as edgeR and DESeq. Here we summarize the supporting statistical theory and use simulations and empirical data to demonstrate substantial improvements provided by a relevant mixture model framework over simple proportions or rarefying. We show how both proportions and rarefied counts result in a high rate of false positives in tests for species that are differentially abundant across sample classes. Regarding microbiome sample-wise clustering, we also show that the rarefying procedure often discards samples that can be accurately clustered by alternative methods. We further compare different Negative Binomial methods with a recently-described zero-inflated Gaussian mixture, implemented in a package called metagenomeSeq. We find that metagenomeSeq performs well when there is an adequate number of biological replicates, but it nevertheless tends toward a higher false positive rate. Based on these results and well-established statistical theory, we advocate that investigators avoid rarefying altogether. We have provided microbiome-specific extensions to these tools in the R package, phyloseq.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号