期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A statistical model for under- or overdispersed clustered and longitudinal count data

Grunwald GK Bruce SL Jiang L Strand M Rabinovitch N 《Biometrical journal. Biometrische Zeitschrift》2011,53(4):578-594

We propose a likelihood-based model for correlated count data that display under- or overdispersion within units (e.g. subjects). The model is capable of handling correlation due to clustering and/or serial correlation, in the presence of unbalanced, missing or unequally spaced data. A family of distributions based on birth-event processes is used to model within-subject underdispersion. A computational approach is given to overcome a parameterization difficulty with this family, and this allows use of common Markov Chain Monte Carlo software (e.g. WinBUGS) for estimation. Application of the model to daily counts of asthma inhaler use by children shows substantial within-subject underdispersion, between-subject heterogeneity and correlation due to both clustering of measurements within subjects and serial correlation of longitudinal measurements. The model provides a major improvement over Poisson longitudinal models, and diagnostics show that the model fits well. 相似文献

2.

A statistical model for iTRAQ data analysis

Hill EG Schwacke JH Comte-Walters S Slate EH Oberg AL Eckel-Passow JE Therneau TM Schey KL 《Journal of proteome research》2008,7(8):3091-3101

We describe biological and experimental factors that induce variability in reporter ion peak areas obtained from iTRAQ experiments. We demonstrate how these factors can be incorporated into a statistical model for use in evaluating differential protein expression and highlight the benefits of using analysis of variance to quantify fold change. We demonstrate the model's utility based on an analysis of iTRAQ data derived from a spike-in study. 相似文献

3.

Mutation parameters from DNA sequence data using graph theoretic measures on lineage trees

Magori-Cohen R Louzoun Y Kleinstein SH 《Bioinformatics (Oxford, England)》2006,22(14):e332-e340

MOTIVATION: B cells responding to antigenic stimulation can fine-tune their binding properties through a process of affinity maturation composed of somatic hypermutation, affinity-selection and clonal expansion. The mutation rate of the B cell receptor DNA sequence, and the effect of these mutations on affinity and specificity, are of critical importance for understanding immune and autoimmune processes. Unbiased estimates of these properties are currently lacking due to the short time-scales involved and the small numbers of sequences available. RESULTS: We have developed a bioinformatic method based on a maximum likelihood analysis of phylogenetic lineage trees to estimate the parameters of a B cell clonal expansion model, which includes somatic hypermutation with the possibility of lethal mutations. Lineage trees are created from clonally related B cell receptor DNA sequences. Important links between tree shapes and underlying model parameters are identified using mutual information. Parameters are estimated using a likelihood function based on the joint distribution of several tree shapes, without requiring a priori knowledge of the number of generations in the clone (which is not available for rapidly dividing populations in vivo). A systematic validation on synthetic trees produced by a mutating birth-death process simulation shows that our estimates are precise and robust to several underlying assumptions. These methods are applied to experimental data from autoimmune mice to demonstrate the existence of hypermutating B cells in an unexpected location in the spleen. 相似文献

4.

A conditional density error model for the statistical analysis of microarray data 总被引：1，自引：0，他引：1

Love B Rank DR Penn SG Jenkins DA Thomas RS 《Bioinformatics (Oxford, England)》2002,18(8):1064-1072

相似文献

5.

The statistical analysis of the differential blood count

H. D. Unkelbach 《Biometrical journal. Biometrische Zeitschrift》1980,22(6):545-552

The differential blood count obtained by unbiased sampling mathematically follows a multinomial distribution. The variation between individuals can be formalized by a mixing distribution of the parameters of the multinomial distribution. By DIRICHLET-distributions used as mixing distributions the main phenomena of the observed data can be described, and they are useful in estimating and testing treatment effects. If the correlation between the different types of leucocytes is taken into account in an appropriate manner, univariate test procedures can be applied also. 相似文献

6.

Improved analysis of lek count data using N-mixture models

Rebecca McCaffery J. Joshua Nowak Paul M. Lukacs 《The Journal of wildlife management》2016,80(6):1011-1021

相似文献

7.

The statistical analysis of insect phenology

Murtaugh PA Emerson SC McEvoy PB Higgs KM 《Environmental entomology》2012,41(2):355-361

We introduce two simple methods for the statistical comparison of the temporal pattern of life-cycle events between two populations. The methods are based on a translation of stage-frequency data into individual 'times in stage'. For example, if the stage-k individuals in a set of samples consist of three individuals counted at time t(1) and two counted at time t(2), the observed times in stage k would be (t(1), t(1), t(1), t(2), t(2)). Times in stage then can be compared between two populations by performing stage-specific t-tests or by testing for equality of regression lines of time versus stage between the two populations. Simulations show that our methods perform at close to the nominal level, have good power against a range of alternatives, and have much better operating characteristics than a widely-used phenology model from the literature. 相似文献

8.

A generalized model for overdispersed count data

Hiroshi Okamura André E. Punt Tatsuya Amano 《Population Ecology》2012,54(3):467-474

Overdispersed count data are very common in ecology. The negative binomial model has been used widely to represent such data. Ecological data often vary considerably, and traditional approaches are likely to be inefficient or incorrect due to underestimation of uncertainty and poor predictive power. We propose a new statistical model to account for excessive overdisperson. It is the combination of two negative binomial models, where the first determines the number of clusters and the second the number of individuals in each cluster. Simulations show that this model often performs better than the negative binomial model. This model also fitted catch and effort data for southern bluefin tuna better than other models according to AIC. A model that explicitly and properly accounts for overdispersion should contribute to robust management and conservation for wildlife and plants. 相似文献

9.

A discrete-time model for the statistical analysis of infectious disease incidence data.

A H Rampey I M Longini M Haber A S Monto 《Biometrics》1992,48(1):117-128

A discrete-time model is devised for the per-time-unit distribution of infectious disease cases in a sample of households. Using the time at which an individual is identified (e.g., when illness symptoms appear) as a marker for being infected, the probabilities of becoming infected from the community or from a single infectious household member are estimated for various risk factor levels. Maximum likelihood procedures for estimating the model parameters are given. An individual may be classified with regard to level of susceptibility and level of infectiousness. The model is fitted to a combination of symptom and viral culture data from a rhinovirus epidemic in Tecumseh, Michigan. In general, it is observed that decreasing risk of infection is associated with increasing age. 相似文献

10.

An information theory analysis of gene-environmental interactions in count/rate data

Knights J Ramanathan M 《Human heredity》2012,73(3):123-138

相似文献

11.

The use of a mixture model in the analysis of count data 总被引：1，自引：0，他引：1

V T Farewell D A Sprott 《Biometrics》1988,44(4):1191-1194

A mixture model is presented for the analysis of data on premature ventricular contractions. The analysis is shown to be straightforward and the conclusions relatively simple. 相似文献

12.

A Bayesian generalized random regression model for estimating heritability using overdispersed count data

Colette Mair Michael Stear Paul Johnson Matthew Denwood Joaquin Prada Jimenez de Cisneros Thorsten Stefan Louise Matthews 《遗传、选种与进化》2015,47(1)

Background

Faecal egg counts are a common indicator of nematode infection and since it is a heritable trait, it provides a marker for selective breeding. However, since resistance to disease changes as the adaptive immune system develops, quantifying temporal changes in heritability could help improve selective breeding programs. Faecal egg counts can be extremely skewed and difficult to handle statistically. Therefore, previous heritability analyses have log transformed faecal egg counts to estimate heritability on a latent scale. However, such transformations may not always be appropriate. In addition, analyses of faecal egg counts have typically used univariate rather than multivariate analyses such as random regression that are appropriate when traits are correlated. We present a method for estimating the heritability of untransformed faecal egg counts over the grazing season using random regression.

Results

Replicating standard univariate analyses, we showed the dependence of heritability estimates on choice of transformation. Then, using a multitrait model, we exposed temporal correlations, highlighting the need for a random regression approach. Since random regression can sometimes involve the estimation of more parameters than observations or result in computationally intractable problems, we chose to investigate reduced rank random regression. Using standard software (WOMBAT), we discuss the estimation of variance components for log transformed data using both full and reduced rank analyses. Then, we modelled the untransformed data assuming it to be negative binomially distributed and used Metropolis Hastings to fit a generalized reduced rank random regression model with an additive genetic, permanent environmental and maternal effect. These three variance components explained more than 80 % of the total phenotypic variation, whereas the variance components for the log transformed data accounted for considerably less. The heritability, on a link scale, increased from around 0.25 at the beginning of the grazing season to around 0.4 at the end.

Conclusions

Random regressions are a useful tool for quantifying sources of variation across time. Our MCMC (Markov chain Monte Carlo) algorithm provides a flexible approach to fitting random regression models to non-normal data. Here we applied the algorithm to negative binomially distributed faecal egg count data, but this method is readily applicable to other types of overdispersed data. 相似文献

13.

Joint analysis of panel count and interval-censored data using distribution-free frailty analysis

Chi-Chung Wen Yi-Hau Chen Chi-Hong Tseng 《Biometrical journal. Biometrische Zeitschrift》2020,62(5):1164-1175

We propose a joint analysis of recurrent and nonrecurrent event data subject to general types of interval censoring. The proposed analysis allows for general semiparametric models, including the Box–Cox transformation and inverse Box–Cox transformation models for the recurrent and nonrecurrent events, respectively. A frailty variable is used to account for the potential dependence between the recurrent and nonrecurrent event processes, while leaving the distribution of the frailty unspecified. We apply the pseudolikelihood for interval-censored recurrent event data, usually termed as panel count data, and the sufficient likelihood for interval-censored nonrecurrent event data by conditioning on the sufficient statistic for the frailty and using the working assumption of independence over examination times. Large sample theory and a computation procedure for the proposed analysis are established. We illustrate the proposed methodology by a joint analysis of the numbers of occurrences of basal cell carcinoma over time and time to the first recurrence of squamous cell carcinoma based on a skin cancer dataset, as well as a joint analysis of the numbers of adverse events and time to premature withdrawal from study medication based on a scleroderma lung disease dataset. 相似文献

14.

Some measures of information arising in statistical games

Hans W. Gottinger 《Biological cybernetics》1974,15(2):111-116

This paper discusses some measures of information which naturally arise in the context of statistical games (games against nature). Some useful inequalities are proven relating the entropy to the value of information provided by experiments. Two other measures, based on the notion of a metric as informational distance and that of a diameter value are also discussed. 相似文献

15.

Differential expression analysis for sequence count data 总被引：22，自引：0，他引：22

Anders S Huber W 《Genome biology》2010,11(10):R106

High-throughput sequencing assays such as RNA-Seq, ChIP-Seq or barcode counting provide quantitative readouts in the form of count data. To infer differential signal in such data correctly and with good statistical power, estimation of data variability throughout the dynamic range and a suitable error model are required. We propose a method based on the negative binomial distribution, with variance and mean linked by local regression and present an implementation, DESeq, as an R/Bioconductor package. 相似文献

16.

Practical measures of integrated information for time-series data

Barrett AB Seth AK 《PLoS computational biology》2011,7(1):e1001052

A recent measure of 'integrated information', Φ(DM), quantifies the extent to which a system generates more information than the sum of its parts as it transitions between states, possibly reflecting levels of consciousness generated by neural systems. However, Φ(DM) is defined only for discrete Markov systems, which are unusual in biology; as a result, Φ(DM) can rarely be measured in practice. Here, we describe two new measures, Φ(E) and Φ(AR), that overcome these limitations and are easy to apply to time-series data. We use simulations to demonstrate the in-practice applicability of our measures, and to explore their properties. Our results provide new opportunities for examining information integration in real and model systems and carry implications for relations between integrated information, consciousness, and other neurocognitive processes. However, our findings pose challenges for theories that ascribe physical meaning to the measured quantities. 相似文献

17.

Overdispersed fungus germination data: statistical analysis using R

Maíra Blumer Fatoretto Rafael de Andrade Moral Clarice Garcia Borges Demétrio Christopher Silva de Pádua Vinicius Menarin Víctor Manuel Arévalo Rojas 《Biocontrol Science and Technology》2018,28(11):1034-1053

ABSTRACT

Proportion data from dose-response experiments are often overdispersed, characterised by a larger variance than assumed by the standard binomial model. Here, we present different models proposed in the literature that incorporate overdispersion. We also discuss how to select the best model to describe the data and present, using R software, specific code used to fit and interpret binomial, quasi-binomial, beta-binomial, and binomial-normal models, as well as to assess goodness-of-fit. We illustrate applications of these generalized linear models and generalized linear mixed models with a case study from a biological control experiment, where different isolates of Isaria fumosorosea (Hypocreales: Cordycipitaceae) were used to assess which ones presented higher resistance to UV-B radiation. We show how to test for differences between isolates and also how to statistically group isolates presenting a similar behaviour. 相似文献

18.

利用E-mail检索Inter网上的昆虫基因信息

徐广郭予元宋福平《昆虫知识》2000,37(4):255-255

在昆虫分子生物学研究中 ,常常需要了解昆虫的基因序列或对基因序列进行分析比较。随着 Internet的迅速发展 ,互连网上的昆虫学资源也日益丰富。利用 E-mail可以免费、迅速地获得美国国家生物技术信息中心提供的有关核酸和蛋白质序列与结构以及出版物的信息 ,其中也包括昆虫的信息。只需要向地址为query@ncbi.nlm.nih.gov的服务器发送按下文格式书写的 E-mail,若网络通畅 ,几分钟就可以收到对方按查询要求发回的 E-mail。该服务器提供的检索范围几乎囊括了世界上所有著名的相关数据库 ,其中核酸序列数据库包括 :Gen Bank,EMBL,DDBJ,db… 相似文献

19.

A statistical model for testing the pleiotropic control of phenotypic plasticity for a count trait

下载免费PDF全文

Ma CX Yu Q Berg A Drost D Novaes E Fu G Yap JS Tan A Kirst M Cui Y Wu R 《Genetics》2008,179(1):627-636

The differences of a phenotypic trait produced by a genotype in response to changes in the environment are referred to as phenotypic plasticity. Despite its importance in the maintenance of genetic diversity via genotype-by-environment interactions, little is known about the detailed genetic architecture of this phenomenon, thus limiting our ability to predict the pattern and process of microevolutionary responses to changing environments. In this article, we develop a statistical model for mapping quantitative trait loci (QTL) that control the phenotypic plasticity of a complex trait through differentiated expressions of pleiotropic QTL in different environments. In particular, our model focuses on count traits that represent an important aspect of biological systems, controlled by a network of multiple genes and environmental factors. The model was derived within a multivariate mixture model framework in which QTL genotype-specific mixture components are modeled by a multivariate Poisson distribution for a count trait expressed in multiple clonal replicates. A two-stage hierarchic EM algorithm is implemented to obtain the maximum-likelihood estimates of the Poisson parameters that specify environment-specific genetic effects of a QTL and residual errors. By approximating the number of sylleptic branches on the main stems of poplar hybrids by a Poisson distribution, the new model was applied to map QTL that contribute to the phenotypic plasticity of a count trait. The statistical behavior of the model and its utilization were investigated through simulation studies that mimic the poplar example used. This model will provide insights into how genomes and environments interact to determine the phenotypes of complex count traits. 相似文献

20.

Deriving sustainability measures using statistical data: A case study from the Eisenwurzen, Austria

Friedrich Putzhuber Hubert Hasenauer 《Ecological Indicators》2010,10(1):32-38

Within the past two decades sustainability has become a key term in emphasizing and understanding relationships between economic progress and the protection of the environment. One key difficulty is in the definition of sustainability indicators based on information at different spatial and temporal scales. In this paper we formalize statistical models for the assessment of sustainability impact indicators using a public data source provided by the Austrian government. Our application example is the Eisenwurzen region in Austria, an old and famous mining area within the Alps. The total area covers 5.743 km² and includes 99 municipalities. In our study we define 15 impact indicators covering economic, social and environmental impacts. For each of the impact indicators we develop response functions using the available public data sources. The results suggest that the available data are an important source for deriving sustainable impact indicators within specific regions. The presented approach may serve as diagnostic tool to provide insights into the regional drivers for assessing sustainability indicators. 相似文献