首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A log-linear model for estimating the size of a closed population is defined for inverse multiple-recapture sampling with dependent samples. Efficient estimators of the log-linear model parameters and the population size are obtained by the method of minimum chi-square. A chi-square test of the general linear hypothesis regarding the log-linear model parameters is defined.  相似文献   

2.
Green PE  Park T 《Biometrics》2003,59(4):886-896
Log-linear models have been shown to be useful for smoothing contingency tables when categorical outcomes are subject to nonignorable nonresponse. A log-linear model can be fit to an augmented data table that includes an indicator variable designating whether subjects are respondents or nonrespondents. Maximum likelihood estimates calculated from the augmented data table are known to suffer from instability due to boundary solutions. Park and Brown (1994, Journal of the American Statistical Association 89, 44-52) and Park (1998, Biometrics 54, 1579-1590) developed empirical Bayes models that tend to smooth estimates away from the boundary. In those approaches, estimates for nonrespondents were calculated using an EM algorithm by maximizing a posterior distribution. As an extension of their earlier work, we develop a Bayesian hierarchical model that incorporates a log-linear model in the prior specification. In addition, due to uncertainty in the variable selection process associated with just one log-linear model, we simultaneously consider a finite number of models using a stochastic search variable selection (SSVS) procedure due to George and McCulloch (1997, Statistica Sinica 7, 339-373). The integration of the SSVS procedure into a Markov chain Monte Carlo (MCMC) sampler is straightforward, and leads to estimates of cell frequencies for the nonrespondents that are averages resulting from several log-linear models. The methods are demonstrated with a data example involving serum creatinine levels of patients who survived renal transplants. A simulation study is conducted to investigate properties of the model.  相似文献   

3.
The case-crossover design was introduced in epidemiology 15 years ago as a method for studying the effects of a risk factor on a health event using only cases. The idea is to compare a case's exposure immediately prior to or during the case-defining event with that same person's exposure at otherwise similar "reference" times. An alternative approach to the analysis of daily exposure and case-only data is time series analysis. Here, log-linear regression models express the expected total number of events on each day as a function of the exposure level and potential confounding variables. In time series analyses of air pollution, smooth functions of time and weather are the main confounders. Time series and case-crossover methods are often viewed as competing methods. In this paper, we show that case-crossover using conditional logistic regression is a special case of time series analysis when there is a common exposure such as in air pollution studies. This equivalence provides computational convenience for case-crossover analyses and a better understanding of time series models. Time series log-linear regression accounts for overdispersion of the Poisson variance, while case-crossover analyses typically do not. This equivalence also permits model checking for case-crossover data using standard log-linear model diagnostics.  相似文献   

4.
The Grizzle-Starmer-Koch (GSK) model is extended to include the traditional log-linear model and a general class of Poisson and conditional Poisson distributions. Estimators of the model parameters are defined under general exact and stochastic linear constraints.  相似文献   

5.
One approach frequently used for identifying genetic factors involved in the process of a complex disease is the comparison of patients and controls for a number of genetic markers near a candidate gene. The analysis of such association studies raises some specific problems because of the fact that genotypic and not gametic data are generally available. We present a log-linear-model analysis providing a valid method for analyzing such studies. When studying the association of disease with one marker locus, the log-linear model allows one to test for the difference between allelic frequencies among affected and unaffected individuals, Hardy-Weinberg (H-W) equilibrium in both groups, and interaction between the association of alleles at the marker locus and disease. This interaction provides information about the dominance of the disease susceptibility locus, with dominance defined using the epidemiological notion of odds ratio. The degree of dominance measured at the marker locus depends on the strength of linkage disequilibrium between the marker locus and the disease locus. When studying the association of disease with several linked markers, the model becomes rapidly complex and uninterpretable unless it is assumed that affected and unaffected populations are in H-W equilibrium at each locus. This hypothesis must be tested before going ahead in the analysis. If it is not rejected, the log-linear model offers a stepwise method of identification of the parameters causing the difference between populations. This model can be extended to any number of loci, alleles, or populations.  相似文献   

6.
A generalized self-thinning curve for plants is derived from the modified Von Bertallanfy equation. When an asymptotic relation between photosynthesis per unit of leaf area and stocking density is assumed, the self-thinning curve thus derived is also asymptotic on a log-log scale but is fitted quite well by a log-linear approximation. The model predicts that the slope of the log-linear approximation is a function of (a) photosynthetic response to density and (b) the relation between leaf area and total aboveground biomass. Intercept of the log-linear approximation is a function of these plus maximum attainable biomass, site productivity, density at which maximum photosynthesis is attained, and the nature of carbon loss within the plant community. Linkages between various parameters within the model act to reduce differences in slope and intercept for species with different life history's and physiological requirements.  相似文献   

7.
L A Goodman 《Biometrics》1983,39(1):149-160
To analyse the dependence of a qualitative (dichotomous or polytomous) response variable upon one or more qualitative explanatory variables, log-linear models for frequencies are compared with log-linear models for odds, when the categories of the response variable are ordered and the categories of each explanatory variable may be either ordered or unordered. The log-linear models for odds express the odds (or log odds) pertaining to adjacent response categories in terms of appropriate multiplicative (or additive) factors. These models include the 'null log-odds model', the 'uniform log-odds model', the 'parallel log-odds model', and other log-linear models for the odds. With these models, the dependence of the response variable (with ordered categories) can be analyzed in a manner analogous to the usual multiple regression analysis and related analysis of variance and analysis of covariance. Application of log-linear models for the odds sheds light on earlier applications of log-linear models for the frequencies in contingency tables with ordered categories.  相似文献   

8.
Coull BA  Agresti A 《Biometrics》1999,55(1):294-301
We examine issues in estimating population size N with capture-recapture models when there is variable catchability among subjects. We focus on a logistic-normal mixed model, for which the logit of the probability of capture is an additive function of a random subject and a fixed sampling occasion parameter. When the probability of capture is small or the degree of heterogeneity is large, the log-likelihood surface is relatively flat and it is difficult to obtain much information about N. We also discuss a latent class model and a log-linear model that account for heterogeneity and show that the log-linear model has greater scope. Models assuming homogeneity provide much narrower intervals for N but are usually highly overly optimistic, the actual coverage probability being much lower than the nominal level.  相似文献   

9.
Summary Doubling time has been widely used to represent the growth pattern of cells. A traditional method for finding the doubling time is to apply gray-scaled cells, where the logarithmic transformed scale is used. As an alternative statistical method, the log-linear model was recently proposed, for which actual cell numbers are used instead of the transformed gray-scaled cells. In this paper, I extend the log-linear model and propose the extended log-linear model. This model is designed for extra-Poisson variation, where the log-linear model produces the less appropriate estimate of the doubling time. Moreover, I compare statistical properties of the gray-scaled method, the log-linear model, and the extended log-linear model. For this purpose, I perform a Monte Carlo simulation study with three data-generating models: the additive error model, the multiplicative error model, and the overdispersed Poisson model. From the simulation study, I found that the gray-scaled method highly depends on the normality assumption of the gray-scaled cells; hence, this method is appropriate when the error model is multiplicative with the log-normally distributed errors. However, it is less efficient for other types of error distributions, especially when the error model is additive or the errors follow the Poisson distribution. The estimated standard error for the doubling time is not accurate in this case. The log-linear model was found to be efficient when the errors follow the Poisson distribution or nearly Poisson distribution. The efficiency of the log-linear model was decreased accordingly as the overdispersion increased, compared to the extended log-linear model. When the error model is additive or multiplicative with Gamma-distributed errors, the log-linear model is more efficient than the gray-scaled method. The extended log-linear model performs well overall for all three data-generating models. The loss of efficiency of the extended log-linear model is observed only when the error model is multiplicative with log-normally distributed errors, where the gray-scaled method is appropriate. However, the extended log-linear model is more efficient than the log-linear model in this case.  相似文献   

10.
M A Espeland  S L Hui 《Biometrics》1987,43(4):1001-1012
Misclassification is a common source of bias and reduced efficiency in the analysis of discrete data. Several methods have been proposed to adjust for misclassification using information on error rates (i) gathered by resampling the study population, (ii) gathered by sampling a separate population, or (iii) assumed a priori. We present unified methods for incorporating these types of information into analyses based on log-linear models and maximum likelihood estimation. General variance expressions are developed. Examples from epidemiologic studies are used to demonstrate the proposed methodology.  相似文献   

11.
The effect of incubation temperature, before and after a heat shock, on thermotolerance of Listeria monocytogenes at 58°C was investigated. Exposing cells grown at 10°C and 30°C to a heat shock resulted in similar rises in thermotolerance while the increase was significantly higher when cells were grown at 4°C prior to the heat shock. Cells held at 4°C and 10°C after heat shock maintained heat shock-induced thermotolerance for longer than cells held at 30°C. The growth temperature prior to inactivation had negligible effect on the persistence of heat shock-induced thermotolerance. Concurrent with measurements of thermotolerance were measurements of the levels of heat shock-induced proteins. Major proteins showing increased synthesis upon the heat shock had approximate molecular weights of 84, 74, 63, 25 and 19 kDa. There was little correlation between the loss of thermotolerance after the heat shock and the levels of these proteins. Thermotolerance of heat shocked and non-heat shocked cells was described by traditional log-linear kinetics and a model describing a sigmoidal death curve (logistic model). Employing log-linear kinetics resulted in a poor fit to a major part of the data whereas a good fit was achieved by the use of a logistic model.  相似文献   

12.
In Bayesian divergence time estimation methods, incorporating calibrating information from the fossil record is commonly done by assigning prior densities to ancestral nodes in the tree. Calibration prior densities are typically parametric distributions offset by minimum age estimates provided by the fossil record. Specification of the parameters of calibration densities requires the user to quantify his or her prior knowledge of the age of the ancestral node relative to the age of its calibrating fossil. The values of these parameters can, potentially, result in biased estimates of node ages if they lead to overly informative prior distributions. Accordingly, determining parameter values that lead to adequate prior densities is not straightforward. In this study, I present a hierarchical Bayesian model for calibrating divergence time analyses with multiple fossil age constraints. This approach applies a Dirichlet process prior as a hyperprior on the parameters of calibration prior densities. Specifically, this model assumes that the rate parameters of exponential prior distributions on calibrated nodes are distributed according to a Dirichlet process, whereby the rate parameters are clustered into distinct parameter categories. Both simulated and biological data are analyzed to evaluate the performance of the Dirichlet process hyperprior. Compared with fixed exponential prior densities, the hierarchical Bayesian approach results in more accurate and precise estimates of internal node ages. When this hyperprior is applied using Markov chain Monte Carlo methods, the ages of calibrated nodes are sampled from mixtures of exponential distributions and uncertainty in the values of calibration density parameters is taken into account.  相似文献   

13.
The consistency of the species abundance distribution across diverse communities has attracted widespread attention. In this paper, I argue that the consistency of pattern arises because diverse ecological mechanisms share a common symmetry with regard to measurement scale. By symmetry, I mean that different ecological processes preserve the same measure of information and lose all other information in the aggregation of various perturbations. I frame these explanations of symmetry, measurement, and aggregation in terms of a recently developed extension to the theory of maximum entropy. I show that the natural measurement scale for the species abundance distribution is log-linear: the information in observations at small population sizes scales logarithmically and, as population size increases, the scaling of information grades from logarithmic to linear. Such log-linear scaling leads naturally to a gamma distribution for species abundance, which matches well with the observed patterns. Much of the variation between samples can be explained by the magnitude at which the measurement scale grades from logarithmic to linear. This measurement approach can be applied to the similar problem of allelic diversity in population genetics and to a wide variety of other patterns in biology.  相似文献   

14.
R N Tamura  S S Young 《Biometrics》1986,42(2):343-349
Empirical Bayes procedures for incorporating historical control information in bioassay carcinogenesis studies are receiving attention in the literature. In general, the empirical Bayes methods fail to take into account the error in estimating the parameters of a prior distribution. The implications of this are studied for the beta prior of Tarone (1982, Biometrics 38, 215-220). Using simulations, we show that the skewness in the maximum likelihood estimators for the parameters of the beta prior increases the false positive rate in the test of dose-related trend.  相似文献   

15.
Analysis of variance (ANOVA) and log-linear analyses of time-budget data from a study of sloth bear enclosure utilization are compared. Two sampling models that plausibly underlie such data are discussed. Either could lead to an analysis of variance, but only one to a log-linear analysis. Given an appropriate sampling model and appropriate data, there is much to recommend log-linear analysis, despite its unfamiliarity to most animal behaviorists. One need not worry whether distribution assumptions are violated. Moreover, the data analyzed are the data collected, not estimates derived from those data, and thus no power is lost through a data reduction step. No matter what analysis is used, effect size should be taken into consideration. Multiple R2 can be used for ANOVA, but no directly comparable statistic exists for log-linear analyses. One possible candidate for a log-linear R2 analog is discussed here, and appears to give sensible and interpretable results. © 1992 Wiley-Liss Inc.  相似文献   

16.
In studies of complex health conditions, mixtures of discrete outcomes (event time, count, binary, ordered categorical) are commonly collected. For example, studies of skin tumorigenesis record latency time prior to the first tumor, increases in the number of tumors at each week, and the occurrence of internal tumors at the time of death. Motivated by this application, we propose a general underlying Poisson variable framework for mixed discrete outcomes, accommodating dependency through an additive gamma frailty model for the Poisson means. The model has log-linear, complementary log-log, and proportional hazards forms for count, binary and discrete event time outcomes, respectively. Simple closed form expressions can be derived for the marginal expectations, variances, and correlations. Following a Bayesian approach to inference, conditionally-conjugate prior distributions are chosen that facilitate posterior computation via an MCMC algorithm. The methods are illustrated using data from a Tg.AC mouse bioassay study.  相似文献   

17.
In biomarker discovery, applying domain knowledge is an effective approach to eliminating false positive features, prioritizing functionally impactful markers and facilitating the interpretation of predictive signatures. Several computational methods have been developed that formulate the knowledge-based biomarker discovery as a feature selection problem guided by prior information. These methods often require that prior information is encoded as a single score and the algorithms are optimized for biological knowledge of a specific type. However, in practice, domain knowledge from diverse resources can provide complementary information. But no current methods can integrate heterogeneous prior information for biomarker discovery. To address this problem, we developed the Know-GRRF (know-guided regularized random forest) method that enables dynamic incorporation of domain knowledge from multiple disciplines to guide feature selection. Know-GRRF embeds domain knowledge in a regularized random forest framework. It combines prior information from multiple domains in a linear model to derive a composite score, which, together with other tuning parameters, controls the regularization of the random forests model. Know-GRRF concurrently optimizes the weight given to each type of domain knowledge and other tuning parameters to minimize the AIC of out-of-bag predictions. The objective is to select a compact feature subset that has a high discriminative power and strong functional relevance to the biological phenotype. Via rigorous simulations, we show that Know-GRRF guided by multiple-domain prior information outperforms feature selection methods guided by single-domain prior information or no prior information. We then applied Known-GRRF to a real-world study to identify prognostic biomarkers of prostate cancers. We evaluated the combination of cancer-related gene annotations, evolutionary conservation and pre-computed statistical scores as the prior knowledge to assemble a panel of biomarkers. We discovered a compact set of biomarkers with significant improvements on prediction accuracies. Know-GRRF is a powerful novel method to incorporate knowledge from multiple domains for feature selection. It has a broad range of applications in biomarker discoveries. We implemented this method and released a KnowGRRF package in the R/CRAN archive.  相似文献   

18.
Risk mapping in epidemiology enables areas with a low or high risk of disease contamination to be localized and provides a measure of risk differences between these regions. Risk mapping models for pooled data currently used by epidemiologists focus on the estimated risk for each geographical unit. They are based on a Poisson log-linear mixed model with a latent intrinsic continuous hidden Markov random field (HMRF) generally corresponding to a Gaussian autoregressive spatial smoothing. Risk classification, which is necessary to draw clearly delimited risk zones (in which protection measures may be applied), generally must be performed separately. We propose a method for direct classified risk mapping based on a Poisson log-linear mixed model with a latent discrete HMRF. The discrete hidden field (HF) corresponds to the assignment of each spatial unit to a risk class. The risk values attached to the classes are parameters and are estimated. When mapping risk using HMRFs, the conditional distribution of the observed field is modeled with a Poisson rather than a Gaussian distribution as in image segmentation. Moreover, abrupt changes in risk levels are rare in disease maps. The spatial hidden model should favor smoothed out risks, but conventional discrete Markov random fields (e.g. the Potts model) do not impose this. We therefore propose new potential functions for the HF that take into account class ordering. We use a Monte Carlo version of the expectation-maximization algorithm to estimate parameters and determine risk classes. We illustrate the method's behavior on simulated and real data sets. Our method appears particularly well adapted to localize high-risk regions and estimate the corresponding risk levels.  相似文献   

19.
Schweder T 《Biometrics》2003,59(4):974-983
Maximum likelihood estimates of abundance are obtained from repeated photographic surveys of a closed stratified population with naturally marked and unmarked individuals. Capture intensities are assumed log-linear in stratum, year, and season. In the chosen model, an approximate confidence distribution for total abundance of bowhead whales, with an accompanying likelihood reduced of nuisance parameters, is found from a parametric bootstrap experiment. The confidence distribution depends on the assumed study protocol. A confidence distribution that is exact (except for the effect of discreteness) is found by conditioning in the unstratified case without unmarked individuals.  相似文献   

20.
The heterogeneous Poisson process with discretized exponential quadratic rate function is considered. Maximum likelihood estimates of the parameters of the rate function are derived for the case when the data consists of numbers of occurrences in consecutive equal time periods. A likelihood ratio test of the null hypothesis of exponential quadratic rate is presented. Its power against exponential linear rate functions is estimated using Monte Carlo simulation. The maximum likelihood method is compared with a log-linear least squares techniques. An application of the technique to the analysis of mortality rates due to congenital malformations is presented.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号