首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 8 毫秒
1.
The aggregate data study design (Prentice and Sheppard, 1995, Biometrika 82, 113-125) estimates individual-level exposure effects by regressing population-based disease rates on covariate data from survey samples in each population group. In this work, we further develop the aggregate data model to allow for residual spatial correlation among disease rates across populations. Geographical variation that is not explained by model predictors and has a spatial component often arises in studies of rare chronic diseases, such as breast cancer. We combine the aggregate and Bayesian disease-mapping models to provide an intuitive approach to the modeling of spatial effects while drawing correct inference regarding the exposure effect. Based on the results of simulation studies, we suggest guidelines for use of the proposed model.  相似文献   

2.
Lin  Pei-Sheng 《Biometrika》2008,95(4):847-858
We use the quasilikelihood concept to propose an estimatingequation for spatial data with correlation across the studyregion in a multi-dimensional space. With appropriate mixingconditions, we develop a central limit theorem for a randomfield under various Lp metrics. The consistency and asymptoticnormality of quasilikelihood estimators can then be derived.We also conduct simulations to evaluate the performance of theproposed estimating equation, and a dataset from East LansingWoods is used to illustrate the method.  相似文献   

3.
Summary .   In this article, we present new methods to analyze data from an experiment using rodent models to investigate the role of p27, an important cell-cycle mediator, in early colon carcinogenesis. The responses modeled here are essentially functions nested within a two-stage hierarchy. Standard functional data analysis literature focuses on a single stage of hierarchy and conditionally independent functions with near white noise. However, in our experiment, there is substantial biological motivation for the existence of spatial correlation among the functions, which arise from the locations of biological structures called colonic crypts: this possible functional correlation is a phenomenon we term crypt signaling . Thus, as a point of general methodology, we require an analysis that allows for functions to be correlated at the deepest level of the hierarchy. Our approach is fully Bayesian and uses Markov chain Monte Carlo methods for inference and estimation. Analysis of this data set gives new insights into the structure of p27 expression in early colon carcinogenesis and suggests the existence of significant crypt signaling. Our methodology uses regression splines, and because of the hierarchical nature of the data, dimension reduction of the covariance matrix of the spline coefficients is important: we suggest simple methods for overcoming this problem.  相似文献   

4.
Significance testing for correlated binary outcome data   总被引:1,自引:0,他引:1  
B Rosner  R C Milton 《Biometrics》1988,44(2):505-512
Multiple logistic regression is a commonly used multivariate technique for analyzing data with a binary outcome. One assumption needed for this method of analysis is the independence of outcome for all sample points in a data set. In ophthalmologic data and other types of correlated binary data, this assumption is often grossly violated and the validity of the technique becomes an issue. A technique has been developed (Rosner, 1984) that utilizes a polychotomous logistic regression model to allow one to look at multiple exposure variables in the context of a correlated binary data structure. This model is an extension of the beta-binomial model, which has been widely used to model correlated binary data when no covariates are present. In this paper, a relationship is developed between the two techniques, whereby it is shown that use of ordinary logistic regression in the presence of correlated binary data can result in true significance levels that are considerably larger than nominal levels in frequently encountered situations. This relationship is explored in detail in the case of a single dichotomous exposure variable. In this case, the appropriate test statistic can be expressed as an adjusted chi-square statistic based on the 2 X 2 contingency table relating exposure to outcome. The test statistic is easily computed as a function of the ordinary chi-square statistic and the correlation between eyes (or more generally between cluster members) for outcome and exposure, respectively. This generalizes some previous results obtained by Koval and Donner (1987, in Festschrift for V. M. Joshi, I. B. MacNeill (ed.), Vol. V, 199-224.(ABSTRACT TRUNCATED AT 250 WORDS)  相似文献   

5.
BHOJ  DINESH S. 《Biometrika》1984,71(3):639-641
  相似文献   

6.
On testing equality of means of correlated variables with incomplete data   总被引:1,自引:0,他引:1  
NAIK  UMESH D. 《Biometrika》1975,62(3):615-622
  相似文献   

7.
Community data is often transformed or standardized to meet the requirements and assumptions of multivariate analysis. While these methods are usually appropriate for abundance data, they are seldom applied to presence-absence data. Here, a method of transforming a binary matrix using the binomial probability is described. Number of trials (n), number of successes (x) and probability of success (p) are necessary to compute the binomial probability. Successes were defined as the number of sites where the species occurrence can be considered; trials were equal and greater than the number of successes. The actual occurrence of each species along the gradient was considered the probability of success. The Mantel statistic associated with the binomially transformed distance matrix and the distance matrix based on binary data were used to choose an appropriate binomial transformation. The chosen binomial transformation gave greater value to species indicating habitat typologies. Binomially transformed data rendered results closer to expectations.  相似文献   

8.
The use of survival models involving a random effect or 'frailty' term is becoming more common. Usually the random effects are assumed to represent different clusters, and clusters are assumed to be independent. In this paper, we consider random effects corresponding to clusters that are spatially arranged, such as clinical sites or geographical regions. That is, we might suspect that random effects corresponding to strata in closer proximity to each other might also be similar in magnitude. Such spatial arrangement of the strata can be modeled in several ways, but we group these ways into two general settings: geostatistical approaches, where we use the exact geographic locations (e.g. latitude and longitude) of the strata, and lattice approaches, where we use only the positions of the strata relative to each other (e.g. which counties neighbor which others). We compare our approaches in the context of a dataset on infant mortality in Minnesota counties between 1992 and 1996. Our main substantive goal here is to explain the pattern of infant mortality using important covariates (sex, race, birth weight, age of mother, etc.) while accounting for possible (spatially correlated) differences in hazard among the counties. We use the GIS ArcView to map resulting fitted hazard rates, to help search for possible lingering spatial correlation. The DIC criterion (Spiegelhalter et al., Journal of the Royal Statistical Society, Series B 2002, to appear) is used to choose among various competing models. We investigate the quality of fit of our chosen model, and compare its results when used to investigate neonatal versus post-neonatal mortality. We also compare use of our time-to-event outcome survival model with the simpler dichotomous outcome logistic model. Finally, we summarize our findings and suggest directions for future research.  相似文献   

9.
Summary An index to assess trophic diversity from presence-absence food data is proposed. The index is computed according to the expression , where the 's are the frequencies of occurrence of the various prey categories. The upper and lower limits of D are derived. A test of the reliability of D was carried out by comparing D and H (Shannon's information function) values obtained from a set of twenty-three food analyses from vertebrate animals. Results show that, although a significant correlation exists between D and H, only a small fraction of H-variation is explained by D-variation. D contains two kinds of information, one referred to species richness and another relative to the degree of between-samples heterogeneity. The former is shared in common with H and this presumably explains the fairly weak correlation found between both measures.  相似文献   

10.
GeneMerge--post-genomic analysis,data mining,and hypothesis testing   总被引:6,自引:0,他引:6  
SUMMARY: GeneMerge is a web-based and standalone program written in PERL that returns a range of functional and genomic data for a given set of study genes and provides statistical rank scores for over-representation of particular functions or categories in the data set. Functional or categorical data of all kinds can be analyzed with GeneMerge, facilitating regulatory and metabolic pathway analysis, tests of population genetic hypotheses, cross-experiment comparisons, and tests of chromosomal clustering, among others. GeneMerge can perform analyses on a wide variety of genomic data quickly and easily and facilitates both data mining and hypothesis testing. AVAILABILITY: GeneMerge is available free of charge for academic use over the web and for download from: http://www.oeb.harvard.edu/hartl/lab/publications/GeneMerge.html.  相似文献   

11.
Demographic studies focusing on age-specific mortality rates are becoming increasingly common throughout the fields of life-history evolution, ecology and biogerontology. Well-defined statistical techniques for quantifying patterns of mortality within a cohort and identifying differences in age-specific mortality among cohorts are needed. Here I discuss using maximum likelihood (ML) statistical methods to estimate the parameters of mathematical models, which are used to describe the change in mortality with age. ML provides a convenient and powerful framework for choosing an adequate mortality model, estimating model parameters and testing hypotheses about differences in parameters among experimental or ecological treatments. Simulations suggest that experiments designed to estimate age-specific mortality should involve at least 100-500 individuals per cohort per treatment. Significant bias in the estimation of model parameters is introduced when the mortality model is misspecified and samples are too small to detect the true mortality pattern. Furthermore, the lack of simple and efficient procedures for comparing different mortality models has forced the use of the Gompertz model, which specifies an exponentially increasing mortality with age, and which may not apply to the majority of experimental systems.  相似文献   

12.
Zhu J  Eickhoff JC  Yan P 《Biometrics》2005,61(3):674-683
Observations of multiple-response variables across space and over time occur often in environmental and ecological studies. Compared to purely spatial models for a single response variable in the exponential family of distributions, fewer statistical tools are available for multiple-response variables that are not necessarily Gaussian. An exception is a common-factor model developed for multivariate spatial data by Wang and Wall (2003, Biostatistics 4, 569-582). The purpose of this article is to extend this multivariate space-only model and develop a flexible class of generalized linear latent variable models for multivariate spatial-temporal data. For statistical inference, maximum likelihood estimates and their standard deviations are obtained using a Monte Carlo EM algorithm. We also use a novel way to automatically adjust the Monte Carlo sample size, which facilitates the convergence of the Monte Carlo EM algorithm. The methodology is illustrated by an ecological study of red pine trees in response to bark beetle challenges in a forest stand of Wisconsin.  相似文献   

13.
14.
15.

Aim

To assess whether flexible species distribution models that perform well at nearby testing locations still perform strongly when evaluated on spatially separated testing data.

Location

Australian Wet Tropics (AWT), Ontario, Canada (CAN), north-east New South Wales, Australia (NSW), New Zealand (NZ), five countries of South America (SA), and Switzerland (SWI).

Time period

Most species data were collected between 1950 and 2000.

Major taxa studied

Birds, mammals, plants and reptiles.

Methods

We compared 10 species distribution modelling methods with varying flexibility in terms of the allowed complexity of their fitted functions [boosted regression trees (BRT), generalized additive model (GAM), multivariate adaptive regression splines (MARS), maximum entropy (MaxEnt), support vector machine (SVM), variants of generalized linear model (GLM) and random forest (RF), and an Ensemble model]. We used established practices for model selection to avoid overfitting, including parameter tuning in learning methods. Models were trained on presence–background data for 171 species and tested on presence–absence data. Training and testing data were separated using both random and spatial partitioning, the latter based on 75-km blocks. We calculated the average performance and mean rank of the methods (focussing on the area under the receiver operating characteristic and precision-recall gain curves, and correlation) and assessed the statistical significance of the differences between them.

Results

The ranking of methods did not change when evaluated on spatially separated testing data. Methods with the strongest predictive performance were nonparametric methods known to be flexible. An ensemble formed by averaging predictions of five pre-selected modelling methods was the best model in both random and spatial partitioning, followed by MaxEnt and a variant of random forest.

Main conclusions

Whilst some modellers expect methods limited to simple smooth functions to predict better spatially separated data, we found no evidence of that using blocks of 75 km. We conclude that flexible models that are tuned well enough to avoid overfitting are effective at predicting to spatially distinct areas.  相似文献   

16.
17.
Modeling functional data with spatially heterogeneous shape characteristics   总被引:1,自引:0,他引:1  
We propose a novel class of models for functional data exhibiting skewness or other shape characteristics that vary with spatial or temporal location. We use copulas so that the marginal distributions and the dependence structure can be modeled independently. Dependence is modeled with a Gaussian or t-copula, so that there is an underlying latent Gaussian process. We model the marginal distributions using the skew t family. The mean, variance, and shape parameters are modeled nonparametrically as functions of location. A computationally tractable inferential framework for estimating heterogeneous asymmetric or heavy-tailed marginal distributions is introduced. This framework provides a new set of tools for increasingly complex data collected in medical and public health studies. Our methods were motivated by and are illustrated with a state-of-the-art study of neuronal tracts in multiple sclerosis patients and healthy controls. Using the tools we have developed, we were able to find those locations along the tract most affected by the disease. However, our methods are general and highly relevant to many functional data sets. In addition to the application to one-dimensional tract profiles illustrated here, higher-dimensional extensions of the methodology could have direct applications to other biological data including functional and structural magnetic resonance imaging (MRI).  相似文献   

18.
This article examines group testing procedures where units within a group (or pool) may be correlated. The expected number of tests per unit (i.e., efficiency) of hierarchical- and matrix-based procedures is derived based on a class of models of exchangeable binary random variables. The effect on efficiency of the arrangement of correlated units within pools is then examined. In general, when correlated units are arranged in the same pool, the expected number of tests per unit decreases, sometimes substantially, relative to arrangements that ignore information about correlation.  相似文献   

19.
Simple analytical models assuming homogeneous space have been used to examine the effects of habitat loss and fragmentation on metapopulation size. The models predict an extinction threshold, a critical amount of suitable habitat below which the metapopulation goes deterministically extinct. The consequences of non-random loss of habitat for species with localized dispersal have been studied mainly numerically. In this paper, we present two analytical approaches to the study of habitat loss and its metapopulation dynamic consequences incorporating spatial correlation in both metapopulation dynamics as well as in the pattern of habitat destruction. One approach is based on a measure called metapopulation capacity, given by the dominant eigenvalue of a "landscape" matrix, which encapsulates the effects of landscape structure on population extinctions and colonizations. The other approach is based on pair approximation. These models allow us to examine analytically the effects of spatial structure in habitat loss on the equilibrium metapopulation size and the threshold condition for persistence. In contrast to the pair approximation based approaches, the metapopulation capacity based approach allows us to consider species with long as well as short dispersal range and landscapes with spatial correlation at different scales. The two methods make dissimilar assumptions, but the broad conclusions concerning the consequences of spatial correlation in the landscape structure are the same. Our results show that increasing correlation in the spatial arrangement of the remaining habitat increases patch occupancy, that this increase is more evident for species with short-range than long-range dispersal, and that to be most beneficial for metapopulation size, the range of spatial correlation in landscape structure should be at least a few times greater than the dispersal range of the species.  相似文献   

20.
Efficient measurement error correction with spatially misaligned data   总被引:1,自引:0,他引:1  
Association studies in environmental statistics often involve exposure and outcome data that are misaligned in space. A common strategy is to employ a spatial model such as universal kriging to predict exposures at locations with outcome data and then estimate a regression parameter of interest using the predicted exposures. This results in measurement error because the predicted exposures do not correspond exactly to the true values. We characterize the measurement error by decomposing it into Berkson-like and classical-like components. One correction approach is the parametric bootstrap, which is effective but computationally intensive since it requires solving a nonlinear optimization problem for the exposure model parameters in each bootstrap sample. We propose a less computationally intensive alternative termed the "parameter bootstrap" that only requires solving one nonlinear optimization problem, and we also compare bootstrap methods to other recently proposed methods. We illustrate our methodology in simulations and with publicly available data from the Environmental Protection Agency.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号