首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
Summary Spatial cluster detection is an important methodology for identifying regions with excessive numbers of adverse health events without making strong model assumptions on the underlying spatial dependence structure. Previous work has focused on point or individual‐level outcome data and few advances have been made when the outcome data are reported at an aggregated level, for example, at the county‐ or census‐tract level. This article proposes a new class of spatial cluster detection methods for point or aggregate data, comprising of continuous, binary, and count data. Compared with the existing spatial cluster detection methods it has the following advantages. First, it readily incorporates region‐specific weights, for example, based on a region's population or a region's outcome variance, which is the key for aggregate data. Second, the established general framework allows for area‐level and individual‐level covariate adjustment. A simulation study is conducted to evaluate the performance of the method. The proposed method is then applied to assess spatial clustering of high Body Mass Index in a health maintenance organization population in the Seattle, Washington, USA area.  相似文献   

Spatial cluster detection has become an important methodology in quantifying the effect of hazardous exposures. Previous methods have focused on cross‐sectional outcomes that are binary or continuous. There are virtually no spatial cluster detection methods proposed for longitudinal outcomes. This paper proposes a new spatial cluster detection method for repeated outcomes using cumulative geographic residuals. A major advantage of this method is its ability to readily incorporate information on study participants relocation, which most cluster detection statistics cannot. Application of these methods will be illustrated by the Home Allergens and Asthma prospective cohort study analyzing the relationship between environmental exposures and repeated measured outcome, occurrence of wheeze in the last 6 months, while taking into account mobile locations.  相似文献   

Although many methods have been proposed for analysing point locations for spatial pattern, previous methods have concentrated on clumping and spacing. The study of anisotropy (changes in spatial pattern with direction) in point patterns has been limited by lack of methods explicitly designed for these data and this purpose; researchers have been constrained to choosing arbitrary test directions or converting their data into quadrat counts and using methods designed for continuously distributed data. Wavelet analysis, a booming approach to studying spatial pattern, widely used in mathematics and physics for signal analysis, has started to make its way into the ecological literature. A simple adaptation of wavelet analysis is proposed for the detection of anisotropy in point patterns. The method is illustrated with both simulated and field data. This approach can easily be used for both global and local spatial analysis.  相似文献   

As a useful tool for geographical cluster detection of events, the spatial scan statistic is widely applied in many fields and plays an increasingly important role. The classic version of the spatial scan statistic for the binary outcome is developed by Kulldorff, based on the Bernoulli or the Poisson probability model. In this paper, we apply the Hypergeometric probability model to construct the likelihood function under the null hypothesis. Compared with existing methods, the likelihood function under the null hypothesis is an alternative and indirect method to identify the potential cluster, and the test statistic is the extreme value of the likelihood function. Similar with Kulldorff’s methods, we adopt Monte Carlo test for the test of significance. Both methods are applied for detecting spatial clusters of Japanese encephalitis in Sichuan province, China, in 2009, and the detected clusters are identical. Through a simulation to independent benchmark data, it is indicated that the test statistic based on the Hypergeometric model outweighs Kulldorff’s statistics for clusters of high population density or large size; otherwise Kulldorff’s statistics are superior.  相似文献   

Gangnon RE 《Biometrics》2012,68(1):174-182
The spatial scan statistic is an important and widely used tool for cluster detection. It is based on the simultaneous evaluation of the statistical significance of the maximum likelihood ratio test statistic over a large collection of potential clusters. In most cluster detection problems, there is variation in the extent of local multiplicity across the study region. For example, using a fixed maximum geographic radius for clusters, urban areas typically have many overlapping potential clusters, whereas rural areas have relatively few. The spatial scan statistic does not account for local multiplicity variation. We describe a previously proposed local multiplicity adjustment based on a nested Bonferroni correction and propose a novel adjustment based on a Gumbel distribution approximation to the distribution of a local scan statistic. We compare the performance of all three statistics in terms of power and a novel unbiased cluster detection criterion. These methods are then applied to the well-known New York leukemia dataset and a Wisconsin breast cancer incidence dataset.  相似文献   

Summary Identifying homogeneous groups of individuals is an important problem in population genetics. Recently, several methods have been proposed that exploit spatial information to improve clustering algorithms. In this article, we develop a Bayesian clustering algorithm based on the Dirichlet process prior that uses both genetic and spatial information to classify individuals into homogeneous clusters for further study. We study the performance of our method using a simulation study and use our model to cluster wolverines in Western Montana using microsatellite data.  相似文献   

Place cells, spatially responsive hippocampal cells, provide the neural substrate supporting navigation and spatial memory. Historically most studies of these neurons have used electrophysiological recordings from implanted electrodes but optical methods, measuring intracellular calcium, are becoming increasingly common. Several methods have been proposed as a means to identify place cells based on their calcium activity but there is no common standard and it is unclear how reliable different approaches are. Here we tested four methods that have previously been applied to two-photon hippocampal imaging or electrophysiological data, using both model datasets and real imaging data. These methods use different parameters to identify place cells, including the peak activity in the place field, compared to other locations (the Peak method); the stability of cells’ activity over repeated traversals of an environment (Stability method); a combination of these parameters with the size of the place field (Combination method); and the spatial information held by the cells (Information method). The methods performed differently from each other on both model and real data. In real datasets, vastly different numbers of place cells were identified using the four methods, with little overlap between the populations identified as place cells. Therefore, choice of place cell detection method dramatically affects the number and properties of identified cells. Ultimately, we recommend the Peak method be used in future studies to identify place cell populations, as this method is robust to moderate variations in place field within a session, and makes no inherent assumptions about the spatial information in place fields, unless there is an explicit theoretical reason for detecting cells with more narrowly defined properties.  相似文献   

Single Molecule Localization Microscopy techniques like PhotoActivated Localization Microscopy, with their sub-diffraction limit spatial resolution, have been popularly used to characterize the spatial organization of membrane proteins, by means of quantitative cluster analysis. However, such quantitative studies remain challenged by the techniques’ inherent sources of errors such as a limited detection efficiency of less than 60%, due to incomplete photo-conversion, and a limited localization precision in the range of 10 – 30nm, varying across the detected molecules, mainly depending on the number of photons collected from each. We provide analytical methods to estimate the effect of these errors in cluster analysis and to correct for them. These methods, based on the Ripley’s L(r) – r or Pair Correlation Function popularly used by the community, can facilitate potentially breakthrough results in quantitative biology by providing a more accurate and precise quantification of protein spatial organization.  相似文献   

A spatial scan statistic for multiple clusters   总被引:1,自引:0,他引:1  
Spatial scan statistics are commonly used for geographical disease surveillance and cluster detection. While there are multiple clusters coexisting in the study area, they become difficult to detect because of clusters’ shadowing effect to each other. The recently proposed sequential method showed its better power for detecting the second weaker cluster, but did not improve the ability of detecting the first stronger cluster which is more important than the second one. We propose a new extension of the spatial scan statistic which could be used to detect multiple clusters. Through constructing two or more clusters in the alternative hypothesis, our proposed method accounts for other coexisting clusters in the detecting and evaluating process. The performance of the proposed method is compared to the sequential method through an intensive simulation study, in which our proposed method shows better power in terms of both rejecting the null hypothesis and accurately detecting the coexisting clusters. In the real study of hand-foot-mouth disease data in Pingdu city, a true cluster town is successfully detected by our proposed method, which cannot be evaluated to be statistically significant by the standard method due to another cluster’s shadowing effect.  相似文献   

In observational studies, subjects are often nested within clusters. In medical studies, patients are often treated by doctors and therefore patients are regarded as nested or clustered within doctors. A concern that arises with clustered data is that cluster-level characteristics (e.g., characteristics of the doctor) are associated with both treatment selection and patient outcomes, resulting in cluster-level confounding. Measuring and modeling cluster attributes can be difficult and statistical methods exist to control for all unmeasured cluster characteristics. An assumption of these methods however is that characteristics of the cluster and the effects of those characteristics on the outcome (as well as probability of treatment assignment when using covariate balancing methods) are constant over time. In this paper, we consider methods that relax this assumption and allow for estimation of treatment effects in the presence of unmeasured time-dependent cluster confounding. The methods are based on matching with the propensity score and incorporate unmeasured time-specific cluster effects by performing matching within clusters or using fixed- or random-cluster effects in the propensity score model. The methods are illustrated using data to compare the effectiveness of two total hip devices with respect to survival of the device and a simulation study is performed that compares the proposed methods. One method that was found to perform well is matching within surgeon clusters partitioned by time. Considerations in implementing the proposed methods are discussed.  相似文献   

In surveillance studies of periodontal disease, the relationship between disease and other health and socioeconomic conditions is of key interest. To determine whether a patient has periodontal disease, multiple clinical measurements (eg, clinical attachment loss, alveolar bone loss, and tooth mobility) are taken at the tooth‐level. Researchers often create a composite outcome from these measurements or analyze each outcome separately. Moreover, patients have varying number of teeth, with those who are more prone to the disease having fewer teeth compared to those with good oral health. Such dependence between the outcome of interest and cluster size (number of teeth) is called informative cluster size and results obtained from fitting conventional marginal models can be biased. We propose a novel method to jointly analyze multiple correlated binary outcomes for clustered data with informative cluster size using the class of generalized estimating equations (GEE) with cluster‐specific weights. We compare our proposed multivariate outcome cluster‐weighted GEE results to those from the convectional GEE using the baseline data from Veterans Affairs Dental Longitudinal Study. In an extensive simulation study, we show that our proposed method yields estimates with minimal relative biases and excellent coverage probabilities.  相似文献   

Functional neuroimaging, including positron emission tomography (PET) and functional magnetic resonance imaging (fMRI), plays an important role in identifying specific brain regions associated with experimental stimuli or psychiatric disorders such as schizophrenia. PET and fMRI produce massive data sets that contain both temporal correlations from repeated scans and complex spatial correlations. Several methods exist for handling temporal correlations, some of which rely on transforming the response data to induce either a known or an independence covariance structure. Despite the presence of spatial correlations between the volume elements (voxels) comprising a brain scan, conventional methods perform voxel-by-voxel analyses of measured brain activity. We propose a two-stage spatio-temporal model for the estimation and testing of localized activity. Our second-stage model specifies a spatial auto-regression, capturing correlations within neural processing clusters defined by a data-driven cluster analysis. We use maximum likelihood methods to estimate parameters from our spatial autoregressive model. Our model protects against type-I errors, enables the detection of both localized and regional activations (including volume of interest effects), provides information on functional connectivity in the brain, and establishes a framework to produce spatially smoothed maps of distributed brain activity for each individual. We illustrate the application of our model using PET data from a study of working memory in individuals with schizophrenia.  相似文献   

In cluster detection of disease, the use of local cluster detection tests (CDTs) is current. These methods aim both at locating likely clusters and testing for their statistical significance. New or improved CDTs are regularly proposed to epidemiologists and must be subjected to performance assessment. Because location accuracy has to be considered, performance assessment goes beyond the raw estimation of type I or II errors. As no consensus exists for performance evaluations, heterogeneous methods are used, and therefore studies are rarely comparable. A global indicator of performance, which assesses both spatial accuracy and usual power, would facilitate the exploration of CDTs behaviour and help between-studies comparisons. The Tanimoto coefficient (TC) is a well-known measure of similarity that can assess location accuracy but only for one detected cluster. In a simulation study, performance is measured for many tests. From the TC, we here propose two statistics, the averaged TC and the cumulated TC, as indicators able to provide a global overview of CDTs performance for both usual power and location accuracy. We evidence the properties of these two indicators and the superiority of the cumulated TC to assess performance. We tested these indicators to conduct a systematic spatial assessment displayed through performance maps.  相似文献   

Local spatial autocorrelation in biological variables   总被引:2,自引:0,他引:2  
Spatial autocorrelation (SA) methods have recently been extended to include the detection of local spatial autocorrelation at individual sampling stations. We review the formulas for these statistics and report on the results of an extensive population-genetic simulation study we have published elsewhere to test the applicability of these methods in spatially distributed biological data. We find that most biological variables exhibit global SA, and that in such cases the methods proposed for testing the significance of local SA coefficients reject the null hypothesis excessively. When global SA is absent, permutational methods for testing significance yield reliable results. Although standard errors have been published for the local SA coefficients, their employment using an asymptotically normal approach leads to unreliable results; permutational methods are preferred. In addition to significance tests of suspected non-stationary localities, we can use these methods in an exploratory manner to find and identify hotspots (places with positive local SA) and coldspots (negative local SA) in a dataset. We illustrate the application of these methods in three biological examples from plant population biology, ecology and population genetics. The examples range from the study of single variables to the joint analysis of several variables and can lead to successful demographic and evolutionary inferences about the populations studied.  相似文献   

One goal of cluster analysis is to sort characteristics into groups (clusters) so that those in the same group are more highly correlated to each other than they are to those in other groups. An example is the search for groups of genes whose expression of RNA is correlated in a population of patients. These genes would be of greater interest if their common level of RNA expression were additionally predictive of the clinical outcome. This issue arose in the context of a study of trauma patients on whom RNA samples were available. The question of interest was whether there were groups of genes that were behaving similarly, and whether each gene in the cluster would have a similar effect on who would recover. For this, we develop an algorithm to simultaneously assign characteristics (genes) into groups of highly correlated genes that have the same effect on the outcome (recovery). We propose a random effects model where the genes within each group (cluster) equal the sum of a random effect, specific to the observation and cluster, and an independent error term. The outcome variable is a linear combination of the random effects of each cluster. To fit the model, we implement a Markov chain Monte Carlo algorithm based on the likelihood of the observed data. We evaluate the effect of including outcome in the model through simulation studies and describe a strategy for prediction. These methods are applied to trauma data from the Inflammation and Host Response to Injury research program, revealing a clustering of the genes that are informed by the recovery outcome.  相似文献   

Density-dependent processes are fundamental in the understanding of species population dynamics. Whereas the benefits of considering the spatial dimension in population biology are widely acknowledged, the implications of doing so for the statistical detection of spatial density dependence have not been examined. The outcome of traditional tests may therefore differ from those that include ecologically relevant locational information on both the prey species and natural enemy. Here, we explicitly incorporate spatial information on individual counts when testing for density dependence between an insect herbivore and its parasitoids. The spatially explicit approach used identified significant density dependence more frequently and in different instances than traditional methods. The form of density dependence detected also differed between methods. These results demonstrate that the explicit consideration of patch location in density-dependence analyses is likely to significantly alter current understanding of the prevalence and form of spatial density dependence in natural populations.  相似文献   

A vast literature has recently been concerned with the analysis of variation in disease counts recorded across geographical areas with the aim of detecting clusters of regions with homogeneous behavior. Most of the proposed modeling approaches have been discussed for the univariate case and only very recently spatial models have been extended to predict more than one outcome simultaneously. In this paper we extend the standard finite mixture models to the analysis of multiple, spatially correlated, counts. Dependence among outcomes is modeled using a set of correlated random effects and estimation is carried out by numerical integration through an EM algorithm without assuming any specific parametric distribution for the random effects. The spatial structure is captured by the use of a Gibbs representation for the prior probabilities of component membership through a Strauss‐like model. The proposed model is illustrated using real data (© 2009 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

Bayesian multimodel inference for geostatistical regression models   总被引:2,自引:0,他引:2  
Johnson DS  Hoeting JA 《PloS one》2011,6(11):e25677
The problem of simultaneous covariate selection and parameter inference for spatial regression models is considered. Previous research has shown that failure to take spatial correlation into account can influence the outcome of standard model selection methods. A Markov chain Monte Carlo (MCMC) method is investigated for the calculation of parameter estimates and posterior model probabilities for spatial regression models. The method can accommodate normal and non-normal response data and a large number of covariates. Thus the method is very flexible and can be used to fit spatial linear models, spatial linear mixed models, and spatial generalized linear mixed models (GLMMs). The Bayesian MCMC method also allows a priori unequal weighting of covariates, which is not possible with many model selection methods such as Akaike's information criterion (AIC). The proposed method is demonstrated on two data sets. The first is the whiptail lizard data set which has been previously analyzed by other researchers investigating model selection methods. Our results confirmed the previous analysis suggesting that sandy soil and ant abundance were strongly associated with lizard abundance. The second data set concerned pollution tolerant fish abundance in relation to several environmental factors. Results indicate that abundance is positively related to Strahler stream order and a habitat quality index. Abundance is negatively related to percent watershed disturbance.  相似文献   

Dynamic model-based clustering for time-course gene expression data   总被引:1,自引:0,他引:1  
Microarray technology has produced a huge body of time-course gene expression data. Such gene expression data has proved useful in genomic disease diagnosis and genomic drug design. The challenge is how to uncover useful information in such data. Cluster analysis has played an important role in analyzing gene expression data. Many distance/correlation- and static model-based clustering techniques have been applied to time-course expression data. However, these techniques are unable to account for the dynamics of such data. It is the dynamics that characterize the data and that should be considered in cluster analysis so as to obtain high quality clustering. This paper proposes a dynamic model-based clustering method for time-course gene expression data. The proposed method regards a time-course gene expression dataset as a set of time series, generated by a number of stochastic processes. Each stochastic process defines a cluster and is described by an autoregressive model. A relocation-iteration algorithm is proposed to identity the model parameters and posterior probabilities are employed to assign each gene to an appropriate cluster. A bootstrapping method and an average adjusted Rand index (AARI) are employed to measure the quality of clustering. Computational experiments are performed on a synthetic and three real time-course gene expression datasets to investigate the proposed method. The results show that our method allows the better quality clustering than other clustering methods (e.g. k-means) for time-course gene expression data, and thus it is a useful and powerful tool for analyzing time-course gene expression data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号