首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Gangnon RE  Clayton MK 《Biometrics》2000,56(3):922-935
Many current statistical methods for disease clustering studies are based on a hypothesis testing paradigm. These methods typically do not produce useful estimates of disease rates or cluster risks. In this paper, we develop a Bayesian procedure for drawing inferences about specific models for spatial clustering. The proposed methodology incorporates ideas from image analysis, from Bayesian model averaging, and from model selection. With our approach, we obtain estimates for disease rates and allow for greater flexibility in both the type of clusters and the number of clusters that may be considered. We illustrate the proposed procedure through simulation studies and an analysis of the well-known New York leukemia data.  相似文献   

2.
Tango [Biometrics 40:15 (1984)] proposed an index for detecting disease clustering in time applicable to grouped data obtained from a population that remains fairly stable over the study period. This index has received considerable attention in the literature including the suggestion that it be used to detect the space-time clustering of diseases and the suggestion to use similar test statistics to detect disease clustering in space and/or time while accounting for a changing population size over the study period. This paper concerns the related question of measuring the severity of the disease clustering once it has been determined that cases are not randomly distributed over space and/or time. A family of alternatives to randomness is proposed in which space and/or time versions of Tango's index are sufficient statistics for the parameters measuring the severity of the clustering. For the special case of temporal clustering, an unbiased estimator of the clustering parameter and its sampling variance is derived, and a particularly simple interpretation of this estimator is suggested. These latter results are based on some asymptotic approximations due to Tango [Biometrics 46:351 (1990)]. An application to the trisomy data given by Wallenstein [Am. J. Epidemiol. 111:367 (1980)] is discussed.  相似文献   

3.
Agrin induces discrete high-density patches of acetylcholine receptors (AChRs) and other synaptic components on cultured myotubes in a manner that resembles synaptic differentiation. Furthermore, agrin-like molecules are present at developing neuromuscular junctions in vivo. This provides us with a unique opportunity to manipulate AChR patching in order to examine the role of cytoskeletal components. Cultured chick myotubes were fixed and labeled to visualize the distributions of actin, alpha-actinin, filamin, tropomyosin, and vinculin. Overnight exposure to agrin caused a small amount of alpha-actinin, filamin, and vinculin to reorganize into discrete clusters. Double-labeling studies revealed that 78% of the AChR clusters were associated with detectable concentrations of filamin, 70% with alpha-actinin, and 58% with vinculin. Filamin even showed congruence to AChRs within clustered regions. By contrast, actin (visualized with fluorescein-phalloidin) and tropomyosin did not show specific associations with agrin-induced AChR clusters. The accumulation of cytoskeletal components at AChRs clusters raised the possibility that cytoskeletal rearrangements direct AChR clustering. However, a time course of agrin-induced clustering that focused on filamin revealed that most of the early AChR clusters (3-6 h) were not associated with detectable amounts of cytoskeletal material. The accumulation of cytoskeletal material at later times (12-18 h) may imply a role in maintenance and stabilization, but it appears unlikely that these cytoskeletal elements initiate AChR clustering on myotubes.  相似文献   

4.
Complex human diseases commonly differ in their phenotypic characteristics, e.g., Crohn’s disease (CD) patients are heterogeneous with regard to disease location and disease extent. The genetic susceptibility to Crohn’s disease is widely acknowledged and has been demonstrated by identification of over 100 CD associated genetic loci. However, relating CD subphenotypes to disease susceptible loci has proven to be a difficult task. In this paper we discuss the use of cluster analysis on genetic markers to identify genetic-based subgroups while taking into account possible confounding by population stratification. We show that it is highly relevant to consider the confounding nature of population stratification in order to avoid that detected clusters are strongly related to population groups instead of disease-specific groups. Therefore, we explain the use of principal components to correct for population stratification while clustering affected individuals into genetic-based subgroups. The principal components are obtained using 30 ancestry informative markers (AIM), and the first two PCs are determined to discriminate between continental origins of the affected individuals. Genotypes on 51 CD associated single nucleotide polymorphisms (SNPs) are used to perform latent class analysis, hierarchical and Partitioning Around Medoids (PAM) cluster analysis within a sample of affected individuals with and without the use of principal components to adjust for population stratification. It is seen that without correction for population stratification clusters seem to be influenced by population stratification while with correction clusters are unrelated to continental origin of individuals.  相似文献   

5.
Cancer is a complex genetic disease, resulting from defects of multiple genes. Development of microarray techniques makes it possible to survey the whole genome and detect genes that have influential impacts on the progression of cancer. Statistical analysis of cancer microarray data is challenging because of the high dimensionality and cluster nature of gene expressions. Here, clusters are composed of genes with coordinated pathological functions and/or correlated expressions. In this article, we consider cancer studies where censored survival endpoint is measured along with microarray gene expressions. We propose a hybrid clustering approach, which uses both pathological pathway information retrieved from KEGG and statistical correlations of gene expressions, to construct gene clusters. Cancer survival time is modeled as a linear function of gene expressions. We adopt the clustering threshold gradient directed regularization (CTGDR) method for simultaneous gene cluster selection, within-cluster gene selection, and predictive model building. Analysis of two lymphoma studies shows that the proposed approach - which is composed of the hybrid gene clustering, linear regression model for survival, and clustering regularized estimation with CTGDR - can effectively identify gene clusters and genes within selected clusters that have satisfactory predictive power for censored cancer survival outcomes.  相似文献   

6.
The formation of dynamical clusters of proteins is ubiquitous in cellular membranes and is in part regulated by the recycling of membrane components. We show, using stochastic simulations and analytic modeling, that the out-of-equilibrium cluster size distribution of membrane components undergoing continuous recycling is strongly influenced by lateral confinement. This result has significant implications for the clustering of plasma membrane proteins whose mobility is hindered by cytoskeletal “corrals” and for protein clustering in cellular organelles of limited size that generically support material fluxes. We show how the confinement size can be sensed through its effect on the size distribution of clusters of membrane heterogeneities and propose that this could be regulated to control the efficiency of membrane-bound reactions. To illustrate this, we study a chain of enzymatic reactions sensitive to membrane protein clustering. The reaction efficiency is found to be a non-monotonic function of the system size, and can be optimal for sizes comparable to those of cellular organelles.  相似文献   

7.
Pei-Sheng Lin  Jun Zhu 《Biometrics》2020,76(2):403-413
Mapping of disease incidence has long been of importance to epidemiology and public health. In this paper, we consider identification of clusters of spatial units with elevated disease rates and develop a new approach that estimates the relative disease risk in association with potential risk factors and simultaneously identifies clusters corresponding to elevated risks. A heterogeneity measure is proposed to enable the comparison of a candidate cluster and its complement under a pair of complementary models. A quasi-likelihood procedure is developed for estimating the model parameters and identifying the clusters. An advantage of our approach over traditional spatial clustering methods is the identification of clusters that can have arbitrary shapes due to abrupt or noncontiguous changes while accounting for risk factors and spatial correlation. Asymptotic properties of the proposed methodology are established and a simulation study shows empirically sound finite-sample properties. The mapping and clustering of enterovirus 71 infections in Taiwan are carried out for illustration.  相似文献   

8.
Agrin induces discrete high-density patches of acetylcholine receptors (AChRs) and other synaptic components on cultured myotubes in a manner that resembles synaptic differentiation. Furthermore, agrin-like molecules are present at developing neuromuscular junctions in vivo. This provides us with a unique opportunity to manipulate AChR patching in order to examine the role of cytoskeletal components. Cultured chick myotubes were fixed and labeled to visualize the distributions of actin, α-actinin, filamin, tropomyosin, and vinculin. Overnight exposure to agrin caused a small amount of α-actinin, filamin, and vinculin to reorganize into discrete clusters. Double-labeling studies revealed that 78% of the AChR clusters were associated with detectable concentrations of filamin, 70% with α-actinin, and 58% with vinculin. Filamin even showed congruence to AChRs within clustered regions. By contrast, actin (visualized with fluorescein-phalloidin) and tropomyosin did not show specific associations with agrin-induced AChR clusters. The accumulation of cytoskeletal components at AChRs clusters raised the possibility that cytoskeletal rearrangements direct AChR clustering. However, a time course of agrin-induced clustering that focused on filamin revealed that most of the early AChR clusters (3–6 h) were not associated with detectable amounts of cytoskeletal material. The accumulation of cytoskeletal material at later times (12–18 h) may imply a role in maintenance and stabilization, but it appears unlikely that these cytoskeletal elements initiate AChR clustering on myotubes.  相似文献   

9.
Lin X  Carroll RJ 《Biometrics》1999,55(2):613-619
In the analysis of clustered data with covariates measured with error, a problem of common interest is to test for correlation within clusters and heterogeneity across clusters. We examined this problem in the framework of generalized linear mixed measurement error models. We propose using the simulation extrapolation (SIMEX) method to construct a score test for the null hypothesis that all variance components are zero. A key feature of this SIMEX score test is that no assumptions need to be made regarding the distributions of the random effects and the unobserved covariates. We illustrate this test by analyzing Framingham heart disease data and evaluate its performance by simulation. We also propose individual SIMEX score tests for testing the variance components separately. Both tests can be easily implemented using existing statistical software.  相似文献   

10.
Growing public awareness of environmental hazards has led to an increased demand for public health authorities to investigate geographical clustering of diseases. Although such cluster analysis is nearly always ineffective in identifying causes of disease, it often has to be used to address public concern about environmental hazards. Interpreting the resulting data is not straightforward, however, and this paper presents a guide for the non-specialist. The pitfalls include the fact that cluster analyses are usually done post hoc, and not as a result of a prior hypothesis. This is particularly true for investigations prompted by reported clusters, which have the inherent danger of overestimating the disease rate through "boundary shrinkage" of the population from which the cases are assumed to have arisen. In disease surveillance the problem of making multiple comparisons can be overcome by testing for clustering and autocorrelation. When rates of disease are illustrated in disease maps undue focus on areas where random fluctuation is greatest can be minimised by smoothing techniques. Despite the fact that cluster analyses rarely prove fruitful in identifying causation, they may-like single case reports-have the potential to generate new knowledge.  相似文献   

11.
The conventional approach of candidate gene studies in complex diseases is to look at the effect of one gene at a time. However, as the outcome of chronic diseases is influenced by a large number of alleles, simultaneous analysis is needed. We demonstrate the application of multivariate regression and cluster analysis to a multiple sclerosis (MS) dataset with genotypes for 489 patients at 11 candidate genes selected on their involvement in the immune response. Using multivariate regression, we observed that different sets of genes were associated with different disease characteristics that reflect different aspects of disease. Out of 15 polymorphisms, we identified one that contributed to the severity of disease. In addition, the set of 15 polymorphisms was predictive for yearly increase in lesion volume as seen on T1-weighted MRI (p=0.044). From this set, no individual polymorphisms could be identified after adjustment for multiple hypotheses testing. By means of a cluster analysis, we aimed to identify subgroups of patients with different pathogenic subtypes of MS on the basis of their genetic profile. We constructed genetic profiles from the genotypes at the 11 candidate genes. The approach proved to be feasible. We observed three clusters in the sample of patients. In this study, we observed no significant differences in the usual clinical and MRI outcome measures between the different clusters. However, a number of consistent trends indicated that this clustering might be related to the course of disease. With a larger number of genes regulating the course of disease, we may be able to identify clinically relevant clusters. The analyses are easily implemented and will be applicable to candidate gene studies of complex traits in general.  相似文献   

12.

Background

Cancer is a heterogeneous disease caused by genomic aberrations and characterized by significant variability in clinical outcomes and response to therapies. Several subtypes of common cancers have been identified based on alterations of individual cancer genes, such as HER2, EGFR, and others. However, cancer is a complex disease driven by the interaction of multiple genes, so the copy number status of individual genes is not sufficient to define cancer subtypes and predict responses to treatments. A classification based on genome-wide copy number patterns would be better suited for this purpose.

Method

To develop a more comprehensive cancer taxonomy based on genome-wide patterns of copy number abnormalities, we designed an unsupervised classification algorithm that identifies genomic subgroups of tumors. This algorithm is based on a modified genomic Non-negative Matrix Factorization (gNMF) algorithm and includes several additional components, namely a pilot hierarchical clustering procedure to determine the number of clusters, a multiple random initiation scheme, a new stop criterion for the core gNMF, as well as a 10-fold cross-validation stability test for quality assessment.

Result

We applied our algorithm to identify genomic subgroups of three major cancer types: non-small cell lung carcinoma (NSCLC), colorectal cancer (CRC), and malignant melanoma. High-density SNP array datasets for patient tumors and established cell lines were used to define genomic subclasses of the diseases and identify cell lines representative of each genomic subtype. The algorithm was compared with several traditional clustering methods and showed improved performance. To validate our genomic taxonomy of NSCLC, we correlated the genomic classification with disease outcomes. Overall survival time and time to recurrence were shown to differ significantly between the genomic subtypes.

Conclusions

We developed an algorithm for cancer classification based on genome-wide patterns of copy number aberrations and demonstrated its superiority to existing clustering methods. The algorithm was applied to define genomic subgroups of three cancer types and identify cell lines representative of these subgroups. Our data enabled the assembly of representative cell line panels for testing drug candidates.  相似文献   

13.

Introduction

The pathology of ankylosing spondylitis (AS) suggests that certain cytokines and matrix metalloproteinases (MMPs) might provide useful markers of disease activity. Serum levels of some cytokines and MMPs have been found to be elevated in active disease, but there is a general lack of information about biomarker profiles in AS and how these are related to disease activity and function. The purpose of this study was to investigate whether clinical measures of disease activity and function in AS are associated with particular profiles of circulating cytokines and MMPs.

Methods

Measurement of 30 cytokines, five MMPs and four tissue inhibitors of metalloproteinases was carried out using Luminex® technology on a well-characterised population of AS patients (n = 157). The relationship between biomarker levels and measures of disease activity (Bath ankylosing spondylitis disease activity index (BASDAI)), function (Bath ankylosing spondylitis functional index) and global health (Bath ankylosing spondylitis global health) was investigated. Principal component analysis was used to reduce the large number of biomarkers to a smaller set of independent components, which were investigated for their association with clinical measures. Further analyses were carried out using hierarchical clustering, multiple regression or multivariate logistic regression.

Results

Principal component analysis identified eight clusters consisting of various combinations of cytokines and MMPs. The strongest association with the BASDAI was found with a component consisting of MMP-8, MMP-9, hepatocyte growth factor and CXCL8, and was independent of C-reactive protein levels. This component was also associated with current smoking. Hierarchical clustering revealed two distinct patient clusters that could be separated on the basis of MMP levels. The high MMP cluster was associated with increased C-reactive protein, the BASDAI and the Bath ankylosing spondylitis functional index.

Conclusions

A profile consisting of high levels of MMP-8, MMP-9, hepatocyte growth factor and CXCL8 is associated with increased disease activity in AS. High MMP levels are also associated with smoking and worse function in AS.  相似文献   

14.

Background  

Clustering techniques are routinely used in gene expression data analysis to organize the massive data. Clustering techniques arrange a large number of genes or assays into a few clusters while maximizing the intra-cluster similarity and inter-cluster separation. While clustering of genes facilitates learning the functions of un-characterized genes using their association with known genes, clustering of assays reveals the disease stages and subtypes. Many clustering algorithms require the user to specify the number of clusters a priori. A wrong specification of number of clusters generally leads to either failure to detect novel clusters (disease subtypes) or unnecessary splitting of natural clusters.  相似文献   

15.
The ProtoNet site provides an automatic hierarchical clustering of the SWISS-PROT protein database. The clustering is based on an all-against-all BLAST similarity search. The similarities' E-score is used to perform a continuous bottom-up clustering process by applying alternative rules for merging clusters. The outcome of this clustering process is a classification of the input proteins into a hierarchy of clusters of varying degrees of granularity. ProtoNet (version 1.3) is accessible in the form of an interactive web site at http://www.protonet.cs.huji.ac.il. ProtoNet provides navigation tools for monitoring the clustering process with a vertical and horizontal view. Each cluster at any level of the hierarchy is assigned with a statistical index, indicating the level of purity based on biological keywords such as those provided by SWISS-PROT and InterPro. ProtoNet can be used for function prediction, for defining superfamilies and subfamilies and for large-scale protein annotation purposes.  相似文献   

16.
Current clustering methods are routinely applied to gene expressiontime course data to find genes with similar activation patternsand ultimately to understand the dynamics of biological processes.As the dynamic unfolding of a biological process often involvesthe activation of genes at different rates, successful clusteringin this context requires dealing with varying time and shapepatterns simultaneously. This motivates the combination of anovel pairwise warping with a suitable clustering method todiscover expression shape clusters. We develop a novel clusteringmethod that combines an initial pairwise curve alignment toadjust for time variation within likely clusters. The cluster-specifictime synchronization method shows excellent performance overstandard clustering methods in terms of cluster quality measuresin simulations and for yeast and human fibroblast data sets.In the yeast example, the discovered clusters have high concordancewith the known biological processes.  相似文献   

17.
Previously, we observed that without using prior information about individual sampling locations, a clustering algorithm applied to multilocus genotypes from worldwide human populations produced genetic clusters largely coincident with major geographic regions. It has been argued, however, that the degree of clustering is diminished by use of samples with greater uniformity in geographic distribution, and that the clusters we identified were a consequence of uneven sampling along genetic clines. Expanding our earlier dataset from 377 to 993 markers, we systematically examine the influence of several study design variables—sample size, number of loci, number of clusters, assumptions about correlations in allele frequencies across populations, and the geographic dispersion of the sample—on the “clusteredness” of individuals. With all other variables held constant, geographic dispersion is seen to have comparatively little effect on the degree of clustering. Examination of the relationship between genetic and geographic distance supports a view in which the clusters arise not as an artifact of the sampling scheme, but from small discontinuous jumps in genetic distance for most population pairs on opposite sides of geographic barriers, in comparison with genetic distance for pairs on the same side. Thus, analysis of the 993-locus dataset corroborates our earlier results: if enough markers are used with a sufficiently large worldwide sample, individuals can be partitioned into genetic clusters that match major geographic subdivisions of the globe, with some individuals from intermediate geographic locations having mixed membership in the clusters that correspond to neighboring regions.  相似文献   

18.
Traditional k-means and most k-means variants are still computationally expensive for large datasets, such as microarray data, which have large datasets with large dimension size d. In k-means clustering, we are given a set of n data points in d-dimensional space Rd and an integer k. The problem is to determine a set of k points in Rd, called centers, so as to minimize the mean squared distance from each data point to its nearest center. In this work, we develop a novel k-means algorithm, which is simple but more efficient than the traditional k-means and the recent enhanced k-means. Our new algorithm is based on the recently established relationship between principal component analysis and the k-means clustering. We provided the correctness proof for this algorithm. Results obtained from testing the algorithm on three biological data and six non-biological data (three of these data are real, while the other three are simulated) also indicate that our algorithm is empirically faster than other known k-means algorithms. We assessed the quality of our algorithm clusters against the clusters of a known structure using the Hubert-Arabie Adjusted Rand index (ARIHA). We found that when k is close to d, the quality is good (ARIHA>0.8) and when k is not close to d, the quality of our new k-means algorithm is excellent (ARIHA>0.9). In this paper, emphases are on the reduction of the time requirement of the k-means algorithm and its application to microarray data due to the desire to create a tool for clustering and malaria research. However, the new clustering algorithm can be used for other clustering needs as long as an appropriate measure of distance between the centroids and the members is used. This has been demonstrated in this work on six non-biological data.  相似文献   

19.

Background  

Uncovering subtypes of disease from microarray samples has important clinical implications such as survival time and sensitivity of individual patients to specific therapies. Unsupervised clustering methods have been used to classify this type of data. However, most existing methods focus on clusters with compact shapes and do not reflect the geometric complexity of the high dimensional microarray clusters, which limits their performance.  相似文献   

20.

Background

In case control studies disease risk not explained by the significant risk factors is the unexplained risk. Considering unexplained risk for specific populations, places and times can reveal the signature of unidentified risk factors and risk factors not fully accounted for in the case-control study. This potentially can lead to new hypotheses regarding disease causation.

Methods

Global, local and focused Q-statistics are applied to data from a population-based case-control study of 11 southeast Michigan counties. Analyses were conducted using both year- and age-based measures of time. The analyses were adjusted for arsenic exposure, education, smoking, family history of bladder cancer, occupational exposure to bladder cancer carcinogens, age, gender, and race.

Results

Significant global clustering of cases was not found. Such a finding would indicate large-scale clustering of cases relative to controls through time. However, highly significant local clusters were found in Ingham County near Lansing, in Oakland County, and in the City of Jackson, Michigan. The Jackson City cluster was observed in working-ages and is thus consistent with occupational causes. The Ingham County cluster persists over time, suggesting a broad-based geographically defined exposure. Focused clusters were found for 20 industrial sites engaged in manufacturing activities associated with known or suspected bladder cancer carcinogens. Set-based tests that adjusted for multiple testing were not significant, although local clusters persisted through time and temporal trends in probability of local tests were observed.

Conclusion

Q analyses provide a powerful tool for unpacking unexplained disease risk from case-control studies. This is particularly useful when the effect of risk factors varies spatially, through time, or through both space and time. For bladder cancer in Michigan, the next step is to investigate causal hypotheses that may explain the excess bladder cancer risk localized to areas of Oakland and Ingham counties, and to the City of Jackson.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号