首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 10 毫秒
1.
Clustering analysis is a promising data-driven method for the analysis of functional magnetic resonance imaging (fMRI) data. The huge computation load, however, makes it difficult for the practical use. We use affinity propagation clustering (APC), a new clustering algorithm especially for large data sets to detect brain functional activation from fMRI. It considers all data points as possible exemplars through the minimisation of an energy function and message-passing architecture, and obtains the optimal set of exemplars and their corresponding clusters. Four simulation studies and three in vivo fMRI studies reveal that brain functional activation can be effectively detected and that different response patterns can be distinguished using this method. Our results demonstrate that APC is superior to the k-centres clustering, as revealed by their performance measures in the weighted Jaccard coefficient and average squared error. These results suggest that the proposed APC will be useful in detecting brain functional activation from fMRI data.  相似文献   

2.
Unsupervised clustering represents a powerful technique for self-organized segmentation of biomedical image time series data describing groups of pixels exhibiting similar properties of local signal dynamics. The theoretical background is presented in the beginning, followed by several medical applications demonstrating the flexibility and conceptual power of these techniques. These applications range from functional MRI data analysis to dynamic contrast-enhanced perfusion MRI and breast MRI. For fMRI, these methods can be employed to identify and separate time courses of interest, along with their associated spatial patterns. When applied to dynamic perfusion MRI, they identify groups of voxels associated with time courses that are clinically informative and straightforward to interpret. In breast MRI, a segmentation of the lesion is achieved and in addition a subclassification is obtained within the lesion with regard to regions characterized by different MRI signal time courses. In the present paper, we conclude that unsupervised clustering techniques provide a robust method for blind analysis of time series image data in the important and current field of functional and dynamic MRI.  相似文献   

3.
基于时间聚类分析和独立成分分析的癫痫fMRI盲分析方法   总被引:3,自引:0,他引:3  
提出了一种基于时间聚类分析和独立成分分析的癫痫fMRI数据盲分析方法,并将两种方法有效联合,提取发作间期的癫痫fMRI激活时空信息.该方法首先由时间聚类分析得到与激活相关的时间峰度特征曲线,以此特征作为时间参考信息;再由空间独立成分分析分解fMRI信号得到空间独立成分;最后将每个独立成分所对应的时间曲线与参考曲线做相关分析提取相应脑激活图.提出的方法无需任何关于癫痫fMRI的先验假设信息,有效解决了独立成分的排序问题,实现了对数据的盲分析.仿真试验结果阐明了这一方法的有效性及可靠性,对癫痫数据的试验结果显示空间定位准确性优于统计参数图方法.  相似文献   

4.
This paper considers the clustering problem of physical step count data recorded on wearable devices. Clustering step data give an insight into an individual's activity status and further provide the groundwork for health‐related policies. However, classical methods, such as K‐means clustering and hierarchical clustering, are not suitable for step count data that are typically high‐dimensional and zero‐inflated. This paper presents a new clustering method for step data based on a novel combination of ensemble clustering and binning. We first construct multiple sets of binned data by changing the size and starting position of the bin, and then merge the clustering results from the binned data using a voting method. The advantage of binning, as a critical component, is that it substantially reduces the dimension of the original data while preserving the essential characteristics of the data. As a result, combining clustering results from multiple binned data can provide an improved clustering result that reflects both local and global structures of the data. Simulation studies and real data analysis were carried out to evaluate the empirical performance of the proposed method and demonstrate its general utility.  相似文献   

5.
Styrene is a widely used bulk chemical produced by dehydrogenation of ethylbenzene (EB). Purification of styrene to contain < 100 ppm EB is not cost-effective by conventional separation methods. One separation method is extractive distillation with an ionic liquid (IL) as a binding agent for one of the components, thereby lowering the vapour pressure of this component. In this study, using quantum density functional theory (DFT), we have simulated 22 IL anion–cation pairs, styrene and EB affinities to them, and ion-pair dimer affinities of the ILs. These are compared with experimental liquid–liquid equilibrium studies of M.T.G. Jongmans, B. Schuur, and A.B. de Haan, Ind. Eng. Chem. Res. 50 (2011), pp. 10800–10810. It is shown that experimental selectivity and distribution coefficients of styrene and EB in the ILs are related to computed gas phase anion–cation stabilisation energies and ion-pair–ion-pair dimer affinities. The inverse of molar volume is found to strongly correlate with the selectivity. The computational results also qualitatively correlate with molar volume, and consequently, it is possible to use DFT calculations as a qualitative prediction tool in screening of ILs for this separation process. This tool does not account for effects caused by long alkyl chains, as the length does not seem to affect dimer stabilisation energy beyond ethyl group.  相似文献   

6.
Serban N  Jiang H 《Biometrics》2012,68(3):805-814
Summary In this article, we investigate clustering methods for multilevel functional data, which consist of repeated random functions observed for a large number of units (e.g., genes) at multiple subunits (e.g., bacteria types). To describe the within- and between variability induced by the hierarchical structure in the data, we take a multilevel functional principal component analysis (MFPCA) approach. We develop and compare a hard clustering method applied to the scores derived from the MFPCA and a soft clustering method using an MFPCA decomposition. In a simulation study, we assess the estimation accuracy of the clustering membership and the cluster patterns under a series of settings: small versus moderate number of time points; various noise levels; and varying number of subunits per unit. We demonstrate the applicability of the clustering analysis to a real data set consisting of expression profiles from genes activated by immunity system cells. Prevalent response patterns are identified by clustering the expression profiles using our multilevel clustering analysis.  相似文献   

7.
Qin LX  Self SG 《Biometrics》2006,62(2):526-533
Identification of differentially expressed genes and clustering of genes are two important and complementary objectives addressed with gene expression data. For the differential expression question, many "per-gene" analytic methods have been proposed. These methods can generally be characterized as using a regression function to independently model the observations for each gene; various adjustments for multiplicity are then used to interpret the statistical significance of these per-gene regression models over the collection of genes analyzed. Motivated by this common structure of per-gene models, we proposed a new model-based clustering method--the clustering of regression models method, which groups genes that share a similar relationship to the covariate(s). This method provides a unified approach for a family of clustering procedures and can be applied for data collected with various experimental designs. In addition, when combined with per-gene methods for assessing differential expression that employ the same regression modeling structure, an integrated framework for the analysis of microarray data is obtained. The proposed methodology was applied to two microarray data sets, one from a breast cancer study and the other from a yeast cell cycle study.  相似文献   

8.
Metabolomics and other omics tools are generally characterized by large data sets with many variables obtained under different environmental conditions. Clustering methods and more specifically two-mode clustering methods are excellent tools for analyzing this type of data. Two-mode clustering methods allow for analysis of the behavior of subsets of metabolites under different experimental conditions. In addition, the results are easily visualized. In this paper we introduce a two-mode clustering method based on a genetic algorithm that uses a criterion that searches for homogeneous clusters. Furthermore we introduce a cluster stability criterion to validate the clusters and we provide an extended knee plot to select the optimal number of clusters in both experimental and metabolite modes. The genetic algorithm-based two-mode clustering gave biological relevant results when it was applied to two real life metabolomics data sets. It was, for instance, able to identify a catabolic pathway for growth on several of the carbon sources. Electronic supplementary material The online version of this article (doi:) contains supplementary material, which is available to authorized users. J. A. Hageman and R. A. van den Berg contributed equally to this paper.  相似文献   

9.
One goal of precise oncology is to re-classify cancer based on molecular features rather than its tissue origin. Integrative clustering of large-scale multi-omics data is an important way for molecule-based cancer classification. The data heterogeneity and the complexity of inter-omics variations are two major challenges for the integrative clustering analysis. According to the different strategies to deal with these difficulties, we summarized the clustering methods as three major categories: direct integrative clustering, clustering of clusters and regulatory integrative clustering. A few practical considerations on data pre-processing, post-clustering analysis and pathway-based analysis are also discussed.  相似文献   

10.
Phytoplankton functional traits can represent particular environmental conditions in complex aquatic ecosystems. Categorizing phytoplankton species into functional groups is challenging and time‐consuming, and requires high‐level expertise in species autecology. In this study, we introduced an affinity analysis to aid the identification of candidate associations of phytoplankton from two data sets comprised of phytoplankton and environmental information. In the Huaihe River Basin with a drainage area of 270,000 km2 in China, samples were collected from 217 selected sites during the low‐water period in May 2013; monthly samples were collected during 2006–2011 in a man‐made pond, Dishui Lake. Our results indicated that the affinity analysis can be used to define some meaningful functional groups. The identified phytoplankton associations reflect the ecological preferences of phytoplankton in terms of light and nutrient acquisition. Advantages and disadvantages of applying the affinity analysis to identify phytoplankton associations are discussed with perspectives on their utility in ecological assessment.  相似文献   

11.
Gene function annotation remains a key challenge in modern biology. This is especially true for high-throughput techniques such as gene expression experiments. Vital information about genes is available electronically from biomedical literature in the form of full texts and abstracts. In addition, various publicly available databases (such as GenBank, Gene Ontology and Entrez) provide access to gene-related information at different levels of biological organization, granularity and data format. This information is being used to assess and interpret the results from high-throughput experiments. To improve keyword extraction for annotational clustering and other types of analyses, we have developed a novel text mining approach, which is based on keywords identified at the level of gene annotation sentences (in particular sentences characterizing biological function) instead of entire abstracts. Further, to improve the expressiveness and usefulness of gene annotation terms, we investigated the combination of sentence-level keywords with terms from the Medical Subject Headings (MeSH) and Gene Ontology (GO) resources. We find that sentence-level keywords combined with MeSH terms outperforms the typical 'baseline' set-up (term frequencies at the level of abstracts) by a significant margin, whereas the addition of GO terms improves matters only marginally. We validated our approach on the basis of a manually annotated corpus of 200 abstracts generated on the basis of 2 cancer categories and 10 genes per category. We applied the method in the context of three sets of differentially expressed genes obtained from pediatric brain tumor samples. This analysis suggests novel interpretations of discovered gene expression patterns.  相似文献   

12.
Summary Multivariate analysis of plant community data has three goals: summarization of redundancy, identification of outliers, and elueidation of relationships. The first two are handled conveniently by initial fast clustering, and the third by subsequent ordination and hierarchical clustering, and perhaps table arrangement.Initial clustering algorithms should achieve withincluster homogeneity and require minimal computer resources. However, algorithmic uniqueness and a hierarchy are not needed. Computing time should be proportional to the amount of data, with no higher dependencies on the number of samples. A method is presented here meeting these requirements, called composite clustering and implemented in a FORTRAN program called COMPCLUS. The computer time required for COMPCLUS clustering is on the order of the time required merely to read the data, regardless of the number of samples.Several large field data sets were analyzed effectively by using COMPCLUS to reduce redundancy and identify outliers, and then ordinating the resulting composite clusters by detrended correspondence analysis (DECORANA). Various clusterings of the same data set can be compared using a percent mutual matches (PMM) index, and a matrix of such values can be ordinated for simultaneous comparison of a number of clusterings.This paper benefited at many points from discussions with Mark O. Hill and Robert H. Whittaker. Mark Hill suggested condensed data storage. This work was done under a National Science Foundation grant to Robert Whittaker. I also appreciate technical assistance from Timothy F. Mason and Steven B. Singer.  相似文献   

13.
Clustering of microarray gene expression data is performed routinely, for genes as well as for samples. Clustering of genes can exhibit functional relationships between genes; clustering of samples on the other hand is important for finding e.g. disease subtypes, relevant patient groups for stratification or related treatments. Usually this is done by first filtering the genes for high-variance under the assumption that they carry most of the information needed for separating different sample groups. If this assumption is violated, important groupings in the data might be lost. Furthermore, classical clustering methods do not facilitate the biological interpretation of the results. Therefore, we propose to methodologically integrate the clustering algorithm with prior biological information. This is different from other approaches as knowledge about classes of genes can be directly used to ease the interpretation of the results and possibly boost clustering performance. Our approach computes dendrograms that resemble decision trees with gene classes used to split the data at each node which can help to find biologically meaningful differences between the sample groups. We have tested the proposed method both on simulated and real data and conclude its usefulness as a complementary method, especially when assumptions of few differentially expressed genes along with an informative mapping of genes to different classes are met.  相似文献   

14.
Functional magnetic resonance imaging (fMRI) data-processing methods in the time domain include correlation analysis and the general linear model, among others. Virtually, many fMRI processing strategies utilise temporal information and ignore or pay little attention to phase information, resulting in an unnecessary loss of efficiency. We proposed a novel method named Hilbert phase entropy imaging (HPEI) that used the discrete Hilbert transform of the magnitude time series to detect brain functional activation. The data from two simulation studies and two in vivo fMRI studies that both contained block-design and event-related experiments revealed that the HPEI method enabled the effective detection of brain functional activation and the distinction of different response patterns. Our results demonstrate that this method is useful as a complementary analysis, but hypothesis-constrained, in revealing additional information regarding the complex nature of fMRI time series.  相似文献   

15.
Analysis of large-scale gene expression data.   总被引:10,自引:0,他引:10  
DNA microarray technology has resulted in the generation of large complex data sets, such that the bottleneck in biological investigation has shifted from data generation, to data analysis. This review discusses some of the algorithms and tools for the analysis and organisation of microarray expression data, including clustering methods, partitioning methods, and methods for correlating expression data to other biological data.  相似文献   

16.
Graphical models play an important role in neuroscience studies, particularly in brain connectivity analysis. Typically, observations/samples are from several heterogenous groups and the group membership of each observation/sample is unavailable, which poses a great challenge for graph structure learning. In this paper, we propose a method which can achieve Simultaneous Clustering and Estimation of Heterogeneous Graphs (briefly denoted as SCEHG) for matrix-variate functional magnetic resonance imaging (fMRI) data. Unlike the conventional clustering methods which rely on the mean differences of various groups, the proposed SCEHG method fully exploits the group differences of conditional dependence relationships among brain regions for learning cluster structure. In essence, by constructing individual-level between-region network measures, we formulate clustering as penalized regression with grouping and sparsity pursuit, which transforms the unsupervised learning into supervised learning. A modified difference of convex programming with the alternating direction method of multipliers (DC-ADMM) algorithm is proposed to solve the corresponding optimization problem. We also propose a generalized criterion to specify the number of clusters. Extensive simulation studies illustrate the superiority of the SCEHG method over some state-of-the-art methods in terms of both clustering and graph recovery accuracy. We also apply the SCEHG procedure to analyze fMRI data associated with attention-deficit hyperactivity disorder (ADHD), which illustrates its empirical usefulness.  相似文献   

17.
基于fMRI的屈光参差性弱视静息视觉网络的研究   总被引:2,自引:1,他引:1  
利用静息功能磁共振成像技术,对屈光参差性弱视(anisometropic amblyopia)患者静息态视觉网络进行研究,分析此类患者大脑视觉皮层功能受到的影响。采用独立成分分析(independent component analysis, ICA)这一数据驱动方法对8名屈光参差性弱视患者和11名正常对照的静息数据进行分离,并采用拟合度值(goodness-of-fit scores)分析挑选出静息视觉网络,将结果进行组内分析和组间分析。结果表明,屈光参差性弱视的静息视觉网络中,多级视觉皮层均发生了明显的功能损害,其功能连接度的范围与强度显著低于正常组,而且,高级别纹外皮层比低级别纹状皮层损害更加明显。静息fMRI为深入研究弱视初、高级视觉皮层功能损害的发病机制提供了新的方法。  相似文献   

18.
Besides the problem of searching for effective methods for data analysis there are some additional problems with handling data of high uncertainty. Uncertainty problems often arise in an analysis of ecological data, e.g. in the cluster analysis of ecological data. Conventional clustering methods based on Boolean logic ignore the continuous nature of ecological variables and the uncertainty of ecological data. That can result in misclassification or misinterpretation of the data structure. Clusters with fuzzy boundaries reflect better the continuous character of ecological features. But the problem is, that the common clustering methods (like the fuzzy c-means method) are only designed for treating crisp data, that means they provide a fuzzy partition only for crisp data (e.g. exact measurement data). This paper presents the extension and implementation of the method of fuzzy clustering of fuzzy data proposed by Yang and Liu [Yang, M.-S. and Liu, H-H, 1999. Fuzzy clustering procedures for conical fuzzy vector data. Fuzzy Sets and Systems, 106, 189-200.]. The imprecise data can be defined as multidimensional fuzzy sets with not sharply formed boundaries (in the form of the so-called conical fuzzy vectors). They can then be used for the fuzzy clustering together with crisp data. That can be particularly useful when information is not available about the variances which describe the accuracy of the data and probabilistic approaches are impossible. The method proposed by Yang has been extended and implemented for the Fuzzy Clustering System EcoFucs developed at the University of Kiel. As an example, the paper presents the fuzzy cluster analysis of chemicals according to their ecotoxicological properties. The uncertainty and imprecision of ecotoxicological data are very high because of the use of various data sources, various investigation tests and the difficulty of comparing these data. The implemented method can be very helpful in searching for an adequate partition of ecological data into clusters with similar properties.  相似文献   

19.
Although many numerical clustering algorithms have been applied to gene expression dataanalysis,the essential step is still biological interpretation by manual inspection.The correlation betweengenetic co-regulation and affiliation to a common biological process is what biologists expect.Here,weintroduce some clustering algorithms that are based on graph structure constituted by biological knowledge.After applying a widely used dataset,we compared the result clusters of two of these algorithms in terms ofthe homogeneity of clusters and coherence of annotation and matching ratio.The results show that theclusters of knowledge-guided analysis are the kernel parts of the clusters of Gene Ontology (GO)-Clustersoftware,which contains the genes that are most expression correlative and most consistent with biologicalfunctions.Moreover,knowledge-guided analysis seems much more applicable than GO-Cluster in a largerdataset.  相似文献   

20.
Multi-class clustering and prediction in the analysis of microarray data   总被引:1,自引:0,他引:1  
DNA microarray technology provides tools for studying the expression profiles of a large number of distinct genes simultaneously. This technology has been applied to sample clustering and sample prediction. Because of a large number of genes measured, many of the genes in the original data set are irrelevant to the analysis. Selection of discriminatory genes is critical to the accuracy of clustering and prediction. This paper considers statistical significance testing approach to selecting discriminatory gene sets for multi-class clustering and prediction of experimental samples. A toxicogenomic data set with nine treatments (a control and eight metals, As, Cd, Ni, Cr, Sb, Pb, Cu, and AsV with a total of 55 samples) is used to illustrate a general framework of the approach. Among four selected gene sets, a gene set omega(I) formed by the intersection of the F-test and the set of the union of one-versus-all t-tests performs the best in terms of clustering as well as prediction. Hierarchical and two modified partition (k-means) methods all show that the set omega(I) is able to group the 55 samples into seven clusters reasonably well, in which the As and AsV samples are considered as one cluster (the same group) as are the Cd and Cu samples. With respect to prediction, the overall accuracy for the gene set omega(I) using the nearest neighbors algorithm to predict 55 samples into one of the nine treatments is 85%.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号